Applied Math Derivations
Applied Math Derivations
Applied Math Derivations
Thaddeus H. Black
This book is free software. You can redistribute and/or modify it under the
terms of the GNU General Public License [22], version 2.
Contents
Preface xvii
1 Introduction 1
1.1 Applied mathematics . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Rigor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Axiom and definition . . . . . . . . . . . . . . . . . . 2
1.2.2 Mathematical extension . . . . . . . . . . . . . . . . . 4
1.3 Complex numbers and complex variables . . . . . . . . . . . 5
1.4 On the text . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
iii
iv CONTENTS
3 Trigonometry 47
3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Simple properties . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3 Scalars, vectors, and vector notation . . . . . . . . . . . . . . 49
3.4 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5 Trigonometric sums and differences . . . . . . . . . . . . . . 55
3.5.1 Variations on the sums and differences . . . . . . . . 56
3.5.2 Trigonometric functions of double and half angles . . 57
3.6 Trigonometrics of the hour angles . . . . . . . . . . . . . . . 57
3.7 The laws of sines and cosines . . . . . . . . . . . . . . . . . . 61
3.8 Summary of properties . . . . . . . . . . . . . . . . . . . . . 62
3.9 Cylindrical and spherical coordinates . . . . . . . . . . . . . 64
3.10 The complex triangle inequalities . . . . . . . . . . . . . . . . 67
3.11 De Moivre’s theorem . . . . . . . . . . . . . . . . . . . . . . . 67
4 The derivative 69
4.1 Infinitesimals and limits . . . . . . . . . . . . . . . . . . . . . 69
4.1.1 The infinitesimal . . . . . . . . . . . . . . . . . . . . . 70
4.1.2 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . 72
CONTENTS v
Appendices 511
10.1 The method to extract the three roots of the general cubic. . 227
10.2 The method to extract the four roots of the general quartic. . 234
xiii
xiv LIST OF TABLES
xv
xvi LIST OF FIGURES
xvii
xviii PREFACE
suppose that what is true for me is true for many of them also: we begin by
organizing notes for our own use, then observe that the same notes might
prove useful to others, and then undertake to revise the notes and to bring
them into a form which actually is useful to others. Whether this book
succeeds in the last point is for the reader to judge.
then surely also to others who came before; and even where a proof is new
the idea proven probably is not.
Among the bibliography’s entries stands a reference [10] to my doctoral
adviser G.S. Brown, though the book’s narrative seldom explicitly invokes
the reference. Prof. Brown had nothing directly to do with the book’s devel-
opment, for a substantial bulk of the manuscript, or of the notes underlying
it, had been drafted years before I had ever heard of Prof. Brown or he of
me, and my work under him did not regard the book in any case. However,
the ways in which a good doctoral adviser influences his student are both
complex and deep. Prof. Brown’s style and insight touch the book in many
places and in many ways, usually too implicitly coherently to cite.
Steady encouragement from my wife and children contribute to the book
in ways only an author can properly appreciate.
More and much earlier than to Prof. Brown or to my wife and children,
the book owes a debt to my mother and, separately, to my father, without
either of whom the book would never have come to be. Admittedly, any
author of any book might say as much in a certain respect, but it is no
office of a mathematical book’s preface to burden readers with an author’s
expressions of filial piety. No, it is in entirely another respect that I lay the
matter here. My mother taught me at her own hand most of the mathe-
matics I ever learned as a child, patiently building a foundation that cannot
but be said to undergird the whole book today. More recently, my mother
has edited tracts of the book’s manuscript. My father generously financed
much of my formal education but—more than this—one would have had
to grow up with my brother and me in my father’s home to appreciate the
grand sweep of the man’s curiosity, the general depth of his knowledge, the
profound wisdom of his practicality and the enduring example of his love of
excellence.
May the book deserve such a heritage.
THB
Chapter 1
Introduction
1
2 CHAPTER 1. INTRODUCTION
1.2 Rigor
It is impossible to write such a book as this without some discussion of math-
ematical rigor. Applied and professional mathematics differ principally and
essentially in the layer of abstract definitions the latter subimposes beneath
the physical ideas the former seeks to model. Notions of mathematical rigor
fit far more comfortably in the abstract realm of the professional mathe-
matician; they do not always translate so gracefully to the applied realm.
The applied mathematical reader and practitioner needs to be aware of this
difference.
h h
b1 b2 −b2
b b
b1 h
A1 = ,
2
b2 h
A2 =
2
(each right triangle is exactly half a rectangle). Hence the main triangle’s
area is
(b1 + b2 )h bh
A = A1 + A2 = = .
2 2
Very well. What about the triangle on the right? Its b1 is not shown on the
figure, and what is that −b2 , anyway? Answer: the triangle is composed of
the difference of two right triangles, with b1 the base of the larger, overall
one: b1 = b + (−b2 ). The b2 is negative because the sense of the small right
triangle’s area in the proof is negative: the small area is subtracted from
1.3. COMPLEX NUMBERS AND COMPLEX VARIABLES 5
the large rather than added. By extension on this basis, the main triangle’s
area is again seen to be A = bh/2. The proof is exactly the same. In fact,
once the central idea of adding two right triangles is grasped, the extension
is really rather obvious—too obvious to be allowed to burden such a book
as this.
Excepting the uncommon cases where extension reveals something in-
teresting or new, this book generally leaves the mere extension of proofs—
including the validation of edge cases and over-the-edge cases—as an exercise
to the interested reader.
Licensed to the public under the GNU General Public Licence [22], ver-
sion 2, this book meets the Debian Free Software Guidelines [16].
If you cite an equation, section, chapter, figure or other item from this
book, it is recommended that you include in your citation the book’s precise
draft date as given on the title page. The reason is that equation numbers,
chapter numbers and the like are numbered automatically by the LATEX
typesetting software: such numbers can change arbitrarily from draft to
draft. If an exemplary citation helps, see [8] in the bibliography.
Part I
7
Chapter 2
9
10 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY
b a
a
b
2.1. BASIC ARITHMETIC RELATIONSHIPS 11
A × B × C = A × (B × C).
The sense of it is that the thing on the left (A × ) operates on the thing on
the right (B × C). (In the rare case in which the question arises, you may
want to use parentheses anyway.)
Consider that
(+a)(+b) = +ab,
(+a)(−b) = −ab,
(−a)(+b) = −ab,
(−a)(−b) = +ab.
The first three of the four equations are unsurprising, but the last is inter-
esting. Why would a negative count −a of a negative quantity −b come to
2
The fine C and C++ programming languages are unfortunately stuck with the reverse
order of association, along with division inharmoniously on the same level of syntactic
precedence as multiplication. Standard mathematical notation is more elegant:
(a)(bc)
abc/uvw = .
(u)(vw)
3
The nonassociative cross product B × C is introduced in § 15.2.2.
12 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY
..
.
(+3)(−b) = −3b,
(+2)(−b) = −2b,
(+1)(−b) = −1b,
(0)(−b) = 0b,
(−1)(−b) = +1b,
(−2)(−b) = +2b,
(−3)(−b) = +3b,
..
.
The logic of arithmetic demands that the product of two negative numbers
be positive for this reason.
2.1.3 Inequality
If4
a < b,
then necessarily
a + x < b + x.
However, the relationship between ua and ub depends on the sign of u:
ua < ub if u > 0;
ua > ub if u < 0.
Also,
1 1
> .
a b
Q ← P.
This means, “in place of P , put Q”; or, “let Q now equal P .” For example,
if a2 + b2 = c2 , then the change of variable 2µ ← a yields the new form
(2µ)2 + b2 = c2 .
Similar to the change of variable notation is the definition notation
Q ≡ P.
2.2 Quadratics
Differences and sums of squares are conveniently factored as
a2 − b2 = (a + b)(a − b),
a2 + b2 = (a + ib)(a − ib),
(2.1)
a2 − 2ab + b2 = (a − b)2 ,
a2 + 2ab + b2 = (a + b)2
(where i is the imaginary unit, a number defined such that i2 = −1, in-
troduced in more detail in § 2.12 below). Useful as these four forms are,
5
There appears to exist no broadly established standard mathematical notation for
the change of variable, other than the = equal sign, which regrettably does not fill the
role well. One can indeed use the equal sign, but then what does the change of variable
k = k + 1 mean? It looks like a claim that k and k + 1 are the same, which is impossible.
The notation k ← k + 1 by contrast is unambiguous; it means to increment k by one.
However, the latter notation admittedly has seen only scattered use in the literature.
The C and C++ programming languages use == for equality and = for assignment
(change of variable), as the reader may be aware.
6
One would never write k ≡ k + 1. Even k ← k + 1 can confuse readers inasmuch as
it appears to imply two different values for the same symbol k, but the latter notation is
sometimes used anyway when new symbols are unwanted or because more precise alter-
natives (like kn = kn−1 + 1) seem overwrought. Still, usually it is better to introduce a
new symbol, as in j ← k + 1.
In some books, ≡ is printed as ,.
14 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY
however, none of them can directly factor the more general quadratic7 ex-
pression
z 2 − 2βz + γ 2 .
To factor this, we complete the square, writing
z 2 − 2βz + γ 2 = z 2 − 2βz + γ 2 + (β 2 − γ 2 ) − (β 2 − γ 2 )
= z 2 − 2βz + β 2 − (β 2 − γ 2 )
= (z − β)2 − (β 2 − γ 2 ).
The expression evidently has roots8 where
(z − β)2 = (β 2 − γ 2 ),
or in other words where9
p
z=β± β2 − γ2. (2.2)
This suggests the factoring10
z 2 − 2βz + γ 2 = (z − z1 )(z − z2 ), (2.3)
where z1 and z2 are the two values of z given by (2.2).
It follows that the two solutions of the quadratic equation
z 2 = 2βz − γ 2 (2.4)
are those given by (2.2), which is called the quadratic formula.11 (Cubic and
quartic formulas also exist respectively to extract the roots of polynomials
of third and fourth order, but they are much harder. See Ch. 10 and its
Tables 10.1 and 10.2.)
7
The adjective quadratic refers to the algebra of expressions in which no term has
greater than second order. Examples of quadratic expressions include x2 , 2x2 − 7x + 3 and
x2 +2xy +y 2 . By contrast, the expressions x3 −1 and 5x2 y are cubic not quadratic because
they contain third-order terms. First-order expressions like x + 1 are linear; zeroth-order
expressions like 3 are constant. Expressions of fourth and fifth order are quartic and
quintic, respectively. (If not already clear from the context, order basically refers to the
number of variables multiplied together in a term. The term 5x2 y = 5[x][x][y] is of third
order, for instance.)
8
A root of f (z) is a value of z for which f (z) = 0. See § 2.11.
9
The symbol ± means “+ or −.” In conjunction with this symbol, the alternate
symbol ∓ occasionally also appears, meaning “− or +”—which is the same thing except
that, where the two symbols appear together, (±z) + (∓z) = 0.
10
It suggests it because the expressions on the left and right sides of (2.3) are both
quadratic (the highest power is z 2 ) and have the same roots. Substituting into the equation
the values of z1 and z2 and simplifying proves the suggestion correct.
11
The form of the quadratic formula which usually appears in print is
√
−b ± b2 − 4ac
x= ,
2a
2.3. INTEGER AND SERIES NOTATION 15
6
X
k2 = 32 + 42 + 52 + 62 = 0x56.
k=3
b
Y
f (j)
j=a
P
meansQ to multiply the several f (j) rather than to add them. The symbols
and come respectively from the Greek letters for S and P, and may be
regarded as standing for “Sum” and “Product.” The j or k is a dummy
variable, index of summation or loop counter —a variable with no indepen-
dent existence, used only to facilitate the addition
Q or multiplication of the
13
Q
series. (Nothing prevents one from writing k rather than j , inciden-
tally. For a dummy variable,
P oneQcan use any letter one likes. However, the
general habit of writing k and j proves convenient at least in § 4.5.2 and
Ch. 8, so we start now.)
which solves the quadratic ax2 + bx + c = 0. However, this writer finds the form (2.2)
easier to remember. For example, by (2.2) in light of (2.4), the quadratic
z 2 = 3z − 2
0! = 1. (2.5)
14
One reason among others for this is that factorials rapidly multiply to extremely large
sizes, overflowing computer registers during numerical computation. If you can avoid
unnecessary multiplication by regarding n!/m! as a single unit, this is a win.
15
The extant mathematical Q literature lacks an established standard on the order of
multiplication implied by the “ ” symbol, but this is the order we shall use in this book.
2.4. THE ARITHMETIC SERIES 17
n
Y
n
z ≡ z, n ≥ 0
j=1
z = (z 1/n )n = (z n )1/n
√
z ≡ z 1/2
(uv)a = ua v a
z p/q = (z 1/q )p = (z p )1/q
z ab = (z a )b = (z b )a
z a+b = z a z b
za
z a−b =
zb
1
z −b =
zb
j, n, p, q ∈ Z
Success with
P∞this karithmetic series leads one to wonder about the geo-
metric series k=0 z . Section 2.6.4 addresses that point.
j, k, m, n, p, q, r, s ∈ Z
are integers, but the exponents a and b are arbitrary real numbers.
For example,19
z 3 = (z)(z)(z),
z 2 = (z)(z),
z 1 = z,
z 0 = 1.
z m+n = z m z n ,
zm (2.9)
z m−n = n ,
z
for any integral m and n. For similar reasons,
z mn = (z m )n = (z n )m . (2.10)
(uv)n = un v n . (2.11)
18
The symbol “≡” means “=”, but it further usually indicates that the expression on
its right serves to define the expression on its left. Refer to § 2.1.4.
19
The case 00 is interesting because it lacks an obvious interpretation. The specific
interpretation depends on the nature and meaning of the two zeros. For interest, if E ≡
1/ǫ, then
„ «1/E
ǫ 1
lim ǫ = lim = lim E −1/E = lim e−(ln E)/E = e0 = 1.
ǫ→0+ E→∞ E E→∞ E→∞
20 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY
2.5.2 Roots
Fractional powers are not something we have defined yet, so for consistency
with (2.10) we let
(u1/n )n = u.
This has u1/n as the number which, raised to the nth power, yields u. Setting
v = u1/n ,
v n = u,
(v n )1/n = u1/n ,
(v n )1/n = v.
When z is real and nonnegative, the last notation is usually implicitly taken
to mean the real, nonnegative square root. In any case, the power and root
operations mutually invert one another.
What about powers expressible neither as n nor as 1/n, such as the 3/2
power? If z and w are numbers related by
wq = z,
then
wpq = z p .
Taking the qth root,
wp = (z p )1/q .
But w = z 1/q , so this is
(z 1/q )p = (z p )1/q ,
2.5. POWERS AND ROOTS 21
which says that it does not matter whether one applies the power or the
root first; the result is the same. Extending (2.10) therefore, we define z p/q
such that
(z 1/q )p = z p/q = (z p )1/q . (2.13)
Since any real number can be approximated arbitrarily closely by a ratio of
integers, (2.13) implies a power definition for all real exponents.
Equation (2.13) is this subsection’s main result. However, § 2.5.3 will
find it useful if we can also show here that
w ≡ z 1/qs ,
(ws )q = z.
w = (z 1/q )1/s .
By identical reasoning,
w = (z 1/s )1/q .
But since w ≡ z 1/qs , the last two equations imply (2.14), as we have sought.
In other words
(uv)a = ua v a (2.15)
for any real a.
On the other hand, per (2.10),
z pr = (z p )r .
Raising this equation to the 1/qs power and applying (2.10), (2.13) and
(2.14) to reorder the powers, we have that
By identical reasoning,
Since p/q and r/s can approximate any real numbers with arbitrary preci-
sion, this implies that
(z a )b = z ab = (z b )a (2.16)
for any real a and b.
1 = z −b+b = z −b z b ,
z a−b = z a z −b ,
where the several ak are arbitrary constants. This section discusses the
multiplication and division of power series.
20
Another name for the power series is polynomial. The word “polynomial” usually
connotes a power series with a finite number of terms, but the two names in fact refer to
essentially the same thing.
Professional mathematicians use the terms more precisely. Equation (2.20), they call a
“power series” only if ak = 0 for all k < 0—in other words, technically, not if it includes
negative powers of z. They call it a “polynomial” only if it is a “power series” with a
finite number of terms. They call (2.20) in general a Laurent series.
The name “Laurent series” is a name we shall meet again in § 8.14. In the meantime
however we admit that the professionals have vaguely daunted us by adding to the name
some pretty sophisticated connotations, to the point that we applied mathematicians (at
least in the author’s country) seem to feel somehow unlicensed actually to use the name.
We tend to call (2.20) a “power series with negative powers,” or just “a power series.”
This book follows the last usage. You however can call (2.20) a Laurent series if you
prefer (and if you pronounce it right: “lor-ON”). That is after all exactly what it is.
Nevertheless if you do use the name “Laurent series,” be prepared for people subjectively—
for no particular reason—to expect you to establish complex radii of convergence, to
sketch some annulus in the Argand plane, and/or to engage in other maybe unnecessary
formalities. If that is not what you seek, then you may find it better just to call the thing
by the less lofty name of “power series”—or better, if it has a finite number of terms, by
the even humbler name of “polynomial.”
Semantics. All these names mean about the same thing, but one is expected most
24 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY
2z 2 − 3z + 3 2z 2 − 4z z + 3 z+3
= + = 2z +
z−2 z−2 z−2 z−2
z−2 5 5
= 2z + + = 2z + 1 + .
z−2 z−2 z−2
The strategy is to take the dividend21 B(z) piece by piece, purposely choos-
ing pieces easily divided by A(z).
carefully always to give the right name in the right place. What a bother! (Someone
once told the writer that the Japanese language can give different names to the same
object, depending on whether the speaker is male or female. The power-series terminology
seems to share a spirit of that kin.) If you seek just one word for the thing, the writer
recommends that you call it a “power series” and then not worry too much about it
until someone objects. When someone does object, you can snow him with the big word
“Laurent series,” instead.
The experienced scientist or engineer may notice that the above vocabulary omits the
name “Taylor series.” The vocabulary omits the name because that name fortunately
remains unconfused in usage—it means quite specifically a power series without negative
powers and tends to connote a representation of some particular function of interest—as
we shall see in Ch. 8.
21
If Q(z) is a quotient and R(z) a remainder, then B(z) is a dividend (or numerator )
and A(z) a divisor (or denominator ). Such are the Latin-derived names of the parts of a
long division.
2.6. MULTIPLYING AND DIVIDING POWER SERIES 25
If you feel that you understand the example, then that is really all there
is to it, and you can skip the rest of the subsection if you like. One sometimes
wants to express the long division of power series more formally, however.
That is what the rest of the subsection is about. (Be advised however that
the cleverer technique of § 2.6.3, though less direct, is often easier and faster.)
Formally, we prepare the long division B(z)/A(z) by writing
obtain
rnn n−K rnn n−K
B(z) = A(z) Qn (z) + z + Rn (z) − z A(z) .
aK aK
Matching this equation against the desired iterate
B(z) = A(z)Qn−1 (z) + Rn−1 (z)
and observing from the definition of Qn (z) that Qn−1 (z) = Qn (z) +
qn−K z n−K , we find that
rnn
qn−K = ,
aK (2.25)
Rn−1 (z) = Rn (z) − qn−K z n−K A(z),
where no term remains in Rn−1 (z) higher than a z n−1 term.
To begin the actual long division, we initialize
RN (z) = B(z),
for which (2.23) is trivially true. Then we iterate per (2.25) as many times
as desired. If an infinite number of times, then so long as Rn (z) tends to
vanish as n → −∞, it follows from (2.23) that
B(z)
= Q−∞ (z). (2.26)
A(z)
Iterating only a finite number of times leaves a remainder,
B(z) Rn (z)
= Qn (z) + , (2.27)
A(z) A(z)
except that it may happen that Rn (z) = 0 for sufficiently small n.
Table 2.3 summarizes the long-division procedure.22 In its qn−K equa-
tion, the table includes also the result of § 2.6.3 below.
It should be observed in light of Table 2.3 that if23
K
X
A(z) = ak z k ,
k=Ko
N
X
B(z) = bk z k ,
k=No
22
[59, § 3.2]
23
The notations Ko , ak and z k are usually pronounced, respectively, as “K naught,” “a
sub k” and “z to the k” (or, more fully, “z to the kth power”)—at least in the author’s
country.
2.6. MULTIPLYING AND DIVIDING POWER SERIES 27
then
n
X
Rn (z) = rnk z k for all n < No + (K − Ko ). (2.28)
k=n−(K−Ko )+1
That is, the remainder has order one less than the divisor has. The reason
for this, of course, is that we have strategically planned the long-division
iteration precisely to cause the leading term of the divisor to cancel the
leading term of the remainder at each step.24
The long-division procedure of Table 2.3 extends the quotient Qn (z)
through successively smaller powers of z. Often, however, one prefers to
extend the quotient through successively larger powers of z, where a z K
term is A(z)’s term of least order. In this case, the long division goes by the
complementary rules of Table 2.4.
But for this to hold for all z, the coefficients must match for each n:
n−K
X
an−k qk = bn , n ≥ N.
k=N −K
Transferring all terms but aK qn−K to the equation’s right side and dividing
by aK , we have that
n−K−1
!
1 X
qn−K = bn − an−k qk , n ≥ N. (2.30)
aK
k=N −K
In this case,
B(z) Rn (z)
Q∞ (z) = = Qn (z) + (2.31)
A(z) A(z)
and convergence is not an issue. Solving (2.31) for Rn (z),
Rn (z) = B(z) − A(z)Qn (z). (2.32)
as was to be demonstrated.
27
The notation |z| represents the magnitude of z. For example, |5| = 5, but also
|−5| = 5.
32 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY
(which arises in, among others, Planck’s quantum blackbody radiation cal-
culation28 ), we can compute as follows. We multiply the unknown S1 by z,
producing
∞
X X∞
zS1 = kz k+1 = (k − 1)z k .
k=0 k=1
where ∆r is the distance from source to listener and vsound is the speed of
sound. Here, vsound is an indeterminate constant (given particular atmo-
spheric conditions, it doesn’t vary), ∆r is an independent variable, and t
is a dependent variable. The model gives t as a function of ∆r; so, if you
tell the model how far the listener sits from the sound source, the model
returns the time needed for the sound to propagate from one to the other.
Note that the abstract validity of the model does not necessarily depend on
whether we actually know the right figure for vsound (if I tell you that sound
goes at 500 m/s, but later you find out that the real figure is 331 m/s, it
probably doesn’t ruin the theoretical part of your analysis; you just have
to recalculate numerically). Knowing the figure is not the point. The point
is that conceptually there preëxists some right figure for the indeterminate
constant; that sound goes at some constant speed—whatever it is—and that
we can calculate the delay in terms of this.
Although there exists a definite philosophical distinction between the
three kinds of quantity, nevertheless it cannot be denied that which par-
ticular quantity is an indeterminate constant, an independent variable or
a dependent variable often depends upon one’s immediate point of view.
The same model in the example would remain valid if atmospheric condi-
tions were changing (vsound would then be an independent variable) or if the
model were used in designing a musical concert hall29 to suffer a maximum
acceptable sound time lag from the stage to the hall’s back row (t would
then be an independent variable; ∆r, dependent). Occasionally we go so far
as deliberately to change our point of view in mid-analysis, now regarding
as an independent variable what a moment ago we had regarded as an inde-
terminate constant, for instance (a typical case of this arises in the solution
of differential equations by the method of unknown coefficients, § 9.4). Such
a shift of viewpoint is fine, so long as we remember that there is a difference
29
Math books are funny about examples like this. Such examples remind one of the kind
of calculation one encounters in a childhood arithmetic textbook, as of the quantity of air
contained in an astronaut’s round helmet. One could calculate the quantity of water in
a kitchen mixing bowl just as well, but astronauts’ helmets are so much more interesting
than bowls, you see.
The chance that the typical reader will ever specify the dimensions of a real musical
concert hall is of course vanishingly small. However, it is the idea of the example that mat-
ters here, because the chance that the typical reader will ever specify something technical
is quite large. Although sophisticated models with many factors and terms do indeed play
a major role in engineering, the great majority of practical engineering calculations—for
quick, day-to-day decisions where small sums of money and negligible risk to life are at
stake—are done with models hardly more sophisticated than the one shown here. So,
maybe the concert-hall example is not so unreasonable, after all.
34 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY
between the three kinds of quantity and we keep track of which quantity is
which kind to us at the moment.
The main reason it matters which symbol represents which of the three
kinds of quantity is that in calculus, one analyzes how change in indepen-
dent variables affects dependent variables as indeterminate constants remain
fixed.
(Section 2.3 has introduced the dummy variable, which the present sec-
tion’s threefold taxonomy seems to exclude. However, in fact, most dummy
variables are just independent variables—a few are dependent variables—
whose scope is restricted to a particular expression. Such a dummy variable
does not seem very “independent,” of course; but its dependence is on the
operator controlling the expression, not on some other variable within the
expression. Within the expression, the dummy variable fills the role of an
independent variable; without, it fills no role because logically it does not
exist there. Refer to §§ 2.3 and 7.3.)
az ,
where the variable z is the exponent and the constant a is the base.
The logarithm loga w answers the question, “What power must I raise a to,
to get w?”
Raising a to the power of the last equation, we have that
z)
aloga (a = az .
2.8. EXPONENTIALS AND LOGARITHMS 35
Figure 2.2: The sum of a triangle’s inner angles: turning at the corner.
φ
ψ
A triangle’s three interior angles32 sum to 2π/2. One way to see the truth
of this fact is to imagine a small car rolling along one of the triangle’s sides.
Reaching the corner, the car turns to travel along the next side, and so on
round all three corners to complete a circuit, returning to the start. Since
the car again faces the original direction, we reason that it has turned a
total of 2π: a full revolution. But the angle φ the car turns at a corner
and the triangle’s inner angle ψ there together form the straight angle 2π/2
(the sharper the inner angle, the more the car turns: see Fig. 2.2). In
31
Section 13.9 proves the triangle inequalities more generally, though regrettably with-
out recourse to this subsection’s properly picturesque geometrical argument.
32
Many or most readers will already know the notation 2π and its meaning as the angle
of full revolution. For those who do not, the notation is introduced more properly in
§§ 3.1, 3.6 and 8.11 below. Briefly, however, the symbol 2π represents a complete turn,
a full circle, a spin to face the same direction as before. Hence 2π/4 represents a square
turn or right angle.
You may be used to the notation 360◦ in place of 2π, but for the reasons explained in
Appendix A and in footnote 16 of Ch. 3, this book tends to avoid the former notation.
38 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY
mathematical notation,
φ1 + φ2 + φ3 = 2π,
2π
φk + ψk = , k = 1, 2, 3,
2
where ψk and φk are respectively the triangle’s inner angles and the an-
gles through which the car turns. Solving the latter equations for φk and
substituting into the former yields
2π
ψ1 + ψ2 + ψ3 = , (2.45)
2
which was to be demonstrated.
Extending the same technique to the case of an n-sided polygon, we have
that
n
X
φk = 2π,
k=1
2π
φk + ψk = .
2
Solving the latter equations for φk and substituting into the former yields
n
X 2π
− ψk = 2π,
2
k=1
or in other words n
X 2π
ψk = (n − 2) . (2.46)
2
k=1
a2 + b2 = c2 , (2.47)
where a, b and c are the lengths of the legs and diagonal of a right triangle,
as in Fig. 2.3. Many proofs of the theorem are known.
2.10. THE PYTHAGOREAN THEOREM 39
c
b
c
b
b a
40 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY
One such proof posits a square of side length a + b with a tilted square of
side length c inscribed as in Fig. 2.4. The area of each of the four triangles
in the figure is evidently ab/2. The area of the tilted inner square is c2 .
The area of the large outer square is (a + b)2 . But the large outer square is
comprised of the tilted inner square plus the four triangles, hence the area
of the large outer square equals the area of the tilted inner square plus the
areas of the four triangles. In mathematical symbols, this is
2 2 ab
(a + b) = c + 4 ,
2
a2 + b2 + h2 = r 2 , (2.48)
2.11 Functions
This book is not the place for a gentle introduction to the concept of the
function. Briefly, however, a function is a mapping from one number (or
vector of several numbers) to another. For example, f (x) = x2 − 1 is a
function which maps 1 to 0 and −3 to 8, among others.
One often speaks of domains and ranges when discussing functions. The
domain of a function is the set of numbers one can put into it. The range
of a function is the corresponding set of numbers one can get out of it. In
the example, if the domain is restricted to real x such that |x| ≤ 3, then the
corresponding range is −1 ≤ f (x) ≤ 8.
33
This elegant proof is far simpler than the one famously given by the ancient geometer
Euclid, yet more appealing than alternate proofs often found in print. Whether Euclid was
acquainted with the simple proof given here this writer does not know, but it is possible
[67, “Pythagorean theorem,” 02:32, 31 March 2006] that Euclid chose his proof because
it comported better with the restricted set of geometrical elements he permitted himself
to work with. Be that as it may, the present writer encountered the proof this section
gives somewhere years ago and has never seen it in print since, so can claim no credit for
originating it. Unfortunately the citation is now long lost. A current source for the proof
is [67] as cited earlier in this footnote.
2.12. COMPLEX NUMBERS (INTRODUCTION) 41
Other terms which arise when discussing functions are root (or zero),
singularity and pole. A root (or zero) of a function is a domain point at
which the function evaluates to zero (the example has roots at x = ±1). A
singularity of a function is a domain point at which the function’s output
diverges; that is, where the function’s output is infinite.34 A pole is a sin-
√
gularity that behaves locally like 1/x (rather than, for example, like 1/ x).
A singularity that behaves as 1/xN is a multiple pole, which (§ 9.6.2) can be
thought of as N poles. The example’s function f (x) has no singularities for
finite x; however, the function h(x) = 1/(x2 − 1) has poles at x = ±1.
(Besides the root, the singularity and the pole, there is also the trouble-
some branch point, an infamous example of which is z = 0 in the function
√
g[z] = z. Branch points are important, but the book must lay a more
extensive foundation before introducing them properly in § 8.5.35 )
lim |f (x)| = ∞.
x→xo
The applied mathematician tends to avoid such formalism where there seems no immediate
use for it.
35
There is further the essential singularity, an example of which is z = 0 in p(z) =
exp(1/z), but the best way to handle such unreasonable singularities is almost always to
change a variable, as w ← 1/z, or otherwise to frame the problem such that one need
not approach the singularity. This book will have little to say of such singularities. (Such
singularities are however sometimes the thing one implicitly uses an asymptotic series to
route around, particularly in work with special functions—but special functions are an
advanced topic the book won’t even begin to treat until [chapters not yet written]; and,
even then, the book will not speak of the essential singularity explicitly.)
36
The English word imaginary is evocative, but perhaps not of quite the right concept
in this usage. Imaginary numbers are not to mathematics as, say, imaginary elfs are to the
physical world. In the physical world, imaginary elfs are (presumably) not substantial ob-
jects. However, in the mathematical realm, imaginary numbers are substantial. The word
42 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY
Figure 2.5: The complex (or Argand) plane, and a complex number z
therein.
iℑ(z)
i2
i z
ρ
φ
ℜ(z)
−2 −1 1 2
−i
−i2
Imaginary numbers are given their own number line, plotted at right
angles to the familiar real number line as in Fig. 2.5. The sum of a real
number x and an imaginary number iy is the complex number
z = x + iy.
The conjugate z ∗ of this complex number is defined to be37
z ∗ = x − iy.
imaginary in the mathematical sense is thus more of a technical term than a descriptive
adjective.
The number i is just a concept, of course, but then so is the number 1 (though you and
I have often met one of something—one apple, one chair, one summer afternoon, etc.—
neither of us has ever met just 1). The reason imaginary numbers are called “imaginary”
probably has to do with the fact that they emerge from mathematical operations only,
never directly from counting things. Notice, however, that the number 1/2 never emerges
directly from counting things, either. If for some reason the iyear were offered as a unit
of time, then the period separating your fourteenth and twenty-first birthdays could have
been measured as −i7 iyears. Madness? No, let us not call it that; let us call it a useful
formalism, rather.
The unpersuaded reader is asked to suspend judgment a while. He will soon see the
use.
37
For some inscrutable reason, in the author’s country at least, professional mathe-
maticians seem universally to write z instead of z ∗ , whereas rising engineers take the
mathematicians’ classes and then, having passed them, promptly start writing z ∗ for the
rest of their lives. The writer has his preference between the two notations and this book
2.12. COMPLEX NUMBERS (INTRODUCTION) 43
The magnitude (or modulus, or absolute value) |z| of the complex number is
defined to be the length ρ in Fig. 2.5, which per the Pythagorean theorem
(§ 2.10) is such that
|z|2 = x2 + y 2 . (2.50)
The phase arg z of the complex number is defined to be the angle φ in the
figure, which in terms of the trigonometric functions of § 3.138 is such that
y
tan(arg z) = . (2.51)
x
Specifically to extract the real and imaginary parts of a complex number,
the notations
ℜ(z) = x,
(2.52)
ℑ(z) = y,
are conventionally recognized (although often the symbols ℜ[·] and ℑ[·] are
written Re[·] and Im[·], particularly when printed by hand).
(−j)2 = −1 = j 2 ,
and thus that all the basic properties of complex numbers in the j system
held just as well as they did in the i system. The units i and j would differ
indeed, but would perfectly mirror one another in every respect.
That is the basic idea. To establish it symbolically needs a page or so of
slightly abstract algebra as follows, the goal of which will be to show that
[f (z)]∗ = f (z ∗ ) for some unspecified function f (z) with specified properties.
To begin with, if
z = x + iy,
then
z ∗ = x − iy
by definition. Proposing that (z k−1 )∗ = (z ∗ )k−1 (which may or may not be
true but for the moment we assume it), we can write
where sk−1 and tk−1 are symbols introduced to represent the real and imag-
inary parts of z k−1 . Multiplying the former equation by z = x + iy and the
latter by z ∗ = x − iy, we have that
z k = sk + itk ,
(z ∗ )k = sk − itk .
39
[19, § I:22-5]
2.12. COMPLEX NUMBERS (INTRODUCTION) 45
where ak and bk are real and imaginary parts of the coefficients peculiar to
the function f (·), then
[f (z)]∗ = f ∗ (z ∗ ). (2.61)
In the common case where all bk = 0 and zo = xo is a real number, then
f (·) and f ∗ (·) are the same function, so (2.61) reduces to the desired form
[f (z)]∗ = f (z ∗ ), (2.62)
which says that the effect of conjugating the function’s input is merely to
conjugate its output.
Equation (2.62) expresses a significant, general rule of complex numbers
and complex variables which is better explained in words than in mathemat-
ical symbols. The rule is this: for most equations and systems of equations
used to model physical systems, one can produce an equally valid alter-
nate model simply by simultaneously conjugating all the complex quantities
present.41
40
Mathematical induction is an elegant old technique for the construction of mathemat-
ical proofs. Section 8.1 elaborates on the technique and offers a more extensive example.
Beyond the present book, a very good introduction to mathematical induction is found
in [27].
41
[27][57]
46 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY
42
[31, § 10.8]
43
That is a pretty impressive-sounding statement: “Such power series have broad appli-
cation.” However, air, words and molecules also have “broad application”; merely stating
the fact does not tell us much. In fact the general power series is a sort of one-size-fits-all
mathematical latex glove, which can be stretched to fit around almost any function. The
interesting part is not so much in the general form (2.59) of the series as it is in the specific
choice of ak and bk , which this section does not discuss.
Observe that the Taylor series (which this section also does not discuss; see § 8.3) is a
power series with ak = bk = 0 for k < 0.
44
The careful reader might observe that this statement neglects Gibbs’ phenomenon,
but that curious matter will be dealt with in § 17.6.
45
To illustrate, from the fact that (1 + i2)(2 + i3) + (1 − i) = −3 + i6 the conjugation
property infers immediately that (1 − i2)(2 − i3) + (1 + i) = −3 − i6. Observe however
that no such property holds for the real parts: (−1 + i2)(−2 + i3) + (−1 − i) 6= 3 + i6.
Chapter 3
Trigonometry
3.1 Definitions
Consider the circle-inscribed right triangle of Fig. 3.1.
In considering the circle, we shall find some terminology useful: the
angle φ in the diagram is measured in radians, where a radian is the an-
gle which, when centered in a unit circle, describes an arc of unit length.1
Measured in radians, an angle φ intercepts an arc of curved length ρφ on
a circle of radius ρ (that is, of distance ρ from the circle’s center to its
perimeter). An angle in radians is a dimensionless number, so one need not
write “φ = 2π/4 radians”; it suffices to write “φ = 2π/4.” In mathematical
theory, we express angles in radians.
The angle of full revolution is given the symbol 2π—which thus is the
circumference of a unit circle.2 A quarter revolution, 2π/4, is then the right
angle, or square angle.
The trigonometric functions sin φ and cos φ (the “sine” and “cosine” of φ)
relate the angle φ to the lengths shown in Fig. 3.1. The tangent function is
then defined as
sin φ
tan φ ≡ , (3.1)
cos φ
1
The word “unit” means “one” in this context. A unit length is a length of 1 (not one
centimeter or one mile, just an abstract 1). A unit circle is a circle of radius 1.
2
Section 8.11 computes the numerical value of 2π.
47
48 CHAPTER 3. TRIGONOMETRY
Figure 3.1: The sine and the cosine (shown on a circle-inscribed right trian-
gle, with the circle centered at the triangle’s point).
ρ sin φ
φ ρ cos φ
b x
which is the “rise” per unit “run,” or slope, of the triangle’s diagonal. In-
verses of the three trigonometric functions can also be defined:
arcsin (sin φ) = φ,
arccos (cos φ) = φ,
arctan (tan φ) = φ.
When the last of these is written in the form
y
arctan ,
x
it is normally implied that x and y are to be interpreted as rectangular
coordinates3,4 and that the arctan function is to return φ in the correct
quadrant −π < φ ≤ π (for example, arctan[1/(−1)] = [+3/8][2π], whereas
arctan[(−1)/1] = [−1/8][2π]). This is similarly the usual interpretation
when an equation like
y
tan φ =
x
3
Rectangular coordinates are pairs of numbers (x, y) which uniquely specify points in
a plane. Conventionally, the x coordinate indicates distance eastward; the y coordinate,
northward. For instance, the coordinates (3, −4) mean the point three units eastward and
four units southward (that is, −4 units northward) from the origin (0, 0). A third rectan-
gular coordinate can also be added—(x, y, z)—where the z indicates distance upward.
4
Because the “oo” of “coordinates” is not the monophthongal “oo” of “boot” and
“door,” the old publishing convention this book generally follows should style the word
as “coördinates.” The book uses the word however as a technical term. For better or for
worse, every English-language technical publisher the author knows of styles the technical
term as “coordinates.” The author neither has nor desires a mandate to reform technical
publishing practice, so “coordinates” the word shall be.
3.2. SIMPLE PROPERTIES 49
sin(t)
1
t
2π 2π
−
2 2
is written.
By the Pythagorean theorem (§ 2.10), it is seen generally that5
Fig. 3.2 plots the sine function. The shape in the plot is called a sinusoid.
tan(−φ) = − tan φ
tan(2π/4 − φ) = +1/ tan φ
tan(2π/2 − φ) = − tan φ
tan(φ ± 2π/4) = −1/ tan φ
tan(φ ± 2π/2) = + tan φ
tan(φ + n2π) = tan φ
sin φ
= tan φ
cos φ
cos2 φ + sin2 φ = 1
1
1 + tan2 φ =
cos2 φ
1 1
1+ 2 =
tan φ sin2 φ
3.3. SCALARS, VECTORS, AND VECTOR NOTATION 51
Figure 3.3: A two-dimensional vector u = x̂x + ŷy, shown with its rectan-
gular components.
ŷy u
x̂x
x
v
x
z
52 CHAPTER 3. TRIGONOMETRY
Many readers will already find the basic vector concept familiar, but for
those who do not, a brief review: Vectors such as the
u = x̂x + ŷy,
v = x̂x + ŷy + ẑz
of the figures are composed of multiples of the unit basis vectors x̂, ŷ and ẑ,
which themselves are vectors of unit length pointing in the cardinal direc-
tions their respective symbols suggest.7 Any vector a can be factored into
an amplitude a and a unit vector â, as
a = âa,
where the â represents direction only and has unit magnitude by definition,
and where the a represents amplitude only and carries the physical units
if any.8 For example, a = 55 miles per hour, â = northwestward. The
unit vector â itself can be expressed in terms of the unit basis
√ vectors:√for
√ then â =√−x̂(1/ 2)+ ŷ(1/ 2),
example, if x̂ points east and ŷ points north,
where per the Pythagorean theorem (−1/ 2)2 + (1/ 2)2 = 12 .
A single number which is not a vector or a matrix (Ch. 11) is called
a scalar. In the example, a = 55 miles per hour is a scalar. Though the
scalar a in the example happens to be real, scalars can be complex, too—
which might surprise one, since scalars by definition lack direction and the
Argand phase φ of Fig. 2.5 so strongly resembles a direction. However,
phase is not an actual direction in the vector sense (the real number line
7
Printing by hand, one customarily writes a general vector like u as “ ~u ” or just “ u ”,
and a unit vector like x̂ as “ x̂ ”.
8
The word “unit” here is unfortunately overloaded. As an adjective in mathematics,
or in its nounal form “unity,” it refers to the number one (1)—not one mile per hour,
one kilogram, one Japanese yen or anything like that; just an abstract 1. The word
“unit” itself as a noun however usually signifies a physical or financial reference quantity
of measure, like a mile per hour, a kilogram or even a Japanese yen. There is no inherent
mathematical unity to 1 mile per hour (otherwise known as 0.447 meters per second,
among other names). By contrast, a “unitless 1”—a 1 with no physical unit attached—
does represent mathematical unity.
Consider the ratio r = h1 /ho of your height h1 to my height ho . Maybe you are taller
than I am and so r = 1.05 (not 1.05 cm or 1.05 feet, just 1.05). Now consider the ratio
h1 /h1 of your height to your own height. That ratio is of course unity, exactly 1.
There is nothing ephemeral in the concept of mathematical unity, nor in the concept of
unitless quantities in general. The concept is quite straightforward and entirely practical.
That r > 1 means neither more nor less than that you are taller than I am. In applications,
one often puts physical quantities in ratio precisely to strip the physical units from them,
comparing the ratio to unity without regard to physical units.
3.4. ROTATION 53
3.4 Rotation
A fundamental problem in trigonometry arises when a vector
u = x̂x + ŷy (3.4)
must be expressed in terms of alternate unit vectors x̂′ and ŷ′ , where x̂′
and ŷ′ stand at right angles to one another and lie in the plane11 of x̂ and ŷ,
9
Some books print |v| as kvk or even kvk2 to emphasize that it represents the real,
scalar magnitude of a complex vector. The reason the last notation subscripts a numeral 2
is obscure, having to do with the professional mathematician’s generalized definition of a
thing he calls the “norm.” This book just renders it |v|.
10
The writer does not know the etymology for certain, but verbal lore in American
engineering has it that the name “right-handed” comes from experience with a standard
right-handed wood screw or machine screw. If you hold the screwdriver in your right hand
and turn the screw in the natural manner clockwise, turning the screw slot from the x
orientation toward the y, the screw advances away from you in the z direction into the
wood or hole. If somehow you came across a left-handed screw, you’d probably find it
easier to drive that screw with the screwdriver in your left hand.
11
A plane, as the reader on this tier undoubtedly knows, is a flat (but not necessarily
level) surface, infinite in extent unless otherwise specified. Space is three-dimensional. A
54 CHAPTER 3. TRIGONOMETRY
u
ŷ ′ ŷ
φ x̂′
φ
x
x̂
but are rotated from the latter by an angle φ as depicted in Fig. 3.5.12 In
terms of the trigonometric functions of § 3.1, evidently
not the unit basis vectors but rather the vector u instead?” the answer is
that it amounts to the same thing, except that the sense of the rotation is
reversed:
u′ = x̂(x cos φ − y sin φ) + ŷ(x sin φ + y cos φ). (3.8)
Whether it is the basis or the vector which rotates thus depends on your
point of view.13
Much later in the book, § 15.1 will extend rotation in two dimensions to
reorientation in three dimensions.
With the change of variable β ← −β and the observations from Table 3.1
that sin(−φ) = − sin φ and cos(−φ) = + cos(φ), eqns. (3.9) become
Equations (3.9) and (3.10) are the basic formulas for trigonometric functions
of sums and differences of angles.
cos(α − β) − cos(α + β)
sin α sin β = ,
2
sin(α − β) + sin(α + β)
sin α cos β = , (3.11)
2
cos(α − β) + cos(α + β)
cos α cos β = .
2
Solving (3.13) for sin2 α and cos2 α yields the half-angle formulas
1 − cos 2α
sin2 α = ,
2 (3.14)
1 + cos 2α
cos2 α = .
2
ANGLE φ
[radians] [hours] sin φ tan φ cos φ
0 0 0 0 1
√ √ √
2π 3−1 3−1 3+1
1 √ √ √
0x18 2 2 3+1 2 2
√
2π 1 1 3
2 √
0xC 2 3 2
2π 1 1
3 √ 1 √
8 2 2
√
2π 3 √ 1
4 3
6 2 2
√ √ √
(5)(2π) 3+1 3+1 3−1
5 √ √ √
0x18 2 2 3−1 2 2
2π
6 1 ∞ 0
4
60 CHAPTER 3. TRIGONOMETRY
√
2
1 1 1
√
3
2
1 1/2 1/2
equilateral triangle17 of Fig. 3.7. Each of the square’s four angles naturally
measures six hours; and since a triangle’s angles always total twelve hours
(§ 2.9.3), by symmetry each of the angles of the equilateral triangle in the
figure measures four. Also by symmetry, the perpendicular splits the trian-
gle’s top angle into equal halves of two hours each and its bottom leg into
equal segments of length 1/2 each; and the diagonal splits the square’s cor-
ner into equal halves of three hours each. The Pythagorean theorem (§ 2.10)
then supplies the various other lengths in the figure, after which we observe
b α c
y
h
γ β
x
a
• the tangent is the opposite leg’s length divided by the adjacent leg’s,
and
With this observation and the lengths in the figure, one can calculate the
sine, tangent and cosine of angles of two, three and four hours.
The values for one and five hours are found by applying (3.9) and (3.10)
against the values for two and three hours just calculated. The values for
zero and six hours are, of course, seen by inspection.18
But there is nothing special about β and γ; what is true for them must be
true for α, too.19 Hence,
a = x̂a,
b = x̂b cos γ + ŷb sin γ,
then
c2 = |b − a|2
= (b cos γ − a)2 + (b sin γ)2
= a2 + (b2 )(cos2 γ + sin2 γ) − 2ab cos γ.
It doesn’t work that way. Nevertheless, variable unit basis vectors are de-
fined:
ρ̂ ≡ +x̂ cos φ + ŷ sin φ,
φ̂ ≡ −x̂ sin φ + ŷ cos φ,
(3.17)
r̂ ≡ +ẑ cos θ + ρ̂ sin θ,
θ̂ ≡ −ẑ sin θ + ρ̂ cos θ;
21
Orthonormal in this context means “of unit length and at right angles to the other
vectors in the set.” [67, “Orthonormality,” 14:19, 7 May 2006]
22
Notice that the φ is conventionally written second in cylindrical (ρ; φ, z) but third in
spherical (r; θ; φ) coordinates. This odd-seeming convention is to maintain proper right-
handed coordinate rotation. (The explanation will seem clearer once Chs. 15 and 16 are
read.)
3.9. CYLINDRICAL AND SPHERICAL COORDINATES 65
ẑ
r
θ z
ŷ
φ
x̂ ρ
Table 3.4: Relations among the rectangular, cylindrical and spherical coor-
dinates.
ρ2 = x2 + y 2
r 2 = ρ2 + z 2 = x2 + y 2 + z 2
ρ
tan θ =
z
y
tan φ =
x
z = r cos θ
ρ = r sin θ
x = ρ cos φ = r sin θ cos φ
y = ρ sin φ = r sin θ sin φ
66 CHAPTER 3. TRIGONOMETRY
x̂x + ŷy
ρ̂ = ,
ρ
−x̂y + ŷx
φ̂ = ,
ρ (3.18)
ẑz + ρ̂ρ x̂x + ŷy + ẑz
r̂ = = ,
r r
−ẑρ + ρ̂z
θ̂ = .
r
Such variable unit basis vectors point locally in the directions in which their
respective coordinates advance.
Combining pairs of (3.17)’s equations appropriately, we have also that
(ρx )2 = y 2 + z 2 , (ρy )2 = z 2 + x2 ,
ρx ρy
tan θ x = , tan θ y = , (3.20)
x y
z x
tan φx = , tan φy = ,
y z
instead if needed.23
23
Symbols like ρx are logical but, as far as this writer is aware, not standard. The writer
is not aware of any conventionally established symbols for quantities like these, but § 15.6
at least will use the ρx -style symbology.
3.10. THE COMPLEX TRIANGLE INEQUALITIES 67
These are just the triangle inequalities of § 2.9.2 in vector notation.24 But
if the triangle inequalities hold for real vectors in a plane, then why not
equally for complex scalars? Consider the geometric interpretation of the
Argand plane of Fig. 2.5 on page 42. Evidently,
for any two complex numbers z1 and z2 . Extending the sum inequality, we
have that
X X
zk ≤ |zk | . (3.22)
k k
Naturally, (3.21) and (3.22) hold equally well for real numbers as for com-
plex; one may find the latter formula useful for sums of real numbers, for
example, when some of the numbers summed are positive and others nega-
tive.25 P
P An important consequence of (3.22) is that if |zk | converges, then
zk also converges. Such a consequence is importantP because mathematical
derivations sometimes need the convergence P of z k established, which can
be hard to do P directly. Convergence of |zk |, which per (3.22) implies
convergence of zk , is often easier to establish.
See also (9.15). Equation (3.22) will find use among other places in
§ 8.10.3.
where
cis φ ≡ cos φ + i sin φ. (3.24)
If z = x + iy, then evidently
x = ρ cos φ,
(3.25)
y = ρ sin φ.
Per (2.53),
z1 z2 = (x1 x2 − y1 y2 ) + i(y1 x2 + x1 y2 ).
Applying (3.25) to the equation yields
z1 z2
= (cos φ1 cos φ2 − sin φ1 sin φ2 ) + i(sin φ1 cos φ2 + cos φ1 sin φ2 ).
ρ1 ρ2
But according to (3.10), this is just
z1 z2
= cos(φ1 + φ2 ) + i sin(φ1 + φ2 ),
ρ1 ρ2
or in other words
z1 z2 = ρ1 ρ2 cis(φ1 + φ2 ). (3.26)
Equation (3.26) is an important result. It says that if you want to multiply
complex numbers, it suffices
• to multiply their magnitudes and
• to add their phases.
It follows by parallel reasoning (or by extension) that
z1 ρ1
= cis(φ1 − φ2 ) (3.27)
z2 ρ2
and by extension that
z a = ρa cis aφ. (3.28)
Equations (3.26), (3.27) and (3.28) are known as de Moivre’s theorem.26 ,27
We have not shown yet, but shall in § 5.4, that
cis φ = exp iφ = eiφ ,
where exp(·) is the natural exponential function and e is the natural loga-
rithmic base, both defined in Ch. 5. De Moivre’s theorem is most useful in
this light.
26
Also called de Moivre’s formula. Some authors apply the name of de Moivre directly
only to (3.28), or to some variation thereof; but since the three equations express essentially
the same idea, if you refer to any of them as de Moivre’s theorem then you are unlikely
to be misunderstood.
27
[57][67]
Chapter 4
The derivative
69
70 CHAPTER 4. THE DERIVATIVE
4.1.2 Limits
The notation limz→zo indicates that z draws as near to zo as it possibly
can. When written limz→zo+ , the implication is that z draws toward zo from
the positive side such that z > zo . Similarly, when written limz→zo− , the
implication is that z draws toward zo from the negative side.
The reason for the notation is to provide a way to handle expressions
like
3z
2z
as z vanishes:
3z 3
lim = .
z→0 2z 2
The symbol “limQ ” is short for “in the limit as Q.”
Notice that lim is not a function like log or sin. It is just a reminder
that a quantity approaches some value, used when saying that the quantity
2
Among scientists and engineers who study wave phenomena, there is an old rule
of thumb that sinusoidal waveforms be discretized not less finely than ten points per
wavelength. In keeping with this book’s adecimal theme (Appendix A) and the concept of
the hour of arc (§ 3.6), we should probably render the rule as twelve points per wavelength
here. In any case, even very roughly speaking, a quantity greater then 1/0xC of the
principal to which it compares probably cannot rightly be regarded as infinitesimal. On the
other hand, a quantity less than 1/0x10000 of the principal is indeed infinitesimal for most
practical purposes (but not all: for example, positions of spacecraft and concentrations of
chemical impurities must sometimes be accounted more precisely). For quantities between
1/0xC and 1/0x10000, it depends on the accuracy one seeks.
72 CHAPTER 4. THE DERIVATIVE
lim (z + 2) = 4
z→2−
4.2 Combinatorics
In its general form, the problem of selecting k specific items out of a set of n
available items belongs to probability theory ([chapter not yet written]). In
its basic form, however, the same problem also applies to the handling of
polynomials or power series. This section treats the problem in its basic
form.3
„ «
0
„ 0
«„ «
1 1
„ 0
«„ 1
«„ «
2 2 2
„ 0
«„ 1
«„ 2
«„ «
3 3 3 3
„ 0
«„ 1
«„ 2
«„ 3
«„ «
4 4 4 4 4
0 1 2 3 4
..
.
4
The author is given to understand that, by an heroic derivational effort, (4.11) can be
extended to nonintegral n. However, since applied mathematics does not usually concern
itself with hard theorems of little known practical use, the extension is not covered in this
book. The Taylor series for (1 + z)a−1 for complex z and complex a is indeed covered
notwithstanding, in Table 8.1, and this amounts to much the same thing.
4.3. THE BINOMIAL THEOREM 75
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 A A 5 1
1 6 F 14 F 6 1
1 7 15 23 23 15 7 1
..
.
(actually this holds for any ǫ, small or large; but the typical case of interest
has |ǫ| ≪ 1). In either form, the binomial theorem is a direct consequence
of the combinatorics of § 4.2. Since
each (a+b) factor corresponds to one of the “wooden blocks,” where a means
rejecting the block and b, accepting it.
that5
1 + mǫo ≈ (1 + ǫo )m
5
The symbol ≈ means “approximately equals.”
76 CHAPTER 4. THE DERIVATIVE
(1 + ǫ)x ≈ 1 + xǫ (4.13)
Section 5.4 will investigate the extremely interesting effects which arise
when ℜ(ǫ) = 0 and the power z in (4.14) grows large, but for the moment
we shall use the equation in a more ordinary manner to develop the concept
and basic application of the derivative, as follows.
f (t + ǫ/2) − f (t − ǫ/2)
f ′ (t) ≡ lim . (4.15)
ǫ→0+ ǫ
Alternately, one can define the same derivative in the unbalanced form
f (t + ǫ) − f (t)
f ′ (t) = lim ,
ǫ→0+ ǫ
but this book generally prefers the more elegant balanced form (4.15), which
we will now use in developing the derivative’s several properties through the
rest of the chapter.7
which simplifies to
∞
X
f ′ (t) = ck ktk−1 . (4.17)
k=−∞
dt = ǫ,
df = f (t + dt/2) − f (t − dt/2),
10
If you do not fully understand this sentence, reread it carefully with reference to (4.15)
and (4.18) until you do; it’s important.
11
This is difficult, yet the author can think of no clearer, more concise way to state it.
The quantities dt and df represent coordinated infinitesimal changes in t and f respectively,
so there is usually no trouble with treating dt and df as though they were the same kind
of thing. However, at the fundamental level they really aren’t.
If t is an independent variable, then dt is just an infinitesimal of some kind, whose
specific size could be a function of t but more likely is just a constant. If a constant,
then dt does not fundamentally have anything to do with t as such. In fact, if s and t
are both independent variables, then we can (and in complex analysis sometimes do) say
that ds = dt = ǫ, after which nothing prevents us from using the symbols ds and dt
interchangeably. Maybe it would be clearer in some cases to write ǫ instead of dt, but the
latter is how it is conventionally written.
By contrast, if f is a dependent variable, then df or d(f ) is the amount by which f
changes as t changes by dt. The df is infinitesimal but not constant; it is a function of t.
Maybe it would be clearer in some cases to write dt f instead of df , but for most cases the
former notation is unnecessarily cluttered; the latter is how it is conventionally written.
Now, most of the time, P what we are interested
R in is not dt or df as such, but rather the
ratio df /dt or the sum k f (k dt) dt = f (t) dt. For this reason, we do not usually worry
about which of df and dt is the independent infinitesimal, nor do we usually worry about
the precise value of dt. This leads one to forget that dt does indeed have a precise value.
What confuses is when one changes perspective in mid-analysis, now regarding f as the
independent variable. Changing perspective is allowed and perfectly proper, but one must
take care: the dt and df after the change are not the same as the dt and df before the
change. However, the ratio df /dt remains the same in any case.
Sometimes when writing a differential equation like the potential-kinetic energy equation
ma dx = mv dv, we do not necessarily have either v or x in mind as the independent
variable. This is fine. The important point is that dv and dx be coordinated so that the
ratio dv/dx has a definite value no matter which of the two be regarded as independent,
or whether the independent be some third variable (like t) not in the equation.
One can avoid the confusion simply by keeping the dv/dx or df /dt always in ratio, never
treating the infinitesimals individually. Many applied mathematicians do precisely that.
That is okay as far as it goes, but it really denies the entire point of the Leibnitz notation.
One might as well just stay with the Newton notation in that case. Instead, this writer
recommends that you learn the Leibnitz notation properly, developing the ability to treat
the infinitesimals individually.
Because the book is a book of applied mathematics, this footnote does not attempt to
say everything there is to say about infinitesimals. For instance, it has not yet pointed
out (but does so now) that even if s and t are equally independent variables, one can have
dt = ǫ(t), ds = δ(s, t), such that dt has prior independence to ds. The point is not to
fathom all the possible implications from the start; you can do that as the need arises.
The point is to develop a clear picture in your mind of what a Leibnitz infinitesimal really
is. Once you have the picture, you can go from there.
80 CHAPTER 4. THE DERIVATIVE
At first glance, the distinction between dt and d(t) seems a distinction with-
out a difference; and for most practical cases of interest, so indeed it is.
However, when switching perspective in mid-analysis as to which variables
are dependent and which are independent, or when changing multiple inde-
pendent complex variables simultaneously, the math can get a little tricky.
In such cases, it may be wise to use the symbol dt to mean d(t) only, in-
troducing some unambiguous symbol like ǫ to represent the independent
infinitesimal. In any case you should appreciate the conceptual difference
between dt = ǫ and d(t).
Where two or more independent variables are at work in the same equa-
tion, it is conventional to use the symbol ∂ instead of d, as a reminder that
the reader needs to pay attention to which ∂ tracks which independent vari-
able.12 A derivative ∂f /∂t or ∂f /∂s in this case is sometimes called by the
slightly misleading name of partial derivative. (If needed or desired, one
can write ∂t (·) when tracking t, ∂s (·) when tracking s, etc. Use discretion,
though. Such notation appears only rarely in the literature, so your audi-
ence might not understand it when you write it.) Conventional shorthand
for d(df ) is d2 f ; for (dt)2 , dt2 ; so
d(df /dt) d2 f
= 2
dt dt
is a derivative of a derivative, or second derivative. By extension, the nota-
tion
dk f
dtk
represents the kth derivative.
df f (z + ǫ/2) − f (z − ǫ/2)
= lim , (4.19)
dz ǫ→0 ǫ
one should like it to evaluate the same in the limit regardless of the complex
phase of ǫ. That is, if δ is a positive real infinitesimal, then it should be
equally valid to let ǫ = δ, ǫ = −δ, ǫ = iδ, ǫ = −iδ, ǫ = (4 − i3)δ or
12
The writer confesses that he remains unsure why this minor distinction merits the
separate symbol ∂, but he accepts the notation as conventional nevertheless.
4.4. THE DERIVATIVE 81
any other infinitesimal value, so long as 0 < |ǫ| ≪ 1. One should like the
derivative (4.19) to come out the same regardless of the Argand direction
from which ǫ approaches 0 (see Fig. 2.5). In fact for the sake of robustness,
one normally demands that derivatives do come out the same regardless
of the Argand direction; and (4.19) rather than (4.15) is the definition we
normally use for the derivative for this reason. Where the limit (4.19) is
sensitive to the Argand direction or complex phase of ǫ, there we normally
say that the derivative does not exist.
Where the derivative (4.19) does exist—where the derivative is finite
and insensitive to Argand direction—there we say that the function f (z) is
differentiable.
Excepting the nonanalytic parts of complex numbers (|·|, arg[·], [·]∗ , ℜ[·]
and ℑ[·]; see § 2.12.3), plus the Heaviside unit step u(t) and the Dirac
delta δ(t) (§ 7.7), most functions encountered in applications do meet the
criterion (4.19) except at isolated nonanalytic points (like z = 0 in h[z] = 1/z
√
or g[z] = z). Meeting the criterion, such functions are fully differentiable
except at their poles (where the derivative goes infinite in any case) and
other nonanalytic points. Particularly, the key formula (4.14), written here
as
(1 + ǫ)w ≈ 1 + wǫ,
works without modification when ǫ is complex; so the derivative (4.17) of
the general power series,
∞ ∞
d X k
X
ck z = ck kz k−1 (4.20)
dz
k=−∞ k=−∞
which simplifies to
d(z a )
= az a−1 (4.21)
dz
for any complex z and a.
How exactly to evaluate z a or z a−1 when a is complex is another matter,
treated in § 5.4 and its (5.12); but in any case you can use (4.21) for real a
right now.
Since the product of two or more dfj is negligible compared to the first-order
infinitesimals to which they are added here, this simplifies to
" # " #
Y Y X dfk Y X −dfk
d fj (z) = fj (z) − fj (z) ,
2fk (z) 2fk (z)
j j k j k
in the form
f (w) = w1/2 ,
w(z) = 3z 2 − 1.
Then
df 1 1
= = √ ,
dw 2w1/2 2 3z 2 − 1
dw
= 6z,
dz
so by (4.23), „ «„ «
df df dw 6z 3z
= = √ = √ .
dz dw dz 2 3z 2 − 1 3z 2 − 1
14
It bears emphasizing to readers who may inadvertently have picked up unhelpful ideas
about the Leibnitz notation in the past: the dw factor in the denominator cancels the dw
factor in the numerator; a thing divided by itself is 1. That’s it. There is nothing more
to the proof of the derivative chain rule than that.
84 CHAPTER 4. THE DERIVATIVE
f (xo ) b
f (x)
x
xo
Swapping the equation’s left and right sides then dividing through by z a
yields
df f d(z a f )
+a = a , (4.28)
dz z z dz
a pattern worth committing to memory, emerging among other places in
§ 16.9.
At the extremum, the slope is zero. The curve momentarily runs level there.
One solves (4.29) to find the extremum.
Whether the extremum be a minimum or a maximum depends on wheth-
er the curve turn from a downward slope to an upward, or from an upward
17
The notation P |Q means “P when Q,” “P , given Q,” or “P evaluated at Q.” Some-
times it is alternately written P |Q or [P ]Q .
86 CHAPTER 4. THE DERIVATIVE
f (xo ) b
f (x)
x
xo
d2 f
> 0 implies a local minimum at xo ;
dx2 x=xo
d2 f
< 0 implies a local maximum at xo .
dx2 x=xo
f (z) f (z) − 0
lim = lim
z→zo g(z) z→z o g(z) − 0
In the case where z = zo is a pole, new functions F (z) ≡ 1/f (z) and
G(z) ≡ 1/g(z) of which z = zo is a root are defined, with which
where we have used the fact from (4.21) that d(1/u) = −du/u2 for any u.
Canceling the minus signs and multiplying by g2 /f 2 , we have that
g(z) dg
lim = lim .
z→zo f (z) z→zo df
Inverting,
f (z) df df /dz
lim = lim = lim .
z→zo g(z) z→zo dg z→zo dg/dz
And if zo itself is infinite? Then, whether it represents a root or a pole, we
define the new variable Z = 1/z and the new functions Φ(Z) = f (1/Z) =
f (z) and Γ(Z) = g(1/Z) = g(z), with which we apply l’Hôpital’s rule for
Z → 0 to obtain
f (z) Φ(Z) dΦ/dZ df /dZ
lim = lim = lim = lim
z→∞ g(z) Z→0 Γ(Z) Z→0 dΓ/dZ Z→0 dg/dZ
(df /dz)(dz/dZ) (df /dz)(−z 2 ) df /dz
= lim = lim 2
= lim .
z→∞, (dg/dz)(dz/dZ) z→∞ (dg/dz)(−z ) z→∞ dg/dz
Z→0
19
Partly with reference to [67, “L’Hopital’s rule,” 03:40, 5 April 2006].
88 CHAPTER 4. THE DERIVATIVE
ln x 1/x 2
lim √ = lim √ = lim √ = 0.
x→∞ x x→∞ 1/2 x x→∞ x
The example incidentally shows that natural logarithms grow slower than
square roots, an instance of a more general principle we shall meet in § 5.3.
Section 5.3 will put l’Hôpital’s rule to work.
One begins the iteration by guessing the root and calling the guess z0 .
Then z1 , z2 , z3 , etc., calculated in turn by the iteration (4.31), give suc-
cessively better estimates of the true root z∞ .
20
Consider for example the ratio limx→0 (x3 + x)2 /x2 , which is 0/0. The easier way to
resolve this particular ratio would naturally be to cancel a factor of x2 from it; but just to
make the point let us apply l’Hôpital’s rule instead, reducing the ratio to limx→0 2(x3 +
x)(3x2 + 1)/2x, which is still 0/0. Applying l’Hôpital’s rule again to the result yields
limx→0 2[(3x2 +1)2 +(x3 +x)(6x)]/2 = 2/2 = 1. Where expressions involving trigonometric
or special functions (Chs. 3, 5 and [not yet written]) appear in ratio, a recursive application
of l’Hôpital’s rule can be just the thing one needs.
Observe that one must stop applying l’Hôpital’s rule once the ratio is no longer 0/0 or
∞/∞. In the example, applying the rule a third time would have ruined the result.
21
This paragraph is optional reading for the moment. You can read Ch. 5 first, then
come back here and read the paragraph if you prefer.
22
[55, § 10-2]
4.8. THE NEWTON-RAPHSON ITERATION 89
x
xk xk+1
f (x)
It then approximates the root xk+1 as the point at which f˜k (xk+1 ) = 0:
˜ d
fk (xk+1 ) = 0 = f (xk ) + f (x) (xk+1 − xk ).
dx x=xk
f (x) = xn − p
25
Equations (4.32) and (4.33) work not only for real p but also usually for complex.
Given x0 = 1, however, they converge reliably and orderly only for real, nonnegative p.
(To see why, sketch f [x] in the fashion of Fig. 4.5.)
If reliable, orderly convergence is needed for complex p = u + iv = σ cis ψ, σ ≥ 0, you
can decompose p1/n per de Moivre’s theorem (3.28) as p1/n = σ 1/n cis(ψ/n), in which
cis(ψ/n) = cos(ψ/n) + i sin(ψ/n) is calculated by the Taylor series of Table 8.1. Then σ
is real and nonnegative, upon which (4.33) reliably, orderly computes σ 1/n .
The Newton-Raphson iteration however excels as a practical root-finding technique, so
it often pays to be a little less theoretically rigid in applying it. If so, then don’t bother to
decompose; seek p1/n directly, using complex zk in place of the real xk . In the uncommon
event that the direct iteration does not seem to converge, start over again with some
randomly chosen complex z0 . This saves effort and usually works.
26
[55, § 4-9][45, § 6.1.1][66]
92 CHAPTER 4. THE DERIVATIVE
Chapter 5
93
94 CHAPTER 5. THE COMPLEX EXPONENTIAL
which says that the slope and height of the exponential function are both
unity at x = 0, implying that the straight line which best approximates
the exponential function in that neighborhood—the tangent line, which just
grazes the curve—is
y(x) = 1 + x.
With the tangent line y(x) found, the next step toward putting a concrete
bound on e is to show that y(x) ≤ exp x for all real x, that the curve runs
nowhere below the line. To show this, we observe per (5.1) that the essential
action of the exponential function is to multiply repeatedly by 1 + ǫ as x
increases, to divide repeatedly by 1 + ǫ as x decreases. Since 1 + ǫ > 1, this
1
Excepting (5.4), the author would prefer to omit much of the rest of this section, but
even at the applied level cannot think of a logically permissible way to do it. It seems
nonobvious that the limit limǫ→0 (1 + ǫ)1/ǫ actually does exist. The rest of this section
shows why it does.
5.1. THE REAL EXPONENTIAL 95
exp x1 ≤ exp x2 if x1 ≤ x2 .
However, a positive number remains positive no matter how many times one
multiplies or divides it by 1 + ǫ, so the same action also means that
0 ≤ exp x
for all real x. In light of (5.4), the last two equations imply further that
d d
exp x1 ≤ exp x2 if x1 ≤ x2 ,
dx dx
d
0 ≤ exp x.
dx
But we have purposely defined the tangent line y(x) = 1 + x such that
exp 0 = y(0) = 1,
d d
exp 0 = y(0) = 1;
dx dx
that is, such that the line just grazes the curve of exp x at x = 0. Rightward,
at x > 0, evidently the curve’s slope only increases, bending upward away
from the line. Leftward, at x < 0, evidently the curve’s slope only decreases,
again bending upward away from the line. Either way, the curve never
crosses below the line for real x. In symbols,
y(x) ≤ exp x.
1
≤ e−1/2 ,
2
2 ≤ e1 ,
96 CHAPTER 5. THE COMPLEX EXPONENTIAL
exp x
x
−1
or in other words,
2 ≤ e ≤ 4, (5.5)
which in consideration of (5.2) puts the desired bound on the exponential
function. The limit does exist.
By the Taylor series of Table 8.1, the value
e ≈ 0x2.B7E1
can readily be calculated, but the derivation of that series does not come
until Ch. 8.
and call it the natural logarithm. Just as for any other base b, so also for
base b = e; thus the natural logarithm inverts the natural exponential and
vice versa:
ln exp x = ln ex = x,
(5.6)
exp ln x = eln x = x.
5.2. THE NATURAL LOGARITHM 97
ln x
1
x
−1
That is,
exp(+x)
lim = ∞,
x→∞ xa (5.10)
exp(−x)
lim = 0,
x→∞ xa
5.3. FAST AND SLOW FUNCTIONS 99
But per § 5.1, the essential operation of the exponential function is to multi-
ply repeatedly by some factor, where the factor is not quite exactly unity—in
this case, by 1 + iǫ. So let us multiply a complex number z = x + iy by
1 + iǫ, obtaining
iℑ(z)
i2
∆z
i z
ρ
φ
ℜ(z)
−2 −1 1 2
−i
−i2
∆ρ = 0,
∆φ = ǫ.
That is, exp iθ is the complex number which lies on the Argand unit circle at
phase angle θ. Had we known that θ was an Argand phase angle, naturally
we should have represented it by the symbol φ from the start. Changing
φ ← θ now, we have for real φ that
|exp iφ| = 1,
arg [exp iφ] = φ,
z = x + iy = ρ exp iφ.
w = u + iv = σ exp iψ,
wz = exp[ln wz ]
= exp[z ln w]
= exp[(x + iy)(iψ + ln σ)]
= exp[(x ln σ − ψy) + i(y ln σ + ψx)].
where
x = ρ cos φ,
y = ρ sin φ,
p
σ = u2 + v 2 ,
v
tan ψ = .
u
Equation (5.12) serves to raise any complex number to a complex power.
Curious consequences of Euler’s formula (5.11) include that
e±i2π/4 = ±i,
e±i2π/2 = −1, (5.13)
ein2π = 1.
For the natural logarithm of a complex number in light of Euler’s formula,
we have that
ln w = ln σeiψ = ln σ + iψ. (5.14)
i2 = −1 = (−i)2
1
= −i
i
eiφ = cos φ + i sin φ
eiz = cos z + i sin z
z1 z2 = ρ1 ρ2 ei(φ1 +φ2 ) = (x1 x2 − y1 y2 ) + i(y1 x2 + x1 y2 )
z1 ρ1 i(φ1 −φ2 ) (x1 x2 + y1 y2 ) + i(y1 x2 − x1 y2 )
= e =
z2 ρ2 x22 + y22
z a = ρa eiaφ
wz = ex ln σ−ψy ei(y ln σ+ψx)
ln w = ln σ + iψ
z ≡ x + iy = ρeiφ
w ≡ u + iv = σeiψ
exp z ≡ ez
cis z ≡ cos z + i sin z = eiz
d
exp z = exp z
dz
d 1
ln w =
dw w
df /dz d
= ln f (z)
f (z) dz
ln w
logb w =
ln b
5.8. DERIVATIVES OF COMPLEX EXPONENTIALS 107
ℑ(z)
dz
dt
ρ z
ωt + φo
ℜ(z)
Refer to Fig. 5.4. Suppose that the point z in the figure is not fixed but
travels steadily about the circle such that
How fast then is the rate dz/dt, and in what Argand direction?
dz d d
= (ρ) cos(ωt + φo ) + i sin(ωt + φo ) . (5.25)
dt dt dt
Evidently,
Matching the real and imaginary parts of (5.25) against those of (5.26), we
108 CHAPTER 5. THE COMPLEX EXPONENTIAL
have that
d
cos(ωt + φo ) = −ω sin(ωt + φo ),
dt (5.27)
d
sin(ωt + φo ) = +ω cos(ωt + φo ).
dt
d
cos t = − sin t,
dt (5.28)
d
sin t = + cos t.
dt
d
exp z = exp z,
dz
d 1
ln w = .
dw w
8
[55, back endpaper]
5.8. DERIVATIVES OF COMPLEX EXPONENTIALS 109
d d 1 1
exp z = + exp z = −
dz dz exp z exp z
d d 1 1
sin z = + cos z = −
dz dz sin z tan z sin z
d d 1 tan z
cos z = − sin z = +
dz dz cos z cos z
d 1
tan z = + 1 + tan2 z
= + 2
dz cos z
d 1 1 1
= − 1+ 2 = − 2
dz tan z tan z sin z
d d 1 1
sinh z = + cosh z = −
dz dz sinh z tanh z sinh z
d d 1 tanh z
cosh z = + sinh z = −
dz dz cosh z cosh z
d 1
tanh z = 1 − tanh2 z = +
dz cosh2 z
d 1 1 1
= 1− = −
dz tanh z tanh2 z sinh2 z
110 CHAPTER 5. THE COMPLEX EXPONENTIAL
that
arcsin w = z,
w = sin z,
dw
= cos z,
dz
dw p
= ± 1 − sin2 z,
dz
dw p
= ± 1 − w2 ,
dz
dz ±1
= √ ,
dw 1 − w2
d ±1
arcsin w = √ . (5.29)
dw 1 − w2
Similarly,
arctan w = z,
w = tan z,
dw
= 1 + tan2 z,
dz
dw
= 1 + w2 ,
dz
dz 1
= ,
dw 1 + w2
d 1
arctan w = . (5.30)
dw 1 + w2
Derivatives of the other inverse trigonometrics are found in the same way.
Table 5.3 summarizes.
d 1
ln w =
dw w
d ±1
arcsin w = √
dw 1 − w2
d ∓1
arccos w = √
dw 1 − w2
d 1
arctan w =
dw 1 + w2
d ±1
arcsinh w = √
dw w2 + 1
d ±1
arccosh w = √
dw w2 − 1
d 1
arctanh w =
dw 1 − w2
Raphson iteration (§ 4.8) you have found a root of the polynomial x3 + 2x2 + 3x + 4 at
x ≈ −0x0.2D + i0x1.8C. Where is there another root? Answer: at the complex conjugate,
x ≈ −0x0.2D − i0x1.8C. One need not actually run the Newton-Raphson again to find
the conjugate root.
11
[61, Ch. 12]
Chapter 6
This chapter gathers a few significant topics, each of whose treatment seems
too brief for a chapter of its own.
113
114 CHAPTER 6. PRIMES, ROOTS AND AVERAGES
2
[58]
3
[52, Appendix 1][67, “Reductio ad absurdum,” 02:36, 28 April 2006]
4
Unfortunately the author knows no more elegant proof than this, yet cannot even cite
this one properly. The author encountered the proof in some book over a decade ago. The
identity of that book is now long forgotten.
6.1. PRIME NUMBERS 115
B = C − p1 q 1 ,
which might be prime or composite (or unity), but which either way enjoys
a unique prime factorization because B < C, with C the least positive
integer factorable two ways. Observing that some integer s which divides C
necessarily also divides C ± ns, we note that each of p1 and q1 necessarily
divides B. This means that B’s unique factorization includes both p1 and q1 ,
which further means that the product p1 q1 divides B. But if p1 q1 divides B,
then it divides B + p1 q1 = C, also.
Let E represent the positive integer which results from dividing C
by p1 q1 :
C
E≡ .
p1 q 1
Then,
C
Eq1 = = Ap ,
p1
C
Ep1 = = Aq .
q1
are typical on the shadowed frontier where applied shades into pure math-
ematics: with sufficient experience and with a firm grasp of the model at
hand, if you think that it’s true, then it probably is. Judging when to delve
into the mathematics anyway, seeking a more rigorous demonstration of a
proposition one feels pretty sure is correct, is a matter of applied mathe-
matical style. It depends on how sure one feels, and more importantly on
whether the unsureness felt is true uncertainty or is just an unaccountable
desire for more precise mathematical definition (if the latter, then unlike
the author you may have the right temperament to become a professional
mathematician). The author does judge the present subsection’s proof to
be worth the applied effort; but nevertheless, when one lets logical minutiae
distract him to too great a degree, he admittedly begins to drift out of the
applied mathematical realm that is the subject of this book.
p2
= n,
q2
which form is evidently also fully reduced. But if q > 1, then the fully
reduced n = p2 /q 2 is not an integer as we had assumed that it was. The
contradiction proves false the assumption which gave rise to it. Hence there
√
exists no rational, nonintegral n, as was to be demonstrated. The proof is
readily extended to show that any x = nj/k is irrational if nonintegral, the
5
Section 2.3 explains the ∈ Z notation.
6
A proof somewhat like the one presented here is found in [52, Appendix 1].
118 CHAPTER 6. PRIMES, ROOTS AND AVERAGES
A(z) = z − α,
N
X
B(z) = bk z k , N > 0, bN 6= 0,
k=0
B(α) = 0.
B(α) = ρ.
After all, B(z) is everywhere differentiable, so the string can only sweep
as ρ decreases; it can never skip. The Argand origin lies inside the loops at
the start but outside at the end. If so, then the values of ρ and φ precisely
where the string has swept through the origin by definition constitute a
root B(ρeiφ ) = 0. Thus as we were required to show, B(z) does have at
least one root, which observation completes the applied demonstration of
the fundamental theorem of algebra.
The fact that the roots exist is one thing. Actually finding the roots nu-
merically is another matter. For a quadratic (second order) polynomial, (2.2)
gives the roots. For cubic (third order) and quartic (fourth order) polyno-
mials, formulas for the roots are known (see Ch. 10) though seemingly not
so for quintic (fifth order) and higher-order polynomials;9 but the Newton-
Raphson iteration (§ 4.8) can be used to locate a root numerically in any
case. The Newton-Raphson is used to extract one root (any root) at each
step as described above, reducing the polynomial step by step until all the
roots are found.
The reverse problem, findingQ the polynomial given the roots, is much
easier: one just multiplies out j (z − αj ), as in (6.1).
Now suppose that we are told that Adam can lay a brick every 30 seconds;
Brian, every 40 seconds; Charles, every 60 seconds. How much time do the
9
In a celebrated theorem of pure mathematics [65, “Abel’s impossibility theorem”], it is
said to be shown that no such formula even exists, given that the formula be constructed
according to certain rules. Undoubtedly the theorem is interesting to the professional
mathematician, but to the applied mathematician it probably suffices to observe merely
that no such formula is known.
10
The figures in the example are in decimal notation.
6.3. ADDITION AND AVERAGES 121
The two problems are precisely equivalent. Neither is stated in simpler terms
than the other. The notation used to solve the second is less elegant, but
fortunately there exists a better notation:
where
1 1 1 1
= + + .
30 k 40 k 60 30 40 60
The operator k is called the parallel addition operator. It works according
to the law
1 1 1
= + , (6.2)
akb a b
where the familiar operator + is verbally distinguished from the k when
necessary by calling the + the serial addition or series addition operator.
With (6.2) and a bit of arithmetic, the several parallel-addition identities of
Table 6.1 are soon derived.
The writer knows of no conventional notation for parallel sums of series,
but suggests that the notation which appears in the table,
b
X
k f (k) ≡ f (a) k f (a + 1) k f (a + 2) k · · · k f (b),
k=a
a k x ≤ a. (6.4)
1 1 1 1 1 1
= + = k
akb a b a+b a b
ab ab
akb = a+b =
a+b akb
1 a 1 a
ak = a+ =
b 1 + ab b 1 k ab
akb = bka a+b = b+a
a k (b k c) = (a k b) k c a + (b + c) = (a + b) + c
a k (−a) = ∞ a + (−a) = 0
(a)(b k c) = ab k ac (a)(b + c) = ab + ac
1 X 1 1 X 1
P = P = k
k k ak k
ak k ak
k
ak
6.3. ADDITION AND AVERAGES 123
more natural than parallel addition (k) does. The psychological barrier is
hard to breach, yet for many purposes parallel addition is in fact no less
fundamental. Its rules are inherently neither more nor less complicated, as
Table 6.1 illustrates; yet outside the electrical engineering literature the par-
allel addition notation is seldom seen.12 Now that you have seen it, you can
use it. There is profit in learning to think both ways. (Exercise: counting
from zero serially goes 0, 1, 2, 3, 4, 5, . . .; how does the parallel analog go?)13
Convention brings no special symbol for parallel subtraction, inciden-
tally. One merely writes
a k (−b),
which means exactly what it appears to mean.
6.3.2 Averages
Let us return to the problem of the preceding section. Among the three
masons, what is their average productivity? The answer depends on how
you look at it. On the one hand,
The geometric mean does not have the problem either of the two averages
discussed above has. The inverse geometric mean
where the xk are the several samples and the wk are weights. For two
samples weighted equally, these are
a+b
µ = , (6.8)
√2
µΠ = ab, (6.9)
µk = 2(a k b). (6.10)
out there and this is not one of them.
The fact remains nevertheless that businesspeople sometimes use mathematics in pecu-
liar ways, making relatively easy problems harder and more mysterious than the problems
need to be. If you have ever encountered the little monstrosity of an approximation banks
(at least in the author’s country) actually use in place of (9.12) to accrue interest and
amortize loans, then you have met the difficulty.
Trying to convince businesspeople that their math is wrong, incidentally, is in the au-
thor’s experience usually a waste of time. Some businesspeople are mathematically rather
sharp—as you presumably are if you are in business and are reading these words—but
as for most: when real mathematical ability is needed, that’s what they hire engineers,
architects and the like for. The author is not sure, but somehow he doubts that many
boards of directors would be willing to bet the company on a financial formula containing
some mysterious-looking ex . Business demands other talents.
6.3. ADDITION AND AVERAGES 125
0 ≤ (a − b)2 ,
0 ≤ a2 − 2ab + b2 ,
4ab ≤ a2 + 2ab + b2 ,
√
2 ab ≤ a + b,
√
2 ab a+b
≤ 1 ≤ √ ,
a+b 2 ab
2ab √ a+b
≤ ab ≤ ,
a+b 2
√ a+b
2(a k b) ≤ ab ≤ .
2
That is,
µk ≤ µΠ ≤ µ. (6.11)
The arithmetic mean is greatest and the harmonic mean, least; with the
geometric mean falling between.
Does (6.11) hold when there are several nonnegative samples of various
nonnegative weights? To show that it does, consider the case of N = 2m
nonnegative samples of equal weight. Nothing prevents one from dividing
such a set of samples in half, considering each subset separately, for if (6.11)
holds for each subset individually then surely it holds for the whole set (this
is so because the average of the whole set is itself the average of the two sub-
set averages, where the word “average” signifies the arithmetic, geometric
or harmonic mean as appropriate). But each subset can further be divided
in half, then each subsubset can be divided in half again, and so on until
each smallest group has two members only—in which case we already know
that (6.11) obtains. Starting there and recursing back, we have that (6.11)
15
The steps are logical enough, but the motivation behind them remains inscrutable
until the reader realizes that the writer originally worked the steps out backward with his
pencil, from the last step to the first. Only then did he reverse the order and write the
steps formally here. The writer had no idea that he was supposed to start from 0 ≤ (a−b)2
until his pencil working backward showed him. “Begin with the end in mind,” the saying
goes. In this case the saying is right.
The same reading strategy often clarifies inscrutable math. When you can follow the
logic but cannot understand what could possibly have inspired the writer to conceive the
logic in the first place, try reading backward.
126 CHAPTER 6. PRIMES, ROOTS AND AVERAGES
obtains for the entire set. Now consider that a sample of any weight can
be approximated arbitrarily closely by several samples of weight 1/2m , pro-
vided that m is sufficiently large. By this reasoning, (6.11) holds for any
nonnegative weights of nonnegative samples, which was to be demonstrated.
Chapter 7
The integral
127
128 CHAPTER 7. THE INTEGRAL
f1 (τ ) f2 (τ )
0x10 0x10
∆τ = 1 ∆τ = 21
S1 S2
τ τ
0x10 0x10
What do these sums represent? One way to think of them is in terms of the
shaded areas of Fig. 7.1. In the figure, S1 is composed of several tall, thin
rectangles of width 1 and height k; S2 , of rectangles of width 1/2 and height
7.1. THE CONCEPT OF THE INTEGRAL 129
k/2.1 As n grows, the shaded region in the figure looks more and more like
a triangle of base length b = 0x10 and height h = 0x10. In fact it appears
that
bh
lim Sn = = 0x80,
n→∞ 2
or more tersely
S∞ = 0x80,
is the area the increasingly fine stairsteps approach.
Notice how we have evaluated S∞ , the sum of an infinite number of
infinitely narrow rectangles, without actually adding anything up. We have
taken a shortcut directly to the total.
In the equation
(0x10)n−1
1 X k
Sn = ,
n n
k=0
k
τ← ,
n
1
∆τ ← ,
n
to obtain the representation
(0x10)n−1
X
Sn = ∆τ τ;
k=0
or more properly,
(k|τ =0x10 )−1
X
Sn = τ ∆τ,
k=0
where the notation k|τ =0x10 indicates the value of k when τ = 0x10. Then
f (τ )
0x10
S∞
τ
0x10
P(k| )−1
The symbol limdτ →0+ k=0τ =0x10 is cumbersome, so we replace it with the
0x10
new symbol2 0
R
to obtain the form
Z 0x10
S∞ = τ dτ.
0
2 P
Like the
R Greek S, , denoting discrete summation, the seventeenth century-styled
Roman S, , stands for Latin “summa,” English “sum.” See [67, “Long s,” 14:54, 7 April
2006].
7.1. THE CONCEPT OF THE INTEGRAL 131
k|τ =a = an,
k|τ =b = bn,
(k, n) ∈ Z, n 6= 0,
In the limit,
(k|τ =b )−1 Z b
X
S∞ = lim f (τ ) dτ = f (τ ) dτ.
dτ →0+ a
k=(k|τ =a )
This is the integral of f (τ ) in the interval a < τ < b. It represents the area
under the curve of f (τ ) in that interval.
Figure 7.3: Integration by the trapezoid rule (7.1). Notice that the shaded
and dashed areas total the same.
f (τ )
b
b
b
b
τ
a dτ b
Here, the first and last integration samples are each balanced “on the edge,”
half within the integration domain and half without.
Equation (7.1) is known as the trapezoid rule. Figure 7.3 depicts it. The
name “trapezoid” comes of the shapes of the shaded integration elements in
the figure. Observe however that it makes no difference whether one regards
the shaded trapezoids or the dashed rectangles as the actual integration
elements; the total integration area is the same either way.3 The important
point to understand is that the integral is conceptually just a sum. It is a
sum of an infinite number of infinitesimal elements as dτ tends to vanish,
but a sum nevertheless; nothing more.
Nothing actually requires the integration element width dτ to remain
constant from element to element, incidentally. Constant widths are usually
easiest to handle but variable widths find use in some cases. The only
requirement is that dτ remain infinitesimal. (For further discussion of the
point, refer to the treatment of the Leibnitz notation in § 4.4.2.)
3
The trapezoid rule (7.1) is perhaps the most straightforward, general, robust way to
define the integral, but other schemes are possible, too. For example, one can give the
integration elements quadratically curved tops which more nearly track the actual curve.
That scheme is called Simpson’s rule. [A section on Simpson’s rule might be added to the
book at some later date.]
7.2. THE ANTIDERIVATIVE 133
τ
a b
will do nicely—then on a separate set of axes directly above the first, sketch the cor-
134 CHAPTER 7. THE INTEGRAL
d τa
τ a−1 = , a 6= 0
dτ a
1 d
= ln τ, ln 1 = 0
τ dτ
d
exp τ = exp τ, exp 0 = 1
dτ
d
cos τ = sin τ, sin 0 = 0
dτ
d
sin τ = (− cos τ ) , cos 0 = 1
dτ
is urged to pause now and ponder the formula thoroughly until he feels
reasonably confident that indeed he does grasp it and the important idea
it represents. One is unlikely to do much higher mathematics without this
formula.
As an example of the formula’s use, consider that because
(d/dτ )(τ 3 /6) = τ 2 /2, it follows that
x x
x
τ 2 dτ τ3 τ 3 x3 − 8
d
Z Z
= dτ = = .
2 2 2 dτ 6 6 2 6
Gathering elements from (4.21) and from Tables 5.2 and 5.3, Table 7.1
lists a handful of the simplest, most useful derivatives for antiderivative use.
Section 9.1 speaks further of the antiderivative.
responding slope function df /dτ . Mark two points a and b on the common horizontal
axis; then on the upper, df /dτ plot, shade the integration area under the curve. Now
consider (7.2) in light of your sketch.
There. Does the idea not dawn?
Another way to see the truth of the formula begins by canceling its (1/dτ ) dτ to obtain
Rb
the form τ =a df = f (τ )|ba . If this way works better for you, fine; but make sure that you
understand it the other way, too.
7.3. OPERATORS, LINEARITY AND MULTIPLE INTEGRALS 135
7.3.1 Operators
An operator is a mathematical agent that combines several values of a func-
tion.
Such a definition, unfortunately, is extraordinarily unilluminating to
those who do not already know what it means. A better way to introduce
the operator
P isQbyRgiving examples. Operators include +, −, multiplication,
division, , , and ∂. The essential action of an operator is to take
Q
several values of a function and combine them in some way. For example,
is an operator in
5
Y
(2j − 1) = (1)(3)(5)(7)(9) = 0x3B1.
j=1
Notice that the operator has acted to remove the variable j from the
expression 2j − 1. The j appears on the equation’s left side but not on its
right. The operator has used the variable up. Such a variable, used up by
an operator, is a dummy variable, as encountered earlier in § 2.3.
7.3.2 A formalism
But then how are + and − operators? They don’t use any dummy variables
up, do they? Well, it depends on how you look at it. Consider the sum
S = 3 + 5. One can write this as
1
X
S= f (k),
k=0
where
3
if k = 0,
f (k) ≡ 5 if k = 1,
undefined otherwise.
Then,
1
X
S= f (k) = f (0) + f (1) = 3 + 5 = 8.
k=0
136 CHAPTER 7. THE INTEGRAL
where
g(z) if k = 0,
h(z) if k = 1,
p(z) if k = 2,
Φ(k, z) ≡
0 if k = 3,
q(z) if k = 4,
undefined otherwise.
Such unedifying formalism is essentially useless in applications, except
as a vehicle
P for Rdefinition. Once you understand why + and − are operators
just as and are, you can forget the formalism. It doesn’t help much.
7.3.3 Linearity
A function f (z) is linear iff (if and only if) it has the properties
f (z1 + z2 ) = f (z1 ) + f (z2 ),
f (αz) = αf (z),
f (0) = 0.
The functions f (z) = 3z, f (u, v) = 2u − v and f (z) = 0 are examples of
√
linear functions. Nonlinear functions include6 f (z) = z 2 , f (u, v) = uv,
f (t) = cos ωt, f (z) = 3z + 1 and even f (z) = 1.
6
If 3z + 1 is a linear expression, then how is not f (z) = 3z + 1 a linear function?
Answer: it is partly a matter of purposeful definition, partly of semantics. The equation
y = 3x + 1 plots a line, so the expression 3z + 1 is literally “linear” in this sense; but the
definition has more purpose to it than merely this. When you see the linear expression
3z + 1, think 3z + 1 = 0, then g(z) = 3z = −1. The g(z) = 3z is linear; the −1 is the
constant value it targets. That’s the sense of it.
7.3. OPERATORS, LINEARITY AND MULTIPLE INTEGRALS 137
This is a sum of the several values of the expression xk /j!, evaluated at every
possible pair (j, k) in the indicated domain. Now consider the sum
q
" b #
X X xk
S2 = .
j!
j=p k=a
This is evidently a sum of the same values, only added in a different order.
Apparently S1 = S2 . Reflection along these lines must soon lead the reader
to the conclusion that, in general,
XX XX
f (j, k) = f (j, k).
k j j k
Now consider that an integral is just a sum of many elements, and that
a derivative is just a difference of two elements. Integrals and derivatives
must then have the same commutative property discrete sums have. For
7
You don’t see d in the list of linear operators? But d in this context is really just
another way of writing ∂, so, yes, d is linear, too. See § 4.4.2.
138 CHAPTER 7. THE INTEGRAL
example,
Z ∞ Z b Z b Z ∞
f (u, v) du dv = f (u, v) dv du;
v=−∞ u=a u=a v=−∞
Z X XZ
fk (v) dv = fk (v) dv;
k k
∂ ∂f
Z Z
f du = du.
∂v ∂v
In general,
Lv Lu f (u, v) = Lu Lv f (u, v), (7.3)
P R
where L is any of the linear operators , or ∂.
Some convergent summations, like
∞ X
1
X (−)j
,
2k + j + 1
k=0 j=0
One cannot blithely swap operators here. This is not because swapping is
wrong, but rather because the inner sum after the swap diverges, hence the
outer sum after the swap has no concrete summand on which to work. (Why
does the inner sum after the swap diverge? Answer: 1 + 1/3 + 1/5 + · · · =
[1] + [1/3 + 1/5] + [1/7 + 1/9 + 1/0xB + 1/0xD] + · · · > 1[1/4] + 2[1/8] +
4[1/0x10] + · · · = 1/4 + 1/4 + 1/4 + · · · . See also § 8.10.5.) For a more
twisted example of the same phenomenon, consider8
1 1 1 1 1 1 1 1
1 − + − + ··· = 1 − − + − − + ··· ,
2 3 4 2 4 3 6 8
which associates two negative terms with each positive, but still seems to
omit no term. Paradoxically, then,
1 1 1 1 1 1 1
1 − + − + ··· = − + − + ···
2 3 4 2 4 6 8
1 1 1 1
= − + − + ···
2 4 6 8
1 1 1 1
= 1 − + − + ··· ,
2 2 3 4
8
[1, § 1.2.3]
7.3. OPERATORS, LINEARITY AND MULTIPLE INTEGRALS 139
or so it would seem, but cannot be, for it claims falsely that the sum is half
itself. A better way to have handled the example might have been to write
the series as
1 1 1 1 1
lim 1 − + − + · · · + −
n→∞ 2 3 4 2n − 1 2n
in the first place, thus explicitly specifying equal numbers of positive and
negative terms.9 So specifying would have prevented the error.
The conditional convergence 10 of the last paragraph, which can occur in
integrals as well as in sums, seldom poses much of a dilemma in practice.
One can normally swap summational and integrodifferential operators with
little worry. The reader however should at least be aware that conditional
convergence troubles can arise where a summand or integrand varies in sign
or phase.
its effect is to cut the area under the surface into flat, upright slices, then
the slices crosswise into tall, thin towers. The towers are integrated over v
to constitute the slice, then the slices over u to constitute the volume.
In light of § 7.3.4, evidently nothing prevents us from swapping the
integrations: u first, then v. Hence
Z v2 Z u2 2
u
V = du dv.
v1 u1 v
And indeed this makes sense, doesn’t it? What difference does it make
whether we add the towers by rows first then by columns, or by columns
first then by rows? The total volume is the same in any case.
Double integrations arise very frequently in applications. Triple inte-
grations arise about as often. For instance, if µ(r) = µ(x, y, z) represents
the variable mass density of some soil,11 then the total soil mass in some
rectangular volume is
Z x 2 Z y2 Z z 2
M= µ(x, y, z) dz dy dx.
x1 y1 z1
where the V stands for “volume” and is understood to imply a triple inte-
gration. Similarly for the double integral,
Z
V = f (ρ) dρ,
S
where the S stands for “surface” and is understood to imply a double inte-
gration.
Even more than three nested integrations are possible. If we integrated
over time as well as space, the integration would be fourfold. A spatial
Fourier transform ([section not yet written]) implies a triple integration; and
its inverse, another triple: a sixfold integration altogether. Manifold nesting
of integrals is thus not just a theoretical mathematical topic; it arises in
sophisticated real-world engineering models. The topic concerns us here for
this reason.
11
Conventionally the Greek letter ρ not µ is used for density, but it happens that we
need the letter ρ for a different purpose later in the paragraph.
7.4. AREAS AND VOLUMES 141
dφ
ρ
A cross-section of a cone, cut parallel to the cone’s base, has the same shape
the base has but a different scale. If coordinates are chosen such that the
altitude h runs in the ẑ direction with z = 0 at the cone’s vertex, then
the cross-sectional area is evidently13 (B)(z/h)2 . For this reason, the cone’s
volume is
Z h
B h 2 B h3
z 2
Bh
Z
Vcone = (B) dz = 2 z dz = 2 = . (7.5)
0 h h 0 h 3 3
ẑ
r
θ z
ŷ
φ
x̂ ρ
ẑ
r dθ
ŷ
ρ dφ
144 CHAPTER 7. THE INTEGRAL
The sphere’s total surface area then is the sum of all such elements over the
sphere’s entire surface:
Z π Z π
Ssphere = dS
φ=−π θ=0
Z π Z π
= r 2 sin θ dθ dφ
φ=−π θ=0
Z π
2
= r [− cos θ]π0 dφ
φ=−π
Z π
= r2 [2] dφ
φ=−π
2
= 4πr , (7.6)
where we have used the fact from Table 7.1 that sin τ = (d/dτ )(− cos τ ).
Having computed the sphere’s surface area, one can find its volume just
as § 7.4.1 has found a circle’s area—except that instead of dividing the circle
into many narrow triangles, one divides the sphere into many narrow cones,
each cone with base area dS and altitude r, with the vertices of all the cones
meeting at the sphere’s center. Per (7.5), the volume of one such cone is
Vcone = r dS/3. Hence,
r dS r r
I I I
Vsphere = Vcone = = dS = Ssphere ,
S S 3 3 S 3
indicates integration over a closed surface. In light of (7.6), the total volume
is
4πr 3
Vsphere = . (7.7)
3
(One can compute the same spherical volume more prosaically, without R ref-
erence to cones, by writing dV = r 2 sin θ dr dθ dφ then integrating V dV .
The derivation given above, however, is preferred because it lends the addi-
tional insight that a sphere can sometimes be viewed as a great cone rolled
up about its own vertex. The circular area derivation of § 7.4.1 lends an
analogous insight: that a circle can sometimes be viewed as a great triangle
rolled up about its own vertex.)
7.5. CHECKING AN INTEGRATION 145
with a pencil, how does one check the result? Answer: by differentiating
3
∂ b − a3 τ2
= .
∂b 6 b=τ 2
like (9.14) below, which lack variable limits to differentiate. However, many
or most integrals one meets in practice have or can be given variable limits.
Equations (7.9) and (7.10) do serve such indefinite integrals.
It is a rare irony of mathematics that, although numerically differenti-
ation is indeed harder than integration, analytically precisely the opposite
is true. Analytically, differentiation is the easier. So far the book has in-
troduced only easy integrals, but Ch. 9 will bring much harder ones. Even
experienced mathematicians are apt to err in analyzing these. Reversing an
integration by taking an easy derivative is thus an excellent way to check a
hard-earned integration result.
where C stands for “contour” and means in this example the specific contour
of Fig. 7.8. In the example, x2 + y 2 = ρ2 (by the Pythagorean theorem) and
dℓ = ρ dφ, so
Z 2π/4
2π 3
Z
2
S= ρ dℓ = ρ3 dφ = ρ .
C 0 4
7.7. DISCONTINUITIES 147
ρ
φ
x
In the example the contour is open, but closed contours which begin and
end at the same point are also possible, indeed common. The useful symbol
I
indicates integration over a closed contour. It means that the contour ends
where it began: the loop is closed. The contour of Fig. 7.8 would be closed,
for instance, if it continued to r = 0 and then back to r = x̂ρ.
Besides applying where the variable of integration is a vector, contour
integration applies equally where the variable of integration is a complex
scalar. In the latter case some interesting mathematics emerge, as we shall
see in §§ 8.8 and 9.5.
7.7 Discontinuities
The polynomials and trigonometrics studied to this point in the book of-
fer flexible means to model many physical phenomena of interest, but one
thing they do not model gracefully is the simple discontinuity. Consider a
mechanical valve opened at time t = to . The flow x(t) past the valve is
(
0, t < to ;
x(t) =
xo , t > t o .
148 CHAPTER 7. THE INTEGRAL
u(t)
δ(t)
d
δ(t) ≡ u(t), (7.12)
dt
plotted in Fig. 7.10. This function is zero everywhere except at t = 0, where
it is infinite, with the property that
Z ∞
δ(t) dt = 1, (7.13)
−∞
7.7. DISCONTINUITIES 149
for any function f (t). (Equation 7.14 is the sifting property of the Dirac
delta.)16
The Dirac delta is defined for vectors, too, such that
Z
δ(r) dr = 1. (7.15)
V
16
It seems inadvisable for the narrative to digress at this point to explore u(z) and δ(z),
the unit step and delta of a complex argument, although by means of Fourier analysis
(Ch. 18) or by conceiving the Dirac delta as an infinitely narrow Gaussian pulse ([chapter
or section not yet written]) it could perhaps do so. The book has more pressing topics to
treat. For the book’s present purpose the interesting action of the two functions is with
respect to the real argument t.
In the author’s country at least, a sort of debate seems to have run for decades between
professional and applied mathematicians over the Dirac delta δ(t). Some professional
mathematicians seem to have objected that δ(t) is not a function, inasmuch as it lacks
certain properties common to functions as they define them [48, § 2.4][17]. From the
applied point of view the objection is admittedly a little hard to understand, until one
realizes that it is more a dispute over methods and definitions than over facts. What the
professionals seem to be saying is that δ(t) does not fit as neatly as they would like into
the abstract mathematical framework they had established for functions in general before
Paul Dirac came along in 1930 [67, “Paul Dirac,” 05:48, 25 May 2006] and slapped his
disruptive δ(t) down on the table. The objection is not so much that δ(t) is not allowed
as it is that professional mathematics for years after 1930 lacked a fully coherent theory
for it.
It’s a little like the six-fingered man in Goldman’s The Princess Bride [26]. If I had
established a definition of “nobleman” which subsumed “human,” whose relevant traits
in my definition included five fingers on each hand, when the six-fingered Count Rugen
appeared on the scene, then you would expect me to adapt my definition, wouldn’t you?
By my preëxisting definition, strictly speaking, the six-fingered count is “not a nobleman”;
but such exclusion really tells one more about flaws in the definition than it does about
the count.
Whether the professional mathematician’s definition of the function is flawed, of course,
is not for this writer to judge. Even if not, however, the fact of the Dirac delta dispute,
coupled with the difficulty we applied mathematicians experience in trying to understand
the reason the dispute even exists, has unfortunately surrounded the Dirac delta with a
kind of mysterious aura, an elusive sense that δ(t) hides subtle mysteries—when what it
really hides is an internal discussion of words and means among the professionals. The
professionals who had established the theoretical framework before 1930 justifiably felt
reluctant to throw the whole framework away because some scientists and engineers like
us came along one day with a useful new function which didn’t quite fit, but that was
the professionals’ problem not ours. To us the Dirac delta δ(t) is just a function. The
internal discussion of words and means, we leave to the professionals, who know whereof
they speak.
150 CHAPTER 7. THE INTEGRAL
7. ∗ Evaluate the integral of the example of § 7.6 along the alternate con-
tour suggested there, from x̂ρ to 0 to ŷρ.
Rx Rx Rx
8. Evaluate (a) 0 cos ωτ dτ ; (b) 0 sin ωτ dτ ; ∗ (c)17 0 τ sin ωτ dτ .
Rx√ Ra √ √
9. ∗ Evaluate18 (a) 1 1 + 2τ dτ ; (b) x [(cos τ )/ τ ] dτ.
Rx Rx
10. ∗ Evaluate19 (a) 0 [1/(1 + τ 2 )] dτ (answer: arctan x); (b) 0 [(4 +
√
i3)/ 2 − 3τ 2 ] dτ (hint: the answer involves another inverse trigono-
metric).
Rx R∞
11. ∗∗ Evaluate (a) −∞ exp[−τ 2 /2] dτ ; (b) −∞ exp[−τ 2 /2] dτ .
17
[55, § 8-2]
18
[55, § 5-6]
19
[55, back endpaper]
7.8. REMARKS (AND EXERCISES) 151
The Taylor series is a power series which fits a function in a limited domain
neighborhood. Fitting a function in such a way brings two advantages:
This chapter introduces the Taylor series and some of its incidents. It also
derives Cauchy’s integral formula. The chapter’s early sections prepare the
ground for the treatment of the Taylor series proper in § 8.3.1
1
Because even at the applied level the proper derivation of the Taylor series involves
mathematical induction, analytic continuation and the matter of convergence domains,
no balance of rigor the chapter might strike seems wholly satisfactory. The chapter errs
maybe toward too much rigor; for, with a little less, most of §§ 8.1, 8.2, 8.4 and 8.6 would
cease to be necessary. For the impatient, to read only the following sections might not
be an unreasonable way to shorten the chapter: §§ 8.3, 8.5, 8.8, 8.9 and 8.11, plus the
introduction of § 8.1.
From another point of view, the chapter errs maybe toward too little rigor. Some pretty
constructs of pure mathematics serve the Taylor series and Cauchy’s integral formula.
However, such constructs drive the applied mathematician on too long a detour. The
chapter as written represents the most nearly satisfactory compromise the writer has been
able to attain.
153
154 CHAPTER 8. THE TAYLOR SERIES
i, j, k, m, n, K ∈ Z.
for |z| < 1. What about 1/(1 − z)2 , 1/(1 − z)3 , 1/(1 − z)4 , and so on? By the
long-division procedure of Table 2.4, one can calculate the first few terms of
1/(1 − z)2 to be
1 1
2
= = 1 + 2z + 3z 2 + 4z 3 + · · ·
(1 − z) 1 − 2z + z 2
n + k ← m,
k ← j,
8.1.3 Convergence
The question remains as to the domain over which the sum (8.1) converges.2
To answer the question, consider that per (4.9),
m m m−1
= , m > 0.
j m−j j
or more tersely,
n+k
ank = an(k−1) ,
k
where
n+k
ank ≡
n
are the coefficients of the power series (8.1). Rearranging factors,
ank n+k n
= =1+ . (8.11)
an(k−1) k k
2
The meaning of the verb to converge may seem clear enough from the context and
from earlier references, but if explanation here helps: a series converges if and only if it
approaches a specific, finite value after many terms. A more rigorous way of saying the
same thing is as follows: the series
X∞
S= τk
k=0
converges iff (if and only if), for all possible positive constants ǫ, there exists a finite
K ≥ −1 such that ˛ n ˛
˛ X ˛
τk ˛ < ǫ,
˛ ˛
˛
˛ ˛
k=K+1
for all n ≥ K (of course it is also required that the τk be finite, but you knew that already).
The professional mathematical literature calls such convergence “uniform convergence,”
distinguishing it through a test devised by Weierstrass from the weaker “pointwise con-
vergence” [1, § 1.5]. The applied mathematician can profit substantially by learning the
professional view in the matter, but the effect of trying to teach the professional view in a
book like this would not be pleasing. Here, we avoid error by keeping a clear view of the
physical phenomena the mathematics is meant to model.
It is interesting nevertheless to consider an example of an integral for which convergence
is not so simple, such as Frullani’s integral of § 9.7.
158 CHAPTER 8. THE TAYLOR SERIES
ank z k n
= 1 + z,
an(k−1) z k−1 k
which is to say that the kth term of (8.1) is (1 + n/k)z times the (k − 1)th
term. So long as the criterion3
n
1+ z ≤ 1 − δ
k
is satisfied for all sufficiently large k > K—where 0 < δ ≪ 1 is a small posi-
tive constant—then the series evidently converges (see § 2.6.4 and eqn. 3.22).
But we can bind 1+n/k as close to unity as desired by making K sufficiently
large, so to meet the criterion it suffices that
The bound (8.12) thus establishes a sure convergence domain for (8.1).
really face this question, you will understand the ideas behind
mathematical induction. It is only when you grasp the problem
clearly that the method becomes clear. [27, § 2.3]
Hamming also wrote,
The function of rigor is mainly critical and is seldom construc-
tive. Rigor is the hygiene of mathematics, which is needed to
protect us against careless thinking. [27, § 1.6]
The applied mathematician may tend to avoid rigor for which he finds no
immediate use, but he does not disdain mathematical rigor on principle.
The style lies in exercising rigor at the right level for the problem at hand.
Hamming, a professional mathematician who sympathized with the applied
mathematician’s needs, wrote further,
Ideally, when teaching a topic the degree of rigor should follow
the student’s perceived need for it. . . . It is necessary to require
a gradually rising level of rigor so that when faced with a real
need for it you are not left helpless. As a result, [one cannot
teach] a uniform level of rigor, but rather a gradually rising level.
Logically, this is indefensible, but psychologically there is little
else that can be done. [27, § 1.6]
Applied mathematics holds that the practice is defensible, on the ground
that the math serves the model; but Hamming nevertheless makes a perti-
nent point.
Mathematical induction is a broadly applicable technique for construct-
ing mathematical proofs. We shall not always write inductions out as ex-
plicitly in this book as we have done in the present section—often we shall
leave the induction as an implicit exercise for the interested reader—but this
section’s example at least lays out the general pattern of the technique.
Of the two subseries, the f− (z) is expanded term by term using (8.1), after
which combining like powers of w yields the form
∞
X
f− (z) = qk w k ,
k=0
−(K+1) (8.17)
X n+k
qk ≡ (c[−(n+1)] ) .
n
n=0
The f+ (z) is even simpler to expand: one need only multiply the series out
term by term per (4.12), combining like powers of w to reach the form
∞
X
f+ (z) = pk w k ,
k=0
∞ (8.18)
X n
pk ≡ (cn ) .
k
n=k
8.3. EXPANDING FUNCTIONS IN TAYLOR SERIES 161
with terms of nonnegative order k ≥ 0 only, how would you do it? The
procedure of § 8.1 worked well enough in the case of f (z) = 1/(1−z)n+1 , but
it is not immediately obvious that the same procedure works more generally.
What if f (z) = sin z, for example?5
Fortunately a different way to attack the power-series expansion problem
is known. It works by asking the question: what power series, having terms
of nonnegative order only, most resembles f (z) in the immediate neighbor-
hood of z = zo ? To resemble f (z), the desired power series should have
a0 = f (zo ); otherwise it would not have the right value at z = zo . Then it
should have a1 = f ′ (zo ) for the right slope. Then, a2 = f ′′ (zo )/2 for the
right second derivative, and so on. With this procedure,
∞
!
dk f (z − zo )k
X
f (z) = . (8.19)
dz k z=zo k!
k=0
Equation (8.19) is the Taylor series. Where it converges, it has all the same
derivatives f (z) has, so if f (z) is infinitely differentiable then the Taylor
series is an exact representation of the function.6
5
The actual Taylor series for sin z is given in § 8.9.
6
Further proof details may be too tiresome to inflict on applied mathematical readers.
However, for readers who want a little more rigor nevertheless, the argument goes briefly
as follows. Consider an infinitely differentiable function F (z) and its Taylor series f (z)
about zo . Let ∆F (z) ≡ F (z) − f (z) be the part of F (z) not representable as a Taylor
series about zo .
If ∆F (z) is the part of F (z) not representable as a Taylor series, then ∆F (zo ) and
all its derivatives at zo must be identically zero (otherwise by the Taylor series formula
of eqn. 8.19, one could construct a nonzero Taylor series for ∆F [zo ] from the nonzero
derivatives). However, if F (z) is infinitely differentiable and if all the derivatives of ∆F (z)
are zero at z = zo , then by the unbalanced definition of the derivative from § 4.4, all the
derivatives must also be zero at z = zo ± ǫ, hence also at z = zo ± 2ǫ, and so on. This
means that ∆F (z) = 0. In other words, there is no part of F (z) not representable as a
Taylor series.
A more formal way to make the same argument would be to suppose that
dn ∆F/dz n |z=zo +ǫ = h for some integer n ≥ 0; whereas that this would mean that
dn+1 ∆F/dz n+1 ˛z=z = h/ǫ; but that, inasmuch as the latter is one of the derivatives of
˛
o
∆F (z) at z = zo , it follows that h = 0.
The interested reader can fill the details in, but basically that is how the more rigorous
proof goes. (A more elegant rigorous proof, preferred by the professional mathemati-
8.4. ANALYTIC CONTINUATION 163
However, neither functions like this f (z) nor more subtly unreasonable func-
tions normally arise in the modeling of physical phenomena. When such
functions do arise, one transforms, approximates, reduces, replaces and/or
avoids them. The full theory which classifies and encompasses—or explicitly
excludes—such functions is thus of limited interest to the applied mathe-
matician, and this book does not cover it.8
This does not mean that the scientist or engineer never encounters non-
analytic functions. On the contrary, he encounters several, but they are not
subtle: |z|; arg z; z ∗ ; ℜ(z); ℑ(z); u(t); δ(t). Refer to §§ 2.12 and 7.7. Such
functions are nonanalytic either because they lack proper derivatives in the
Argand plane according to (4.19) or because one has defined them only over
a real domain.
smoothly, that neither suddenly skip from one spot to another—then one
finds that f (z) ends in a different place than it began, even though z itself
has returned precisely to its own starting point. The range contour remains
open even though the domain contour is closed.
the strategy is to compose the contour to exclude the branch point, to shut
it out. Such a strategy of avoidance usually prospers.10
sin z cos w
tan z = =− ,
cos z sin w
2π
w ← z − (2n + 1) , n ∈ Z.
4
10
Traditionally associated with branch points in complex variable theory are the notions
of branch cuts and Riemann sheets. These ideas are interesting, but are not central to the
analysis as developed in this book and are not covered here. The interested reader might
consult a book on complex variables or advanced calculus like [31], among many others.
11
[66]
12
[40]
168 CHAPTER 8. THE TAYLOR SERIES
Section 8.9 and its Table 8.1, below, give Taylor series for cos z and sin z,
with which
−1 + w2 /2 − w4 /0x18 − · · ·
tan z = .
w − w3 /6 + w5 /0x78 − · · ·
By long division,
1 w/3 − w3 /0x1E + · · ·
tan z = − + .
w 1 − w2 /6 + w4 /0x78 − · · ·
(On the other hand, if it is unclear that z = [2n + 1][2π/4] are the only
singularities tan z has—that it has no singularities of which ℑ[z] 6= 0—then
consider that the singularities of tan z occur where cos z = 0, which by
Euler’s formula, eqn. 5.17, occurs where exp[+iz] = exp[−iz]. This in turn
is possible only if |exp[+iz]| = |exp[−iz]|, which happens only for real z.)
Sections 8.14, 8.15 and 9.6 speak further of the matter.
and if
such that aK is the series’ first nonzero coefficient; then, in the immediate
neighborhood of the expansion point,
Evidently one can shift the output of an analytic function f (z) slightly in
any desired Argand direction by shifting slightly the function’s input z.
8.8. CAUCHY’S INTEGRAL FORMULA 169
Z z2
Sn = z n−1 dz, n ∈ Z. (8.21)
z1
If z were always a real number, then by the antiderivative (§ 7.2) this inte-
gral would evaluate to (z2n − z1n )/n; or, in the case of n = 0, to ln(z2 /z1 ).
Inasmuch as z is complex, however, the correct evaluation is less obvious.
To evaluate the integral sensibly in the latter case, one must consider some
specific path of integration in the Argand plane. One must also consider the
meaning of the symbol dz.
13
Professional mathematicians tend to define the domain and its boundary more care-
fully.
14
[62][40]
170 CHAPTER 8. THE TAYLOR SERIES
dz = [z + dz] − [z]
h i h i
= (ρ + dρ)ei(φ+dφ) − ρeiφ
h i h i
= (ρ + dρ)ei dφ eiφ − ρeiφ
h i h i
= (ρ + dρ)(1 + i dφ)eiφ − ρeiφ .
Now consider the integration (8.21) along the contour of Fig. 8.1. Integrat-
15
The dropping of second-order infinitesimals like dρ dφ, added to first order infinites-
imals like dρ, is a standard calculus technique. One cannot always drop them, however.
Occasionally one encounters a sum in which not only do the finite terms cancel, but also
the first-order infinitesimals. In such a case, the second-order infinitesimals dominate and
cannot be dropped. An example of the type is
One typically notices that such a case has arisen when the dropping of second-order
infinitesimals has left an ambiguous 0/0. To fix the problem, you simply go back to the
step where you dropped the infinitesimal and you restore it, then you proceed from there.
Otherwise there isn’t much point in carrying second-order infinitesimals around. In the
relatively uncommon event that you need them, you’ll know it. The math itself will tell
you.
8.8. CAUCHY’S INTEGRAL FORMULA 171
y = ℑ(z)
zc
zb
ρ
φ
za
x = ℜ(z)
Z zc Z ρc
n−1
z dz = (ρeiφ )n−1 (dρ + iρ dφ)eiφ
zb ρ
Z bρc
= (ρeiφ )n−1 (dρ)eiφ
ρb
Z ρc
inφ
= e ρn−1 dρ
ρb
einφ n
= (ρc − ρnb )
n
zcn − zbn
= .
n
172 CHAPTER 8. THE TAYLOR SERIES
surprisingly the same as for real z. Since any path of integration between
any two complex numbers z1 and z2 is approximated arbitrarily closely by a
succession of short constant-ρ and constant-φ segments, it follows generally
that Z z2
z n − z1n
z n−1 dz = 2 , n ∈ Z, n 6= 0. (8.23)
z1 n
The applied mathematician might reasonably ask, “Was (8.23) really
worth the trouble? We knew that already. It’s the same as for real numbers.”
Well, we really didn’t know it before deriving it, but the point is well
taken nevertheless. However, notice the exemption of n = 0. Equation (8.23)
does not hold in that case. Consider the n = 0 integral
Z z2
dz
S0 = .
z1 z
Following the same steps as before and using (5.7) and (2.39), we find that
Z ρ2 Z ρ2 Z ρ2
dz (dρ + iρ dφ)eiφ dρ ρ2
= iφ
= = ln . (8.24)
ρ1 z ρ1 ρe ρ1 ρ ρ1
The odd thing about this is in what happens when the contour closes a
complete loop in the Argand plane about the z = 0 pole. In this case,
φ2 = φ1 + 2π, thus
S0 = i2π,
even though the integration ends where it begins.
Generalizing, we have that
I
(z − zo )n−1 dz = 0, n ∈ Z, n 6= 0;
(8.26)
dz
I
= i2π;
z − zo
H
where as in § 7.6 the symbol represents integration about a closed contour
that ends where it begins, and where it is implied that the contour loops
positively (counterclockwise, in the direction of increasing φ) exactly once
about the z = zo pole.
Notice that the formula’s i2π does not depend on the precise path of
integration, but only on the fact that the path loops once positively about
the pole. Notice also that nothing in the derivation of (8.23) actually requires
that n be an integer, so one can write
Z z2
z a − z1a
z a−1 dz = 2 , a 6= 0. (8.27)
z1 a
However, (8.26) does not hold in the latter case; its integral comes to zero
for nonintegral a only if the contour does not enclose the branch point at
z = zo .
For a closedHcontour which encloses no pole or other nonanalytic point,
(8.27) has that z a−1 dz = 0, or with the change of variable z − zo ← z,
I
(z − zo )a−1 dz = 0.
But because any analytic function can be expanded in the form f (z) =
ak −1 (which is just a Taylor series if the a happen to be
P
k (ck )(z − zo ) k
positive integers), this means that
I
f (z) dz = 0 (8.28)
f (z)
I
dz = i2πf (zo ). (8.29)
z − zo
But if the contour were not an infinitesimal circle but rather the larger
contour of Fig. 8.2? In this case, if the dashed detour which excludes the
pole is taken, then according to (8.28) the resulting integral totals zero;
but the two straight integral segments evidently cancel; and similarly as
we have just reasoned, the reverse-directed integral about the tiny detour
circle is −i2πf (zo ); so to bring the total integral to zero the integral about
the main contour must be i2πf (zo ). Thus, (8.29) holds for any positively-
directed contour which once encloses a pole and no other nonanalytic point,
whether the contour be small or large. Equation (8.29) is Cauchy’s integral
formula.
If the contour encloses multiple poles (§§ 2.11 and 9.6.2), then by the
principle of linear superposition (§ 7.3.3),
I " X fk (z)
#
X
fo (z) + dz = i2π fk (zk ), (8.30)
z − zk
k k
where the fo (z) is a regular part;17 and again, where neither fo (z) nor any
of the several fk (z) has a pole or other nonanalytic point within (or on) the
an f (z) represented by a Taylor series with an infinite number of terms and a finite
convergence domain (for example, f [z] = ln[1 − z]). However, by § 8.2 one can transpose
such a series from zo to an overlapping convergence domain about z1 . Let the contour’s
interior be divided into several cells, each of which is small enough to enjoy a single
convergence domain. Integrate about each cell. Because the cells share boundaries within
the contour’s interior, each interior boundary is integrated twice, once in each direction,
canceling. The original contour—each piece of which is an exterior boundary of some
cell—is integrated once piecewise. This is the basis on which a more rigorous proof is
constructed.
17
[43, § 1.1]
8.8. CAUCHY’S INTEGRAL FORMULA 175
ℑ(z)
zo
ℜ(z)
contour. The values fk (zk ), which represent the strengths of the poles, are
called residues. In words, (8.30) says that an integral about a closed contour
in the Argand plane comes to i2π times the sum of the residues of the poles
(if any) thus enclosed. (Note however that eqn. 8.30 does not handle branch
points. If there is a branch point, the contour must exclude it or the formula
will not work.)
As we shall see in § 9.5, whether in the form of (8.29) or of (8.30) Cauchy’s
integral formula is an extremely useful result.18
f (z)
I
S= dz, m ∈ Z, m ≥ 0,
(z − zo )m+1
18
[31, § 10.6][57][67, “Cauchy’s integral formula,” 14:13, 20 April 2006]
176 CHAPTER 8. THE TAYLOR SERIES
!
dm f
dz
I
S = m
dz z=zo (m!)(z − zo )
!I
dm f
1 dz
= m
m! dz z=zo (z − zo )
!
i2π dm f
= ,
m! dz m z=zo
where the integral is evaluated in the last step according to (8.29). Alto-
gether,
!
dm f
f (z) i2π
I
dz = , m ∈ Z, m ≥ 0. (8.31)
(z − zo )m+1 m! dz m z=zo
With the general Taylor series formula (8.19), the derivatives of Tables 5.2
and 5.3, and the observation from (4.21) that
d(z a )
= az a−1 ,
dz
19
[40][57]
8.9. TAYLOR SERIES FOR SPECIFIC FUNCTIONS 177
one can calculate Taylor series for many functions. For instance, expanding
about z = 1,
ln z|z=1 = ln z|z=1 = 0,
d 1
ln z = = 1,
dz z z=1
z=1
d2
−1
ln z = = −1,
dz 2 z 2 z=1
z=1
d3
2
ln z = = 2,
dz 3 z=1 z3 z=1
..
.
dk −(−)k (k − 1)!
= −(−)k (k − 1)!, k > 0.
ln z
=
dz k z=1 z k
z=1
20
[55, § 11-7]
178 CHAPTER 8. THE TAYLOR SERIES
∞
! k
dk f
X Y z − zo
f (z) = k
dz z=zo
j
k=0 j=1
∞ Y
k
X a
(1 + z)a−1 = −1 z
j
k=0 j=1
∞ Y k ∞
X z X zk
exp z = =
j k!
k=0 j=1 k=0
∞ k 2
X Y −z
sin z = z
(2j)(2j + 1)
k=0 j=1
k
∞ Y
X −z 2
cos z =
(2j − 1)(2j)
k=0 j=1
∞ k
X Y z2
sinh z = z
(2j)(2j + 1)
k=0 j=1
∞ Y
k
X z2
cosh z =
(2j − 1)(2j)
k=0 j=1
∞ k ∞
X 1Y X zk
− ln(1 − z) = z=
k k
k=1 j=1 k=1
∞ k ∞
X 1 Y X (−)k z 2k+1
arctan z = z (−z 2 ) =
2k + 1 2k + 1
k=0 j=1 k=0
8.10. ERROR BOUNDS 179
among others.
8.10.1 Examples
Some series alternate sign. For these it is easy if the numbers involved
happen to be real. For example, from Table 8.1,
3 1 1 1 1 1
ln = ln 1 + = − + − + ···
2 2 (1)(2 ) (2)(2 ) (3)(2 ) (4)(24 )
1 2 3
Each term is smaller in magnitude than the last, so the true value of ln(3/2)
necessarily lies between the sum of the series to n terms and the sum to
n + 1 terms. The last and next partial sums bound the result. Up to but
not including the fourth-order term, for instance,
1 3
S4 − < ln < S4 ,
(4)(24 ) 2
1 1 1
S4 = − + .
(1)(2 ) (2)(2 ) (3)(23 )
1 2
1 1
R5 = + + ···
(5)(2 ) (6)(26 )
5
180 CHAPTER 8. THE TAYLOR SERIES
The basic technique in such a case is to find a replacement series (or inte-
gral) Rn′ which one can collapse analytically, each of whose terms equals or
exceeds in magnitude the corresponding term of Rn . For the example, one
might choose
∞
′ 1X 1 2
R5 = k
= ,
5 2 (5)(25 )
k=5
wherein (2.34) had been used to collapse the summation. Then,
S5 < ln 2 < S5 + R5′ .
For real 0 ≤ x < 1 generally,
Sn < − ln(1 − x) < Sn + Rn′ ,
n−1
X xk
Sn ≡ ,
k
k=1
∞
X xk xn
Rn′ ≡ = .
n (n)(1 − x)
k=n
8.10.2 Majorization
To majorize in mathematics is to be, or to replace by virtue of being, ev-
erywhere at least as great as. This is best explained by example. Consider
the summation
∞
X 1 1 1 1
S= = 1 + 2 + 2 + 2 + ···
k2 2 3 4
k=1
The exact value this summation totals to is unknown to us, but the sum-
mation does rather resemble the integral (refer to Table 7.1)
1 ∞
Z ∞
dx
I= = − = 1.
1 x2 x 1
8.10. ERROR BOUNDS 181
Figure 8.3: Majorization. The area I between the dashed curve and the x
axis majorizes the area S − 1 between the stairstep curve and the x axis,
because the height of the dashed curve is everywhere at least as great as
that of the stairstep curve.
1/22 y = 1/x2
1/32
1/42
x
1 2 3
are often integrals and/or series summations, the two of which are akin as
Fig. 8.3 illustrates. The choice of whether to majorize a particular unknown
quantity by an integral or by a series summation depends on the convenience
of the problem at hand.
The series S of this subsection is interesting, incidentally. It is a har-
monic series rather than a power series, because although its terms do
decrease in magnitude it has no z k factor (or seen from another point of
view, it does have a z k factor, but z = 1), and the ratio of adjacent terms’
magnitudes approaches unity as k grows. Harmonic series can be hard to
sum accurately, but clever majorization can help (and if majorization does
not help enough, the series transformations of Ch. [not yet written] can help
even more).
Harmonic series can be hard to sum as § 8.10.2 has observed, but more
common than harmonic series are true power series, easier to sum in that
they include a z k factor in each term. There is no one, ideal bound that
works equally well for all power series. However, the point of establishing
a bound is not to sum a power series exactly but rather to fence the sum
within some sufficiently (rather than optimally) small neighborhood. A
simple, general bound which works quite adequately for most power series
encountered in practice, including among many others all the Taylor series
of Table 8.1, is the geometric majorization
|τn |
|ǫn | < . (8.33)
1 − |ρn |
Here, τn represents the power series’ nth-order term (in Table 8.1’s series for
exp z, for example, τn = z n /[n!]). The |ρn | is a positive real number chosen,
preferably as small as possible, such that
τk+1
τ ≤ |ρn | for all k ≥ n, (8.34)
k
τk+1
τk < |ρn | for at least one k ≥ n,
which is to say, more or less, such that each term in the series’ tail is smaller
than the last by at least a factor of |ρn |. Given these definitions, if22
n−1
X
Sn ≡ τk ,
k=0
(8.36)
ǫn ≡ S ∞ − S n ,
where ǫn is the error in the truncated sum.23 Here, |τk+1 /τk | = [k/(k +
1)] |z| < |z| for all k ≥ n > 0, so we have chosen |ρn | = |z|.
Second, if the Taylor series
∞ Y
k ∞
X z X zk
exp z = =
j k!
k=0 j=1 k=0
also of Table 8.1 is truncated before the nth-order term, and if we choose to
stipulate that
n + 1 > |z| ,
22
Some scientists and engineers—as, for example, the authors of [47] and even this
writer in earlier years—prefer to define ǫn ≡ Sn − S∞ , oppositely as we define it here.
This choice is a matter of taste. Professional mathematicians—as, for example, the author
of [63]—seem to tend toward the ǫn ≡ S∞ − Sn of (8.36).
23
This particular error bound fails for n = 0, but that is no flaw. There is no reason
to use the error bound for n = 0 when, merely by taking one or two more terms into the
truncated sum, one can quite conveniently let n = 1 or n = 2.
184 CHAPTER 8. THE TAYLOR SERIES
then
n−1 k n−1
XY z X zk
exp z ≈ = ,
j k!
k=0 j=1 k=0
n
|z | /n!
|ǫn | < .
1 − |z| /(n + 1)
Here, |τk+1 /τk | = |z| /(k + 1), whose maximum value for all k ≥ n occurs
when k = n, so we have chosen |ρn | = |z| /(n + 1).
1
− ln γ = ln ,
γ
1
γ a−1 = ,
(1/γ)a−1
which leave the respective Taylor series to compute quantities like − ln(1/3)
and (1/3)a−1 they can handle.
Let f (1 + ζ) be a function whose Taylor series about ζ = 0 converges for
|ζ| < 1 and which obeys properties of the forms24
1
f (γ) = g f ,
γ (8.37)
f (αγ) = h [f (α), f (γ)] ,
where g[·] and h[·, ·] are functions we know how to compute like g[·] = −[·]
24
This paragraph’s notation is necessarily abstract. To make it seem more concrete,
consider that the function f (1 + ζ) = − ln(1 − z) has ζ = −z, f (γ) = g[f (1/γ)] = −f (1/γ)
and f (αγ) = h[f (α), f (γ)] = f (α) + f (γ); and that the function f (1 + ζ) = (1 + z)a−1 has
ζ = z, f (γ) = g[f (1/γ)] = 1/f (1/γ) and f (αγ) = h[f (α), f (γ)] = f (α)f (γ).
8.10. ERROR BOUNDS 185
or g[·] = 1/[·]; and like h[·, ·] = [·] + [·] or h[·, ·] = [·][·]. Identifying
1
= 1 + ζ,
γ
1
γ= , (8.38)
1+ζ
1−γ
= ζ,
γ
we have that
1−γ
f (γ) = g f 1 + , (8.39)
γ
whose convergence domain |ζ| < 1 is |1 − γ| / |γ| < 1, which is |γ − 1| < |γ|
or in other words
1
ℜ(γ) > .
2
Although the transformation from ζ to γ has not lifted the convergence
limit altogether, we see that it has apparently opened the limit to a broader
domain.
Though this writer knows no way to lift the convergence limit altogether
that does not cause more problems than it solves, one can take advantage of
the h[·, ·] property of (8.37) to sidestep the limit, computing f (ω) indirectly
for any ω 6= 0 by any of several tactics. One nonoptimal but entirely effective
tactic is represented by the equations
ω ≡ in 2m γ,
|ℑ(γ)| ≤ ℜ(γ),
(8.40)
1 ≤ ℜ(γ) < 2,
m, n ∈ Z,
calculates f (ω) fast for any ω 6= 0—provided only that we have other means
to compute f (in 2m ), which not infrequently we do.25
25
Equation (8.41) admittedly leaves open the question of how to compute f (in 2m ),
but at least for the functions this subsection has used as examples this is not hard.
For the logarithm, − ln(in 2m ) = m ln(1/2) − in(2π/4). For the power, (in 2m )a−1 =
cis[(n2π/4)(a − 1)]/[(1/2)a−1 ]m . The sine and cosine in the cis function are each calcu-
lated directly by Taylor series, as are the numbers ln(1/2) and (1/2)a−1 . The number 2π,
we have not calculated yet, but shall in § 8.11.
186 CHAPTER 8. THE TAYLOR SERIES
∞
X 1
k
k=1
k+1
1 dτ
Z
> ;
k k τ
hence,
∞ ∞ Z k+1 ∞
1 dτ dτ
X X Z
> = = ln ∞.
k τ τ
k=1 k=1 k 1
26
To draw another example from Table 8.1, consider that
arctan ω = α + arctan ζ,
ω cos α − sin α
ζ≡ ,
ω sin α + cos α
where arctan ω is interpreted as the geometrical angle the vector x̂ + ŷω makes with x̂.
Axes are rotated per (3.7) through some angle α to reduce the tangent from ω to ζ, thus
causing the Taylor series to converge faster or indeed to converge at all.
Any number of further examples and tactics of the kind will occur to the creative reader,
shrinking a function’s argument by some convenient means before feeding the argument
to the function’s Taylor series.
8.10. ERROR BOUNDS 187
8.10.6 Remarks
The study of error bounds is not a matter of rules and formulas so much as of
ideas, suggestions and tactics. There is normally no such thing as an optimal
error bound—with sufficient cleverness, some tighter bound can usually be
discovered—but often easier and more effective than cleverness is simply
to add a few extra terms into the series before truncating it (that is, to
increase n a little). To eliminate the error entirely usually demands adding
an infinite number of terms, which is impossible; but since eliminating the
error entirely also requires recording the sum to infinite precision, which
is impossible anyway, eliminating the error entirely is not normally a goal
one seeks. To eliminate the error to the 0x34-bit (sixteen-decimal place)
precision of a computer’s double-type floating-point representation typically
requires something like 0x34 terms—if the series be wisely composed and if
care be taken to keep z moderately small and reasonably distant from the
edge of the series’ convergence domain. Besides, few engineering applications
really use much more than 0x10 bits (five decimal places) in any case. Perfect
precision is impossible, but adequate precision is usually not hard to achieve.
Occasionally nonetheless a series arises for which even adequate precision
is quite hard to achieve. An infamous example is
∞
X (−)k 1 1 1
S=− √ = 1 − √ + √ − √ + ··· ,
k=1
k 2 3 4
which obviously converges, but sum it if you can! It is not easy to do.
Before closing the section, we ought to arrest one potential agent of
terminological confusion. The “error” in a series summation’s error bounds
is unrelated to the error of probability theory. The English word “error” is
thus overloaded here. A series sum converges to a definite value, and to the
same value every time the series is summed; no chance is involved. It is just
that we do not necessarily know exactly what that value is. What we can
do, by this section’s techniques or perhaps by other methods, is to establish
a definite neighborhood in which the unknown value is sure to lie; and we
can make that neighborhood as tight as we want, merely by including a
sufficient number of terms in the sum.
The topic of series error bounds is what G.S. Brown refers to as “trick-
based.”27 There is no general answer to the error-bound problem, but there
are several techniques which help, some of which this section has introduced.
Other techniques, we shall meet later in the book as the need for them arises.
27
[10]
188 CHAPTER 8. THE TAYLOR SERIES
8.11 Calculating 2π
The Taylor series for arctan z in Table 8.1 implies a neat way of calculating
the constant 2π. We already know that tan 2π/8 = 1, or in other words that
2π
arctan 1 = .
8
Applying the Taylor series, we have that
∞
X (−)k
2π = 8 . (8.42)
2k + 1
k=0
The series (8.42) is simple but converges extremely slowly. Much faster con-
vergence is given by angles smaller than 2π/8. For example, from Table 3.2,
√
3−1 2π
arctan √ = .
3+1 0x18
functions are neither odd nor even, of course, but one can always split an
analytic function into two components—one odd, the other even—by the
simple expedient of sorting the odd-order terms from the even-order terms
in the function’s Taylor series. For example, exp z = sinh z + cosh z.
1 1 1
, , , tan z,
sin z cos z tan z
1 1 1
, , , tanh z,
sinh z cosh z tanh z
the poles and their respective residues are
z − kπ
= (−)k ,
sin z z = kπ
z − (k − 1/2)π
= (−)k ,
cos z
z = (k − 1/2)π
z − kπ
= 1,
tan z z = kπ
[z − (k − 1/2)π] tan z|z = (k − 1/2)π = −1,
z − ikπ (8.44)
= (−)k ,
sinh z z = ikπ
z − i(k − 1/2)π
= i(−)k ,
cosh z
z = i(k − 1/2)π
z − ikπ
= 1,
tanh z z = ikπ
[z − i(k − 1/2)π] tanh z|z = i(k − 1/2)π = 1,
k ∈ Z.
To support (8.44)’s claims, we shall marshal the identities of Tables 5.1
and 5.2 plus l’Hôpital’s rule (4.30). Before calculating residues and such,
however, we should like to verify that the poles (8.44) lists are in fact the
only poles that there are; that we have forgotten no poles. Consider for
190 CHAPTER 8. THE TAYLOR SERIES
instance the function 1/ sin z = i2/(eiz − e−iz ). This function evidently goes
infinite only when eiz = e−iz , which is possible only for real z; but for real z,
the sine function’s very definition establishes the poles z = kπ (refer to
Fig. 3.1). With the observations from Table 5.1 that i sinh z = sin iz and
cosh z = cos iz, similar reasoning for each of the eight trigonometrics forbids
poles other than those (8.44) lists. Satisfied that we have forgotten no poles,
therefore, we finally apply l’Hôpital’s rule to each of the ratios
z − kπ z − (k − 1/2)π z − kπ z − (k − 1/2)π
, , , ,
sin z cos z tan z 1/ tan z
z − ikπ z − i(k − 1/2)π z − ikπ z − i(k − 1/2)π
, , ,
sinh z cosh z tanh z 1/ tanh z
to reach (8.44).
Trigonometric poles evidently are special only in that a trigonometric
function has an infinite number of them. The poles are ordinary, single
poles, with residues, subject to Cauchy’s integral formula and so on. The
trigonometrics are meromorphic functions (§ 8.6) for this reason.29
The six simpler trigonometrics sin z, cos z, sinh z, cosh z, exp z and
cis z—conspicuously excluded from this section’s gang of eight—have no
poles for finite z, because ez , eiz , ez ± e−z and eiz ± e−iz then likewise are
finite. These simpler trigonometrics are not only meromorphic but also en-
tire. Observe however that the inverse trigonometrics are multiple-valued
and have branch points, and thus are not meromorphic at all.
Any analytic function can be expanded in a Taylor series, but never about
a pole or branch point of the function. Sometimes one nevertheless wants
to expand at least about a pole. Consider for example expanding
e−z
f (z) = (8.45)
1 − cos z
29
[40]
8.14. THE LAURENT SERIES 191
1 − z + z 2 /2 − z 3 /6 + · · ·
f (z) =
z 2 /2 − z 4 /0x18 + · · ·
P∞ j j
j=0 (−) z /j!
=
− ∞ (−)k z 2k /(2k)!
P
P∞ k=1 2k 2k+1 /(2k + 1)!
k=0 −z /(2k)! + z
= P∞ k 2k
.
k=1 (−) z /(2k)!
By long division,
( ∞
2 2 2 2 X (−)k z 2k
f (z) = − + − 2+
z2 z z z (2k)!
k=1
∞ ) ,X ∞
X z 2k z 2k+1 (−)k z 2k
+ − +
(2k)! (2k + 1)! (2k)!
k=0 k=1
(∞
(−)k 2z 2k−2 (−)k 2z 2k−1
2 2 X
= − + − +
z2 z (2k)! (2k)!
k=1
∞ ) ,X ∞
X z 2k z 2k+1 (−)k z 2k
+ − +
(2k)! (2k + 1)! (2k)!
k=0 k=1
(∞
2 2 X (−)k 2z 2k (−)k 2z 2k+1
= 2
− + −
z z (2k + 2)! (2k + 2)!
k=0
∞ ) ,X ∞
X z 2k z 2k+1 (−)k z 2k
+ − +
(2k)! (2k + 1)! (2k)!
k=0 k=1
∞
2 2 X −(2k + 1)(2k + 2) + (−)k 2 2k
= 2
− + z
z z (2k + 2)!
k=0
,X ∞
(2k + 2) − (−)k 2 2k+1 (−)k z 2k
+ z .
(2k + 2)! (2k)!
k=1
192 CHAPTER 8. THE TAYLOR SERIES
One can continue dividing to extract further terms if desired, and if all the
terms
2 2 7 z
f (z) = 2 − + − + · · ·
z z 6 2
are extracted the result is the Laurent series proper,
∞
X
f (z) = (ak )(z − zo )k , (k, K) ∈ Z, K ≤ 0. (8.47)
k=K
However for many purposes (as in eqn. 8.46) the partial Laurent series
−1 P∞
X
k (bk )(z − zo )k
f (z) = (ak )(z − zo ) + Pk=0
∞ k
, (8.48)
k=K k=0 (ck )(z − zo )
(k, K) ∈ Z, K ≤ 0, c0 6= 0,
That expansion is good only for |z| < 2, but for |z| > 2 we also have that
∞ k X ∞
1/z 2 1 X 1+k 2 2k−2 (k − 1)
g(z) = = = ,
(1 − 2/z)2 z2 1 z zk
k=0 k=2
194 CHAPTER 8. THE TAYLOR SERIES
if N = 2 then
k = (k1 , k2 )
= (0, 0), (0, 1), (0, 2), (0, 3), . . . ;
(1, 0), (1, 1), (1, 2), (1, 3), . . . ;
(2, 0), (2, 1), (2, 2), (2, 3), . . . ;
(3, 0), (3, 1), (3, 2), (3, 3), . . . ;
...
• The (z − zo )k represents
N
Y
k
(z − zo ) ≡ (zn − zon )kn .
n=1
• The k! represents
N
Y
k! ≡ kn !.
n=1
With these definitions, the multidimensional Taylor series (8.50) yields all
the right derivatives and cross-derivatives at the expansion point z = zo .
Thus within some convergence domain about z = zo , the multidimensional
Taylor series (8.50) represents a function f (z) as accurately as the simple
Taylor series (8.19) represents a function f (z), and for the same reason.
196 CHAPTER 8. THE TAYLOR SERIES
Chapter 9
Integration techniques
For instance,
x
1
Z
dτ = ln τ |x1 = ln x.
1 τ
One merely looks at the integrand 1/τ , recognizing it to be the derivative
of ln τ , then directly writes down the solution ln τ |x1 . Refer to § 7.2.
1
The notation f (τ )|za or [f (τ )]za means f (z) − f (a).
197
198 CHAPTER 9. INTEGRATION TECHNIQUES
τa
a−1 d
τ = , (9.2)
dτ a
Tables 7.1, 5.2, 5.3 and 9.1 provide several further good derivatives this
antiderivative technique can use.
One particular, nonobvious, useful variation on the antiderivative tech-
nique seems worth calling out specially here. If z = ρeiφ , then (8.24)
and (8.25) have that
z2
dz ρ2
Z
= ln + i(φ2 − φ1 ). (9.3)
z1 z ρ1
This helps, for example, when z1 and z2 are real but negative numbers.
u ← 1 + x2 ,
d(u) = d(1 + x2 ),
du = 2x dx,
9.3. INTEGRATION BY PARTS 199
the integral is
x2
x dx
Z
S =
u
Zx=x
x2
1
2x dx
=
x=x1 2u
1+x22
du
Z
=
u=1+x21 2u
1+x2
1 2
= ln u
2 u=1+x2 1
1 1 + x22
= ln .
2 1 + x21
To check the result, we can take the derivative per § 7.5 of the final expression
with respect to x2 :
∂ 1 1 + x22
1 ∂ 2
2
ln = ln 1 + x 2 − ln 1 + x 1
∂x2 2 1 + x21 x2 =x 2 ∂x2 x2 =x
x
= ,
1 + x2
which indeed has the form of the integrand we started with.
The technique is integration by substitution. It does not solve all inte-
grals but it does solve many, whether alone or in combination with other
techniques.
d(uv) = u dv + v du,
Unsure how to integrate this, we can begin by integrating part of it. We can
begin by integrating the cos ατ dτ part. Letting
u ← τ,
dv ← cos ατ dτ,
we find that2
du = dτ,
sin ατ
v= .
α
τ sin ατ x
Z x
sin ατ x
S(x) = − dτ = sin αx + cos αx − 1.
α
0 0 α α
2
The careful reader will observe that v = (sin ατ )/α + C matches the chosen dv for
any value of C, not just for C = 0. This is true. However, nothing in the integration by
parts technique requires us to consider all possible v. Any convenient v suffices. In this
case, we choose v = (sin ατ )/α.
3
[43]
9.4. INTEGRATION BY UNKNOWN COEFFICIENTS 201
Letting
u ← e−τ ,
dv ← τ z−1 dτ,
du = −e−τ dτ,
τz
v= .
z
Substituting these according to (9.4) into (9.5) yields
z ∞ Z ∞ z
−τ τ τ
−e−τ dτ
Γ(z) = e −
z τ =0 z
Z ∞ z0
τ −τ
= [0 − 0] + e dτ
0 z
Γ(z + 1)
= .
z
When written
Γ(z + 1) = zΓ(z), (9.6)
this is an interesting result. Since per (9.5)
Z ∞
∞
e−τ dτ = −e−τ 0 = 1,
Γ(1) =
0
Thus (9.5), called the gamma function, can be taken as an extended defi-
nition of the factorial (z − 1)! for all z, ℜ(z) > 0. Integration by parts has
made this finding possible.
If one does not know how to solve the integral in a more elegant way, one
can guess a likely-seeming antiderivative form, such as
2 /2 d −(ρ/σ)2 /2
e−(ρ/σ) ρ= ae ,
dρ
where the a is an unknown coefficient. Having guessed, one has no guarantee
that the guess is right, but see: if the guess were right, then the antiderivative
would have the form
2 /2 d −(ρ/σ)2 /2
e−(ρ/σ) ρ = ae
dρ
aρ 2
= − 2 e−(ρ/σ) /2 ,
σ
implying that
a = −σ 2
(evidently the guess is right, after all). Using this value for a, one can write
the specific antiderivative
2 /2 d h 2 −(ρ/σ)2 /2 i
e−(ρ/σ) ρ= −σ e ,
dρ
with which one can solve the integral, concluding that
h 2
ix h 2
i
S(x) = −σ 2 e−(ρ/σ) /2 = σ 2 1 − e−(x/σ) /2 . (9.9)
0
The same technique solves differential equations, too. Consider for ex-
ample the differential equation
the loan off in the time T , then (perhaps after some bad guesses) we guess
the form
x(t) = Aeαt + B,
where α, A and B are unknown coefficients. The guess’ derivative is
dx = αAeαt dt.
Substituting the last two equations into (9.10) and dividing by dt yields
αAeαt = IAeαt + IB − P,
αAeαt = IAeαt ,
0 = IB − P,
α = I,
P
B= .
I
Substituting these coefficients into the x(t) equation above yields the general
solution
P
x(t) = AeIt + (9.11)
I
to (9.10). The constants A and P , we establish by applying the given bound-
ary conditions x|t=0 = xo and x|t=T = 0. For the former condition, (9.11)
is
P P
xo = Ae(I)(0) + =A+ ;
I I
and for the latter condition,
P
0 = AeIT + .
I
Solving the last two equations simultaneously, we have that
−e−IT xo
A= ,
1 − e−IT (9.12)
Ixo
P = .
1 − e−IT
204 CHAPTER 9. INTEGRATION TECHNIQUES
Applying these to the general solution (9.11) yields the specific solution
xo h
(I)(t−T )
i
x(t) = 1 − e (9.13)
1 − e−IT
to (9.10) meeting the boundary conditions, with the payment rate P required
of the borrower given by (9.12).
The virtue of the method of unknown coefficients lies in that it permits
one to try an entire family of candidate solutions at once, with the family
members distinguished by the values of the coefficients. If a solution exists
anywhere in the family, the method usually finds it.
The method of unknown coefficients is an elephant. Slightly inelegant the
method may be, but it is pretty powerful, too—and it has surprise value (for
some reason people seem not to expect it). Such are the kinds of problems
the method can solve.
za
I a
I= dz = i2πz a |z=−1 = i2π ei2π/2 = i2πei2πa/2 .
z+1
The trouble, of course, is that the integral S does not go about a closed
complex contour. One can however construct a closed complex contour I of
which S is a part, as in Fig 9.1. If the outer circle in the figure is of infinite
5
[43, § 1.2]
9.5. INTEGRATION BY CLOSED CONTOUR 205
ℑ(z)
I2
z = −1
I1
ℜ(z)
I4 I3
radius and the inner, of infinitesimal, then the closed contour I is composed
of the four parts
I = I1 + I2 + I3 + I4
= (I1 + I3 ) + I2 + I4 .
The figure tempts one to make the mistake of writing that I1 = S = −I3 ,
but besides being incorrect this defeats the purpose of the closed contour
technique. More subtlety is needed. One must take care to interpret the
four parts correctly. The integrand z a /(z + 1) is multiple-valued; so, in
fact, the two parts I1 + I3 6= 0 do not cancel. The integrand has a branch
point at z = 0, which, in passing from I3 through I4 to I1 , the contour has
circled. Even though z itself takes on the same values along I3 as along I1 ,
the multiple-valued integrand z a /(z + 1) does not. Indeed,
∞ Z ∞
(ρei0 )a ρa
Z
I1 = dρ = dρ = S,
0 (ρei0 ) + 1 0 ρ+1
∞ Z ∞
(ρei2π )a ρa
Z
−I3 = dρ = ei2πa dρ = ei2πa S.
0 (ρei2π ) + 1 0 ρ+1
206 CHAPTER 9. INTEGRATION TECHNIQUES
Therefore,
I = I1 + I2 + I3 + I4
= (I1 + I3 ) + I2 + I4
2π Z 2π
za za
Z
i2πa
= (1 − e )S + lim dz − lim dz
ρ→∞ φ=0 z + 1 ρ→0 φ=0 z + 1
Z 2π Z 2π
i2πa a−1
= (1 − e )S + lim z dz − lim z a dz
ρ→∞ φ=0 ρ→0 φ=0
a 2π a+1 2π
z z
= (1 − ei2πa )S + lim
− lim .
ρ→∞ a φ=0 ρ→0 a + 1 φ=0
Since a < 0, the first limit vanishes; and because a > −1, the second
limit vanishes, too, leaving
I = (1 − ei2πa )S.
As in the previous example, here again the contour is not closed. The
previous example closed the contour by extending it, excluding the branch
point. In this example there is no branch point to exclude, nor need one
extend the contour. Rather, one changes the variable
z ← eiθ
and takes advantage of the fact that z, unlike θ, begins and ends the inte-
gration at the same point. One thus obtains the equivalent integral
dz/iz i2 dz
I I
T = =− 2
1 + (a/2)(z + 1/z) a z + 2z/a + 1
i2 dz
I
= − h √ ih √ i,
a z − −1 + 1 − a2 /a z − −1 − 1 − a2 /a
whose contour is the unit circle in the Argand plane. The integrand evidently
has poles at
√
−1 ± 1 − a2
z= ,
a
√
2 2 − a2 ∓ 2 1 − a2
|z| = .
a2
One of the two magnitudes is less than unity and one is greater, meaning
that one of the two poles lies within the contour and one lies without, as is
208 CHAPTER 9. INTEGRATION TECHNIQUES
a2 < 1,
0 < 1 − a2 ,
(−a2 )(0) > (−a2 )(1 − a2 ),
0 > −a2 + a4 ,
1 − a2 > 1 − 2a2 + a4 ,
2
1 − a2 > 1 − a2 ,
√
1 − a2 > 1 − a2 ,
√ √
− 1 − a2 < −(1 − a2 ) < 1 − a2 ,
√ √
1 − 1 − a2 < a2 < 1 + 1 − a2 ,
√ √
2 − 2 1 − a2 < 2a2 < 2 + 2 1 − a2 ,
√ √
2 − a2 − 2 1 − a2 < a2 < 2 − a2 + 2 1 − a2 ,
2
√ √
2−a −2 1−a 2 2 − a2 + 2 1 − a2
< 1 < .
a2 a2
Per Cauchy’s integral formula (8.29), integrating about the pole within the
contour yields
−i2/a 2π
T = i2π √ =√ .
z − −1 − 1 − a2 /a √ 1 − a2
z=(−1+ 1−a2 )/a
The extension Z z2
Z z2
f (z) dz ≤ |f (z) dz| (9.15)
z1 z1
of the complex triangle sum inequality (3.22) from the discrete to the con-
tinuous case sometimes proves useful in evaluating integrals by this section’s
technique, as in § 17.5.4.
j, j ′ , k, ℓ, m, n, p, p(·) , M, N ∈ Z.
The trouble is that one is not always given the function in the amenable
form.
Given a rational function
PN −1
bk z k
f (z) = QNk=0 (9.16)
j=1 (z − αj )
9
[48, Appendix F][31, §§ 2.7 and 10.12]
10
Terminology (you probably knew this already): A fraction is the ratio of two numbers
or expressions B/A. In the fraction, B is the numerator and A is the denominator. The
quotient is Q = B/A.
210 CHAPTER 9. INTEGRATION TECHNIQUES
in which no two of the several poles αj are the same, the partial-fraction
expansion has the form
N
X Ak
f (z) = , (9.17)
z − αk
k=1
puts the several fractions over a common denominator, yielding (9.16). Di-
viding (9.16) by (9.17) gives the ratio
PN −1 , N
k
k=0 bk z
X Ak
1 = QN .
j=1 (z − αj )
z − αk
k=1
where Am , the value of f (z) with the pole canceled, is called the residue
of f (z) at the pole z = αm . Equations (9.17) and (9.18) together give the
partial-fraction expansion of (9.16)’s rational function f (z).
insight, the present subsection treats the matter in a different way. Here,
we separate the poles.
Consider the function
N −1
X Cei2πk/N
g(z) = , N > 1, 0 < ǫ ≪ 1, (9.19)
k=0
z − ǫei2πk/N
But11
N −1
(
X k N if j = mN,
ei2πj/N =
k=0
0 otherwise,
so
∞
X ǫmN −1
g(z) = N C .
m=1
z mN
For |z| ≫ ǫ—that is, except in the immediate neighborhood of the small
circle of poles—the first term of the summation dominates. Hence,
ǫN −1
g(z) ≈ N C , |z| ≫ ǫ.
zN
11
If you don’t see why, then for N = 8 and j = 3 plot the several (ei2πj/N )k in the
Argand plane. Do the same for j = 2 then j = 8. Only in the j = 8 case do the terms
add coherently; in the other cases they cancel.
This effect—reinforcing when j = nN , canceling otherwise—is a classic manifestation
of Parseval’s principle, which § 17.1 will formally introduce later in the book.
212 CHAPTER 9. INTEGRATION TECHNIQUES
1
C= ,
N ǫN −1
then
1
g(z) ≈ , |z| ≫ ǫ.
zN
N −1
1 X ei2πk/N
g(z) = , N > 1, 0 < ǫ ≪ 1.
N ǫN −1 z − ǫei2πk/N
k=0
N −1
1 1 X ei2πk/N
= lim , N > 1. (9.20)
(z − zo )N ǫ→0 N ǫN −1
k=0
z − zo + ǫei2πk/N
The significance of (9.20) is that it lets one replace an N -fold pole with
a small circle of ordinary poles, which per § 9.6.1 we already know how to
handle. Notice incidentally that 1/N ǫN −1 is a large number not a small.
The poles are close together but very strong.
9.6. INTEGRATION BY PARTIAL-FRACTION EXPANSION 213
z2 − z + 6
f (z) =
(z − 1)2 (z + 2)
z2 − z + 6
= lim
ǫ→0 (z − [1 + ǫei2π(0)/2 ])(z − [1 + ǫei2π(1)/2 ])(z + 2)
z2 − z + 6
= lim
ǫ→0 (z − [1 + ǫ])(z − [1 − ǫ])(z + 2)
z2 − z + 6
1
= lim
ǫ→0 z − [1 + ǫ] (z − [1 − ǫ])(z + 2) z=1+ǫ
z2 − z + 6
1
+
z − [1 − ǫ] (z − [1 + ǫ])(z + 2) z=1−ǫ
z2 − z + 6
1
+
z+2 (z − [1 + ǫ])(z − [1 − ǫ]) z=−2
1 6+ǫ
= lim
ǫ→0 z − [1 + ǫ] 6ǫ + 2ǫ2
1 6−ǫ
+
z − [1 − ǫ] −6ǫ + 2ǫ2
1 0xC
+
z+2 9
1/ǫ − 1/6 −1/ǫ − 1/6 4/3
= lim + +
ǫ→0 z − [1 + ǫ] z − [1 − ǫ] z+2
1/ǫ −1/ǫ −1/3 4/3
= lim + + + .
ǫ→0 z − [1 + ǫ] z − [1 − ǫ] z − 1 z + 2
If one can find the poles of a rational function of the form (9.16), then one
can use (9.17) and (9.18)—and, if needed, (9.20)—to expand the function
into a sum of partial fractions, each of which one can integrate individually.
214 CHAPTER 9. INTEGRATION TECHNIQUES
x x
τ2 − τ + 6
Z Z
f (τ ) dτ = 2
dτ
0 0 (τ − 1) (τ + 2)
Z x
1/ǫ −1/ǫ −1/3 4/3
= lim + + + dτ
ǫ→0 0 τ − [1 + ǫ] τ − [1 − ǫ] τ − 1 τ + 2
1 1
= lim ln([1 + ǫ] − τ ) − ln([1 − ǫ] − τ )
ǫ→0 ǫ ǫ
x
1 4
− ln(1 − τ ) + ln(τ + 2)
3 3 0
x
1 [1 + ǫ] − τ 1 4
= lim ln − ln(1 − τ ) + ln(τ + 2)
ǫ→0 ǫ [1 − ǫ] − τ 3 3
0x
1 [1 − τ ] + ǫ 1 4
= lim ln − ln(1 − τ ) + ln(τ + 2)
ǫ→0 ǫ [1 − τ ] − ǫ 3 3
x 0
1 2ǫ 1 4
= lim ln 1 + − ln(1 − τ ) + ln(τ + 2)
ǫ→0 ǫ 1−τ 3 3 0
x
1 2ǫ 1 4
= lim − ln(1 − τ ) + ln(τ + 2)
ǫ→0 ǫ 1−τ 3 3 0
x
2 1 4
= lim − ln(1 − τ ) + ln(τ + 2)
ǫ→0 1 − τ 3 3
0
2 1 4 x+2
= − 2 − ln(1 − x) + ln .
1−x 3 3 2
To check (§ 7.5) that the result is correct, we can take the derivative of the
final expression:
d 2 1 4 x+2
− 2 − ln(1 − x) + ln
dx 1−x 3 3 2 x=τ
2 −1/3 4/3
= + +
(τ − 1)2 τ − 1 τ + 2
τ2 − τ + 6
= ,
(τ − 1)2 (τ + 2)
which indeed has the form of the integrand we started with, confirming the
result. (Notice incidentally how much easier it is symbolically to differentiate
than to integrate!)
9.6. INTEGRATION BY PARTIAL-FRACTION EXPANSION 215
dk Φ wp−k hk (w)
= , 0 ≤ k ≤ p, (9.22)
dwk [g(w)]k+1
dk Φ
= 0 for 0 ≤ k < p. (9.23)
dwk w=0
That is, the function and its first p − 1 derivatives are all zero at w = 0. The
reason is that (9.22)’s denominator is [g(w)]k+1 6= 0, whereas its numerator
has a wp−k = 0 factor, when 0 ≤ k < p and w = 0.
What the partial-fraction expansion (9.25) lacks are the values of its several
coefficients Ajℓ .
One can determine the coefficients with respect to one (possibly re-
peated) pole at a time. To determine them with respect to the pm -fold
pole at z = αm , 1 ≤ m ≤ M , one multiplies (9.25) by (z − αm )pm to obtain
the form
M pj −1
X X (Ajℓ )(z − αm )pm pXm −1
pm
(z − αm ) f (z) = pj −ℓ
+ (Amℓ )(z − αm )ℓ .
j=1, ℓ=0
(z − αj ) ℓ=0
j6=m
But (9.23) with w = z − αm reveals the double summation and its first
pm − 1 derivatives all to be null at z = αm ; that is,
M pj −1
dk X X (Ajℓ )(z − αm )pm
= 0, 0 ≤ k < pm ;
dz k (z − αj )pj −ℓ
j=1, ℓ=0
j6=m
z=αm
so, the (z − αm )pm f (z) equation’s kth derivative reduces at that point to
pX
m −1
dk h pm
i dk h i
ℓ
(z − αm ) f (z) = (Amℓ )(z − αm )
dz k z=αm dz k z=αm
ℓ=0
= k!Amk , 0 ≤ k < pm .
9.6. INTEGRATION BY PARTIAL-FRACTION EXPANSION 217
Changing j ← m and ℓ ← k and solving for Ajℓ then produces the coeffi-
cients
ℓ h i
1 d pj
Ajℓ = (z − αj ) f (z) , 0 ≤ ℓ < p, (9.26)
ℓ! dz ℓ z=αj
between them. Logically this difference must be zero for all z if the two
solutions are actually to represent the same function f (z). This however
218 CHAPTER 9. INTEGRATION TECHNIQUES
is seen to be possible only if Bjℓ = Ajℓ for each (j, ℓ). Therefore, the two
solutions are one and the same.
Existence comes of combining the several fractions of (9.25) over a com-
mon denominator and comparing the resulting numerator against the numer-
ator of (9.24). Each coefficient bk is seen thereby to be a linear combination
of the several Ajℓ , where the combination’s weights depend solely on the
locations αj and multiplicities pj of f (z)’s several poles. From the N coeffi-
cients bk and the N coefficients Ajℓ , an N × N system of N linear equations
in N unknowns results—which might for example (if, say, N = 3) look like
We shall show in Chs. 11 through 14 that when such a system has no so-
lution, there always exist an alternate set of bk for which the same system
has multiple solutions. But uniqueness, which we have already established,
forbids such multiple solutions in all cases. Therefore it is not possible for
the system to have no solution—which is to say, the solution necessarily
exists.
We shall not often in this book prove existence and uniqueness explicitly,
but such proofs when desired tend to fit the pattern outlined here.
(here on the face of it, we have split the integration as though a ≤ b, but in
fact it does not matter which of a and b is the greater, as is easy to verify).
So long as each of f (ǫ) and f (1/ǫ) approaches a constant value as ǫ vanishes,
this is
( Z b/ǫ Z bǫ )
dσ dσ
S = lim f (+∞) − f (0+ )
ǫ→0+ a/ǫ σ aǫ σ
b/ǫ + bǫ
= lim f (+∞) ln − f (0 ) ln
ǫ→0+ a/ǫ aǫ
b
= [f (τ )]∞0 ln .
a
Thus we have Frullani’s integral,
Z ∞
f (bτ ) − f (aτ ) b
dτ = [f (τ )]∞
0 ln , (9.27)
0 τ a
which, if a and b are both real and positive, works for any f (τ ) which has
definite f (0+ ) and f (+∞).12
1 (−)n−k
ak = Qn = , 0 ≤ k ≤ n.
α j=k+1 (−jα) (n!/k!)αn−k+1
Therefore,14
n
d X (−)n−k
exp(ατ )τ n = exp(ατ )τ k , n ∈ Z, n ≥ 0, α 6= 0.
dτ (n!/k!)αn−k+1
k=0
(9.28)
The right form to guess for the antiderivative of τ a−1 ln τ is less obvious.
Remembering however § 5.3’s observation that ln τ is of zeroth order in τ ,
after maybe some false tries we eventually do strike the right form
d a
τ a−1 ln τ = τ [B ln τ + C]
dτ
= τ a−1 [aB ln τ + (B + aC)],
d τa
a−1 1
τ ln τ = ln τ − , a 6= 0. (9.29)
dτ a a
14
[55, Appendix 2, eqn. 73]
15
[55, Appendix 2, eqn. 74]
9.9. INTEGRATION BY TAYLOR SERIES 221
n
d X (−)n−k
exp(ατ )τ n = exp(ατ )τ k , n ∈ Z, n ≥ 0, α 6= 0
dτ (n!/k!)αn−k+1
k=0
d τa
1
τ a−1 ln τ = ln τ − , a 6= 0
dτ a a
ln τ d (ln τ )2
=
τ dτ 2
ln τ d (ln τ )2
= . (9.30)
τ dτ 2
τ a = exp(a ln τ ), (9.31)
With sufficient cleverness the techniques of the foregoing sections solve many,
many integrals. But not all. When all else fails, as sometimes it does, the
Taylor series of Ch. 8 and the antiderivative of § 9.1 together offer a concise,
practical way to integrate some functions, at the price of losing the functions’
222 CHAPTER 9. INTEGRATION TECHNIQUES
Then,
x
τ2
Z
exp − dτ = myf x.
0 2
The myf z is no less a function than sin z is; it’s just a function you hadn’t
heard of before. You can plot the function, or take its derivative
2
d τ
myf τ = exp − ,
dτ 2
or calculate its value, or do with it whatever else one does with functions.
It works just the same.
Chapter 10
Under the heat of noonday, between the hard work of the morning and the
heavy lifting of the afternoon, one likes to lay down one’s burden and rest a
spell in the shade. Chapters 2 through 9 have established the applied math-
ematical foundations upon which coming chapters will build; and Ch. 11,
hefting the weighty topic of the matrix, will indeed begin to build on those
foundations. But in this short chapter which rests between, we shall refresh
ourselves with an interesting but lighter mathematical topic: the topic of
cubics and quartics.
The expression
z + a0
is a linear polynomial, the lone root z = −a0 of which is plain to see. The
quadratic polynomial
z 2 + a1 z + a0
has of course two roots, which though not plain to see the quadratic for-
mula (2.2) extracts with little effort. So much algebra has been known since
antiquity. The roots of higher-order polynomials, the Newton-Raphson iter-
ation (4.31) locates swiftly, but that is an approximate iteration rather than
an exact formula like (2.2), and as we have seen in § 4.8 it can occasionally
fail to converge. One would prefer an actual formula to extract the roots.
No general formula to extract the roots of the nth-order polynomial
seems to be known.1 However, to extract the roots of the cubic and quartic
polynomials
z 3 + a2 z 2 + a1 z + a0 ,
z 4 + a3 z 3 + a2 z 2 + a1 z + a0 ,
1
Refer to Ch. 6’s footnote 9.
223
224 CHAPTER 10. CUBICS AND QUARTICS
though the ancients never discovered how, formulas do exist. The 16th-
century algebraists Ferrari, Vieta, Tartaglia and Cardano have given us the
clever technique. This chapter explains.2
10.2 Cubics
The general cubic polynomial is too hard to extract the roots of directly, so
one begins by changing the variable
x+h←z (10.3)
2
[66, “Cubic equation”][66, “Quartic equation”][67, “Quartic equation,” 00:26, 9 Nov.
2006][67, “François Viète,” 05:17, 1 Nov. 2006][67, “Gerolamo Cardano,” 22:35, 31 Oct.
2006][59, § 1.5]
3
This change of variable broadly recalls the sum-of-exponentials form (5.19) of the
cosh(·) function, inasmuch as exp[−φ] = 1/ exp φ.
4
Also called “Vieta’s substitution.” [66, “Vieta’s substitution”]
10.2. CUBICS 225
ln z
ln w
The choice
a2
h≡− (10.4)
3
casts the polynomial into the improved form
a22
a 3
3 a1 a2 2
x + a1 − x + a0 − +2 ,
3 3 3
or better yet
x3 − px − q,
where
a22
p ≡ −a1 + ,
3 (10.5)
a1 a2 a 3
2
q ≡ −a0 + −2 .
3 3
The solutions to the equation
x3 = px + q, (10.6)
roots would follow immediately, but no very simple substitution like (10.3)
achieves this—or rather, such a substitution does achieve it, but at the
price of reintroducing an unwanted x2 or z 2 term. That way is no good.
Lacking guidance, one might try many, various substitutions, none of which
seems to help much; but after weeks or months of such frustration one might
eventually discover Vieta’s transform (10.1), with the idea of balancing the
equation between offsetting w and 1/w terms. This works.
Vieta-transforming (10.6) by the change of variable
wo2
w+ ←x (10.7)
w
we get the new equation
wo2 wo6
w3 + (3wo2 − p)w + (3wo2 − p) + 3 = q, (10.8)
w w
which invites the choice
p
wo2 ≡ , (10.9)
3
reducing (10.8) to read
(p/3)3
w3 + = q.
w3
Multiplying by w3 and rearranging terms, we have the quadratic equation
q p 3
(w3 )2 = 2 w3 − , (10.10)
2 3
which by (2.2) we know how to solve.
Vieta’s transform has reduced the original cubic to a quadratic.
The careful reader will observe that (10.10) seems to imply six roots,
double the three the fundamental theorem of algebra (§ 6.2.2) allows a cubic
polynomial to have. We shall return to this point in § 10.3. For the moment,
however, we should like to improve the notation by defining5
p
P ←− ,
3 (10.11)
q
Q←+ ,
2
5
Why did we not define P and Q so to begin with? Well, before unveiling (10.10),
we lacked motivation to do so. To define inscrutable coefficients unnecessarily before the
need for them is apparent seems poor applied mathematical style.
10.3. SUPERFLUOUS ROOTS 227
Table 10.1: The method to extract the three roots of the general cubic
polynomial. (In the definition of w3 , one can choose either sign.)
0 = z 3 + a2 z 2 + a1 z + a0
a1 a2 2
P ≡ −
3 3
1 a a a 3
1 2 2
Q ≡ −a0 + 3 −2
2 3 3 3
2Q p if P = 0,
w3 ≡
Q ± Q2 + P 3 otherwise.
0 if P = 0 and Q = 0,
x ≡
w − P/w otherwise.
a2
z = x−
3
x3 = 2Q − 3P x, (10.12)
3 2 3 3
(w ) = 2Qw + P . (10.13)
Table 10.1 summarizes the complete cubic polynomial root extraction meth-
od in the revised notation—including a few fine points regarding superfluous
roots and edge cases, treated in §§ 10.3 and 10.4 below.
the guess is right, then the second w3 cannot but yield the same three roots,
which means that the second w3 is superfluous and can safely be overlooked.
But is the guess right? Does a single w3 in fact generate three distinct x?
To prove that it does, let us suppose that it did not. Let us suppose
that a single w3 did generate two w which led to the same x. Letting the
symbol w1 represent the third w, then (since all three w come from the
same w3 ) the two w are e+i2π/3 w1 and e−i2π/3 w1 . Because x ≡ w − P/w,
by successive steps,
P P
e+i2π/3 w1 − = e−i2π/3 w1 − ,
e+i2π/3 w1 e−i2π/3 w1
P P
e+i2π/3 w1 + −i2π/3 = e−i2π/3 w1 + +i2π/3 ,
e w1 e w
1
P P
e+i2π/3 w1 + = e−i2π/3 w1 + ,
w1 w1
Squaring,
Q4 + 2Q2 P 3 + P 6 = Q4 + Q2 P 3 ,
then canceling offsetting terms and factoring,
(P 3 )(Q2 + P 3 ) = 0.
6
The verb to cube in this context means “to raise to the third power,” as to change y
to y 3 , just as the verb to square means “to raise to the second power.”
10.4. EDGE CASES 229
in which § 10.3 generally finds it sufficient to consider either of the two signs.
In the edge case P = 0,
w3 = 2Q or 0.
Both edge cases are interesting. In this section, we shall consider first the
edge cases themselves, then their effect on the proof of § 10.3.
The edge case P = 0, like the general non-edge case, gives two distinct
quadratic solutions w3 . One of the two however is w3 = Q − Q = 0, which
is awkward in light of Table 10.1’s definition that x ≡ w − P/w. For this
reason, in applying the table’s method when P = 0, one chooses the other
quadratic solution, w3 = Q + Q = 2Q.
The edge case P 3 = −Q2 gives only the one quadratic solution w3 = Q;
or more precisely, it gives two quadratic solutions which happen to have the
same value. This is fine. One merely accepts that w3 = Q, and does not
worry about choosing one w3 over the other.
The double edge case, or corner case, arises where the two edges meet—
where P = 0 and P 3 = −Q2 , or equivalently where P = 0 and Q = 0. At
the corner, the trouble is that w3 = 0 and that no alternate w3 is available.
However, according to (10.12), x3 = 2Q − 3P x, which in this case means
that x3 = 0 and thus that x = 0 absolutely, no other x being possible. This
implies the triple root z = −a2 /3.
Section 10.3 has excluded the edge cases from its proof of the sufficiency
of a single w3 . Let us now add the edge cases to the proof. In the edge case
P 3 = −Q2 , both w3 are the same, so the one w3 suffices by default because
the other w3 brings nothing different. The edge case P = 0 however does
give two distinct w3 , one of which is w3 = 0, which puts an awkward 0/0 in
the table’s definition of x. We address this edge in the spirit of l’Hôpital’s
rule, by sidestepping it, changing P infinitesimally from P = 0 to P = ǫ.
Then, choosing the − sign in the definition of w3 ,
ǫ3 ǫ3
p
3
w = Q− Q2 + ǫ3 = Q − (Q) 1 + =− ,
2Q2 2Q
ǫ
w = − ,
(2Q)1/3
ǫ ǫ
x = w− =− + (2Q)1/3 = (2Q)1/3 .
w (2Q)1/3
10.5. QUARTICS 231
10.5 Quartics
Having successfully extracted the roots of the general cubic polynomial, we
now turn our attention to the general quartic. The kernel of the cubic tech-
nique lay in reducing the cubic to a quadratic. The kernel of the quartic
technique lies likewise in reducing the quartic to a cubic. The details dif-
fer, though; and, strangely enough, in some ways the quartic reduction is
actually the simpler.8
As with the cubic, one begins solving the quartic by changing the variable
x+h←z (10.15)
to obtain the equation
x4 = sx2 + px + q, (10.16)
where
a3
h≡− ,
4
a 2
3
s ≡ −a2 + 6 ,
4
a a 3 (10.17)
3 3
p ≡ −a1 + 2a2 −8 ,
4 4
a a 2 a 4
3 3 3
q ≡ −a0 + a1 − a2 +3 .
4 4 4
8
Even stranger, historically Ferrari discovered it earlier [66, “Quartic equation”]. Ap-
parently Ferrari discovered the quartic’s resolvent cubic (10.22), which he could not solve
until Tartaglia applied Vieta’s transform to it. What motivated Ferrari to chase the quar-
tic solution while the cubic solution remained still unknown, this writer does not know,
but one supposes that it might make an interesting story.
The reason the quartic is simpler√to reduce is probably related to the fact that (1)1/4 =
1/3 1/4
±1, ±i, whereas (1) = 1, (−1±i 3)/2. The (1) brings a much neater result, the roots
lying nicely along the Argand axes. This may also be why the quintic is intractable—but
here we trespass the professional mathematician’s territory and stray from the scope of
this book. See Ch. 6’s footnote 9.
232 CHAPTER 10. CUBICS AND QUARTICS
To reduce (10.16) further, one must be cleverer. Ferrari9 supplies the clev-
erness. The clever idea is to transfer some but not all of the sx2 term to the
equation’s left side by
where
k2 ≡ 2u + s,
(10.19)
j 2 ≡ u2 + q.
Now, one must regard (10.18) and (10.19) properly. In these equations, s,
p and q have definite values fixed by (10.17), but not so u, j or k. The
variable u is completely free; we have introduced it ourselves and can assign
it any value we like. And though j 2 and k2 depend on u, still, even after
specifying u we remain free at least to choose signs for j and k. As for u,
though no choice would truly be wrong, one supposes that a wise choice
might at least render (10.18) easier to simplify.
So, what choice for u would be wise? Well, look at (10.18). The left
side of that equation is a perfect square. The right side would be, too, if
it were that p = ±2jk; so, arbitrarily choosing the + sign, we propose the
constraint that
p = 2jk, (10.20)
or, better expressed,
p
j= . (10.21)
2k
Squaring (10.20) and substituting for j 2 and k2 from (10.19), we have that
s 4sq − p2
0 = u3 + u2 + qu + . (10.22)
2 8
9
[66, “Quartic equation”]
10.6. GUESSING THE ROOTS 233
Equation (10.22) is the resolvent cubic, which we know by Table 10.1 how
to solve for u, and which we now specify as a second constraint. If the
constraints (10.21) and (10.22) are both honored, then we can safely substi-
tute (10.20) into (10.18) to reach the form
2
x2 + u = k2 x2 + 2jkx + j 2 ,
which is 2 2
x2 + u = kx + j . (10.23)
The resolvent cubic (10.22) of course yields three u not one, but the
resolvent cubic is a voluntary constraint, so we can just pick one u and
ignore the other two. Equation (10.19) then gives k (again, we can just
pick one of the two signs), and (10.21) then gives j. With u, j and k
established, (10.23) implies the quadratic
x2 = ±(kx + j) − u, (10.24)
wherein the two ± signs are tied together but the third, ±o sign is indepen-
dent of the two. Equation (10.25), with the other equations and definitions
of this section, reveals the four roots of the general quartic polynomial.
In view of (10.25), the change of variables
k
K← ,
2 (10.26)
J ← j,
improves the notation. Using the improved notation, Table 10.2 summarizes
the complete quartic polynomial root extraction method.
0 = [z − 1][z − i][z + i] = z 3 − z 2 + z − 1.
234 CHAPTER 10. CUBICS AND QUARTICS
Table 10.2: The method to extract the four roots of the general quartic
polynomial. (In the table, the resolvent cubic is solved for u by the method of
Table 10.1, where any one of the three resulting u serves. Either of the two K
similarly serves. Of the three ± signs in x’s definition, the ±o is independent
but the other two are tied together, the four resulting combinations giving
the four roots of the general quartic.)
0 = z 4 + a3 z 3 + a2 z 2 + a1 z + a0
a 2
3
s ≡ −a2 + 6
4
a a 3
3 3
p ≡ −a1 + 2a2 −8
4 4
a a 2 a 4
3 3 3
q ≡ −a0 + a1 − a2 +3
4 4 4
s 4sq − p2
0 = u3 + u2 + qu +
√ 2 8
2u + s
K ≡ ±
( p2
± u2 + q if K = 0,
J ≡
p/4K otherwise.
p
x ≡ ±K ±o K 2 ± J − u
a3
z = x−
4
10.6. GUESSING THE ROOTS 235
2z 5 − 7z 4 + 8z 3 + 1z 2 − 0xAz + 6,
which naturally has the same roots. If the roots are complex or irrational,
they are hard to guess; but if any of the roots happens to be real and rational,
it must belong to the set
1 2 3 6
±1, ±2, ±3, ±6, ± , ± , ± , ± .
2 2 2 2
No other real, rational root is possible. Trying the several candidates on the
polynomial, one finds that 1, −1 and 3/2 are indeed roots. Dividing these
out leaves a quadratic which is easy to solve for the remaining roots.
The real, rational candidates are the factors of the polynomial’s trailing
coefficient (in the example, 6, whose factors are ±1, ±2, ±3 and ±6) divided
by the factors of the polynomial’s leading coefficient (in the example, 2,
whose factors are ±1 and ±2). The reason no other real, rational root is
10
At least, no better way is known to this author. If any reader can straightforwardly
simplify the expression without solving a cubic polynomial of some kind, the author would
like to hear of it.
236 CHAPTER 10. CUBICS AND QUARTICS
where all the coefficients ak are integers. Moving the q n term to the equa-
tion’s right side, we have that
11
The presentation here is quite informal. We do not want to spend many pages on
this.
12
[59, § 3.2]
Part II
237
Chapter 11
The matrix
that nontrivial eigenvalues arise (though you√cannot tell just by looking, the
eigenvalues of C happen to be −1 and [7 ± 0x49]/2). But, just what is an
eigenvalue? Answer: an eigenvalue is the value by which an object like C
scales an eigenvector without altering the eigenvector’s direction. Of course,
we have not yet said what an eigenvector is, either, or how C might scale
something, but it is to answer precisely such questions that this chapter and
the three which follow it are written.
So, we are getting ahead of ourselves. Let’s back up.
An object like C is called a matrix. It serves as a generalized coefficient or
multiplier. Where we have used single numbers as coefficients or multipliers
239
240 CHAPTER 11. THE MATRIX
heretofore, one can with sufficient care often use matrices instead. The
matrix interests us for this reason among others.
The technical name for the “single number” is the scalar. Such a number,
as for instance 5 or −4 + i3, is called a scalar because its action alone
during multiplication is simply to scale the thing it multiplies. Besides acting
alone, however, scalars can also act in concert—in orderly formations—thus
constituting any of three basic kinds of arithmetical object:
• the scalar itself, a single number like α = 5 or β = −4 + i3;
• the vector, a column of m scalars like
» –
5
u= ,
−4 + i3
3
In most of its chapters, the book seeks a balance between terseness the determined
beginner cannot penetrate and prolixity the seasoned veteran will not abide. The matrix
upsets this balance.
Part of the trouble with the matrix is that its arithmetic is just that, an arithmetic,
no more likely to be mastered by mere theoretical study than was the classical arithmetic
of childhood. To master matrix arithmetic, one must drill it; yet the book you hold is
fundamentally one of theory not drill.
The reader who has previously drilled matrix arithmetic will meet here the essential
applied theory of the matrix. That reader will find this chapter and the next three te-
dious enough. The reader who has not previously drilled matrix arithmetic, however, is
likely to find these chapters positively hostile. Only the doggedly determined beginner
will learn the matrix here alone; others will find it more amenable to drill matrix arith-
metic first in the early chapters of an introductory linear algebra textbook, dull though
such chapters be (see [42] or better yet the fine, surprisingly less dull [30] for instance,
though the early chapters of almost any such book give the needed arithmetical drill.)
Returning here thereafter, the beginner can expect to find these chapters still tedious but
no longer impenetrable. The reward is worth the effort. That is the approach the author
recommends.
To the mathematical rebel, the young warrior with face painted and sword agleam,
still determined to learn the matrix here alone, the author salutes his honorable defiance.
Would the rebel consider alternate counsel? If so, then the rebel might compose a dozen
matrices of various sizes and shapes, broad, square and tall, decomposing each carefully
by pencil per the Gauss-Jordan method of § 12.3, checking results (again by pencil; using
a machine defeats the point of the exercise, and using a sword, well, it won’t work) by
multiplying factors to restore the original matrices. Several hours of such drill should build
the young warrior the practical arithmetical foundation to master—with commensurate
effort—the theory these chapters bring. The way of the warrior is hard, but conquest is
not impossible.
To the matrix veteran, the author presents these four chapters with grim enthusiasm.
Substantial, logical, necessary the chapters may be, but exciting they are not. At least, the
earlier parts are not very exciting (later parts are better). As a reasonable compromise, the
veteran seeking more interesting reading might skip directly to Chs. 13 and 14, referring
back to Chs. 11 and 12 as need arises.
242 CHAPTER 11. THE MATRIX
Ax = b, (11.1)
For example, » –
0 6 2
A= 1 1 −1
where 2 3
x1 » –
b1
x= 4 x2 5, b= b2
.
x3
4
[5][21][30][42]
5
Professional mathematicians conventionally are careful to begin by drawing a clear
distinction between the ideas of the linear transformation, the basis set and the simul-
taneous system of linear equations—proving from suitable axioms that the three amount
more or less to the same thing, rather than implicitly assuming the fact. The professional
approach [5, Chs. 1 and 2][42, Chs. 1, 2 and 5] has much to recommend it, but it is not
the approach we will follow here.
11.1. PROVENANCE AND BASIC USE 243
aij ≡ [A]ij
is the element at the ith row and jth column of A, counting from top left
(in the example for instance, a12 = 6).
Besides representing linear transformations as such, matrices can also
represent simultaneous systems of linear equations. For example, the system
is compactly represented as
Ax = b,
with A as given above and b = [2 4]T . Seen from this point of view, a
simultaneous system of linear equations is itself neither more nor less than
a linear transformation.
X ≡ [ x1 x2 · · · xp ],
B ≡ [ b1 b2 · · · bp ],
AX = B, (11.4)
n
X
bik = aij xjk .
j=1
AX 6= XA, (11.6)
where [A]∗j is the jth column of A. Here x is not only a vector; it is also an
operator. It operates on A’s columns. By virtue of multiplying A from the
right, the vector x is a column operator acting on A.
If several vectors xk line up in a row to form a matrix X, such that
AX = B, then the matrix X is likewise a column operator:
n
X
[B]∗k = [A]∗j xjk . (11.10)
j=1
The kth column of X weights the several columns of A to yield the kth
column of B.
If a matrix multiplying from the right is a column operator, is a matrix
multiplying from the left a row operator? Indeed it is. Another way to write
AX = B, besides (11.10), is
n
X
[B]i∗ = aij [X]j∗ . (11.11)
j=1
The ith row of A weights the several rows of X to yield the ith row of B.
The matrix A is a row operator. (Observe the notation. The ∗ here means
“any” or “all.” Hence [X]j∗ means “jth row, all columns of X”—that is,
the jth row of X. Similarly, [A]∗j means “all rows, jth column of A”—that
is, the jth column of A.)
Column operators attack from the right; row operators, from the left.
This rule is worth memorizing; the concept is important. In AX = B, the
matrix X operates on A’s columns; the matrix A operates on X’s rows.
Since matrix multiplication produces the same result whether one views
it as a linear transformation (11.4), a column operation (11.10) or a row
operation (11.11), one might wonder what purpose lies in defining matrix
multiplication three separate ways. However, it is not so much for the sake
246 CHAPTER 11. THE MATRIX
C = AT ,
(11.12)
cij = aji ,
C = A∗ ,
(11.13)
cij = a∗ji ,
and that
∞
X
δij ajk = aik , (11.19)
j=−∞
the latter of which is the Kronecker sifting property. The Kronecker equa-
tions (11.18) and (11.19) parallel the Dirac equations (7.13) and (7.14).
Chs. 11 and 14 will find frequent use for the Kronecker delta. Later,
§ 15.4.3 will revisit the Kronecker delta in another light.
9
Q ‘
Recall from § 2.3 that k Ak = · · · A3 A2 A1 , whereas k Ak = A1 A2 A3 · · · .
10
[67, “Kronecker delta,” 15:59, 31 May 2006]
248 CHAPTER 11. THE MATRIX
with zeros in the unused cells. As before, x11 = −4 and x32 = −1, but
now xij exists for all integral i and j; for instance, x(−1)(−1) = 0. For
such a matrix, indeed for all matrices, the matrix multiplication rule (11.4)
generalizes to
B = AX,
∞
X
bik = aij xjk . (11.20)
j=−∞
The ∞ × ∞ matrix
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· 1 0 0 0 0 0 0 ··· 7
7
6
6 ··· 0 1 0 0 0 0 0 ··· 7
7
··· 0 0 1 0 0 0 0 ···
6 7
6 7
A= 6
6 ··· 0 0 5 1 0 0 0 ···
7
7
··· 0 0 0 0 1 0 0 ···
6 7
6 7
··· 0 0 0 0 0 1 0 ···
6 7
6 7
··· 0 0 0 0 0 0 1 ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
expresses the operation generally. As before, a11 = 1 and a21 = 5, but now
also a(−1)(−1) = 1 and a09 = 0, among others. By running ones infinitely
both ways out the main diagonal, we guarantee by (11.20) that when A
acts AX on a matrix X of any dimensionality whatsoever, A adds to the
second row of X, 5 times the first—and affects no other row. (But what if X
is a 1 × p matrix, and has no second row? Then the operation AX creates
a new second row, 5 times the first—or rather so fills in X’s previously null
second row.)
In the infinite-dimensional view, the matrices A and X differ essen-
tially.11 This section explains, developing some nonstandard formalisms
the derivations of later sections and chapters can use.12
11
This particular section happens to use the symbols A and X to represent certain
specific matrix forms because such usage flows naturally from the usage Ax = b of § 11.1.
Such usage admittedly proves awkward in other contexts. Traditionally in matrix work
and elsewhere in the book, the letter A does not necessarily represent an extended operator
as it does here, but rather an arbitrary matrix of no particular form.
12
The idea of infinite dimensionality is sure to discomfit some readers, who have studied
matrices before and are used to thinking of a matrix as having some definite size. There is
nothing wrong with thinking of a matrix as having some definite size, only that that view
does not suit the present book’s development. And really, the idea of an ∞ × 1 vector or
an ∞ × ∞ matrix should not seem so strange. After all, consider the vector u such that
uℓ = sin ℓǫ,
where 0 < ǫ ≪ 1 and ℓ is an integer, which holds all values of the function sin θ of a real
argument θ. Of course one does not actually write down or store all the elements of an
infinite-dimensional vector or matrix, any more than one actually writes down or stores
all the bits (or digits) of 2π. Writing them down or storing them is not the point. The
point is that infinite dimensionality is all right; that the idea thereof does not threaten
to overturn the reader’s preëxisting matrix knowledge; that, though the construct seem
unfamiliar, no fundamental conceptual barrier rises against it.
Different ways of looking at the same mathematics can be extremely useful to the applied
mathematician. The applied mathematical reader who has never heretofore considered
250 CHAPTER 11. THE MATRIX
or more compactly,
[0]ij = 0.
Special symbols like 0, 0 or O are possible for the null matrix, but usually a
simple 0 suffices. There are no surprises here; the null matrix brings all the
expected properties of a zero, like
0 + A = A,
[0][X] = 0.
The same symbol 0 used for the null scalar (zero) and the null matrix
is used for the null vector, too. Whether the scalar 0, the vector 0 and the
matrix 0 actually represent different things is a matter of semantics, but the
three are interchangeable for most practical purposes in any case. Basically,
a zero is a zero is a zero; there’s not much else to it.13
Now a formality: the ordinary m × n matrix X can be viewed, infinite-
dimensionally, as a variation on the null matrix, inasmuch as X differs from
the null matrix only in the mn elements xij , 1 ≤ i ≤ m, 1 ≤ j ≤ n. Though
the theoretical dimensionality of X be ∞ × ∞, one need record only the mn
elements, plus the values of m and n, to retain complete information about
such a matrix. So the semantics are these: when we call a matrix X an
m × n matrix, or more precisely a dimension-limited matrix with an m × n
infinite dimensionality in vectors and matrices would be well served to take the opportunity
to do so here. As we shall discover in Ch. 12, dimensionality is a poor measure of a matrix’s
size in any case. What really counts is not a matrix’s m × n dimensionality but rather its
rank.
13
Well, of course, there’s a lot else to it, when it comes to dividing by zero as in Ch. 4,
or to summing an infinity of zeros as in Ch. 7, but those aren’t what we were speaking of
here.
11.3. DIMENSIONALITY AND MATRIX FORMS 251
or more compactly,
[I]ij = δij , (11.22)
where δij is the Kronecker delta of § 11.2. The identity matrix I is a matrix 1,
as it were,14 bringing the essential property one expects of a 1:
IX = X = XI. (11.23)
or more compactly,
[λI]ij = λδij , (11.24)
14
In fact you can write it as 1 if you like. That is essentially what it is. The I can be
regarded as standing for “identity” or as the Roman numeral I.
252 CHAPTER 11. THE MATRIX
The several αk control how the extended operator A differs from λI. One
need record only the several αk along with their respective addresses (ik , jk ),
plus the scale λ, to retain complete information about such a matrix. For
example, for an extended operator fitting the pattern
2 3
.. .. .. .. .. ..
6 . . . . . . 7
6
6 ··· λ 0 0 0 0 ··· 7
7
··· λα1 λ 0 0 0 ···
6 7
6 7
A= 6
6 ··· 0 0 λ λα2 0 ··· 7,
7
··· 0 0 0 λ(1 + α3 ) 0 ···
6 7
6 7
··· 0 0 0 0 λ ···
6 7
6 7
4 .. .. .. .. .. ..
5
. . . . . .
one need record only the values of α1 , α2 and α3 , the respective addresses
(2, 1), (3, 4) and (4, 4), and the value of the scale λ; this information alone
implies the entire ∞ × ∞ matrix A.
When we call a matrix A an extended n × n operator, or an extended
operator with an n × n active region, we shall mean formally that A is an
∞ × ∞ matrix and is further an extended operator for which
That is, an extended n × n operator is one whose several αk all lie within
the n × n square. The A in the example is an extended 4 × 4 operator (and
also a 5 × 5, a 6 × 6, etc., but not a 3 × 3).
(Often in practice for smaller operators—especially in the typical case
that λ = 1—one finds it easier just to record all the n × n elements of
the active region. This is fine. Large matrix operators however tend to be
11.3. DIMENSIONALITY AND MATRIX FORMS 253
sparse, meaning that they depart from λI in only a very few of their many
elements. It would waste a lot of computer memory explicitly to store all
those zeros, so one normally stores just the few elements, instead.)
Implicit in the definition of the extended operator is that the identity
matrix I and the scalar matrix λI, λ 6= 0, are extended operators with 0 × 0
active regions (and also 1 × 1, 2 × 2, etc.). If λ = 0, however, the scalar
matrix λI is just the null matrix, which is no extended operator but rather
by definition a 0 × 0 dimension-limited matrix.
It’s probably easier just to sketch the matrices and look at them, though.
254 CHAPTER 11. THE MATRIX
limited and extended-operational forms are normally the most useful, and
they are the ones we shall principally be handling in this book.
One reason to have defined specific infinite-dimensional matrix forms is
to show how straightforwardly one can fully represent a practical matrix of
an infinity of elements by a modest, finite quantity of information. Further
reasons to have defined such forms will soon occur.
then also
[C1 C2 ]i∗ = eTi ; (11.34)
and likewise that if
[C1 ]∗j = [C2 ]∗j = ej ,
then also
[C1 C2 ]∗j = ej . (11.35)
The product of matrices has off-diagonal entries in a row or column only
if at least one of the factors itself has off-diagonal entries in that row or
column. Or, less readably but more precisely, the ith row or jth column of
the product of matrices can depart from eTi or ej , respectively, only if the
corresponding row or column of at least one of the factors so departs. The
reason is that in (11.34), C1 acts as a row operator on C2 ; that if C1 ’s ith
row is eTi , then its action is merely to duplicate C2 ’s ith row, which itself is
just eTi . Parallel logic naturally applies to (11.35).
It is good to define a concept aesthetically. One should usually do so when one can; and
indeed in this case one might reasonably promote either definition on aesthetic grounds.
However, an applied mathematician ought not to let a mere definition entangle him. What
matters is the underlying concept. Where the definition does not serve the concept well,
the applied mathematician considers whether it were not worth the effort to adapt the
definition accordingly.
258 CHAPTER 11. THE MATRIX
Note that none of these, and in fact no elementary operator of any kind,
differs from I in more than four elements.
11.4.1 Properties
Significantly, elementary operators as defined above are always invertible
(which is to say, reversible in effect), with
−1
T[i↔j] = T[j↔i] = T[i↔j],
−1
Tα[i] = T(1/α)[i] , (11.39)
−1
Tα[ij] = T−α[ij] ,
being themselves elementary operators such that
T −1 T = I = T T −1 (11.40)
in each case.20 This means that any sequence
Q
of elementaries k Tk can
safely be undone by the reverse sequence k Tk−1 :
`
a Y Y a
Tk−1 Tk = I = Tk Tk−1 . (11.41)
k k k k
or as
A = T−4[32] T(1/0xA)[21] T5[31] T(1/5)[2] T[2↔3] T[1↔3] ,
among other possible orderings. Though you probably cannot tell just by
looking, the three products above are different orderings of the same ele-
mentary chain; they yield the same A and thus represent exactly the same
matrix operation. Interesting is that the act of reordering the elementaries
has altered some of them into other elementaries of the same kind, but has
changed the kind of none of them.
One sorts a chain of elementary operators by repeatedly exchanging
adjacent pairs. This of course supposes that one can exchange adjacent
pairs, which seems impossible since matrix multiplication is not commuta-
tive: A1 A2 6= A2 A1 . However, at the moment we are dealing in elementary
operators only; and for most pairs T1 and T2 of elementary operators, though
indeed T1 T2 6= T2 T1 , it so happens that there exists either a T1′ such that
T1 T2 = T2 T1′ or a T2′ such that T1 T2 = T2′ T1 , where T1′ and T2′ are elemen-
taries of the same kinds respectively as T1 and T2 . The attempt sometimes
fails when both T1 and T2 are addition elementaries, but all other pairs
commute in this way. Significantly, elementaries of different kinds always
commute. And, though commutation can alter one (never both) of the two
elementaries, it changes the kind of neither.
Many qualitatively distinct pairs of elementaries exist; we shall list these
exhaustively in a moment. First, however, we should like to observe a natural
hierarchy among the three kinds of elementary: (i) interchange; (ii) scaling;
(iii) addition.
Tables 11.1, 11.2 and 11.3 list all possible pairs of elementary operators, as
the reader can check. The only pairs that fail to commute are the last three
of Table 11.3.
T A = (T A)(I) = (T A)(T −1 T ),
AT = (I)(AT ) = (T T −1 )(AT ),
T A = [T AT −1 ]T,
(11.44)
AT = T [T −1 AT ],
T[m↔n] = T[n↔m]
T[m↔m] = I
IT[m↔n] = T[m↔n] I
T[m↔n] T[m↔n] = T[m↔n] T[n↔m] = T[n↔m] T[m↔n] = I
T[m↔n] T[i↔n] = T[i↔n] T[m↔i] = T[i↔m] T[m↔n]
2
= T[i↔n] T[m↔n]
T[m↔n] T[i↔j] = T[i↔j] T[m↔n]
T[m↔n] Tα[m] = Tα[n] T[m↔n]
T[m↔n] Tα[i] = Tα[i] T[m↔n]
T[m↔n] Tα[ij] = Tα[ij] T[m↔n]
T[m↔n] Tα[in] = Tα[im] T[m↔n]
T[m↔n] Tα[mj] = Tα[nj] T[m↔n]
T[m↔n] Tα[mn] = Tα[nm] T[m↔n]
262 CHAPTER 11. THE MATRIX
T1[m] = I
ITβ[m] = Tβ[m] I
T(1/β)[m] Tβ[m] = I
Tβ[m] Tα[m] = Tα[m] Tβ[m] = Tαβ[m]
Tβ[m] Tα[i] = Tα[i] Tβ[m]
Tβ[m] Tα[ij] = Tα[ij] Tβ[m]
Tβ[m] Tαβ[im] = Tα[im] Tβ[m]
Tβ[m] Tα[mj] = Tαβ[mj] Tβ[m]
T0[ij] = I
ITα[ij] = Tα[ij] I
T−α[ij] Tα[ij] = I
Tβ[ij] Tα[ij] = Tα[ij] Tβ[ij] = T(α+β)[ij]
Tβ[mj] Tα[ij] = Tα[ij] Tβ[mj]
Tβ[in] Tα[ij] = Tα[ij] Tβ[in]
Tβ[mn] Tα[ij] = Tα[ij] Tβ[mn]
Tβ[mi] Tα[ij] = Tα[ij] Tαβ[mj] Tβ[mi]
Tβ[jn] Tα[ij] = Tα[ij] T−αβ[in] Tβ[jn]
Tβ[ji] Tα[ij] 6= Tα[ij] Tβ[ji]
11.5. INVERSION AND SIMILARITY (INTRODUCTION) 263
T −1 T = I = T T −1 .
Matrix inversion is not for elementary operators only, though. Many more
general matrices C also have inverses such that
C −1 C = I = CC −1 . (11.45)
(Do all matrices have such inverses? No. For example, the null matrix has
no such inverse.) The broad question of how to invert a general matrix C,
we leave for Chs. 12 and 13 to address. For the moment however we should
like to observe three simple rules involving matrix inversion.
First, nothing in the logic leading to (11.44) actually requires the ma-
trix T there to be an elementary operator. Any matrix C for which C −1 is
known can fill the role. Hence,
CA = [CAC −1 ]C,
(11.46)
AC = C[C −1 AC].
Table 11.4: Matrix inversion properties. (The properties work equally for
C −1(r) as for C −1 if A honors an r ×r active region. The full notation C −1(r)
for the rank-r inverse incidentally is not standard, usually is not needed, and
normally is not used.)
C −1 C = I = CC −1
C −1(r) C = Ir = CC −1(r)
−1 T
CT = C −T = C −1
−1 ∗
C∗ = C −∗ = C −1
CA = [CAC −1 ]C
AC = C[C −1 AC]
!−1
Y a
Ck = Ck−1
k k
The full notation C −1(r) is not standard and usually is not needed, since
the context usually implies the rank. When so, one can abbreviate the
notation to C −1 . In either notation, (11.47) and (11.48) apply equally for
the rank-r inverse as for the infinite-dimensional inverse. Because of (11.31),
eqn. (11.46) too applies for the rank-r inverse if A’s active region is limited
to r × r. (Section 13.2 uses the rank-r inverse to solve an exactly determined
linear system. This is a famous way to use the inverse, with which many or
most readers will already be familiar; but before using it so in Ch. 13, we
shall first learn how to compute it reliably in Ch. 12.)
Table 11.4 summarizes.
11.6 Parity
Consider the sequence of integers or other objects 1, 2, 3, . . . , n. By succes-
sively interchanging pairs of the objects (any pairs, not just adjacent pairs),
one can achieve any desired permutation (§ 4.2.1). For example, beginning
11.6. PARITY 265
. . . , v, a1 , a2 , . . . , am−1 , am , u, . . .
The interchange reverses with respect to one another just the pairs
(u, a1 ) (u, a2 ) · · · (u, am−1 ) (u, am )
(a1 , v) (a2 , v) · · · (am−1 , v) (am , v)
(u, v)
22
For readers who learned arithmetic in another language than English, the even integers
are . . . , −4, −2, 0, 2, 4, 6, . . .; the odd integers are . . . , −3, −1, 1, 3, 5, 7, . . . .
266 CHAPTER 11. THE MATRIX
The number of pairs reversed is odd. Since each reversal alters p by ±1, the
net change in p apparently also is odd, reversing parity. It seems that re-
gardless of how distant the pair, interchanging any pair of elements reverses
the permutation’s parity.
The sole exception arises when an element is interchanged with itself.
This does not change parity, but it does not change anything else, either, so
in parity calculations we ignore it.23 All other interchanges reverse parity.
We discuss parity in this, a chapter on matrices, because parity concerns
the elementary interchange operator of § 11.4. The rows or columns of a
matrix can be considered elements in a sequence. If so, then the interchange
operator T[i↔j] , i 6= j, acts precisely in the manner described, interchanging
rows or columns and thus reversing parity. It follows that Q
if ik 6= jk and q is
odd, then qk=1 T[ik ↔jk ] 6= I. However, it is possible that qk=1 T[ik ↔jk ] = I
Q
if q is even. In any event, even q implies even p, which means even (positive)
parity; odd q implies odd p, which means odd (negative) parity.
We shall have more to say about parity in §§ 11.7.1 and 14.1.
23
This is why some authors forbid self-interchanges, as explained in footnote 19.
11.7. THE QUASIELEMENTARY OPERATOR 267
This operator resembles I in that it has a single one in each row and in each
column, but the ones here do not necessarily run along the main diagonal.
The effect of the operator is to shuffle the rows or columns of the matrix it
operates on, without altering any of the rows or columns it shuffles.
By (11.41), (11.39), (11.43) and (11.15), the inverse of the general inter-
change operator is
!−1
Y a
P −1
= T[ik ↔jk ] = T[i−1
k ↔jk ]
k k
a
= T[ik ↔jk ]
k
!∗
a Y
= T[i∗k ↔jk ] = T[ik ↔jk ]
k k
= P∗ = PT (11.51)
P T P = P ∗P = I = P P ∗ = P P T . (11.52)
24
The letter P here recalls the verb “to permute.”
268 CHAPTER 11. THE MATRIX
25
The letter D here recalls the adjective “diagonal.”
11.7. THE QUASIELEMENTARY OPERATOR 269
elementary). An example is
2 3
.. .. .. .. .. ..
6 . . . . . . 7
··· 7 0 0 0 0 ···
6 7
6 7
··· 0 4 0 0 0 ···
6 7
6 7
D = T−5[4] T4[2] T7[1] = 6
6 ··· 0 0 1 0 0 ··· 7.
7
··· 0 0 0 −5 0 ···
6 7
6 7
··· 0 0 0 0 1 ···
6 7
6 7
4 .. .. .. .. .. ..
5
. . . . . .
This operator resembles I in that all its entries run down the main diagonal;
but these entries, though never zeros, are not necessarily ones, either. They
are nonzero scaling factors. The effect of the operator is to scale the rows
or columns of the matrix it operates on.
The general scaling operator is a particularly simple matrix. Its inverse
is evidently
∞ ∞ ∞
−1
a Y X Eii
D = T(1/αi )[i] = T(1/αi )[i] = , (11.54)
αi
i=−∞ i=−∞ i=−∞
∞
Y ∞
a
L[j] = Tαij [ij] = Tαij [ij] (11.56)
i=j+1 i=j+1
∞
X
= I+ αij Eij
i=j+1
2 3
.. .. .. .. .. ..
6 . . . . . . 7
6
6 ··· 1 0 0 0 0 ··· 7
7
··· 0 1 0 0 0 ···
6 7
6 7
= 6
6 ··· 0 0 1 0 0 ··· 7,
7
··· 0 0 ∗ 1 0 ···
6 7
6 7
··· 0 0 ∗ 0 1 ···
6 7
6 7
4 .. .. .. .. .. ..
5
. . . . . .
whose inverse is
∞
a ∞
Y
L−1
[j] = T−αij [ij] = T−αij [ij] (11.57)
i=j+1 i=j+1
∞
X
= I− αij Eij = 2I − L[j];
i=j+1
26
In this subsection the explanations are briefer than in the last two, but the pattern is
similar. The reader can fill in the details.
27
The letter L here recalls the adjective “lower.”
11.8. THE UNIT TRIANGULAR MATRIX 271
j−1
a j−1
Y
U[j] = Tαij [ij] = Tαij [ij] (11.58)
i=−∞ i=−∞
j−1
X
= I+ αij Eij
i=−∞
2 3
.. .. .. .. .. ..
6 . . . . . . 7
6
6 ··· 1 0 ∗ 0 0 ··· 7
7
··· 0 1 ∗ 0 0 ···
6 7
6 7
= 6
6 ··· 0 0 1 0 0 ··· 7,
7
··· 0 0 0 1 0 ···
6 7
6 7
··· 0 0 0 0 1 ···
6 7
6 7
4 .. .. .. .. .. ..
5
. . . . . .
whose inverse is
j−1
Y j−1
a
−1
U[j] = T−αij [ij] = T−αij [ij] (11.59)
i=−∞ i=−∞
j−1
X
= I− αij Eij = 2I − U[j] ;
i=−∞
Yet more complicated than the quasielementary of § 11.7 is the unit trian-
gular matrix, with which we draw this necessary but tedious chapter toward
28
The letter U here recalls the adjective “upper.”
272 CHAPTER 11. THE MATRIX
a long close:
∞
X i−1
X ∞
X ∞
X
L = I+ αij Eij = I + αij Eij (11.60)
i=−∞ j=−∞ j=−∞ i=j+1
2 3
.. .. .. .. .. ..
6 . . . . . . 7
6
6 ··· 1 0 0 0 0 ··· 7
7
··· ∗ 1 0 0 0 ···
6 7
6 7
= 6
6 ··· ∗ ∗ 1 0 0 ··· 7;
7
··· ∗ ∗ ∗ 1 0 ···
6 7
6 7
··· ∗ ∗ ∗ ∗ 1 ···
6 7
6 7
4 .. .. .. .. .. ..
5
. . . . . .
∞
X ∞
X ∞
X j−1
X
U = I+ αij Eij = I + αij Eij (11.61)
i=−∞ j=i+1 j=−∞ i=−∞
2 3
.. .. .. .. .. ..
6 . . . . . . 7
6
6 ··· 1 ∗ ∗ ∗ ∗ ··· 7
7
··· 0 1 ∗ ∗ ∗ ···
6 7
6 7
= 6
6 ··· 0 0 1 ∗ ∗ ··· 7.
7
··· 0 0 0 1 ∗ ···
6 7
6 7
··· 0 0 0 0 1 ···
6 7
6 7
4 .. .. .. .. .. ..
5
. . . . . .
The former is a unit lower triangular matrix; the latter, a unit upper tri-
angular matrix. The unit triangular matrix is a generalized addition quasi-
elementary, which adds not only to multiple targets but also from multi-
ple sources—but in one direction only: downward or leftward for L or U T
(or U ∗ ); upward or rightward for U or LT (or L∗ ).
The general triangular matrix LS or US , which by definition can have
any values along its main diagonal, is sometimes of interest, as in the Schur
decomposition of § 14.10.29 The strictly triangular matrix L − I or U − I is
likewise sometimes of interest, as in Table 11.5.30 However, such matrices
cannot in general be expressed as products of elementary operators and this
section does not treat them.
This section presents and derives the basic properties of the unit trian-
gular matrix.
29
The subscript S here stands for Schur. Other books typically use the symbols L and U
for the general triangular matrix of Schur, but this book distinguishes by the subscript.
30
[67, “Schur decomposition,” 00:32, 30 Aug. 2007]
11.8. THE UNIT TRIANGULAR MATRIX 273
11.8.1 Construction
To make a unit triangular matrix is straightforward:
∞
a
L= L[j] ;
j=−∞
∞ (11.62)
Y
U= U[j] .
j=−∞
which is to say that the entries of L and U are respectively nothing more than
the relevant entries of the several L[j] and U[j]. Equation (11.63) enables one
to use (11.62) immediately and directly, without calculation, to build any
unit triangular matrix desired.
The correctness of (11.63) is most easily seen if the several L[j] and U[j]
are regarded as column operators acting sequentially on I:
∞
a
L = (I) L[j] ;
j=−∞
∞
Y
U = (I) U[j] .
j=−∞
The reader can construct an inductive proof symbolically on this basis with-
out too much difficulty if desired, but just thinking about how L[j] adds
columns leftward and U[j] , rightward, then considering the order in which
the several L[j] and U[j] act, (11.63) follows at once.
31
Q ‘
Recall again from
Q § 2.3 that k Ak = · · · A3 A2 A1 , whereas k Ak = A1 A2 A3 · · · .
This means that ( ‘k Ak )(C) applies first A1 , then A2 , A3 and so on, as row operators
to C; whereas (C)( Qk Ak ) ‘
applies first A1 , then A2 , A3 and so on, as column operators
to C. The symbols and as this book uses them can thus be thought of respectively
as row and column sequencers.
274 CHAPTER 11. THE MATRIX
L1 L2 = L,
(11.64)
U1 U2 = U,
is another unit triangular matrix of the same type. The proof for unit lower
and unit upper triangular matrices is the same. In the unit lower triangular
case, one starts from a form of the definition of a unit lower triangular
matrix: (
0 if i < j,
[L1 ]ij or [L2 ]ij =
1 if i = j.
Then,
∞
X
[L1 L2 ]ij = [L1 ]im [L2 ]mj .
m=−∞
But as we have just observed, [L1 ]im is null when i < m, and [L2 ]mj is null
when m < j. Therefore,
(
0 if i < j,
[L1 L2 ]ij = Pi
m=j [L1 ]im [L2 ]mj if i ≥ j.
which again is the very definition of a unit lower triangular matrix. Hence
(11.64).
11.8.3 Inversion
Inasmuch as any unit triangular matrix can be constructed from addition
quasielementaries by (11.62), inasmuch as (11.63) supplies the specific quasi-
elementaries, and inasmuch as (11.57) or (11.59) gives the inverse of each
11.8. THE UNIT TRIANGULAR MATRIX 275
such quasielementary, one can always invert a unit triangular matrix easily
by
∞
Y
L −1
= L−1
[j] ,
j=−∞
∞ (11.65)
a
−1 −1
U = U[j] .
j=−∞
k ∞
{k}
X X
Lk = I+ αij Eij (11.66)
j=−∞ i=k+1
2 3
.. .. .. .. .. ..
6 . . . . . . 7
6
6 ··· 1 0 0 0 0 ··· 7
7
··· 0 1 0 0 0 ···
6 7
6 7
= 6
6 ··· 0 0 1 0 0 ···
7
7
··· ∗ ∗ ∗ 1 0 ···
6 7
6 7
··· ∗ ∗ ∗ 0 1 ···
6 7
6 7
4 .. .. .. .. .. ..
5
. . . . . .
276 CHAPTER 11. THE MATRIX
or
∞ X
k−1
{k}
X
Uk = I+ αij Eij (11.67)
j=k i=−∞
2 3
.. .. .. .. .. ..
6 . . . . . . 7
6
6 ··· 1 0 ∗ ∗ ∗ ··· 7
7
··· 0 1 ∗ ∗ ∗ ···
6 7
6 7
= 6
6 ··· 0 0 1 0 0 ··· 7,
7
··· 0 0 0 1 0 ···
6 7
6 7
··· 0 0 0 0 1 ···
6 7
6 7
4 .. .. .. .. .. ..
5
. . . . . .
the parallel unit triangular matrix brings the useful property that
k ∞
{k}
X X
Lk =I+ αij Eij
j=−∞ i=k+1
k
a ∞
Y k
a ∞
a
= Tαij [ij] = Tαij [ij]
j=−∞ i=k+1 j=−∞ i=k+1
k
Y ∞
Y k
Y ∞
a
= Tαij [ij] = Tαij [ij]
j=−∞ i=k+1 j=−∞ i=k+1
∞
Y k
a ∞
a k
a
= Tαij [ij] = Tαij [ij] (11.68)
i=k+1 j=−∞ i=k+1 j=−∞
∞
Y k
Y ∞
a k
Y
= Tαij [ij] = Tαij [ij] ,
i=k+1 j=−∞ i=k+1 j=−∞
∞ X
k−1
{k}
X
Uk =I+ αij Eij
j=k i=−∞
∞ k−1
Y a
= Tαij [ij] = · · · ,
j=k i=−∞
which says that one can build a parallel unit triangular matrix equally well
in any sequence—in contrast to the case of the general unit triangular ma-
trix, whose construction per (11.62) one must sequence carefully. (Though
eqn. 11.68 does not show them, even more sequences are possible. You can
scramble the factors’ ordering any random way you like. The multiplication
is fully commutative.) Under such conditions, the inverse of the parallel unit
278 CHAPTER 11. THE MATRIX
k ∞
{k} −1 {k}
X X
Lk =I− αij Eij = 2I − Lk
j=−∞ i=k+1
k
Y ∞
a
= T−αij [ij] = · · · ,
j=−∞ i=k+1
(11.69)
∞ X
k−1
{k} −1 {k}
X
Uk =I− αij Eij = 2I − Uk
j=k i=−∞
∞
a k−1
Y
= T−αij [ij] = · · · ,
j=k i=−∞
The inverse of a parallel unit triangular matrix is just the matrix itself,
only with each element off the main diagonal negated. Table 11.5 records a
few properties that come immediately of the last observation and from the
parallel unit triangular matrix’s basic layout.
32
There is some odd parochiality at play in applied mathematics when one calls such
collections of symbols as (11.69) “particularly simple.” Nevertheless, in the present context
the idea (11.69) represents is indeed simple: that one can multiply constituent elementaries
in any order and still reach the same parallel unit triangular matrix; that the elementaries
in this case do not interfere.
11.8. THE UNIT TRIANGULAR MATRIX 279
Table 11.5: Properties of the parallel unit triangular matrix. (In the table,
the notation Iab represents the generalized dimension-limited indentity ma-
{k} −1 {k}′
trix or truncator of eqn. 11.30. Note that the inverses Lk = Lk and
{k} −1 {k}′
Uk = Uk are parallel unit triangular matrices themselves, such that
the table’s properties hold for them, too.)
{k} {k} −1 {k} {k} −1
Lk + Lk Uk + Uk
=I =
2 2
∞ k{k} {k} ∞ {k} k
Ik+1 Lk I−∞ = Lk −I = Ik+1 (Lk − I)I−∞
k−1 {k} {k} k−1 {k}
I−∞ Uk Ik∞ = Uk − I = I−∞ (Uk − I)Ik∞
{k}
If Lk honors an n × n active region, then
{k} {k} {k}
(In − Ik )Lk Ik = Lk − I = (In − Ik )(Lk − I)Ik
{k} {k}
and (I − In )(Lk − I) = 0 = (Lk − I)(I − In ).
{k}
If Uk honors an n × n active region, then
{k} {k} {k}
Ik−1 Uk (In − Ik−1 ) = Uk − I = Ik−1 (Uk − I)(In − Ik−1 )
{k} {k}
and (I − In )(Uk − I) = 0 = (Uk − I)(I − In ).
280 CHAPTER 11. THE MATRIX
Besides the notation L and U for the general unit lower and unit upper
{k} {k}
triangular matrices and the notation Lk and Uk for the parallel unit
lower and unit upper triangular matrices, we shall find it useful to introduce
the additional notation
∞ X
X ∞
[k]
L = I+ αij Eij (11.70)
j=k i=j+1
2 3
.. .. .. .. .. ..
6 . . . . . . 7
··· 1 0 0 0 0 ···
6 7
6 7
··· 0 1 0 0 0 ···
6 7
6 7
= 6
6 ··· 0 0 1 0 0 ··· 7,
7
··· 0 0 ∗ 1 0 ···
6 7
6 7
··· 0 0 ∗ ∗ 1 ···
6 7
6 7
4 .. .. .. .. .. ..
5
. . . . . .
k
X j−1
X
[k]
U = I+ αij Eij (11.71)
j=−∞ i=−∞
2 3
.. .. .. .. .. ..
6 . . . . . . 7
6
6 ··· 1 ∗ ∗ 0 0 ··· 7
7
··· 0 1 ∗ 0 0 ···
6 7
6 7
= 6
6 ··· 0 0 1 0 0 ···
7
7
··· 0 0 0 1 0 ···
6 7
6 7
··· 0 0 0 0 1 ···
6 7
6 7
4 .. .. .. .. .. ..
5
. . . . . .
11.8. THE UNIT TRIANGULAR MATRIX 281
k
X ∞
X
L{k} = I + αij Eij (11.72)
j=−∞ i=j+1
2 3
.. .. .. .. .. ..
6 . . . . . . 7
6
6 ··· 1 0 0 0 0 ··· 7
7
··· ∗ 1 0 0 0 ···
6 7
6 7
= 6
6 ··· ∗ ∗ 1 0 0 ··· 7,
7
··· ∗ ∗ ∗ 1 0 ···
6 7
6 7
··· ∗ ∗ ∗ 0 1 ···
6 7
6 7
4 .. .. .. .. .. ..
5
. . . . . .
j−1
∞ X
X
{k}
U = I+ αij Eij (11.73)
j=k i=−∞
2 3
.. .. .. .. .. ..
6 . . . . . . 7
··· 1 0 ∗ ∗ ∗ ···
6 7
6 7
··· 0 1 ∗ ∗ ∗ ···
6 7
6 7
= 6
6 ··· 0 0 1 ∗ ∗ ···
7
7
··· 0 0 0 1 ∗ ···
6 7
6 7
··· 0 0 0 0 1 ···
6 7
6 7
4 .. .. .. .. .. ..
5
. . . . . .
for the supplementary forms.33 Such notation is not standard in the liter-
ature, but it serves a purpose in this book and is introduced here for this
reason. If names are needed for L[k], U [k] , L{k} and U {k} , the former pair can
be called minor partial unit triangular matrices, and the latter pair, major
partial unit triangular matrices. Whether minor or major, the partial unit
triangular matrix is a matrix which leftward or rightward of the kth column
resembles I. Of course partial unit triangular matrices which resemble I
above or below the kth row are equally possible, and can be denoted L[k]T ,
U [k]T , L{k}T and U {k}T .
{k} {k}
Observe that the parallel unit triangular matrices Lk and Uk of
§ 11.8.4 are in fact also major partial unit triangular matrices, as the nota-
tion suggests.
33 {k}
The notation is arguably imperfect in that LP + L[k] − P
I 6= L but rather that
{k} [k+1]
− I = L. The conventional notation k=a f (k) + ck=b f (k) 6= ck=a f (k)
b P
L +L
suffers the same arguable imperfection.
282 CHAPTER 11. THE MATRIX
This is called the Jacobian derivative, the Jacobian matrix, or just the Ja-
cobian.34 Each of its columns is the derivative with respect to one element
of x.
The Jacobian derivative of a vector with respect to itself is
dx
= I. (11.78)
dx
The derivative is not In as one might think, because, even if x has only n
elements, still, one could vary xn+1 in principle, and ∂xn+1 /∂xn+1 6= 0.
The Jacobian derivative obeys the derivative product rule (4.25) in the
form35
" # " T #T
d T df dg
g Af = gTA + Af ,
dx dx dx
" # " ∗ #T (11.79)
d ∗ df dg
g Af = g∗ A
+ Af ,
dx dx dx
valid for any constant matrix A—as is seen by applying the definition (4.19)
of the derivative, which here is
and simplifying.
The shift operator of § 11.9 and the Jacobian derivative of this section
complete the family of matrix rudiments we shall need to begin to do in-
creasingly interesting things with matrices in Chs. 13 and 14. Before doing
interesting things, however, we must treat two more foundational matrix
matters. The two are the Gauss-Jordan decomposition and the matter of
matrix rank, which will be the subjects of Ch. 12, next.
34
[67, “Jacobian,” 00:50, 15 Sept. 2007]
35
Notice that the last term on (11.79)’s second line is transposed, not adjointed.
284 CHAPTER 11. THE MATRIX
Chapter 12
Chapter 11 has brought the matrix and its rudiments, the latter including
Such rudimentary forms have useful properties, as we have seen. The general
matrix A does not necessarily have any of these properties, but it turns out
that one can factor any matrix whatsoever into a product of rudiments which
do have the properties, and that several orderly procedures are known to
do so. The simplest of these, and indeed one of the more useful, is the
Gauss-Jordan decomposition. This chapter introduces it.
Section 11.3 has deëmphasized the concept of matrix dimensionality m×
n, supplying in its place the new concept of matrix rank. However, that
section has actually defined rank only for the rank-r identity matrix Ir . In
fact all matrices have rank. This chapter explains.
285
286 CHAPTER 12. RANK AND THE GAUSS-JORDAN
α1 a1 + α2 a2 + α3 a3 + · · · + αn an 6= 0 (12.1)
β1 a1 + β2 a2 + β3 a3 + · · · + βn an = b,
possible? To answer the question, suppose that it were possible. The differ-
ence of the two equations then would be
could define the empty set to be linearly dependent if one really wanted to, but what
then of the observation that adding a vector to a linearly dependent set never renders
the set independent? Surely in this light it is preferable justP
to define the empty set as
independent in the first place. Similar thinking makes 0! = 1, −1 k
k=0 ak z = 0, and 2 not 1
the least prime, among other examples.
288 CHAPTER 12. RANK AND THE GAUSS-JORDAN
where
T[i↔j] IT[i↔j] = I
T[i↔j]P T[i↔j] = P ′
T[i↔j]DT[i↔j] = D ′ = D + ([D]jj − [D]ii ) Eii + ([D]ii − [D]jj ) Ejj
T[i↔j]DT[i↔j] = D if [D]ii = [D]jj
[k]
T[i↔j]L T[i↔j] = L[k] if i < k and j < k
T[i↔j]U [k] T[i↔j] = U [k] if i > k and j > k
T[i↔j] L{k} T[i↔j] = L {k} ′ if i > k and j > k
T[i↔j] U {k}
T[i↔j] = U {k} ′ if i < k and j < k
{k} {k} ′
T[i↔j] Lk T[i↔j] = Lk if i > k and j > k
{k} {k} ′
T[i↔j] Uk T[i↔j] = Uk if i < k and j < k
Tα[i] IT(1/α)[i] = I
Tα[i] DT(1/α)[i] = D
Tα[i] AT(1/α)[i] = A′ where A is any of
{k} {k}
L, U, L[k] , U [k] , L{k} , U {k} , Lk , Uk
Tα[ij] IT−α[ij] = I
Tα[ij] DT−α[ij] = D + ([D]jj − [D]ii ) αEij 6= D ′
Tα[ij] DT−α[ij] = D if [D]ii = [D]jj
Tα[ij] LT−α[ij] = L′ if i > j
Tα[ij] U T−α[ij] = U′ if i < j
290 CHAPTER 12. RANK AND THE GAUSS-JORDAN
• L and U are respectively unit lower and unit upper triangular matrices
(§ 11.8);
{r}T
• K = Lk is the transpose of a parallel unit lower triangular matrix,
being thus a parallel unit upper triangular matrix (§ 11.8.4);
• r is an unspecified rank.
12.3.1 Motive
Equation (12.2) seems inscrutable. The equation itself is easy enough to
read, but just as there are many ways to factor a scalar (0xC = [4][3] =
[2]2 [3] = [2][6], for example), there are likewise many ways to factor a matrix.
Why choose this particular way?
There are indeed many ways. We shall meet some of the others in
§§ 13.11, 14.6, 14.10 and 14.12. The Gauss-Jordan decomposition we meet
here however has both significant theoretical properties and useful practical
applications, and in any case needs less advanced preparation to appreciate
than the others, and (at least as developed in this book) precedes the oth-
ers logically. It emerges naturally when one posits a pair of square, n × n
3
One can pronounce G> and G< respectively as “G acting rightward” and “G acting
leftward.” The letter G itself can be regarded as standing for “Gauss-Jordan,” but ad-
mittedly it is chosen as much because otherwise we were running out of available Roman
capitals!
12.3. THE GAUSS-JORDAN DECOMPOSITION 291
matrices, A and A−1 , for which A−1 A = In , where A is known and A−1 is
to be determined. (The A−1 here is the A−1(n) of eqn. 11.49. However, it is
only supposed here that A−1 A = In ; it is not yet claimed that AA−1 = In .)
To determine A−1 is not an entirely trivial problem. The matrix A−1
such that A−1 A = In may or may not exist (usually it does exist if A is
square, but even then it may not, as we shall soon see), and even if it does
exist, how to determine it is not immediately obvious. And still, if one
can determine A−1 , that is only for square A; what if A is not square? In
the present subsection however we are not trying to prove anything, only
to motivate, so for the moment let us suppose a square A for which Q A−1
−1
does exist, and let us seek A by left-multiplying A by a sequence T
of elementary row operators, each of which makes the matrix more nearly
resemble In . When In is finally achieved, then we shall have that
Y
T (A) = In ,
4
Theoretically, all elementary operators including the ones here have extended-
operational form (§ 11.3.2), but all those · · · ellipses clutter the page too much. Only
the 2 × 2 active regions are shown here.
292 CHAPTER 12. RANK AND THE GAUSS-JORDAN
Hence,
" #" #" #" #" # " #
1 1 2
−1 1 0 1 2 1 0 1 0 0 −A
A = 1
2
= 3
5
1
.
0 1 0 1 0 5
−3 1 0 1 −A 5
Using the elementary commutation identity that Tβ[m] Tα[mj] = Tαβ[mj] Tβ[m] ,
from Table 11.2, to group like operators, we have that
" #" #" #" #" # " #
1 1 2
−1 1 0 1 2 1 0 1 0 0 −A
A = 2
= 5
;
0 1 0 1 − 35 1 0 1
5
0 1 3
−A 1
5
or, multiplying the two scaling elementaries to merge them into a single
general scaling operator (§ 11.7.2),
" #" #" #" # " #
1 1 2
1 0 1 2 1 0 0 −A
A−1 = 2
= 5
.
0 1 0 1 − 35 1 0 1
5
3
−A 1
5
from which
» –» –» –» – » –
2 0 1 0 1 −2 1 0 2 −4
A = DLU I2 = 0 5 3
1 0 1 0 1
= 3 −1
.
5
12.3.2 Method
The Gauss-Jordan decomposition of a matrix A is not discovered at one
stroke but rather is gradually built up, elementary by elementary. It begins
with the equation
A = IIIIAII,
where the six I hold the places of the six Gauss-Jordan factors P , D, L, U ,
K and S of (12.2). By successive elementary operations, the A on the right
is gradually transformed into Ir , while the six I are gradually transformed
into the six Gauss-Jordan factors. The decomposition thus ends with the
equation
A = P DLU Ir KS,
which is (12.2). In between, while the several matrices are gradually being
transformed, the equation is represented as
where the initial value of I˜ is A and the initial values of P̃ , D̃, etc., are all I.
Each step of the transformation goes as follows. The matrix I˜ is left- or
right-multiplied by an elementary operator T . To compensate, one of the
six factors is right- or left-multiplied by T −1 . Intervening factors are mul-
tiplied by both T and T −1 , which multiplication constitutes an elementary
similarity transformation as described in § 12.2. For example,
A = P̃ D̃T(1/α)[i] Tα[i] L̃T(1/α)[i] Tα[i] Ũ T(1/α)[i] Tα[i] I˜ K̃ S̃,
which is just (12.4), inasmuch as the adjacent elementaries cancel one an-
other; then,
I˜ ← Tα[i] I,
˜
Ũ ← Tα[i] Ũ T(1/α)[i] ,
L̃ ← Tα[i] L̃T(1/α)[i] ,
D̃ ← D̃T(1/α)[i] ,
thus associating the operation with the appropriate factor—in this case, D̃.
Such elementary row and column operations are repeated until I˜ = Ir , at
which point (12.4) has become the Gauss-Jordan decomposition (12.2).
294 CHAPTER 12. RANK AND THE GAUSS-JORDAN
• reduces the now unit triangular I˜ further to the rank-r identity ma-
trix Ir (steps 9 through 13).
Specifically, the algorithm decrees the following steps. (The steps as written
include many parenthetical remarks—so many that some steps seem to con-
sist more of parenthetical remarks than of actual algorithm. The remarks
are unnecessary to execute the algorithm’s steps as such. They are however
necessary to explain and to justify the algorithm’s steps to the reader.)
1. Begin by initializing
P̃ ← I, D̃ ← I, L̃ ← I, Ũ ← I, K̃ ← I, S̃ ← I,
I˜ ← A,
i ← 1,
2. (Besides arriving at this point from step 1 above, the algorithm also
reënters here from step 7 below. From step 1, I˜ = A and L̃ = I, so
12.3. THE GAUSS-JORDAN DECOMPOSITION 295
this step 2 though logical seems unneeded. The need grows clear once
one has read through step 7.) Observe that neither the ith row of I˜
nor any row below it has an entry left of the ith column, that I˜ is
all-zero below-leftward of and directly leftward of (though not directly
below) the pivot element ı̃ ii .5 Observe also that above the ith row, the
matrix has proper unit upper triangular form (§ 11.8). Regarding the
other factors, notice that L̃ enjoys the major partial unit triangular
form L{i−1} (§ 11.8.5) and that d˜kk = 1 for all k ≥ i. Pictorially,
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· ∗ 0 0 0 0 0 0 ··· 7
7
6
6 ··· 0 ∗ 0 0 0 0 0 ··· 7
7
··· 0 0 ∗ 0 0 0 0 ···
6 7
6 7
D̃ = 6
6 ··· 0 0 0 1 0 0 0 ··· 7,
7
··· 0 0 0 0 1 0 0 ···
6 7
6 7
··· 0 0 0 0 0 1 0 ···
6 7
6 7
··· 0 0 0 0 0 0 1 ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· 1 0 0 0 0 0 0 ··· 7
7
6
6 ··· ∗ 1 0 0 0 0 0 ··· 7
7
··· ∗ ∗ 1 0 0 0 0 ···
6 7
6 7
L̃ = L{i−1} = 6
6 ··· ∗ ∗ ∗ 1 0 0 0 ··· 7,
7
··· ∗ ∗ ∗ 0 1 0 0 ···
6 7
6 7
··· ∗ ∗ ∗ 0 0 1 0 ···
6 7
6 7
··· ∗ ∗ ∗ 0 0 0 1 ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· 1 ∗ ∗ ∗ ∗ ∗ ∗ ··· 7
7
6
6 ··· 0 1 ∗ ∗ ∗ ∗ ∗ ··· 7
7
··· 0 0 1 ∗ ∗ ∗ ∗ ···
6 7
I˜ =
6 7
··· 0 0 0 ∗ ∗ ∗ ∗ ··· 7,
6 7
6
··· 0 0 0 ∗ ∗ ∗ ∗ ···
6 7
6 7
··· 0 0 0 ∗ ∗ ∗ ∗ ···
6 7
6 7
··· 0 0 0 ∗ ∗ ∗ ∗ ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
where the ith row and ith column are depicted at center.
5
The notation ı̃ ii looks interesting, but this is accidental. The ı̃ relates not to the
doubled, subscribed index ii but to I.˜ The notation ı̃ ii thus means [I]
˜ ii —in other words,
it means the current iith element of the variable working matrix I. ˜
296 CHAPTER 12. RANK AND THE GAUSS-JORDAN
6
A typical Intel or AMD x86-class computer processor represents a C/C++ double-
type floating-point number, x = 2p b, in 0x40 bits of computer memory. Of the 0x40
bits, 0x34 are for the number’s mantissa 2.0 ≤ b < 4.0 (not 1.0 ≤ b < 2.0 as one might
expect), 0xB are for the number’s exponent −0x3FF ≤ p ≤ 0x3FE, and one is for the
number’s ± sign. (The mantissa’s high-order bit, which is always 1, is implied not stored,
thus is one neither of the 0x34 nor of the 0x40 bits.) The out-of-bounds exponents p =
−0x400 and p = 0x3FF serve specially respectively to encode 0 and ∞. All this is standard
computing practice. Such a floating-point representation is easily accurate enough for most
practical purposes, but of course it is not generally exact. [33, § 1-4.2.2]
7
The Gauss-Jordan’s floating-point errors come mainly from dividing by small pivots.
Such errors are naturally avoided by avoiding small pivots, at least until as late in the
algorithm as possible. Smallness however is relative: a small pivot in a row and a column
each populated by even smaller elements is unlikely to cause as much error as is a large
pivot in a row and a column each populated by even larger elements.
To choose a pivot, any of several heuristics are reasonable. The following heuristic if
programmed intelligently might not be too computationally expensive: Define the pivot-
smallness metric
2 2ı̃ ∗pqı̃ pq
η̃pq ≡ Pm ∗ Pn ∗ .
p′ =i ı̃ p′ qı̃ p′ q + q ′ =i ı̃ pq ′ ı̃ pq ′
2
Choose the p and q of least η̃pq . If two are equally least, then choose first the lesser column
index q, then if necessary the lesser row index p.
12.3. THE GAUSS-JORDAN DECOMPOSITION 297
let
P̃ ← P̃ T[p↔i],
L̃ ← T[p↔i]L̃T[p↔i] ,
I˜ ← T[p↔i]IT
˜ [i↔q] ,
S̃ ← T[i↔q] S̃,
thus interchanging the pth with the ith row and the qth with the
ith column, to bring the chosen element to the pivot position. (Re-
fer to Table 12.1 for the similarity transformations. The Ũ and K̃
transformations disappear because at this stage of the algorithm, still
Ũ = K̃ = I. The D̃ transformation disappears because p ≥ i and
because d˜kk = 1 for all k ≥ i. Regarding the L̃ transformation, it does
not disappear, but L̃ has major partial unit triangular form L{i−1} ,
which form according to Table 12.1 it retains since i − 1 < i ≤ p.)
A = P̃ D̃Tı̃ ii [i] T(1/ı̃ ii )[i] L̃Tı̃ ii [i] T(1/ı̃ ii )[i] Ũ Tı̃ ii [i]
× T(1/ı̃ ii )[i] I˜ K̃ S̃
= P̃ D̃Tı̃ ii [i] T(1/ı̃ ii )[i] L̃Tı̃ ii [i] Ũ T(1/ı̃ ii )[i] I˜ K̃ S̃,
D̃ ← D̃Tı̃ ii [i] ,
L̃ ← T(1/ı̃ ii )[i] L̃Tı̃ ii [i] ,
I˜ ← T(1/ı̃ )[i] I.
ii
˜
this step,
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· ∗ 0 0 0 0 0 0 ··· 7
7
6
6 ··· 0 ∗ 0 0 0 0 0 ··· 7
7
··· 0 0 ∗ 0 0 0 0 ···
6 7
6 7
D̃ = 6
6 ··· 0 0 0 ∗ 0 0 0 ··· 7,
7
··· 0 0 0 0 1 0 0 ···
6 7
6 7
··· 0 0 0 0 0 1 0 ···
6 7
6 7
··· 0 0 0 0 0 0 1 ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· 1 ∗ ∗ ∗ ∗ ∗ ∗ ··· 7
7
6
6 ··· 0 1 ∗ ∗ ∗ ∗ ∗ ··· 7
7
··· 0 0 1 ∗ ∗ ∗ ∗ ···
6 7
I˜ =
6 7
··· 0 0 0 1 ∗ ∗ ∗ ··· 7.
6 7
6
··· 0 0 0 ∗ ∗ ∗ ∗ ···
6 7
6 7
··· 0 0 0 ∗ ∗ ∗ ∗ ···
6 7
6 7
··· 0 0 0 ∗ ∗ ∗ ∗ ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
(Though the step changes L̃, too, again it leaves L̃ in the major partial
unit triangular form L{i−1} , because i − 1 < i. Refer to Table 12.1.)
This forces ı̃ ip = 0 for all p > i. It also fills in L̃’s ith column below the
pivot, advancing that matrix from the L{i−1} form to the L{i} form.
12.3. THE GAUSS-JORDAN DECOMPOSITION 299
Pictorially,
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· 1 0 0 0 0 0 0 ··· 7
7
6
6 ··· ∗ 1 0 0 0 0 0 ··· 7
7
··· ∗ ∗ 1 0 0 0 0 ···
6 7
6 7
L̃ = L{i} = 6
6 ··· ∗ ∗ ∗ 1 0 0 0 ··· 7,
7
··· ∗ ∗ ∗ ∗ 1 0 0 ···
6 7
6 7
··· ∗ ∗ ∗ ∗ 0 1 0 ···
6 7
6 7
··· ∗ ∗ ∗ ∗ 0 0 1 ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· 1 ∗ ∗ ∗ ∗ ∗ ∗ ··· 7
7
6
6 ··· 0 1 ∗ ∗ ∗ ∗ ∗ ··· 7
7
··· 0 0 1 ∗ ∗ ∗ ∗ ···
6 7
I˜ =
6 7
··· 0 0 0 1 ∗ ∗ ∗ ··· 7.
6 7
6
··· 0 0 0 0 ∗ ∗ ∗ ···
6 7
6 7
··· 0 0 0 0 ∗ ∗ ∗ ···
6 7
6 7
··· 0 0 0 0 ∗ ∗ ∗ ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
7. Increment
i ← i + 1.
Go back to step 2.
8. Decrement
i←i−1
to undo the last instance of step 7 (even if there never was an instance
of step 7), thus letting i point to the matrix’s last nonzero row. After
decrementing, let the rank
r ≡ i.
Notice that, certainly, r ≤ m and r ≤ n.
9. (Besides arriving at this point from step 8 above, the algorithm also
reënters here from step 11 below.) If i = 0, then skip directly to
step 12.
300 CHAPTER 12. RANK AND THE GAUSS-JORDAN
This forces ı̃ ip = 0 for all p 6= i. It also fills in Ũ ’s ith column above the
pivot, advancing that matrix from the U {i+1} form to the U {i} form.
Pictorially,
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· 1 0 0 ∗ ∗ ∗ ∗ ··· 7
7
6
6 ··· 0 1 0 ∗ ∗ ∗ ∗ ··· 7
7
··· 0 0 1 ∗ ∗ ∗ ∗ ···
6 7
6 7
Ũ = U {i} = 6
6 ··· 0 0 0 1 ∗ ∗ ∗ ··· 7,
7
··· 0 0 0 0 1 ∗ ∗ ···
6 7
6 7
··· 0 0 0 0 0 1 ∗ ···
6 7
6 7
··· 0 0 0 0 0 0 1 ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· 1 ∗ ∗ 0 0 0 0 ··· 7
7
6
6 ··· 0 1 ∗ 0 0 0 0 ··· 7
7
··· 0 0 1 0 0 0 0 ···
6 7
I˜ =
6 7
··· 0 0 0 1 0 0 0 ··· 7.
6 7
6
··· 0 0 0 0 1 0 0 ···
6 7
6 7
··· 0 0 0 0 0 1 0 ···
6 7
6 7
··· 0 0 0 0 0 0 1 ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
(As in step 6, here again it is not necessary actually to apply the ad-
dition elementaries one by one. Together they easily form an addition
quasielementary U[i] . See § 11.7.3.)
12. Notice that I˜ now has the form of a rank-r identity matrix, except
with n − r extra columns dressing its right edge (often r = n however;
then there are no extra columns). Pictorially,
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· 1 0 0 0 ∗ ∗ ∗ ··· 7
7
··· 0 1 0 0 ∗ ∗ ∗ ···
6 7
6 7
··· 0 0 1 0 ∗ ∗ ∗ ···
6 7
I˜ =
6 7
··· 0 0 0 1 ∗ ∗ ∗ ··· 7.
6 7
6
··· 0 0 0 0 0 0 0 ···
6 7
6 7
··· 0 0 0 0 0 0 0 ···
6 7
6 7
··· 0 0 0 0 0 0 0 ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
End.
Never stalling, the algorithm cannot fail to achieve I˜ = Ir and thus a com-
plete Gauss-Jordan decomposition of the form (12.2), though what value
302 CHAPTER 12. RANK AND THE GAUSS-JORDAN
the rank r might turn out to have is not normally known to us in advance.
(We have not yet proven, but shall in § 12.5, that the algorithm always pro-
duces the same Ir , the same rank r ≥ 0, regardless of which pivots ı̃ pq 6= 0
one happens to choose in step 3 along the way. We can safely ignore this
unproven fact however for the immediate moment.)
r ≤ m,
(12.5)
r ≤ n.
The rank r exceeds the number neither of the matrix’s rows nor of its
columns. This is unsurprising. Indeed the narrative of the algorithm’s
step 8 has already noticed the fact.
Observe also however that the rank always fully reaches r = m if the
rows of the original matrix A are linearly independent. The reason for this
observation is that the rank can fall short, r < m, only if step 3 finds a null
row i ≤ m; but step 3 can find such a null row only if step 6 has created one
(or if there were a null row in the original matrix A; but according to § 12.1,
such null rows never were linearly independent in the first place). How do
we know that step 6 can never create a null row? We know this because the
action of step 6 is to add multiples only of current and earlier pivot rows
to rows in I˜ which have not yet been on pivot.8 According to (12.1), such
action has no power to cancel the independent rows it targets.
8
If the truth of the sentence’s assertion regarding the action of step 6 seems nonobvious,
one can drown the assertion rigorously in symbols to prove it, but before going to that
extreme consider: The action of steps 3 and 4 is to choose a pivot row p ≥ i and to
shift it upward to the ith position. The action of step 6 then is to add multiples of the
chosen pivot row downward only—that is, only to rows which have not yet been on pivot.
This being so, steps 3 and 4 in the second iteration find no unmixed rows available to
choose as second pivot, but find only rows which already include multiples of the first
pivot row. Step 6 in the second iteration therefore adds multiples of the second pivot row
downward, which already include multiples of the first pivot row. Step 6 in the ith iteration
adds multiples of the ith pivot row downward, which already include multiples of the first
through (i − 1)th. So it comes to pass that multiples only of current and earlier pivot rows
are added to rows which have not yet been on pivot. To no row is ever added, directly
or indirectly, a multiple of itself—until step 10, which does not belong to the algorithm’s
main loop and has nothing to do with the availability of nonzero rows to step 3.
12.3. THE GAUSS-JORDAN DECOMPOSITION 303
P̃ −1 ← I; D̃−1 ← I; L̃−1 ← I; Ũ −1 ← I; K̃ −1 ← I; S̃ −1 ← I.
(There is no I˜−1 , not because it would not be useful, but because its initial
value would be9 A−1(r) , unknown at algorithm’s start.) Then, for each
operation on any of P̃ , D̃, L̃, Ũ , K̃ or S̃, one operates inversely on the
corresponding inverse matrix. For example, in step 5,
With this simple extension, the algorithm yields all the factors not only
of the Gauss-Jordan decomposition (12.2) but simultaneously also of the
Gauss-Jordan’s complementary form (12.3).
A = Im P DLU Ir KSIn
7
= Im P DLU Ir2 KSIn3 ;
A = (Im P Im )(Im DIm )(Im LIm )(Im U Ir )(Ir KIn )(In SIn ), (12.6)
where the dimensionalities of the six factors on the equation’s right side are
respectively m × m, m × m, m × m, m × r, r × n and n × n. Equation (12.6)
expresses any dimension-limited rectangular matrix A as the product of six
particularly simple, dimension-limited rectangular factors.
By similar reasoning from (12.2),
P ∗ = P −1 = P T
S ∗ = S −1 = S T
P −∗ = P = P −T
S −∗ = S = S −T
K + K −1
=I
2
Ir K(In − Ir ) = K −I = Ir (K − I)(In − Ir )
Ir K −1 (In − Ir ) = K −1 − I = Ir (K −1 − I)(In − Ir )
(I − In )(K − I) = 0 = (K − I)(I − In )
(I − In )(K −1 − I) = 0 = (K −1 − I)(I − In )
A = P DLU KS,
from which, inasmuch as all the factors present but In are n × n extended
operators, the preceding equation results.
One can decompose only reversible extended operators so. The Gauss-
Jordan fails on irreversible extended operators, for which the rank of the
truncated form AIn is r < n. See § 12.5.
This subsection’s equations remain unnumbered because they say little
new. Their only point, really, is that what an operator does outside some
appropriately delimited active region is seldom interesting, because the vec-
tor on which the operator ultimately acts is probably null there in any event.
In such a context it may not matter whether one truncates the operator.
Indeed this was also the point of § 12.3.6 and, if you like, of (11.31), too.11
11
If “it may not matter,” as the narrative says, then one might just put all matrices
in dimension-limited form. Many books do. To put them all in dimension-limited form
however brings at least three effects the book you are reading prefers to avoid. First, it
leaves shift-and-truncate operations hard to express cleanly (refer to §§ 11.3.6 and 11.9
and, as a typical example of the usage, eqn. 13.7). Second, it confuses the otherwise natural
extension of discrete vectors into continuous functions. Third, it leaves one to consider
the ranks of reversible operators like T[1↔2] that naturally should have no rank. The last
12.4. VECTOR REPLACEMENT 307
{u, a1 , a2 , . . . , am }.
As a definition, the space these vectors address consists of all linear combina-
tions of the set’s several vectors. That is, the space consists of all vectors b
formable as
βo u + β1 a1 + β2 a2 + · · · + βm am = b. (12.9)
Now consider a specific vector v in the space,
ψo u + ψ1 a1 + ψ2 a2 + · · · + ψm am = v, (12.10)
for which
ψo 6= 0.
Solving (12.10) for u, we find that
1 ψ1 ψ2 ψm
v− a1 − a2 − · · · − am = u.
ψo ψo ψo ψo
of the three is arguably most significant: matrix rank is such an important attribute that
one prefers to impute it only to those operators about which it actually says something
interesting.
Nevertheless, the extended-operational matrix form is hardly more than a formality. All
it says is that the extended operator unobtrusively leaves untouched anything it happens
to find outside its operational domain, whereas a dimension-limited operator would have
truncated whatever it found there. Since what is found outside the operational domain is
often uninteresting, this may be a distinction without a difference, which one can safely
ignore.
308 CHAPTER 12. RANK AND THE GAUSS-JORDAN
1
φo ← ,
ψo
ψ1
φ1 ← − ,
ψo
ψ2
φ2 ← − ,
ψo
..
.
ψm
φm ← − ,
ψo
1
ψo = ,
φo
φ1
ψ1 = − ,
φo
φ2
ψ2 = − ,
φo
..
.
φm
ψm = − ,
φo
the solution is
φo v + φ1 a1 + φ2 a2 + · · · + φm am = u. (12.11)
Equation (12.11) has identical form to (12.10), only with the symbols u ↔ v
and ψ ↔ φ swapped. Since φo = 1/ψo , assuming that ψo is finite it even
appears that
φo 6= 0;
so, the symmetry is complete. Table 12.3 summarizes.
Now further consider an arbitrary vector b which lies in the space ad-
dressed by the vectors
{u, a1 , a2 , . . . , am }.
Does the same b also lie in the space addressed by the vectors
{v, a1 , a2 , . . . , am }?
12.4. VECTOR REPLACEMENT 309
ψo u + ψ1 a1 + ψ2 a2 φo v + φ1 a1 + φ2 a2
+ · · · + ψm am = v + · · · + φm am = u
1 1
0 6= = φo 0 6= = ψo
ψo φo
ψ1 φ1
− = φ1 − = ψ1
ψo φo
ψ2 φ2
− = φ2 − = ψ2
ψo φo
.. ..
. .
ψm φm
− = φm − = ψm
ψo φo
To show that it does, we substitute into (12.9) the expression for u from
(12.11), obtaining the form
(βo )(φo v + φ1 a1 + φ2 a2 + · · · + φm am ) + β1 a1 + β2 a2 + · · · + βm am = b.
in which we see that, yes, b does indeed also lie in the latter space. Nat-
urally the problem’s u ↔ v symmetry then guarantees the converse, that
an arbitrary vector b which lies in the latter space also lies in the former.
Therefore, a vector b must lie in both spaces or neither, never just in one
or the other. The two spaces are, in fact, one and the same.
This leads to the following useful conclusion. Given a set of vectors
{u, a1 , a2 , . . . , am },
one can safely replace the u by a new vector v, obtaining the new set
{v, a1 , a2 , . . . , am },
provided that the replacement vector v includes at least a little of the re-
placed vector u (ψo 6= 0 in eqn. 12.10) and that v is otherwise an honest
310 CHAPTER 12. RANK AND THE GAUSS-JORDAN
γo v + γ1 a1 + γ2 a2 + · · · + γm am = 0
12.5 Rank
Sections 11.3.5 and 11.3.6 have introduced the rank-r identity matrix Ir ,
where the integer r is the number of ones the matrix has along its main
diagonal. Other matrices have rank, too. Commonly, an n × n matrix has
rank r = n, but consider the matrix
2 3
5 1 6
4 3 6 9 5.
2 4 6
The third column of this matrix is the sum of the first and second columns.
Also, the third row is just two-thirds the second. Either way, by columns
or by rows, the matrix has only two independent vectors. The rank of this
3 × 3 matrix is not r = 3 but r = 2.
This section establishes properly the important concept of matrix rank.
The section demonstrates that every matrix has a definite, unambiguous
rank, and shows how this rank can be calculated.
To forestall potential confusion in the matter, we should immediately
observe that—like the rest of this chapter but unlike some other parts of the
book—this section explicitly trades in exact numbers. If a matrix element
12.5. RANK 311
One valid way to prove Q, then, would be to suppose P1 and show that it
led to Q, then alternately to suppose P2 and show that it separately led
to Q, then again to suppose P3 and show that it also led to Q. The final
step would be to show somehow that P1 , P2 and P3 could not possibly all be
false at once. Herein, the means is to assert several individually suspicious
claims, none of which one actually means to prove. The end which justifies
the means is the conclusion Q, which thereby one can and does prove.
It is a subtle maneuver. Once the reader feels that he grasps its logic,
he can proceed to the next subsection where the maneuver is put to use.
12
It is false to suppose that because applied mathematics permits imprecise quantities,
like 3.0 ± 0.1 inches for the length of your thumb, it also requires them. On the contrary,
the length of your thumb may indeed be 3.0±0.1 inches, but surely no triangle has 3.0±0.1
sides! A triangle has exactly three sides. The ratio of a circle’s circumference to its radius
is exactly 2π. The author has exactly one brother. A construction contract might require
the builder to finish within exactly 180 days (though the actual construction time might
be an inexact t = 172.6 ± 0.2 days), and so on. Exact quantities are every bit as valid
in applied mathematics as imprecise ones are. Where the distinction matters, it is the
applied mathematician’s responsibility to distinguish between the two kinds of quantity.
13
The maneuver’s name rings a bit sinister, does it not? The author recommends no
such maneuver in social or family life! Logically here however it helps the math.
312 CHAPTER 12. RANK AND THE GAUSS-JORDAN
AIr B = Is . (12.12)
(AIr )B = Is , (12.13)
where, because Ir attacking from the right is the column truncation oper-
ator (§ 11.3.6), the product AIr is a matrix with an unspecified number of
rows but only r columns—or, more precisely, with no more than r nonzero
columns. Viewed this way, per § 11.1.3, B operates on the r columns of AIr
to produce the s columns of Is .
The r columns of AIr are nothing more than the first through rth
columns of A. Let the symbols a1 , a2 , a3 , a4 , a5 , . . . , ar denote these columns.
The s columns of Is , then, are nothing more than the elementary vectors
e1 , e2 , e3 , e4 , e5 , . . . , es (§ 11.3.7). The claim (12.13) makes is thus that
the several vectors ak together address each of the several elementary vec-
tors ej —that is, that a linear combination14
{e1 , a2 , a3 , a4 , a5 , . . . , ar }.
The only restriction per § 12.4 is that e1 contain at least a little of the
vector ak it replaces—that bk1 6= 0. Of course there is no guarantee specif-
ically that b11 6= 0, so for e1 to replace a1 might not be allowed. However,
inasmuch as e1 is nonzero, then according to (12.14) at least one of the sev-
eral bk1 also is nonzero; and if bk1 is nonzero then e1 can replace ak . Some
of the ak might indeed be forbidden, but never all; there is always at least
one ak which e1 can replace. (For example, if a1 were forbidden because
b11 = 0, then a3 might be available instead because b31 6= 0. In this case the
new set would be {a1 , a2 , e1 , a4 , a5 , . . . , ar }.)
Here is where the logical maneuver of § 12.5.1 comes in. The book to this
point has established no general method to tell which of the several ak the
elementary vector e1 actually contains (§ 13.2 gives the method, but that
section depends logically on this one, so we cannot licitly appeal to it here).
According to (12.14), the vector e1 might contain some of the several ak
or all of them, but surely it contains at least one of them. Therefore, even
though it is illegal to replace an ak by an e1 which contains none of it, even
though we have no idea which of the several ak the vector e1 contains, even
though replacing the wrong ak logically invalidates any conclusion which
flows from the replacement, still we can proceed with the proof, provided
that the false choice and the true choice lead ultimately alike to the same
identical end. If they do, then all the logic requires is an assurance that
there does exist at least one true choice, even if we remain ignorant as to
which choice that is.
The claim (12.14) guarantees at least one true choice. Whether as the
maneuver also demands, all the choices, true and false, lead ultimately alike
to the same identical end remains to be determined.
314 CHAPTER 12. RANK AND THE GAUSS-JORDAN
{a1 , a2 , a3 , a4 , a5 , . . . , ar }.
Therefore as we have seen, e2 also lies in the space addressed by the new set
{e1 , a2 , a3 , a4 , a5 , . . . , ar }
Again it is impossible for all the coefficients βk2 to be zero, but moreover,
it is impossible for β12 to be the sole nonzero coefficient, for (as should
seem plain to the reader who grasps the concept of the elementary vector,
§ 11.3.7) no elementary vector can ever be a linear combination of other
elementary vectors alone! The linear combination which forms e2 evidently
includes a nonzero multiple of at least one of the remaining ak . At least
one of the βk2 attached to an ak (not β12 , which is attached to e1 ) must be
nonzero. Therefore by the same reasoning as before, we now choose an ak
with a nonzero coefficient βk2 6= 0 and replace it by e2 , obtaining an even
newer set of vectors like
{e1 , a2 , a3 , e2 , a5 , . . . , ar }.
This newer set addresses precisely the same space as the previous set, thus
also as the original set.
And so it goes, replacing one ak by an ej at a time, until all the ak are
gone and our set has become
{e1 , e2 , e3 , e4 , e5 , . . . , er },
which, as we have reasoned, addresses precisely the same space as did the
original set
{a1 , a2 , a3 , a4 , a5 , . . . , ar }.
And this is the one identical end the maneuver of § 12.5.1 has demanded. All
intermediate choices, true and false, ultimately lead to the single conclusion
of this paragraph, which thereby is properly established.
12.5. RANK 315
{a1 , a2 , a3 , a4 , a5 , . . . , ar }
{e1 , e2 , e3 , e4 , e5 , . . . , er }
A matrix A has rank r if and only if matrices B> and B< exist such that
B> AB< = Ir ,
−1 −1
A = B> Ir B< ,
−1 −1
(12.15)
B> B> = I = B> B> ,
−1 −1
B< B< = I = B< B< .
(B> G−1 −1
> )Is (G< B< ) = Ir ,
−1 −1
(G> B> )Ir (B< G< ) = Is .
Were it that r 6= s, then one of these two equations would constitute the
demotion of an identity matrix and the other, a promotion. But according
to § 12.5.2 and its (12.12), promotion is impossible. Therefore r 6= s is also
impossible, and
r=s
is guaranteed. No matrix has two different ranks. Matrix rank is unique.
This finding has two immediate implications:
The discovery that every matrix has a single, unambiguous rank and the
establishment of a failproof algorithm—the Gauss-Jordan—to ascertain that
rank have not been easy to achieve, but they are important achievements
12.5. RANK 317
nonetheless, worth the effort thereto. The reason these achievements matter
is that the mere dimensionality of a matrix is a chimerical measure of the
matrix’s true size—as for instance for the 3 × 3 example matrix at the head
of the section. Matrix rank by contrast is an entirely solid, dependable
measure. We shall rely on it often.
Section 12.5.8 comments further.
• a tall m × n matrix, m ≥ n, has full rank if and only if its columns are
linearly independent;
318 CHAPTER 12. RANK AND THE GAUSS-JORDAN
• a broad m × n matrix, m ≤ n, has full rank if and only if its rows are
linearly independent;
• a square n × n matrix, m = n, has full rank if and only if its columns
and/or its rows are linearly independent; and
• a square matrix has both independent columns and independent rows,
or neither; never just one or the other.
To say that a matrix has full column rank is to say that it is tall or
square and has full rank r = n ≤ m. To say that a matrix has full row
rank is to say that it is broad or square and has full rank r = m ≤ n. Only
a square matrix can have full column rank and full row rank at the same
time, because a tall or broad matrix cannot but include, respectively, more
columns or more rows than Ir .
Ir G< x = G−1
> b.
for unrestricted b when A lacks full row rank. The error is easy to commit
because the equation looks right, because such an equation is indeed valid
over a broad domain of b and might very well have been written correctly in
that context, only not in the context of unrestricted b. Analysis including
such an error can lead to subtly absurd conclusions. It is never such an
analytical error however to require that
Ax = 0
because, whatever other solutions such a system might have, it has at least
the solution x = 0.
A = BC. (12.16)
A = P DLU Ir KS,
A = P DLU In . (12.17)
12.5. RANK 321
Observe however that just because one theoretically can set S = I does
not mean that one actually should. The column permutor S exists to be
used, after all—especially numerically to avoid small pivots during early
invocations of the algorithm’s step 5. Equation (12.17) is not mandatory
but optional for a matrix A of full column rank (though still r = n and thus
K = I for such a matrix, even when the unabbreviated eqn. 12.2 is used).
There are however times when it is nice to know that one theoretically could,
if doing exact arithmetic, set S = I if one wanted to.
Since P DLU acts as a row operator, (12.17) implies that each row of
the full-rank matrix A lies in the space the rows of In address. This is
obvious and boring, but interesting is the converse implication of (12.17)’s
complementary form,
U −1 L−1 D −1 P −1 A = In ,
that each row of In lies in the space the rows of A address. The rows of In
and the rows of A evidently address the same space. One can moreover say
the same of A’s columns, since B = AT has full rank just as A does. In the
whole, if a matrix A is square and has full rank r = n, then A’s columns
together, A’s rows together, In ’s columns together and In ’s rows together
each address the same, complete n-dimensional space.
has two independent rows or, alternately, two independent columns, and,
hence, rank r = 2. One can easily construct a phony 3 × 3 matrix from
the honest 2 × 2, however, simply by applying some 3 × 3 row and column
322 CHAPTER 12. RANK AND THE GAUSS-JORDAN
elementaries:
2 3
» – 5 1 6
5 1
T(2/3)[32] T1[13] T1[23] = 4 3 6 9 5.
3 6
2 4 6
The 3 × 3 matrix on the equation’s right is the one we met at the head
of the section. It looks like a rank-three matrix, but really has only two
independent columns and two independent rows. Its true rank is r = 2. We
have here caught a matrix impostor pretending to be bigger than it really
is.17
Now, admittedly, adjectives like “honest” and “phony,” terms like “im-
poster,” are a bit hyperbolic. The last paragraph has used them to convey
the subjective sense of the matter, but of course there is nothing mathemat-
ically improper or illegal about a matrix of less than full rank, so long as the
true rank is correctly recognized. When one models a physical phenomenon
by a set of equations, one sometimes is dismayed to discover that one of the
equations, thought to be independent, is really just a useless combination
of the others. This can happen in matrix work, too. The rank of a matrix
helps one to recognize how many truly independent vectors, dimensions or
equations one actually has available to work with, rather than how many
seem available at first glance. That is the sense of matrix rank.
17
An applied mathematician with some matrix experience actually probably recognizes
this particular 3 × 3 matrix as a fraud on sight, but it is a very simple example. No one
can just look at some arbitrary matrix and instantly perceive its true rank. Consider for
instance the 5 × 5 matrix (in hexadecimal notation)
2 3
12 9 3 1 0
6 3 F 15
2 12 7
6 2 2 2
E
7
7.
6 D 9 −19 − −6 7
6 3
4 −2 0 6 1 5 5
1 −4 4 1 −8
As the reader can verify by the Gauss-Jordan algorithm, the matrix’s rank is not r = 5
but r = 4.
Chapter 13
Inversion and
orthonormalization
The undeniably tedious Chs. 11 and 12 have piled the matrix theory deep
while affording scant practical reward. Building upon the two tedious chap-
ters, this chapter brings the first rewarding matrix work.
One might be forgiven for forgetting after so many pages of abstract the-
ory that the matrix afforded any reward or had any use at all. Uses however
it has. Sections 11.1.1 and 12.5.5 have already broached1 the matrix’s most
basic use, the primary subject of this chapter, to represent a system of m
linear scalar equations in n unknowns neatly as
Ax = b
and to solve the whole system at once by inverting the matrix A that char-
acterizes it.
Now, before we go on, we want to confess that such a use alone, on
the surface of it—though interesting—might not have justified the whole
uncomfortable bulk of Chs. 11 and 12. We already knew how to solve a
simultaneous system of linear scalar equations in principle without recourse
to the formality of a matrix, after all, as in the last step to derive (3.9) as
far back as Ch. 3. Why should we have suffered two bulky chapters, if only
to prepare to do here something we already knew how to do?
The question is a fair one, but admits at least four answers. First,
the matrix neatly solves a linear system not only for a particular driving
vector b but for all possible driving vectors b at one stroke, as this chapter
1
The reader who has skipped Ch. 12 might at least review § 12.5.5.
323
324 CHAPTER 13. INVERSION AND ORTHONORMALIZATION
explains. Second and yet more impressively, the matrix allows § 13.6 to
introduce the pseudoinverse to approximate the solution to an unsolvable
linear system and, moreover, to do so both optimally and efficiently, whereas
such overdetermined systems arise commonly in applications. Third, to solve
the linear system neatly is only the primary and most straightforward use
of the matrix, not its only use: the even more interesting eigenvalue and
its incidents await Ch. 14. Fourth, specific applications aside, one should
never underestimate the blunt practical benefit of reducing an arbitrarily
large grid of scalars to a single symbol A, which one can then manipulate
by known algebraic rules. Most students first learning the matrix have
wondered at this stage whether it were worth all the tedium; so, if the
reader now wonders, then he stands in good company. The matrix finally
begins to show its worth here.
The chapter opens in § 13.1 by inverting the square matrix to solve the
exactly determined, n × n linear system in § 13.2. It continues in § 13.3 by
computing the rectangular matrix’s kernel to solve the nonoverdetermined,
m × n linear system in § 13.4. In § 13.6, it brings forth the aforementioned
pseudoinverse, which rightly approximates the solution to the unsolvable
overdetermined linear system. After briefly revisiting the Newton-Raphson
iteration in § 13.7, it concludes by introducing the concept and practice of
vector orthonormalization in §§ 13.8 through 13.12.
G−1 −1
> G> = I = G> G> ,
G−1 −1
< G< = I = G< G< , (13.1)
A = G> In G< .
In A = A = AIn ,
−1 −1
In G< G> = G−1 −1
< In G> = G−1 −1
< G> In ,
A = G> In G< ,
In A = G> G< In ,
−1 −1
G< G> In A = In ,
(G< In G−1
−1
> )(A) = In ;
2
The symbology and associated terminology might disorient a reader who had skipped
Chs. 11 and 12. In this book, the symbol I theoretically represents an ∞ × ∞ identity
matrix. Outside the m × m or n × n square, the operators G> and G< each resemble
the ∞ × ∞ identity matrix I, which means that the operators affect respectively only the
first m rows or n columns of the thing they operate on. (In the present section it happens
that m = n because the matrix A of interest is square, but this footnote uses both symbols
because generally m 6= n.)
The symbol Ir contrarily represents an identity matrix of only r ones, though it too
can be viewed as an ∞ × ∞ matrix with zeros in the unused regions. If interpreted as
an ∞ × ∞ matrix, the matrix A of the m × n system Ax = b has nonzero content only
within the m × n rectangle.
None of this is complicated, really. Its purpose is merely to separate the essential features
of a reversible operation like G> or G< from the dimensionality of the vector or matrix on
which the operation happens to operate. The definitions do however necessarily, slightly
diverge from definitions the reader may have been used to seeing in other books. In this
book, one can legally multiply any two matrices, because all matrices are theoretically
∞ × ∞, anyway (though whether it makes any sense in a given circumstance to multiply
mismatched matrices is another question; sometimes it does make sense, as in eqns. 13.24
and 14.49, but more often it does not—which naturally is why the other books tend to
forbid such multiplication).
To the extent that the definitions confuse, the reader might briefly review the earlier
chapters, especially § 11.3.
326 CHAPTER 13. INVERSION AND ORTHONORMALIZATION
or alternately that
A = G> In G< ,
AIn = In G> G< ,
−1 −1
AIn G< G> = In ,
(A)(G−1 −1
< In G> ) = In .
• Since A is square and has full rank (§ 12.5.4), its rows and, separately,
its columns are linearly independent, so it has only the one, unique
inverse A−1 . No other rank-n inverse of A exists.
• On the other hand, inasmuch as A is square and has full rank, it does
per (13.2) indeed have an inverse A−1 . The rank-n inverse exists.
That A−1 is an n × n square matrix of full rank and that A is itself the
inverse of A−1 proceed from the definition (13.2) of A−1 plus § 12.5.3’s find-
ing that reversible operations like G−1 −1
> and G< cannot change In ’s rank.
That the inverse exists is plain, inasmuch as the Gauss-Jordan decompo-
sition plus (13.2) reliably calculate it. That the inverse is unique begins
from § 12.5.4’s observation that the columns (like the rows) of A are lin-
early independent because A is square and has full rank. From this begin-
ning and the fact that In = AA−1 , it follows that [A−1 ]∗1 represents3 the
one and only possible combination of A’s columns which achieves e1 , that
[A−1 ]∗2 represents the one and only possible combination of A’s columns
which achieves e2 , and so on through en . One could observe likewise re-
specting the independent rows of A. Either way, A−1 is unique. Moreover,
no other n × n matrix B 6= A−1 satisfies either requirement of (13.2)—that
BA = In or that AB = In —much less both.
It is not claimed that the matrix factors G> and G< themselves are
unique, incidentally. On the contrary, many different pairs of matrix fac-
tors G> and G< can yield A = G> In G< , no less than that many different
pairs of scalar factors γ> and γ< can yield α = γ> 1γ< . Though the Gauss-
Jordan decomposition is a convenient means to G> and G< , it is hardly the
only means, and any proper G> and G< found by any means will serve so
long as they satisfy (13.1). What are unique are not the factors but the A
and A−1 they produce.
What of the degenerate n × n square matrix A′ , of rank r < n? Rank
promotion is impossible as §§ 12.5.2 and 12.5.3 have shown, so in the sense
of (13.2) such a matrix has no inverse; for, if it had, then A′−1 would by defi-
nition represent a row or column operation which impossibly promoted A′ to
the full rank r = n of In . Indeed, in that it has no inverse such a degenerate
matrix closely resembles the scalar 0, which has no reciprocal. Mathemati-
cal convention owns a special name for a square matrix which is degenerate
and thus has no inverse; it calls it a singular matrix.
3
The notation [A−1 ]∗j means “the jth column of A−1 .” Refer to § 11.1.3.
328 CHAPTER 13. INVERSION AND ORTHONORMALIZATION
Ax = b (13.3)
shall soon meet additional interesting applications of the matrix which in any
case require the theoretical ground to have been prepared. Equation (13.4)
is only the first fruit of the effort.
Where the inverse does not exist, where the square matrix A is singu-
lar, the rows of the matrix are linearly dependent, meaning that the cor-
responding system actually contains fewer than n useful scalar equations.
Depending on the value of the driving vector b, the superfluous equations
either merely reproduce or flatly contradict information the other equations
already supply. Either way, no unique solution to a linear system described
by a singular square matrix is possible—though a good approximate solu-
tion is given by the pseudoinverse of § 13.6. In the language of § 12.5.5, the
singular square matrix characterizes a system that is both underdetermined
and overdetermined, thus degenerate.
• AK has full rank n − r (that is, the columns of AK are linearly inde-
pendent, which gives AK full column rank), and
• AK satisfies the equation
AAK = 0. (13.6)
The n−r independent columns of the kernel matrix AK address the complete
space x = AK a of vectors in the kernel, where the (n − r)-element vector a
can have any value. In symbols,
Ax = A(AK a) = (AAK )a = 0.
The definition does not pretend that the kernel matrix AK is unique.
Except when A has full column rank the kernel matrix is not unique; there
are infinitely many kernel matrices AK to choose from for a given matrix A.
What is unique is not the kernel matrix but rather the space its columns
address, and it is the latter space rather than AK as such that is technically
the kernel (if you forget and call AK “a kernel,” though, you’ll be all right).
The Gauss-Jordan kernel formula 6
AK = S −1 K −1 Hr In−r = G−1
< Hr In−r (13.7)
gives a complete kernel AK of A, where S −1 , K −1 and G−1 < are the factors
their respective symbols indicate of the Gauss-Jordan decomposition’s com-
plementary form (12.3) and Hr is the shift operator of § 11.9. Section 13.3.1
derives the formula, next.
Rearranging terms,
Ir Sx = G−1
> b − Ir K(In − Ir )Sx. (13.9)
Equation (13.9) is interesting. It has Sx on both sides, where Sx is
the vector x with elements reordered in some particular way. The equation
has however on the left only Ir Sx, which is the first r elements of Sx; and
on the right only (In − Ir )Sx, which is the remaining n − r elements.7 No
element of Sx appears on both sides. Naturally this is no accident; we have
(probably after some trial and error not recorded here) planned the steps
leading to (13.9) to achieve precisely this effect. Equation (13.9) implies that
one can choose the last n − r elements of Sx freely, but that the choice then
determines the first r elements.
The implication is significant. To express the implication more clearly
we can rewrite (13.9) in the improved form
f = G−1
> b − Ir KHr a,
f
Sx = = f + Hr a,
a (13.10)
f ≡ Ir Sx,
a ≡ H−r (In − Ir )Sx,
where a represents the n − r free elements of Sx and f represents the r
dependent elements. This makes f and thereby also x functions of the free
parameter a and the driving vector b:
f (a, b) = G−1
> b − Ir KHr a,
(13.11)
f (a, b)
Sx(a, b) = = f (a, b) + Hr a.
a
If b = 0 as (13.5) requires, then
f (a, 0) = −Ir KHr a,
f (a, 0)
Sx(a, 0) = = f (a, 0) + Hr a.
a
7
Notice how we now associate the factor (In − Ir ) rightward as a row truncator, though
it had first entered acting leftward as a column truncator. The flexibility to reassociate
operators in such a way is one of many good reasons Chs. 11 and 12 have gone to such
considerable trouble to develop the basic theory of the matrix.
332 CHAPTER 13. INVERSION AND ORTHONORMALIZATION
Sx(ej , 0) = (I − Ir K)Hr ej .
For all the ej at once,
But if all the ej at once, the columns of In−r , exactly address the domain
of a, then the columns of x(In−r , 0) likewise exactly address the range of
x(a, 0). Equation (13.6) has already named this range AK , by which8
weighted by the elements of a. Seen from one perspective, this seems trivial; from another
perspective, baffling; until one grasps what is really going on here.
The idea is that if we can solve the problem for each elementary vector ej —that is,
in aggregate, if we can solve the problem for the identity matrix In−r —then we shall
implicitly have solved it for every a because a is a weighted combination of the ej and the
whole problem is linear. The solution
x = AK a
for a given choice of a becomes a weighted combination of the solutions for each ej , with
the elements of a again as the weights. And what are the solutions for each ej ? Answer:
the corresponding columns of AK , which by definition are the independent values of x
that cause b = 0.
13.3. THE KERNEL 333
where we have used Table 12.2 again in the last step. How to proceed
symbolically from (13.15) is not obvious, but if one sketches the matrices
of (13.15) schematically with a pencil, and if one remembers that K −1 is
just K with elements off the main diagonal negated, then it appears that
The appearance is not entirely convincing,9 but (13.16) though unproven still
helps because it posits a hypothesis toward which to target the analysis.
Two variations on the identities of Table 12.2 also help. First, from the
identity that
K + K −1
= I,
2
we have that
K − I = I − K −1 . (13.17)
Second, right-multiplying by Ir the identity that
Ir K −1 (In − Ir ) = K −1 − I
K −1 Ir = Ir (13.18)
(which actually is pretty obvious if you think about it, since all of K’s
interesting content lies by construction right of its rth column). Now we
have enough to go on with. Substituting (13.17) and (13.18) into (13.15)
yields
SAK = [(In − K −1 Ir ) − (I − K −1 )]Hr .
Adding 0 = K −1 In Hr − K −1 In Hr and rearranging terms,
Factoring,
which, considering that the identity (11.76) has that (In − Ir )Hr = Hr In−r ,
proves (13.16). The final step is to left-multiply (13.16) by S −1 = S ∗ = S T ,
reaching (13.7) that was to be derived.
One would like to feel sure that the columns of (13.7)’s AK actually
addressed the whole kernel space of A rather than only part. One would
further like to feel sure that AK had no redundant columns; that is, that it
had full rank. Moreover, the definition of AK in the section’s introduction
demands both of these features. In general such features would be hard
to establish, but here the factors conveniently are Gauss-Jordan factors.
Regarding the whole kernel space, AK addresses it because AK comes from
all a. Regarding redundancy, AK lacks it because SAK lacks it, and SAK
lacks it because according to (13.13) the last rows of SAK are Hr In−r . So,
in fact, (13.7) has both features and does fit the definition.
Indeed this makes sense: because the columns of AK C address the same
space the columns of AK address, the two matrices necessarily represent the
same underlying kernel. Moreover, some C exists to convert AK into every
alternate kernel matrix A′K of A. We know this because § 12.4 lets one
replace the columns of AK with those of A′K , reversibly, one column at a
time, without altering the space addressed. (It might not let one replace the
columns in sequence, but if out of sequence then a reversible permutation
at the end corrects the order. Refer to §§ 12.5.1 and 12.5.2 for the pattern
by which this is done.)
The orthonormalizing column operator R−1 of (13.54) below incidentally
tends to make a good choice for C.
10
The author, who has never fired an artillery piece (unless an arrow from a Boy Scout
bow counts), invites any real artillerist among the readership to write in to improve the
example.
336 CHAPTER 13. INVERSION AND ORTHONORMALIZATION
Ax = b, (13.20)
13.4. THE NONOVERDETERMINED LINEAR SYSTEM 337
where we have substituted the last line of (13.22) for x. This holds for any a
and b. We are not free to choose the driving vector b, but since we need
only one particular solution, a can be anything we want. Why not
a = 0?
Then
f (0, b) = G−1
> b,
f (0, b)
Sx1 (0, b) = = f (0, b).
0
That is,
x1 = S −1 G−1
> b. (13.23)
x = S −1 (G−1
> b+K
−1
Hr In−r a) (13.24)
|ri (x)|2
X
[r(x)]∗ [r(x)] = (13.27)
i
is, called the squared residual norm, the more favorably we regard the can-
didate solution x.
Length newly
completed
completed
Number of workers Number of workers
inverting the matrix per §§ 13.1 and 13.2 to solve for x ≡ [σ γ]T , in the hope
that the resulting line will predict future production accurately.
That is all mathematically irreproachable. By the fifth Saturday however
we shall have gathered more production data, plotted on the figure’s right,
to which we should like to fit a better line to predict production more accu-
rately. The added data present a problem. Statistically, the added data are
welcome, but geometrically we need only two points to specify a line; what
are we to do with the other three? The five points together overdetermine
the linear system 2 3 2 3
u1 1 b1
6 u2 1 7» – 6 b2 7
7 σ
= 7.
6 6 7
6 u3 1 7 6 b3
6 7 γ 6 7
4 u4 1 5 4 b4 5
u5 1 b5
There is no way to draw a single straight line b = σu + γ exactly through
all five, for in placing the line we enjoy only two degrees of freedom.13
The proper approach is to draw among the data points a single straight
line that misses the points as narrowly as possible. More precisely, the proper
13
Section 13.3.3 characterized a line as enjoying only one degree of freedom. Why now
two? The answer is that § 13.3.3 discussed travel along a line rather than placement of a
line as here. Though both involve lines, they differ as driving an automobile differs from
washing one. Do not let this confuse you.
13.6. THE PSEUDOINVERSE AND LEAST SQUARES 341
u1 1 b1
2 3 2 3
6
6 u2 1 7
7
6
6 b2 7
7
6 u3 1 7 »
σ
– 6 b3 7
A= 6
6 u4 1 7,
7
x= γ
, b= 6
6 b4 7.
7
6 7 6 7
6 u5 1 7 6 b5 7
.. ..
4 5 4 5
. .
= xT AT Ax + bT b − 2xT AT b
= xT AT (Ax − 2b) + bT b,
in which the transpose is used interchangeably for the adjoint because all
the numbers involved happen to be real. The norm is minimized where
d
rT r = 0
dx
d T T
x A (Ax − 2b) + bT b = 0
dx
342 CHAPTER 13. INVERSION AND ORTHONORMALIZATION
or, after transposing the equation, rearranging terms and dividing by 2, the
simplified equation
AT Ax = AT b.
Assuming (as warranted by § 13.6.2, next) that the n×n square matrix AT A
is invertible, the simplified equation implies the approximate but optimal
least-squares solution
−1 T
x = AT A A b (13.28)
to the unsolvable linear system (13.25) in the restricted but quite typical
case that A and b are real and A has full column rank.
Equation (13.28) plots the line on Fig. 13.1’s right. As the reader can
see, the line does not pass through all the points, for no line can; but it does
pass pretty convincingly nearly among them. In fact it passes optimally
nearly among them. No line can pass more nearly, in the squared-residual
norm sense of (13.27).14
14
Here is a nice example of the use of the mathematical adjective optimal in its ad-
verbial form. “Optimal” means “best.” Many problems in applied mathematics involve
discovering the best of something. What constitutes the best however can be a matter
of judgment, even of dispute. We shall leave to the philosopher and the theologian the
important question of what constitutes objective good, for applied mathematics is a poor
guide to such mysteries. The role of applied mathematics is to construct suitable models
to calculate quantities needed to achieve some definite good; its role is not, usually, to
identify the good as good in the first place.
One generally establishes mathematical optimality by some suitable, nonnegative, real
cost function or metric, and the less, the better. Strictly speaking, the mathematics
cannot tell us which metric to use, but where no other consideration prevails the applied
mathematician tends to choose the metric that best simplifies the mathematics at hand—
and, really, that is about as good a way to choose a metric as any. The metric (13.27) is
so chosen.
“But,” comes the objection, “what if some more complicated metric is better?”
Well, if the other metric really, objectively is better, then one should probably use it. In
general however the mathematical question is: what does one mean by “better?” Better by
which metric? Each metric is better according to itself. This is where the mathematician’s
experience, taste and judgment come in.
In the present section’s example, too much labor on the freeway job might actually slow
construction rather than speed it. One could therefore seek to fit not a line but some
downward-turning curve to the data. Mathematics offers many downward-turning curves.
A circle, maybe? Not likely. An experienced mathematician would probably reject the
circle on the aesthetic yet practical ground that the parabola b = αu2 + σu + γ lends
13.6. THE PSEUDOINVERSE AND LEAST SQUARES 343
A∗ Au = 0, In u 6= 0.
u∗ A∗ Au = 0, In u 6= 0,
Au = 0, In u 6= 0,
A = BC
Ax ≈ b,
BCx ≈ b,
∗ −1
(B B) B BCx ≈ (B ∗ B)−1 B ∗ b,
∗
Cx ≈ (B ∗ B)−1 B ∗ b.
16
This subsection uses the symbols B and b for unrelated purposes, which is unfortunate
but conventional. See footnote 11.
13.6. THE PSEUDOINVERSE AND LEAST SQUARES 345
C ∗ u ← x,
CC ∗ u ≈ (B ∗ B)−1 B ∗ b,
u ≈ (CC ∗ )−1 (B ∗ B)−1 B ∗ b.
Changing the variable back and (because we are conjecturing and can do as
we like), altering the “≈” sign to “=,”
• among all x that enjoy the same, minimal squared residual norm, the x
of (13.31) is strictly least in magnitude.
The conjecture is bold, but if you think about it in the right way it is not un-
warranted under the circumstance. After all, (13.31) does resemble (13.28),
the latter of which admittedly requires real A of full column rank but does
minimize the residual when its requirements are met; and, even if there were
more than one x which minimized the residual, one of them might be smaller
than the others: why not the x of (13.31)? One can but investigate.
The first point of the conjecture is symbolized
Reorganizing,
A∗ (b − Ax) = A∗ b − A∗ Ax
= [C ∗ B ∗ ][b] − [C ∗ B ∗ ][BC][C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ b]
= C ∗ B ∗ b − C ∗ (B ∗ B)(CC ∗ )(CC ∗ )−1 (B ∗ B)−1 B ∗ b
= C ∗ B ∗ b − C ∗ B ∗ b = 0,
which reveals two of the inequality’s remaining three terms to be zero, leav-
ing an assertion that
0 ≤ ∆x∗ A∗ A ∆x.
Each step in the present paragraph is reversible,17 so the assertion in the
last form is logically equivalent to the conjecture’s first point, with which
the paragraph began. Moreover, the assertion in the last form is correct
because the product of any matrix and its adjoint according to § 13.6.3 is a
nonnegative definite operator, thus establishing the conjecture’s first point.
The conjecture’s first point, now established, has it that no x+∆x enjoys
a smaller squared residual norm than the x of (13.31) does. It does not claim
that no x + ∆x enjoys the same, minimal squared residual norm. The latter
case is symbolized
0 = ∆x∗ A∗ A ∆x;
or in other words,
A ∆x = 0.
But A = BC, so this is to claim that
B(C ∆x) = 0,
C ∆x = 0.
17
The paragraph might inscrutably but logically instead have ordered the steps in reverse
as in §§ 6.3.2 and 9.5. See Ch. 6’s footnote 15.
13.6. THE PSEUDOINVERSE AND LEAST SQUARES 347
Considering the product ∆x∗ x in light of (13.31) and the last equation, we
observe that
for any
∆x 6= 0
for which x+∆x achieves minimal squared residual norm (note that it’s “<”
this time, not “≤” as in the conjecture’s first point). Distributing factors
and canceling like terms,
But the last paragraph has found that ∆x∗ x = 0 for precisely such ∆x as
we are considering here, so the last inequality reduces to read
x = A† b, (13.33)
18
Some books print A† as A+ .
348 CHAPTER 13. INVERSION AND ORTHONORMALIZATION
the Moore-Penrose solves the linear system (13.25) as well as the system
can be solved—exactly if possible, with minimal squared residual norm if
impossible. If A is square and invertible, then the Moore-Penrose A† = A−1
is just the inverse, and then of course (13.33) solves the system uniquely and
exactly. Nothing can solve the system uniquely if A has broad shape but the
Moore-Penrose still solves the system exactly in that case as long as A has
full row rank, moreover minimizing the solution’s squared magnitude x∗ x
(which the solution of eqn. 13.23 fails to do). If A lacks full row rank,
then the Moore-Penrose solves the system as nearly as the system can be
solved (as in Fig. 13.1) and as a side-benefit also minimizes x∗ x. The Moore-
Penrose is thus a general-purpose solver and approximator for linear systems.
It is a significant discovery.19
where B = Im in the first case and C = In in the last. It does not intend
to use the full (13.32). If both r < m and r < n—which is to say, if the
Jacobian is degenerate—then (13.35) fails, as though the curve of Fig. 4.5
ran horizontally at the test point—when one quits, restarting the iteration
from another point.
a · b = (a1 e1 + a2 e2 + · · · + an en ) · (b1 e1 + b2 e2 + · · · + bn en ).
ei · ej = δij . (13.36)
Therefore,
a · b = a1 b1 + a2 b2 + · · · + an bn ;
or, more concisely,
∞
X
a · b ≡ aT b = aj bj . (13.37)
j=−∞
21
The term inner product is often used to indicate a broader class of products than
the one defined here, especially in some of the older literature. Where used, the notation
usually resembles ha, bi or (b, a), both of which mean a∗ · b (or, more broadly, some
similar product), except that which of a and b is conjugated depends on the author. Most
recently, at least in the author’s country, the usage ha, bi ≡ a∗ · b seems to be emerging
as standard where the dot is not used [5, § 3.1][21, Ch. 4]. This book prefers the dot.
350 CHAPTER 13. INVERSION AND ORTHONORMALIZATION
The dot notation does not worry whether its arguments are column or row
vectors, incidentally:
a · b = a · bT = aT · b = aT · bT = aT b.
That is, if either vector is wrongly oriented, the notation implicitly reorients
it before using it. (The more orderly notation aT b by contrast assumes that
both are proper column vectors.)
Where vectors may have complex elements, usually one is not interested
in a · b so much as in
∞
X
∗ ∗
a ·b≡a b= a∗j bj . (13.38)
j=−∞
with the product of the imaginary parts added not subtracted, thus honoring
the right Argand sense of “the product of the two vectors to the extent to
which they run in the same direction.”
By the Pythagorean theorem, the dot product
|a|2 = a∗ · a (13.39)
gives the square of a vector’s magnitude, always real, never negative. The
unit vector in a’s direction then is
a a
â ≡ =√ , (13.40)
|a| a∗ · a
from which
|â|2 = â∗ · â = 1. (13.41)
When two vectors do not run in the same direction at all, such that
a∗ · b = 0, (13.42)
the two vectors are said to lie orthogonal to one another. Geometrically this
puts them at right angles. For other angles θ between two vectors,
which formally defines the angle θ even when a and b have more than three
elements each.
13.9. THE COMPLEX VECTOR TRIANGLE INEQUALITIES 351
Splitting a and b each into real and imaginary parts on the inequality’s left
side and then halving both sides,
f · g > |f | |g| ,
in which we observe that the left side must be positive because the right side
is nonnegative. (This naturally is impossible for any case in which f = 0 or
g = 0, among others, but wishing to establish impossibility for all cases we
pretend not to notice and continue reasoning as follows.) Squaring again,
Reordering factors,
X X
[(fi gj )(gi fj )] > (fi gj )2 .
i,j i,j
2
P
Subtracting i (fi gi ) from each side,
X X
[(fi gj )(gi fj )] > (fi gj )2 ,
i6=j i6=j
This is X
0> [fi gj + gi fj ]2 ,
i<j
which, since we have constructed the vectors f and g to have real elements
only, is impossible in all cases. The contradiction proves false the assumption
that gave rise to it, thus establishing the sum hypothesis of (13.44).
The difference hypothesis that |a| − |b| ≤ |a + b| is established by defin-
ing a vector c such that
a + b + c = 0,
whereupon according to the sum hypothesis (which we have already estab-
lished),
|a + c| ≤ |a| + |c| ,
|b + c| ≤ |b| + |c| .
That is,
A⊥ ≡ A∗K (13.46)
A∗K = A⊥ ,
(13.48)
A∗⊥ = AK .
a∗ · b⊥ = 0,
b⊥ ≡ b − βa,
a∗ · b
β= .
a∗ · a
Hence,
a∗ · b⊥ = 0,
a∗ · b (13.49)
b⊥ ≡ b − a.
a∗ · a
√
But according to (13.40), a = â a∗ · a; and according to (13.41), â∗ · â = 1;
so,
b⊥ = b − â(â∗ · b); (13.50)
or, in matrix notation,
b⊥ = b − â(â∗ )(b).
This is arguably better written
b⊥ = [I − (â)(â∗ )] b (13.51)
(observe that it’s [â][â∗ ], a matrix, rather than the scalar [â∗ ][â]).
One orthonormalizes a set of vectors by orthogonalizing them with re-
spect to one another, then by normalizing each of them to unit magnitude.
The procedure to orthonormalize several vectors
{x1 , x2 , x3 , . . . , xn }
13.11. GRAM-SCHMIDT ORTHONORMALIZATION 355
Besides obviating the matrix I − x̂i⊥ x̂∗i⊥ and the associated matrix multipli-
cation, the messier form (13.53) has the significant additional practical virtue
356 CHAPTER 13. INVERSION AND ORTHONORMALIZATION
that it lets one forget each intermediate vector xji immediately after using
it. (A well written orthonormalizing computer program reserves memory for
one intermediate vector only, which memory it repeatedly overwrites—and,
actually, probably does not even reserve that much, working rather in the
memory space it has already reserved for x̂j⊥ .)24
Other equations one algorithmizes can likewise benefit from thoughtful
rendering.
A = QR = QU DS,
(13.54)
R ≡ U DS,
In reality, however, (13.53)’s middle line requires only that no x̂i⊥ be used
before it is fully calculated; otherwise that line does not care which (i, j)
pair follows which. The i-major nesting
bringing the very same index pairs in a different sequence, is just as valid.
We choose i-major nesting on the subtle ground that it affords better infor-
mation to the choice of column index p during the algorithm’s step 3.
The algorithm, in detail:
1. Begin by initializing
Ũ ← I, D̃ ← I, S̃ ← I,
Q̃ ← A,
i ← 1.
2. (Besides arriving at this point from step 1 above, the algorithm also
reënters here from step 9 below.) Observe that Ũ enjoys the major
partial unit triangular form L{i−1}T (§ 11.8.5), that D̃ is a general
scaling operator (§ 11.7.2) with d˜jj = 1 for all j ≥ i, that S̃ is permutor
(§ 11.7.1), and that the first through (i − 1)th columns of Q̃ consist of
mutually orthonormal unit vectors.
where the latter line has applied a rule from Table 12.1, interchange
the chosen pth column to the ith position by
Q̃ ← Q̃T[i↔p],
Ũ ← T[i↔p]Ũ T[i↔p] ,
S̃ ← T[i↔p]S̃.
Q̃ ← Q̃T(1/α)[i] ,
Ũ ← Tα[i] Ũ T(1/α)[i] ,
D̃ ← Tα[i] D̃,
where rh i h i
∗
α= Q̃ · Q̃ .
∗i ∗i
6. Initialize
j ← i + 1.
7. (Besides arriving at this point from step 6 above, the algorithm also
reënters here from step 8 below.) If j > n then skip directly to step 9.
Otherwise, observing that (13.55) can be expanded to read
A = Q̃T−β[ij] Tβ[ij]Ũ D̃S̃,
orthogonalize the jth column of Q̃ per (13.53) with respect to the ith
column by
Q̃ ← Q̃T−β[ij] ,
Ũ ← Tβ[ij] Ũ ,
where h i∗ h i
β = Q̃ · Q̃ .
∗i ∗j
13.11. GRAM-SCHMIDT ORTHONORMALIZATION 359
8. Increment
j ←j+1
and return to step 7.
9. Increment
i←i+1
and return to step 2.
10. Let
Q ≡ Q̃, U ≡ Ũ , D ≡ D̃, S ≡ S̃,
r = i − 1.
End.
Though the Gram-Schmidt algorithm broadly resembles the Gauss-
Jordan, at least two significant differences stand out: (i) the Gram-Schmidt
is one-sided because it operates only on the columns of Q̃, never on the
rows; (ii) since Q is itself dimension-limited, the Gram-Schmidt decomposi-
tion (13.54) needs and has no explicit factor Ir .
As in § 12.5.7, here also one sometimes prefers that S = I. The algorithm
optionally supports this preference if the m × n matrix A has full column
rank r = n, when null columns cannot arise, if one always chooses p = i
during the algorithm’s step 3. Such optional discipline maintains S = I
when desired.
Whether S = I or not, the matrix Q = QIr has only r columns, so one
can write (13.54) as
A = (QIr )(R).
Reassociating factors, this is
A = (Q)(Ir R), (13.56)
which per (12.16) is a proper full-rank factorization with which one can
compute the pseudoinverse A† of A (see eqn. 13.32, above; but see also
eqn. 13.65, below).
If the Gram-Schmidt decomposition (13.54) looks useful, it is even more
useful than it looks. The most interesting of its several factors is the m × r
orthonormalized matrix Q, whose orthonormal columns address the same
space the columns of A themselves address. If Q reaches the maximum pos-
sible rank r = m, achieving square, m × m shape, then it becomes a unitary
matrix —the subject of § 13.12. Before treating the unitary matrix, however,
let us pause to extract a kernel from the Gram-Schmidt decomposition in
§ 13.11.3, next.
360 CHAPTER 13. INVERSION AND ORTHONORMALIZATION
A′ ≡ Q + Im H−r = Q Im
(13.58)
Q′ = Q + A⊥ H−r = Q A⊥
(13.60)
consists of
• r columns on the left that address the same space the columns of A
address and
Each column has unit magnitude and conveniently lies orthogonal to every
other column, left and right.
Equation (13.60) is probably the more useful form, but the Gram-
Schmidt kernel formula as such,
extracts the rightward columns that express the kernel, not of A, but of A∗ .
To compute the kernel of a matrix B by Gram-Schmidt one sets A = B ∗
and applies (13.57) through (13.61). Refer to (13.48).
13.12. THE UNITARY MATRIX 361
In either the form (13.60) or the form (13.61), the Gram-Schmidt kernel
formula does everything the Gauss-Jordan kernel formula (13.7) does and
in at least one sense does it better; for, if one wants a Gauss-Jordan kernel
orthonormalized, then one must orthonormalize it as an extra step, whereas
the Gram-Schmidt kernel comes already orthonormalized.
Being square, the m × m matrix Q′ is a unitary matrix, as the last
paragraph of § 13.11.2 has alluded. The unitary matrix is the subject of
§ 13.12 that follows.
i,i′
which says neither more nor less than that the columns of Q are orthonormal,
which is to say that Q is unitary, as was to be demonstrated.
Unitary operations preserve length. That is, operating on an m-element
vector by an m × m unitary matrix does not alter the vector’s magnitude.
To prove it, consider the system
Qx = b.
x∗ Q∗ Qx = b∗ b.
x∗ x = b∗ b,
as was to be demonstrated.
Equation (13.63) lets one use the Gram-Schmidt decomposition (13.54)
to invert a square matrix as
Q∞ = Q + (I − Im ),
26
This is true only for 1 ≤ i ≤ m, but you knew that already.
13.12. THE UNITARY MATRIX 363
which is just Q with ones running out the main diagonal from its active
region, itself meets the unitary criterion (13.62) for m = ∞.
Unitary matrices are so easy to handle that they can sometimes justify
significant effort to convert a model to work in terms of them if possible.
We shall meet the unitary matrix again in §§ 14.10 and 14.12.
The chapter as a whole has demonstrated at least in theory (and usu-
ally in practice) techniques to solve any linear system characterized by a
matrix of finite dimensionality, whatever the matrix’s rank or shape. It has
explained how to orthonormalize a set of vectors and has derived from the
explanation the useful Gram-Schmidt decomposition. As the chapter’s in-
troduction had promised, the matrix has shown its worth here; for without
the matrix’s notation, arithmetic and algebra most of the chapter’s findings
would have lain beyond practical reach. And even so, the single most inter-
esting agent of matrix arithmetic remains yet to be treated. This last is the
eigenvalue, and it is the subject of Ch. 14, next.
364 CHAPTER 13. INVERSION AND ORTHONORMALIZATION
Chapter 14
The eigenvalue
Av = λv.
This chapter analyzes the eigenvalue and the associated eigenvector it scales.
Before treating the eigenvalue proper, the chapter gathers from across
Chs. 11 through 14 several properties all invertible square matrices share, as-
sembling them in § 14.2 for reference. One of these regards the determinant,
which opens the chapter.
365
366 CHAPTER 14. THE EIGENVALUE
det A3 = (a11 a22 a33 + a12 a23 a31 + a13 a21 a32 )
− (a13 a22 a31 + a12 a21 a33 + a11 a23 a32 );
is available especially to call the rank out, stating explicitly that the deter-
minant has exactly n! terms. (See also §§ 11.3.5 and 11.5 and eqn. 11.49.2 )
It is admitted3 that we have not, as yet, actually shown the determinant
to be a generally useful quantity; we have merely motivated and defined it.
Historically the determinant probably emerged not from abstract consider-
ations but for the mundane reason that the quantity it represents occurred
frequently in practice (as in the A−12 example above). Nothing however log-
ically prevents one from simply defining some quantity which, at first, one
merely suspects will later prove useful. So we do here.4
• If
ai′′ ∗
when i = i′ ,
ci∗ = ai′ ∗ when i = i′′ ,
ai∗ otherwise,
or if
a∗j ′′
when j = j ′ ,
c∗j = a∗j ′ when j = j ′′ ,
a∗j otherwise,
• If (
αai∗ when i = i′ ,
ci∗ =
ai∗ otherwise,
2
And further Ch. 13’s footnotes 5 and 22.
3
[21, § 1.2]
4
[21, Ch. 1]
368 CHAPTER 14. THE EIGENVALUE
or if (
αa∗j when j = j ′ ,
c∗j =
a∗j otherwise,
then
det C = α det A. (14.2)
Scaling a single row or column of a matrix scales the matrix’s deter-
minant by the same factor. (Equation 14.2 tracks the linear scaling
property of § 7.3.3 and of eqn. 11.2.)
• If (
ai∗ + bi∗ when i = i′ ,
ci∗ =
ai∗ = bi∗ otherwise,
or if (
a∗j + b∗j when j = j ′ ,
c∗j =
a∗j = b∗j otherwise,
then
det C = det A + det B. (14.3)
If one row or column of a matrix C is the sum of the corresponding rows
or columns of two other matrices A and B, while the three matrices
remain otherwise identical, then the determinant of the one matrix is
the sum of the determinants of the other two. (Equation 14.3 tracks
the linear superposition property of § 7.3.3 and of eqn. 11.2.)
• If
ci′ ∗ = 0,
or if
c∗j ′ = 0,
then
det C = 0. (14.4)
A matrix with a null row or column also has a null determinant.
• If
ci′′ ∗ = γci′ ∗ ,
or if
c∗j ′′ = γc∗j ′ ,
14.1. THE DETERMINANT 369
det C = 0. (14.5)
These basic properties are all fairly easy to see if the definition of the de-
terminant is clearly understood. Equations (14.2), (14.3) and (14.4) come
because each of the n! terms in the determinant’s expansion has exactly
one element from row i′ or column j ′ . Equation (14.1) comes because a
row or column interchange reverses parity. Equation (14.6) comes because
according to § 11.7.1, the permutors P and P ∗ always have the same par-
ity, and because the adjoint operation individually conjugates each element
of C. Finally, (14.5) comes because, in this case, every term in the deter-
minant’s expansion finds an equal term of opposite parity to offset it. Or,
more formally, (14.5) comes because the following procedure does not al-
ter the matrix: (i) scale row i′′ or column j ′′ by 1/γ; (ii) scale row i′ or
column j ′ by γ; (iii) interchange rows i′ ↔ i′′ or columns j ′ ↔ j ′′ . Not
altering the matrix, the procedure does not alter the determinant either;
and indeed according to (14.2), step (ii)’s effect on the determinant cancels
that of step (i). However, according to (14.1), step (iii) negates the determi-
nant. Hence the net effect of the procedure is to negate the determinant—to
negate the very determinant the procedure is not permitted to alter. The
apparent contradiction can be reconciled only if the determinant is zero to
begin with.
From the foregoing properties the following further property can be de-
duced.
• If (
ai∗ + αai′ ∗ when i = i′′ ,
ci∗ =
ai∗ otherwise,
or if (
a∗j + αa∗j ′ when j = j ′′ ,
c∗j =
a∗j otherwise,
370 CHAPTER 14. THE EIGENVALUE
1
det A−1 = (14.12)
det A
det Q∗ det Q = 1,
(14.13)
|det Q| = 1.
This reason is that |det Q|2 = (det Q)∗ (det Q) = det Q∗ det Q = det Q∗ Q =
det Q−1 Q = det In = 1.
C T A = (det A)In = AC T ;
cij ≡ det Rij ;
1
if i′ = i and j ′ = j, (14.14)
[Rij ]i′ j ′ ≡ 0 if i′ = i or j ′ = j but not both,
ai′ j ′ otherwise.
Pictorially, 2 3
.. .. .. .. ..
6 . . . . . 7
··· ∗ ∗ 0 ∗ ∗ ···
6 7
6 7
··· ∗ ∗ 0 ∗ ∗ ···
6 7
6 7
Rij = 6
6 ··· 0 0 1 0 0 ··· 7,
7
··· ∗ ∗ 0 ∗ ∗ ···
6 7
6 7
··· ∗ ∗ 0 ∗ ∗ ···
6 7
6 7
4 .. .. .. .. .. 5
. . . . .
same as A except in the ith row and jth column. The matrix C, called the
cofactor of A, then consists of the determinants of the various Rij .
Another way to write (14.14) is
P
wherein the equation ℓ aiℓ det Riℓ = det A states that det A, being a deter-
minant, consists of several terms, each term including one factor from each
row of A, where aiℓ provides the ith row and Riℓ provides the other rows.7
In the case that i 6= j,
X X
[AC T ]ij = aiℓ cjℓ = aiℓ det Rjℓ = 0 = (det A)(0) = (det A)δij ,
ℓ ℓ
P
wherein ℓ aiℓ det Rjℓ is the determinant, not of A itself, but rather of A
with the jth row replaced by a copy of the ith, which according to (14.5)
evaluates to zero. Similar equations can be written for [C T A]ij in both cases.
The two cases together prove (14.15), hence also (14.14).
Dividing (14.14) by det A, we have that8
A−1 A = In = AA−1 ,
CT
A−1 = . (14.16)
det A
Equation (14.16) inverts a matrix by determinant. In practice, it inverts
small matrices nicely, through about 4 × 4 dimensionality (the A−1 2 equation
at the head of the section is just eqn. 14.16 for n = 2). It inverts 5 × 5 and
even 6 × 6 matrices reasonably, too—especially with the help of a computer
to do the arithmetic. Though (14.16) still holds in theory for yet larger
matrices, and though symbolically it expresses the inverse of an abstract,
n × n matrix concisely whose entries remain unspecified, for concrete matri-
ces much bigger than 4 × 4 to 6 × 6 or so its several determinants begin to
grow too great and too many for practical calculation. The Gauss-Jordan
technique (or even the Gram-Schmidt technique) is preferred to invert con-
crete matrices above a certain size for this reason.9
not feel the full impact of the coincidence when these properties are left
scattered across the long chapters; so, let us gather and summarize the
properties here. A square, n × n matrix evidently has either all of the
following properties or none of them, never some but not others.
• Its columns address the same space the columns of In address, and its
rows address the same space the rows of In address (§ 12.5.7).
The square matrix which has one of these properties, has all of them. The
square matrix which lacks one, lacks all. Assuming exact arithmetic, a
square matrix is either invertible, with all that that implies, or singular;
never both. The distinction between invertible and singular matrices is
theoretically as absolute as (and indeed is analogous to) the distinction
between nonzero and zero scalars.
Whether the distinction is always useful is another matter. Usually the
distinction is indeed useful, but a matrix can be almost singular just as a
scalar can be almost zero. Such a matrix is known, among other ways, by
its unexpectedly small determinant. Now it is true: in exact arithmetic, a
nonzero determinant, no matter how small, implies a theoretically invert-
ible matrix. Practical matrices however often have entries whose values are
imprecisely known; and even when they don’t, the computers which invert
them tend to do arithmetic imprecisely in floating-point. Matrices which live
376 CHAPTER 14. THE EIGENVALUE
on the hazy frontier between invertibility and singularity resemble the in-
finitesimals of § 4.1.1. They are called ill conditioned matrices. Section 14.8
develops the topic.
Av = λv, (14.18)
[A − λIn ]v = 0. (14.19)
Since In v is nonzero, the last equation is true if and only if the matrix
[A − λIn ] is singular—which in light of § 14.1.3 is to demand that
magnitude but not its direction. The eigenvalue alone takes the place of the
whole, hulking matrix. This is what (14.18) means. Of course it works only
when v happens to be the right eigenvector, which § 14.4 discusses.
When λ = 0, (14.20) makes det A = 0, which as we have said is the
sign of a singular matrix. Zero eigenvalues and singular matrices always
travel together. Singular matrices each have at least one zero eigenvalue;
nonsingular matrices never do.
The eigenvalues of a matrix’s inverse are the inverses of the matrix’s
eigenvalues. That is,
The reason behind (14.21) comes from answering the question: if Avj
scales vj by the factor λj , then what does A′ Avj = Ivj do to vj ?
Naturally one must solve (14.20)’s nth-order polynomial to locate the
actual eigenvalues. One solves it by the same techniques by which one solves
any polynomial: the quadratic formula (2.2); the cubic and quartic methods
of Ch. 10; the Newton-Raphson iteration (4.31). On the other hand, the
determinant (14.20) can be impractical to expand for a large matrix; here
iterative techniques help: see [chapter not yet written].11
[A − λIn ]v = 0,
which is to say that the eigenvectors are the vectors of the kernel space of the
degenerate matrix [A − λIn ]—which one can calculate (among other ways)
by the Gauss-Jordan kernel formula (13.7) or the Gram-Schmidt kernel for-
mula (13.61).
An eigenvalue and its associated eigenvector, taken together, are some-
times called an eigensolution.
11
The inexpensive [21] also covers the topic competently and readably.
378 CHAPTER 14. THE EIGENVALUE
• A matrix and its inverse share the same eigenvectors with inverted
eigenvalues. Refer to (14.21) and its explanation in § 14.3.
k−1
X
vk = cj vj ,
j=1
14.6 Diagonalization
Any n × n matrix with n independent eigenvectors (which class per § 14.5
includes, but is not limited to, every n×n matrix with n distinct eigenvalues)
can be diagonalized as
A = V ΛV −1 , (14.22)
where
λ1 0 ··· 0 0
2 3
6 0 λ2 ··· 0 0 7
.. .. .. ..
6 7
Λ= 6 .. 7
6
6 . . . . . 7
7
4 0 0 ··· λn−1 0 5
0 0 ··· 0 λn
12
[29]
380 CHAPTER 14. THE EIGENVALUE
AV = V Λ
A2 = (V ΛV −1 )(V ΛV −1 ) = V Λ2 V −1
Az = V Λz V −1 ,
h i (14.24)
Λz = δij λzj ,
ij
where δb is the error in b and δx is the resultant error in x, then one should
like to bound the ratio |δx| / |x| to ascertain the reliability of x as a solution.
Transferring A to the equation’s right side,
x + δx = A−1 (b + δb).
|δx| = A−1 δb .
−1
|δx| A δb
= .
|x| |A−1 b|
The quantity A−1 δb cannot exceed λ−1 δb. The quantity A−1 b cannot
min
fall short of λ−1
max b . Thus,
−1
|δx| λ δb λmax |δb|
min
≤ −1 = .
|x| λmax b λmin |b|
That is,
|δx| |δb|
≤κ . (14.27)
|x| |b|
Condition, incidentally, might technically be said to apply to scalars as
well as to matrices, but ill condition remains a property of matrices alone.
According to (14.25), the condition of every nonzero scalar is happily κ = 1.
is (5, 1) in the basis B: five times the first basis vector plus once the second.
The basis provides the units from which other vectors can be built.
Particularly interesting is the n×n, invertible complete basis B, in which
the n basis vectors are independent and address the same full space the
columns of In address. If
x = Bu
then u represents x in the basis B. Left-multiplication by B evidently
converts out of the basis. Left-multiplication by B −1 ,
u = B −1 x,
then does the reverse, converting into the basis. One can therefore convert
any operator A to work within a complete basis B by the successive steps
Ax = b,
ABu = b,
−1
[B AB]u = B −1 b,
by which the operator B −1 AB is seen to be the operator A, only transformed
to work within the basis17,18 B.
The conversion from A into B −1 AB is called a similarity transformation.
If B happens to be unitary (§ 13.12), then the conversion is also called a
unitary transformation. The matrix B −1 AB the transformation produces
is said to be similar (or, if B is unitary, unitarily similar ) to the matrix A.
We have already met the similarity transformation in §§ 11.5 and 12.2. Now
we have the theory to appreciate it properly.
Probably the most important property of the similarity transformation
is that it alters no eigenvalues. That is, if
Ax = λx,
then, by successive steps,
B −1 A(BB −1 )x = λB −1 x,
[B −1 AB]u = λu. (14.28)
17
The reader may need to ponder the basis concept a while to grasp it, but the concept
is simple once grasped and little purpose would be served by dwelling on it here. Basically,
the idea is that one can build the same vector from alternate building blocks, not only
from the standard building blocks e1 , e2 , e3 , etc.—except that the right word for the
relevant “building block” is basis vector. The books [30] and [42] introduce the basis more
gently; one might consult one of those if needed.
18
The professional matrix literature sometimes distinguishes by typeface between the
matrix B and the basis B its columns represent. Such semantical distinctions seem a little
too fine for applied use, though. This book just uses B.
14.10. THE SCHUR DECOMPOSITION 385
The eigenvalues of A and the similar B −1 AB are the same for any square,
n × n matrix A and any invertible, square, n × n matrix B.
A = QUS Q∗ , (14.29)
14.10.1 Derivation
Suppose that20 (for some reason, which will shortly grow clear) we have a
matrix B of the form
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· ∗ ∗ ∗ ∗ ∗ ∗ ∗ ··· 7
7
··· 0 ∗ ∗ ∗ ∗ ∗ ∗ ···
6 7
6 7
··· 0 0 ∗ ∗ ∗ ∗ ∗ ···
6 7
6 7
B= 6
6 ··· 0 0 0 ∗ ∗ ∗ ∗ ··· 7,
7
(14.30)
··· 0 0 0 0 ∗ ∗ ∗ ···
6 7
6 7
··· 0 0 0 0 ∗ ∗ ∗ ···
6 7
6 7
··· 0 0 0 0 ∗ ∗ ∗ ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
19
The alternative is to develop the interesting but difficult Jordan canonical form, which
for brevity’s sake this chapter prefers to omit.
20
This subsection assigns various capital Roman letters to represent the several matrices
and submatrices it manipulates. Its choice of letters except in (14.29) is not standard and
carries no meaning elsewhere. The writer had to choose some letters and these are ones
he chose.
This footnote mentions the fact because good mathematical style avoid assigning letters
that already bear a conventional meaning in a related context (for example, this book
avoids writing Ax = b as T e = i, not because the latter is wrong but because it would be
extremely confusing). The Roman alphabet provides only twenty-six capitals, though, of
which this subsection uses too many to be allowed to reserve any. See Appendix B.
386 CHAPTER 14. THE EIGENVALUE
where the ith row and ith column are depicted at center. Suppose further
that we wish to transform B not only similarly but unitarily into
2 3
.. .. .. .. .. .. .. ..
6 . . . . . . . . 7
6
6 ··· ∗ ∗ ∗ ∗ ∗ ∗ ∗ ··· 7
7
6
6 ··· 0 ∗ ∗ ∗ ∗ ∗ ∗ ··· 7
7
··· 0 0 ∗ ∗ ∗ ∗ ∗ ···
6 7
6 7
C ≡ W ∗ BW = 6
6 ··· 0 0 0 ∗ ∗ ∗ ∗ ··· 7,
7
(14.31)
··· 0 0 0 0 ∗ ∗ ∗ ···
6 7
6 7
··· 0 0 0 0 0 ∗ ∗ ···
6 7
6 7
··· 0 0 0 0 0 ∗ ∗ ···
6 7
6 7
.. .. .. .. .. .. .. ..
4 5
. . . . . . . .
C = W ∗ BW
= (Ii + Hi Wo∗ H−i )(B)(Ii + Hi Wo H−i )
= Ii BIi + Ii BHi Wo H−i + Hi Wo∗ H−i BIi
+ Hi Wo∗ H−i BHi Wo H−i .
The unitary submatrix Wo has only n−i columns and n−i rows, so In−i Wo =
Wo = Wo In−i . Thus,
where the last step has used (14.32) and the identity (11.76). The four terms
on the equation’s right, each term with rows and columns neatly truncated,
represent the four quarters of C ≡ W ∗ BW —upper left, upper right, lower
left and lower right, respectively. The lower left term is null because
(In − Ii )[Hi Wo∗ H−i B]Ii = (In − Ii )[Hi Wo∗ In−i H−i BIi ]Ii
= (In − Ii )[Hi Wo∗ H−i ][(In − Ii )BIi ]Ii
= (In − Ii )[Hi Wo∗ H−i ][0]Ii = 0,
leaving
But the upper left term makes the upper left areas of B and C the same,
and the upper right term does not bother us because we have not restricted
the content of C’s upper right area. Apparently any (n − i) × (n − i) unitary
submatrix Wo whatsoever obeys (14.31) in the lower left, upper left and
upper right.
That leaves the lower right. Left- and right-multiplying (14.31) by the
truncator (In −Ii ) to focus solely on the lower right area, we have the reduced
requirement that
F = Q F RF .
[QF ]∗1 = QF e1 = vo .
14.10. THE SCHUR DECOMPOSITION 389
G ≡ Q∗F Bo QF ,
QF GQ∗F = Bo .
QF Ge1 = λo vo .
Left-multiplying by Q∗F ,
Ge1 = λo Q∗F vo .
Noting that the Gram-Schmidt process has rendered orthogonal to vo all
columns of QF but the first, which is vo , observe that
λo
2 3
6 0 7
Ge1 = λo Q∗F vo = λo e1 = 6
6 0 7,
7
..
4 5
.
which fits the very form (14.32) the submatrix Co is required to have. Con-
clude therefore that
W o = QF ,
(14.36)
Co = G,
because the form (14.30) of B at i = i′ is nothing other than the form (14.31)
of C at i = i′ − 1. Therefore, if we let
B|i=0 = A, (14.38)
B|i=n = US , (14.39)
where per (14.30) the matrix US has the general upper triangular form the
Schur decomposition (14.29) requires. Moreover, because the product of
unitary matrices according to (13.64) is itself a unitary matrix, we have
that
n−1
a
Q= (W |i=i′ ) , (14.40)
i′ =0
whose factors run straight down the main diagonal, where the determinant’s
n! − 1 other terms are all zero because each of them includes at least one
zero factor from below the main diagonal.21 Hence no element above the
main diagonal of US even influences the eigenvalues, which apparently are
λi = uSii , (14.41)
21
The determinant’s definition in § 14.1 makes the following two propositions equivalent:
(i) that a determinant’s term which includes one or more factors above the main diagonal
also includes one or more factors below; (ii) that the only permutor that marks no position
below the main diagonal is the one which also marks no position above. In either form,
the proposition’s truth might seem less than obvious until viewed from the proper angle.
Consider a permutor P . If P marked no position below the main diagonal, then it would
necessarily have pnn = 1, else the permutor’s bottom row would be empty which is not
allowed. In the next-to-bottom row, p(n−1)(n−1) = 1, because the nth column is already
occupied. In the next row up, p(n−2)(n−2) = 1; and so on, thus affirming the proposition.
14.10. THE SCHUR DECOMPOSITION 391
A = QUS Q∗ ,
then the eigenvalues of A are just the values along the main diagonal of US .22
One might think that the Schur decomposition offered an easy way to cal-
culate eigenvalues, but it is less easy than it first appears because one must
calculate eigenvalues to reach the Schur decomposition in the first place.
Whatever practical merit the Schur decomposition might have or lack, how-
ever, it brings at least the theoretical benefit of (14.41): every square matrix
without exception has a Schur decomposition, whose triangular factor US
openly lists all eigenvalues along its main diagonal.
This theoretical benefit pays when some of the n eigenvalues of an n × n
square matrix A repeat. By the Schur decomposition, one can construct
a second square matrix A′ , as near as desired to A but having n distinct
eigenvalues, simply by perturbing the main diagonal of US to23
where |ǫ| ≪ 1 and where u is an arbitrary vector that meets the criterion
given. Though infinitesimally near A, the modified matrix A′ = QUS′ Q∗
unlike A has n (maybe infinitesimally) distinct eigenvalues. With sufficient
toil, one can analyze such perturbed eigenvalues and their associated eigen-
vectors similarly as § 9.6.2 has analyzed perturbed poles.
22
An unusually careful reader might worry that A and US had the same eigenvalues
with different multiplicities. It would be surprising if it actually were so; but, still, one
would like to give a sounder reason than the participle “surprising.” Consider however
that
det[A − λIn ] = det{Q[US − λIn ]Q∗ } = det Q det[US − λIn ] det Q∗ = det[US − λIn ],
which says that A and US have not only the same eigenvalues but also the same charac-
teristic polynomials, thus further the same eigenvalue multiplicities.
23
Equation (11.55) defines the diag{·} notation.
392 CHAPTER 14. THE EIGENVALUE
A∗ = A, (14.43)
24
[23, Ch. 7]
25
[21, Ch. 5]
26
[69]
14.11. THE HERMITIAN MATRIX 393
A = V ΛV ∗ . (14.44)
That is,
λ∗ = λ,
which naturally is possible only if λ is real.
That eigenvectors corresponding to distinct eigenvalues lie orthogonal to
one another is proved27 by letting (λ1 , v1 ) and (λ2 , v2 ) represent eigensolu-
tions of A and constructing the product v2∗ Av1 , for which
That is,
λ∗2 = λ1 or v2∗ v1 = 0.
But according to the last paragraph all eigenvalues are real; the eigenval-
ues λ1 and λ2 are no exceptions. Hence,
λ2 = λ1 or v2∗ v1 = 0.
To prove the last hypothesis of the three needs first some definitions as
follows. Given an m × m matrix A, let the s columns of the m × s matrix Vo
represent the s independent eigenvectors of A such that (i) each column
has unit magnitude and (ii) columns whose eigenvectors share the same
eigenvalue lie orthogonal to one another. Let the s × s diagonal matrix Λo
carry the eigenvalues on its main diagonal such that
AVo = Vo Λo ,
27
[42, § 8.1]
394 CHAPTER 14. THE EIGENVALUE
where the distinction between the matrix Λo and the full eigenvalue matrix Λ
of (14.22) is that the latter always includes a p-fold eigenvalue p times,
whereas the former includes a p-fold eigenvalue only as many times as the
eigenvalue enjoys independent eigenvectors. Let the m−s columns of the m×
(m − s) matrix Vo⊥ represent the complete orthogonal complement (§ 13.10)
to Vo —perpendicular to all eigenvectors, each column of unit magnitude—
such that
Vo⊥∗ Vo = 0 and Vo⊥∗ Vo⊥ = Im−s .
Recall from § 14.5 that s 6= 0 but 0 < s ≤ m because every square matrix
has at least one eigensolution. Recall from § 14.6 that s = m if and only
if A is diagonalizable.28
With these definitions in hand, we can now prove by contradiction that
all Hermitian matrices are diagonalizable, falsely supposing a nondiagonal-
izable Hermitian matrix A, whose Vo⊥ (since A is supposed to be nondiag-
onalizable, implying that s < m) would have at least one column. For such
a matrix A, s × (m − s) and (m − s) × (m − s) auxiliary matrices F and G
necessarily would exist such that
AVo⊥ = Vo F + Vo⊥ G,
not due to any unusual property of the product AVo⊥ but for the mundane
reason that the columns of Vo and Vo⊥ together by definition addressed
28
A concrete example: the invertible but nondiagonalizable matrix
2 3
−1 0 0 0
6 −6 5 25 − 52 7
A=6 4 0
7
0 5 0 5
0 0 0 5
The orthogonal complement Vo⊥ supplies the missing vector, not an eigenvector but per-
pendicular to them all.
In the example, m = 4 and s = 3.
All vectors in the example are reported with unit magnitude. The two λ = 5 eigenvectors
are reported in mutually orthogonal form, but notice that eigenvectors corresponding to
distinct eigenvalues need not be orthogonal when A is not Hermitian.
14.11. THE HERMITIAN MATRIX 395
where we had relied on the assumption that A were Hermitian and thus
that, as proved above, its distinctly eigenvalued eigenvectors lay orthogonal
to one another; in consequence of which A∗ = A and Vo∗ Vo = Is .
The finding that F = 0 reduces the AVo⊥ equation above to read
AVo⊥ = Vo⊥ G.
In the reduced equation the matrix G would have at least one eigensolution,
not due to any unusual property of G but because according to § 14.5 every
square matrix, 1 × 1 or larger, has at least one eigensolution. Let (µ, w)
represent an eigensolution of G. Right-multiplying by the (m − s)-element
vector w 6= 0, we would have by successive steps that
The last equation claims that (µ, Vo⊥ w) were an eigensolution of A, when
we had supposed that all of A’s eigenvectors lay in the space addressed by
the columns of Vo , thus by construction did not lie in the space addressed
by the columns of Vo⊥ . The contradiction proves false the assumption that
gave rise to it. The assumption: that a nondiagonalizable Hermitian A
existed. We conclude that all Hermitian matrices are diagonalizable—and
conclude further that they are unitarily diagonalizable on the ground that
their eigenvectors lie orthogonal to one another—as was to be demonstrated.
Having proven that all Hermitian matrices are diagonalizable and have
real eigenvalues and orthogonal eigenvectors, one wonders whether the con-
verse holds: are all diagonalizable matrices with real eigenvalues and or-
thogonal eigenvectors Hermitian? To show that they are, one can construct
the matrix described by the diagonalization formula (14.22),
A = V ΛV ∗ ,
396 CHAPTER 14. THE EIGENVALUE
a useful fact; for just as a real, positive scalar has a real, positive square
root, so equally has √Λ a real, positive square root under these conditions.
Let the symbol Σ = Λ represent the n × n real, positive square root of the
eigenvalue matrix Λ such that
Λ = Σ∗ Σ, (14.47)
√
+ λ1 0 ··· 0 0
2 3
√
6 0 + λ2 ··· 0 0 7
.. .. .. ..
6 7
∗ ..
Σ =Σ = 6
. . . 7,
7
√. .
6
6 7
4 0 0 ··· + λn−1 0
√
5
0 0 ··· 0 + λn
A∗ A = V Σ∗ ΣV ∗ ,
(14.48)
V ∗ A∗ AV = Σ∗ Σ.
Now consider the m × m matrix U such that
AV Σ−1 = U In ,
AV = U Σ, (14.49)
∗
A = U ΣV .
Substituting (14.49)’s second line into (14.48)’s second line gives the equa-
tion
Σ∗ U ∗ U Σ = Σ∗ Σ;
but ΣΣ−1 = In , so left- and right-multiplying respectively by Σ−∗ and Σ−1
leaves that
In U ∗ U In = In ,
which says neither more nor less than that the first n columns of U are
orthonormal. Equation (14.49) does not constrain the last m − n columns
of U , leaving us free to make them anything we want. Why not use Gram-
Schmidt to make them orthonormal, too, thus making U a unitary matrix?
If we do this, then the surprisingly simple (14.49) constitutes the singular-
value decomposition of A.
If A happens to have broad shape then we can decompose A∗ , instead,
so this case poses no special trouble. Apparently every full-rank matrix has
a singular-value decomposition.
But what of the matrix of less than full rank r < n? In this case the
product A∗ A is singular and has only s < n nonzero eigenvalues (it may be
398 CHAPTER 14. THE EIGENVALUE
AV Σ−1 = U Is ,
AV = U Σ, (14.50)
∗
A = U ΣV .
projectors other books32 include. What has given these chapters their hefty
bulk is not so much the immediate development of the essential agents as the
preparatory development of theoretical tools used to construct the essential
agents, yet most of the tools are of limited interest in themselves; it is the
agents that matter. Tools like the projector not used here tend to be omitted
here or deferred to later chapters, not because they are altogether useless but
because they are not used here and because the present chapters are already
too long. The reader who understands the Moore-Penrose pseudoinverse
and/or the Gram-Schmidt process reasonably well can after all pretty easily
figure out how to construct a projector without explicit instructions thereto,
should the need arise.33
Paradoxically and thankfully, more advanced and more specialized ma-
trix theory though often harder tends to come in smaller, more manageable
increments: the Cholesky decomposition, for instance; or the conjugate-
gradient algorithm. The theory develops endlessly. From the present pause
one could proceed directly to such topics. However, since this is the first
proper pause these several matrix chapters have afforded, since the book
is Derivations of Applied Mathematics rather than Derivations of Applied
Matrices, maybe we ought to take advantage to change the subject.
32
Such as [30, § 3.VI.3], a lengthy but well-knit tutorial this writer recommends.
33
Well, since we have brought it up (though only as an example of tools these chapters
have avoided bringing up), briefly: a projector is a matrix that flattens an arbitrary
vector b into its nearest shadow b̃ within some restricted subspace. If the columns of A
represent the subspace, then x represents b̃ in the subspace basis iff Ax = b̃, which is to
say that Ax ≈ b, whereupon x = A† b. That is, per (13.32),
in which the matrix B(B ∗ B)−1 B ∗ is the projector. Thence it is readily shown that the
deviation b − b̃ lies orthogonal to the shadow b̃. More broadly defined, any matrix M for
which M 2 = M is a projector. One can approach the projector in other ways, but there
are two ways at least.
400 CHAPTER 14. THE EIGENVALUE
Chapter 15
Vector analysis
Leaving the matrix, this chapter and the next turn to a curiously underap-
preciated agent of applied mathematics, the three-dimensional geometrical
vector, first met in §§ 3.3, 3.4 and 3.9. Seen from one perspective, the three-
dimensional geometrical vector is the n = 3 special case of the general,
n-dimensional vector of Chs. 11 through 14. Because its three elements
represent the three dimensions of the physical world, however, the three-
dimensional geometrical vector merits closer attention and special treat-
ment.1
It also merits a shorter name. Where the geometrical context is clear—as
it is in this chapter and the next—we shall call the three-dimensional geo-
metrical vector just a vector. A name like “matrix vector” or “n-dimensional
vector” can disambiguate the vector of Chs. 11 through 14 where necessary
but, since the three-dimensional geometrical vector is in fact a vector, it
usually is not necessary to disambiguate. The lone word vector serves.
In the present chapter’s context and according to § 3.3, a vector con-
sists of an amplitude of some kind plus a direction. Per § 3.9, three scalars
called coordinates suffice together to specify the amplitude and direction
and thus the vector, the three being (x, y, x) in the rectangular coordinate
system, (ρ; φ, z) in the cylindrical coordinate system, or (r; θ; φ) in the spher-
ical spherical coordinate system—as Fig. 15.1 illustrates and Table 3.4 on
page 65 interrelates—among other, more exotic possibilities (§ 15.7).
The vector brings an elegant notation. This chapter and Ch. 16 detail
1
[12, Ch. 2]
401
402 CHAPTER 15. VECTOR ANALYSIS
ẑ
r
θ z
ŷ
φ
x̂ ρ
403
for the aspect coefficient relative to a local surface normal (and if the sen-
tence’s words do not make sense to you yet, don’t worry; just look the
symbols over and appreciate the expression’s bulk). The same coefficient in
standard vector notation is
n̂ · ∆r̂.
Besides being more evocative (once one has learned to read it) and much
more compact, the standard vector notation brings the major advantage
of freeing a model’s geometry from reliance on any particular coordinate
system. Reorienting axes (§ 15.1) for example knots the former expression
like spaghetti but does not disturb the latter expression at all.
Two-dimensional geometrical vectors arise in practical modeling about
as often as three-dimensional geometrical vectors do. Fortunately, the two-
dimensional case needs little special treatment, for it is just the three-
dimensional with z = 0 or θ = 2π/4 (see however § 15.6).
Here at the outset, a word on complex numbers seems in order. Unlike
most of the rest of the book this chapter and the next will work chiefly
in real numbers, or at least in real coordinates. Notwithstanding, complex
coordinates are possible. Indeed, in the rectangular coordinate system com-
plex coordinates are perfectly appropriate and are straightforward enough to
handle. The cylindrical and spherical systems however, which these chapters
also treat, were not conceived with complex coordinates in mind; and, al-
though it might with some theoretical subtlety be possible to treat complex
radii, azimuths and elevations consistently as three-dimensional coordinates,
these chapters will not try to do so.2 (This is not to say that you cannot
have a complex vector like, say, ρ̂[3 + j2] − φ̂[1/4] in a nonrectangular basis.
You can have such a vector, it is fine, and these chapters will not avoid it.
What these chapters will avoid are complex nonrectangular coordinates like
[3 + j2; −1/4, 0].)
Vector addition will already be familiar to the reader from Ch. 3 or (quite
likely) from earlier work outside this book. This chapter therefore begins
with the reorientation of axes in § 15.1 and vector multiplication in § 15.2.
2
The author would be interested to learn if there existed an uncontrived scientific or
engineering application that actually used complex, nonrectangular coordinates.
404 CHAPTER 15. VECTOR ANALYSIS
15.1 Reorientation
Matrix notation expresses the rotation of axes (3.5) as
x̂′
2 3 2 32 3
cos φ sin φ 0 x̂
4 ŷ′ 5 = 4 − sin φ cos φ 0 54 ŷ 5.
ẑ′ 0 0 1 ẑ
In three dimensions however one can do more than just to rotate the x and y
axes about the z. One can reorient the three axes generally as follows.
Notice in (15.1) and (15.2) that the transpose (though curiously not the
adjoint) of each 3 × 3 Tait-Bryan factor is also its inverse.
In concept, the Tait-Bryan equations (15.1) and (15.2) say nearly all
one needs to say about reorienting axes in three dimensions; but, still, the
equations can confuse the uninitiated. Consider a vector
It is not the vector one reorients but rather the axes used to describe the
vector. Envisioning the axes as in Fig. 15.1 with the z axis upward, one first
yaws the x axis through an angle φ toward the y then pitches it downward
through an angle θ away from the z. Finally, one rolls the y and z axes
through an angle ψ about the new x, all the while maintaining the three
axes rigidly at right angles to one another. These three Tait-Bryan rotations
can orient axes any way. Yet, even once one has clearly visualized the Tait-
Bryan sequence, the prospect of applying (15.2) (which inversely represents
the sequence) to (15.3) can still seem daunting until one rewrites the latter
equation in the form
x
v = x̂ ŷ ẑ y , (15.4)
z
where
x′
2 3 2 32 32 32 3
1 0 0 cos θ 0 − sin θ cos φ sin φ 0 x
4 y′ 5 ≡ 4 0 cos ψ sin ψ 54 0 1 0 54 − sin φ cos φ 0 54 y 5,
z′ 0 − sin ψ cos ψ sin θ 0 cos θ 0 0 1 z
(15.5)
and where Table 3.4 converts to cylindrical or spherical coordinates if and
as desired. Since (15.5) resembles (15.1), it comes as no surprise that its
inverse,
2 3 2 32 32 32 ′ 3
x cos φ − sin φ 0 cos θ 0 sin θ 1 0 0 x
4 y 5 = 4 sin φ cos φ 0 54 0 1 0 54 0 cos ψ − sin ψ 54 y ′ 5,
z 0 0 1 − sin θ 0 cos θ 0 sin ψ cos ψ z′
(15.6)
resembles (15.2).
406 CHAPTER 15. VECTOR ANALYSIS
x̂′
2 3 2 32 32 32 3
cos ψ sin ψ 0 cos θ 0 − sin θ cos φ sin φ 0 x̂
4 ŷ′ 5 = 4 − sin ψ cos ψ 0 54 0 1 0 54 − sin φ cos φ 0 54 ŷ 5;
ẑ′ 0 0 1 sin θ 0 cos θ 0 0 1 ẑ
(15.7)
and inversely
2 3 2 32 32 32 ′ 3
x̂ cos φ − sin φ 0 cos θ 0 sin θ cos ψ − sin ψ 0 x̂
4 ŷ 5 = 4 sin φ cos φ 0 54 0 1 0 54 sin ψ cos ψ 0 54 ŷ′ 5.
ẑ 0 0 1 − sin θ 0 cos θ 0 0 1 ẑ′
(15.8)
Whereas the Tait-Bryan point the x axis first, the Euler tactic is to point
first the z.
So, that’s it. One can reorient three axes arbitrarily by rotating them in
pairs about the z, y and x or the z, y and z axes in sequence—or, general-
izing, in pairs about any of the three axes so long as the axis of the middle
rotation differs from the axes (Tait-Bryan) or axis (Euler) of the first and
last. A firmer grasp of the reorientation of axes in three dimensions comes
with practice, but those are the essentials of it.
15.2 Multiplication
One can multiply a vector in any of three ways. The first, scalar multipli-
cation, is trivial: if a vector v is as defined by (15.3), then
v1 · v2 = x1 x2 + y1 y2 + z1 z2 , (15.10)
which, if the vectors v1 and v2 are real, is the product of the two vectors
to the extent to which they run in the same direction. It is the product to
the extent to which the vectors run in the same direction because one can
reorient axes to point x̂′ in the direction of v1 , whereupon v1 · v2 = x′1 x′2
since y1′ and z1′ have vanished.
Naturally, to be valid, the dot product must not vary under a reorienta-
tion of axes; and indeed if we write (15.10) in matrix notation,
x2
v1 · v2 = x1 y1 z1 y2 , (15.11)
z2
and then expand each of the two factors on the right according to (15.6),
we see that the dot product does not in fact vary. As in (13.43) of § 13.8,
here too the relationship
gives the angle θ between two vectors according Fig. 3.1’s cosine if the vectors
are real, by definition hereby if complex. Consequently, the two vectors are
mutually orthogonal—that is, the vectors run at right angles θ = 2π/4 to
one another—if and only if
v1∗ · v2 = 0.
That the dot product is commutative,
v2 · v1 = v1 · v2 , (15.13)
b
a · b = ab cos θ
θ
a
b cos θ
which they run in the same direction, the cross product is the product of two
vectors to the extent to which they run in different directions. Unlike the
dot product the cross product is a vector, defined in rectangular coordinates
as
x̂ ŷ ẑ
v1 × v2 = x1 y1 z1 (15.14)
x2 y2 z2
≡ x̂(y1 z2 − z1 y2 ) + ŷ(z1 x2 − x1 z2 ) + ẑ(x1 y2 − y1 x2 ),
of (15.14) arises again and again in vector analysis. Where the pro-
gression is honored, as in ẑx1 y2 , the associated term bears a + sign,
otherwise a − sign, due to § 11.6’s parity principle and the right-hand
rule.
15.2. MULTIPLICATION 409
v2 × v1 = −v1 × v2 , (15.16)
(v1 × v2 ) × v3 6= v1 × (v2 × v3 ),
• The cross product runs perpendicularly to each of its two factors if the
vectors involved are real. That is,
• Unlike the dot product, the cross product is closely tied to three-
dimensional space. Two-dimensional space (a plane) can have a cross
product so long as one does not mind that the product points off
into the third dimension, but to speak of a cross product in four-
dimensional space would require arcane definitions and would oth-
erwise make little sense. Fortunately, the physical world is three-
dimensional (or, at least, the space in which we model all but a few,
exotic physical phenomena is three-dimensional), so the cross prod-
uct’s limitation as defined here to three dimensions will seldom if ever
disturb us.
• Section 15.2.1 has related the cosine of the angle between vectors to
the dot product. One can similarly relate the angle’s sine to the cross
product if the vectors involved are real, as
|v1 × v2 | = v1 v2 sin θ,
(15.18)
|v̂1 × v̂2 | = sin θ,
demonstrated by reorienting axes such that v̂1 = x̂′ , that v̂2 has no
component in the ẑ′ direction, and that v̂2 has only a nonnegative com-
ponent in the ŷ′ direction; by remembering that reorientation cannot
410 CHAPTER 15. VECTOR ANALYSIS
c = a×b
= ĉab sin θ
8
Conventionally one would prefer the letter v to represent speed, with velocity as v
which in the present example would happen to be v = ℓ̂v. However, this section will
require the letter v for an unrelated purpose.
9
A complex orthogonal basis is also theoretically possible but is normally unnecessary
in geometrical applications and involves subtleties in the cross product. This chapter,
which specifically concerns three-dimensional geometrical vectors rather than the general,
n-dimensional vectors of Ch. 11, is content to consider real bases only. Note that one can
express a complex vector in a real basis.
412 CHAPTER 15. VECTOR ANALYSIS
constitutes such an orthogonal basis, from which other vectors can be built.
The geometries of some models suggest no particular basis, when one usually
just uses a constant [x̂ ŷ ẑ]. The geometries of other models however do
suggest a particular basis, often a variable one.
• Where the model features a contour like the example’s winding road,
an [ℓ̂ v̂ ŵ] basis (or a [û v̂ ℓ̂] basis or even a [û ℓ̂ ŵ] basis) can be
used, where ℓ̂ locally follows the contour. The variable unit vectors v̂
and ŵ (or û and v̂, etc.) can be defined in any convenient way so
long as they remain perpendicular to one another and to ℓ̂—such that
(ẑ × ℓ̂) · ŵ = 0 for instance (that is, such that ŵ lay in the plane of ẑ
and ℓ̂)—but if the geometry suggests a particular v̂ or ŵ (or û), like
the direction right-to-left across the example’s road, then that v̂ or ŵ
should probably be used. The letter ℓ here stands for “longitudinal.”10
• Where the model features a curved surface like the surface of a wavy
sea,11 a [û v̂ n̂] basis (or a [û n̂ ŵ] basis, etc.) can be used, where n̂
points locally perpendicularly to the surface. The letter n here stands
for “normal,” a synonym for “perpendicular.” Observe, incidentally
but significantly, that such a unit normal n̂ tells one everything one
needs to know about its surface’s local orientation.
• Combining the last two, where the model features a contour along a
curved surface, an [ℓ̂ v̂ n̂] basis can be used. One need not worry about
choosing a direction for v̂ in this case since necessarily v̂ = n̂ × ℓ̂.
• Where the model features a circle or cylinder, a [ρ̂ φ̂ ẑ] basis can
be used, where ẑ is constant and runs along the cylinder’s axis (or
perpendicularly through the circle’s center), ρ̂ is variable and points
locally away from the axis, and φ̂ is variable and runs locally along
the circle’s perimeter in the direction of increasing azimuth φ. Refer
to § 3.9 and Fig. 15.4.
10
The assertion wants a citation, which the author lacks.
11
[49]
15.3. ORTHOGONAL BASES 413
φ̂
ρ̂
b
ρ
ẑ
φ
• Where the model features a sphere, an [r̂ θ̂ φ̂] basis can be used,
where r̂ is variable and points locally away from the sphere’s cen-
ter, θ̂ is variable and runs locally tangentially to the sphere’s surface
in the direction of increasing elevation θ (that is, though not usually
in the −ẑ direction itself, as nearly as possible to the −ẑ direction
without departing from the sphere’s surface), and φ̂ is variable and
runs locally tangentially to the sphere’s surface in the direction of in-
creasing azimuth φ (that is, along the sphere’s surface perpendicularly
to ẑ). Standing on the earth’s surface, with the earth as the sphere, r̂
would be up, θ̂ south, and φ̂ east. Refer to § 3.9 and Fig. 15.5.
• Occasionally a model arises with two circles that share a center but
whose axes stand perpendicular to one another. In such a model one
conventionally establishes ẑ as the direction of the principal circle’s
axis but then is left with x̂ or ŷ as the direction of the secondary
x x x y y y
circle’s axis, upon which an [x̂ ρ̂x φ̂ ], [φ̂ r̂ θ̂ ], [φ̂ ŷ ρ̂y ] or [θ̂ φ̂ r̂]
basis can be used locally as appropriate. Refer to § 3.9.
Many other orthogonal bases are possible (as in § 15.7, for instance), but the
foregoing are the most common. Whether listed here or not, each orthogonal
basis orders its three unit vectors by the right-hand rule (15.19).
414 CHAPTER 15. VECTOR ANALYSIS
r̂
b
φ̂
θ̂
Quiz: what does the vector expression ρ̂3 − φ̂(1/4) + ẑ2 mean? Wrong
answer: it meant the cylindrical coordinates (3; −1/4, 2); or, it meant the
position vector x̂3 cos(−1/4) + ŷ3 sin(−1/4) + ẑ2 associated with those co-
ordinates. Right answer: the expression means nothing certain in itself
but acquires a definite meaning only when an azimuthal coordinate φ is
also supplied, after which the expression indicates the ordinary rectangular
vector x̂′ 3 − ŷ′ (1/4) + ẑ′ 2, where x̂′ = ρ̂ = x̂ cos φ + ŷ sin φ, ŷ′ = φ̂ =
−x̂ sin φ + ŷ cos φ, and ẑ′ = ẑ. But, if this is so—if the cylindrical basis
[ρ̂ φ̂ ẑ] is used solely to express rectangular vectors—then why should we
name this basis “cylindrical”? Answer: only because cylindrical coordinates
(supplied somewhere) determine the actual directions of its basis vectors.
Once directions are determined such a basis is used purely rectangularly,
like any other orthogonal basis.
This can seem confusing until one has grasped what the so-called non-
rectangular bases are for. Consider the problem of air flow in a jet engine.
It probably suits such a problem that instantaneous local air velocity within
the engine cylinder be expressed in cylindrical coordinates, with the z axis
oriented along the engine’s axle; but this does not mean that the air flow
within the engine cylinder were everywhere ẑ-directed. On the contrary, a
local air velocity of q = [−ρ̂5.0 + φ̂30.0 − ẑ250.0] m/s would have air moving
15.3. ORTHOGONAL BASES 415
through the point in question at 250.0 m/s aftward along the axle, 5.0 m/s
inward toward the axle and 30.0 m/s circulating about the engine cylinder.
In this model, it is true that the basis vectors ρ̂ and φ̂ indicate differ-
ent directions at different positions within the cylinder, but at a particular
position the basis vectors are still used rectangularly to express q, the in-
stantaneous local air velocity at that position. It’s just that the “rectangle”
is rotated locally to line up with the axle.
Naturally, you cannot make full sense of an air-velocity vector q unless
you have also the coordinates (ρ; φ, z) of the position within the engine
cylinder at which the air has the velocity the vector specifies—yet this is
when confusion can arise, for besides the air-velocity vector there is also,
separately, a position vector r = x̂ρ cos φ + ŷρ sin φ + ẑz. One may denote
the air-velocity vector as12 q(r), a function of position; yet, though the
position vector is as much a vector as the velocity vector is, one nonetheless
handles it differently. One will not normally express the position vector r in
the cylindrical basis.
It would make little sense to try to express the position vector r in the
cylindrical basis because the position vector is the very thing that determines
the cylindrical basis. In the cylindrical basis, after all, the position vector
is necessarily r = ρ̂ρ + ẑz (and consider: in the spherical basis it is the
even more cryptic r = r̂r), and how useful is that, really? Well, maybe it
is useful in some situations, but for the most part to express the position
vector in the cylindrical basis would be as to say, “My house is zero miles
away from home.” Or, “The time is presently now.” Such statements may
be tautologically true, perhaps, but they are confusing because they only
seem to give information. The position vector r determines the basis, after
which one expresses things other than position, like instantaneous local air
velocity q, in that basis. In fact, the only basis normally suitable to express
a position vector is a fixed rectangular basis like [x̂ ŷ ẑ]. Otherwise, one
uses cylindrical coordinates (ρ; φ, z), but not a cylindrical basis [ρ̂ φ̂ ẑ], to
express a position r in a cylindrical geometry.
Maybe the nonrectangular bases were more precisely called “rectangular
bases of the nonrectangular coordinate systems,” but those are too many
words and, anyway, that is not how the usage has evolved. Chapter 16 will
elaborate the story by considering spatial derivatives of quantities like air
velocity, when one must take the variation in ρ̂ and φ̂ from point to point
12
Conventionally, one is much more likely to denote a velocity vector as u(r) or v(r),
except that the present chapter is (as footnote 8 has observed) already using the letters u
and v for an unrelated purpose. To denote position as r however is entirely standard.
416 CHAPTER 15. VECTOR ANALYSIS
15.4 Notation
The vector notation of §§ 15.1 and 15.2 is correct, familiar and often expe-
dient but sometimes inconveniently prolix. This admittedly difficult section
augments the notation to render it much more concise.
the abbreviation lends a more amenable notation to the dot and cross prod-
ucts of (15.10) and (15.14):
a · b = ax bx + ay by + az bz ; (15.21)
x̂ ŷ ẑ
a × b = ax ay az . (15.22)
bx by bz
13
“Wait!” comes the objection. “I thought that you said that ax meant x̂ · a. Now you
claim that it means the x component of a?”
But there is no difference between x̂ · a and the x component of a. The two are one and
the same.
15.4. NOTATION 417
Because all those prime marks burden the notation and for professional
mathematical reasons, the general forms (15.23) and (15.24) are sometimes
rendered
a · b = a1 b1 + a2 b2 + a3 b3 ,
ê1 ê2 ê3
a × b = a1 a2 a3 ,
b1 b2 b3
but you have to be careful about that in applied usage because people are
not always sure whether a symbol like a3 means “the third component of
the vector a” (as it does here) or “the third vector’s component in the â
direction” (as it would in eqn. 15.10). Typically, applied mathematicians will
write in the manner of (15.21) and (15.22) with the implied understanding
that they really mean (15.23) and (15.24) but prefer not to burden the
notation with extra little strokes—that is, with the implied understanding
that x, y and z could just as well be ρ, φ and z or the coordinates of any
other orthogonal, right-handed, three-dimensional basis.
Some pretty powerful confusion can afflict the student regarding the
roles of the cylindrical symbols ρ, φ and z; or, worse, of the spherical sym-
bols r, θ and φ. Such confusion reflects a pardonable but remediable lack
of understanding of the relationship between coordinates like ρ, φ and z
and their corresponding unit vectors ρ̂, φ̂ and ẑ. Section 15.3 has already
written of the matter; but, further to dispel the confusion, one can now
ask the student what the cylindrical coordinates of the vectors ρ̂, φ̂ and ẑ
are. The correct answer: (1; φ, 0), (1; φ + 2π/4, 0) and (0; 0, 1), respectively.
Then, to reinforce, one can ask the student which cylindrical coordinates
the variable vectors ρ̂ and φ̂ are functions of. The correct answer: both are
functions of the coordinate φ only (ẑ, a constant vector, is not a function
of anything). What the student needs to understand is that, among the
cylindrical coordinates, φ is a different kind of thing than z and ρ are:
418 CHAPTER 15. VECTOR ANALYSIS
• but ρ̂, φ̂ and ẑ are all the same kind of thing, unit vectors;
• and, separately, aρ , aφ and az are all the same kind of thing, lengths.
Now to ask the student a harder question: in the cylindrical basis, what is
the vector representation of (ρ1 ; φ1 , z1 )? The correct answer: ρ̂ρ1 cos(φ1 −
φ) + φ̂ρ1 sin(φ1 − φ) + ẑz1 . The student that gives this answer probably
grasps the cylindrical symbols.
If the reader feels that the notation begins to confuse more than it de-
scribes, the writer empathizes but regrets to inform the reader that the rest
of the section, far from granting the reader a comfortable respite to absorb
the elaborated notation as it stands, shall not delay to elaborate the no-
tation yet further! The confusion however is subjective. The trouble with
vector work is that one has to learn to abbreviate or the expressions in-
volved grow repetitive and unreadably long. For vectors, the abbreviated
notation really is the proper notation. Eventually one accepts the need and
takes the trouble to master the conventional vector abbreviation this section
presents; and, indeed, the abbreviation is rather elegant once one becomes
used to it. So, study closely and take heart! The notation is not actually as
impenetrable as it at first will seem.
14
[32]
15
Some professional mathematicians now write a superscript ai in certain cases in place
of a subscript ai , where the superscript bears some additional semantics [67, “Einstein
notation,” 05:36, 10 February 2008]. Scientists and engineers however tend to prefer
Einstein’s original, subscript-only notation.
15.4. NOTATION 419
which is (15.23), except that Einstein’s form (15.25) expresses it more suc-
cinctly. Likewise,
a × b = ı̂(ai+1 bi−1 − bi+1 ai−1 ) (15.26)
is (15.24)—although an experienced applied mathematician would probably
apply the Levi-Civita epsilon of § 15.4.3, below, to further abbreviate this
last equation to the form of (15.27) before presenting it.
Einstein’s summation convention is also called the Einstein notation, a
term sometimes taken loosely to include also the Kronecker delta and Levi-
Civita epsilon of § 15.4.3.
What is important to understand about Einstein’s summation conven-
tion is that, in and of itself, it brings no new mathematics. It is rather a
notational convenience.16 It asks a reader to regard a repeated index P like
the i in “ai bi ” as a dummy index (§ 2.3) and thus to read “ai bi ” as “ i ai bi .”
It does not magically create a summation where none existed; it just hides
the summation sign to keep it from cluttering the page. It is the kind of
notational trick an accountant
P might appreciate. Under the convention, the
summational operator i is implied not written, but the operator is still
there. Admittedly confusing on first encounter, the convention’s utility and
charm are felt after only a little practice.
Incidentally, nothing requires you to invoke Einstein’s summation con-
vention everywhere and for all purposes. You can waive the convention,
writing the summation symbol out explicitly whenever you like.17 In con-
texts outside vector analysis, to invoke the convention at all may make little
sense. Nevertheless, you should indeed learn the convention—if only because
you must learn it to understand the rest of this chapter—but once having
learned it you should naturally use it only where it actually serves to clarify.
Fortunately, in vector work, it often does just that.
Quiz:18 if δij is the Kronecker delta of § 11.2, then what does the sym-
bol δii represent where Einstein’s summation convention is in force?
Table 15.1: Properties of the Kronecker delta and the Levi-Civita epsilon,
with Einstein’s summation convention in force.
δjk = δkj
δij δjk = δik
δii = 3
δjk ǫijk = 0
δnk ǫijk = ǫijn
ǫijk = ǫjki = ǫkij = −ǫikj = −ǫjik = −ǫkji
ǫijk ǫijk = 6
ǫijn ǫijk = 2δnk
ǫimn ǫijk = δmj δnk − δmk δnj
property that ǫimn ǫijk = δmj δnk − δmk δnj is proved by observing that, in
the case that i = x′ , either (j, k) = (y ′ , z ′ ) or (j, k) = (z ′ , y ′ ), and also
either (m, n) = (y ′ , z ′ ) or (m, n) = (z ′ , y ′ ); and similarly in the cases that
i = y ′ and i = z ′ (more precisely, in each case the several indices can take
any values, but combinations other than the ones listed drive ǫimn or ǫijk ,
or both, to zero, thus contributing nothing to the sum). This implies that
either (j, k) = (m, n) or (j, k) = (n, m)—which, when one takes parity into
account, is exactly what the property in question asserts. The property that
ǫijn ǫijk = 2δnk is proved by observing that, in any given term of the Einstein
sum, i is either x′ or y ′ or z ′ and that j is one of the remaining two, which
leaves the third to be shared by both k and n. The factor 2 appears because,
for k = n = x′ , an (i, j) = (y ′ , z ′ ) term and an (i, j) = (z ′ , y ′ ) term both
contribute positively to the sum; and similarly for k = n = y ′ and again for
k = n = z′ .
Unfortunately, the last paragraph likely makes sense to few who do not
already know what it means. A concrete example helps. Consider the
compound product c × (a × b). In this section’s notation and with the use
422 CHAPTER 15. VECTOR ANALYSIS
c × (a × b) = c × (ǫijk ı̂aj bk )
= ǫmni m̂cn (ǫijk ı̂aj bk )i
= ǫmni ǫijk m̂cn aj bk
= ǫimn ǫijk m̂cn aj bk
= (δmj δnk − δmk δnj )m̂cn aj bk
= δmj δnk m̂cn aj bk − δmk δnj m̂cn aj bk
= ̂ck aj bk − k̂cj aj bk
= (̂aj )(ck bk ) − (k̂bk )(cj aj ).
X
c × (a × b) = ǫimn ǫijk m̂cn aj bk
i,j,k,m,n
X
= (δmj δnk − δmk δnj )m̂cn aj bk ,
j,k,m,n
15.4. NOTATION 423
which makes sense if you think about it hard enough,24 and justifies the
24
If thinking about it hard enough does not work, then here it is in interminable detail:
X
ǫimn ǫijk f (j, k, m, n)
i,j,k,m,n
− f (y ′ , z ′ , z ′ , y ′ ) + f (z ′ , x′ , x′ , z ′ ) + f (x′ , y ′ , y ′ , x′ )
ˆ
+ f (z ′ , y ′ , y ′ , z ′ ) + f (x′ , z ′ , z ′ , x′ ) + f (y ′ , x′ , x′ , y ′ )
˜
f (y ′ , z ′ , y ′ , z ′ ) + f (z ′ , x′ , z ′ , x′ ) + f (x′ , y ′ , x′ , y ′ )
ˆ
=
+ f (z ′ , y ′ , z ′ , y ′ ) + f (x′ , z ′ , x′ , z ′ ) + f (y ′ , x′ , y ′ , x′ )
+ f (x′ , x′ , x′ , x′ ) + f (y ′ , y ′ , y ′ , y ′ ) + f (z ′ , z ′ , z ′ , z ′ )
˜
− f (y ′ , z ′ , z ′ , y ′ ) + f (z ′ , x′ , x′ , z ′ ) + f (x′ , y ′ , y ′ , x′ )
ˆ
+ f (z ′ , y ′ , y ′ , z ′ ) + f (x′ , z ′ , z ′ , x′ ) + f (y ′ , x′ , x′ , y ′ )
+ f (x′ , x′ , x′ , x′ ) + f (y ′ , y ′ , y ′ , y ′ ) + f (z ′ , z ′ , z ′ , z ′ )
˜
X
= (δmj δnk − δmk δnj )f (j, k, m, n).
j,k,m,n
That is for the property that ǫimn ǫijk = δmj δnk −δmk δnj . For the property that ǫijn ǫijk =
2δnk , the corresponding calculation is
X
ǫijn ǫijk f (k, n)
i,j,k,n
= ǫy ′ z ′ x′ ǫy ′ z ′ x′ f (x′ , x′ ) + ǫz ′ y ′ x′ ǫz ′ y ′ x′ f (x′ , x′ )
+ ǫz ′ x′ y ′ ǫz ′ x′ y ′ f (y ′ , y ′ ) + ǫx′ z ′ y ′ ǫx′ z ′ y ′ f (y ′ , y ′ )
+ ǫx′ y ′ z ′ ǫx′ y ′ z ′ f (z ′ , z ′ ) + ǫy ′ x′ z ′ ǫy ′ x′ z ′ f (z ′ , z ′ )
= f (x′ , x′ ) + f (x′ , x′ ) + f (y ′ , y ′ ) + f (y ′ , y ′ ) + f (z ′ , z ′ ) + f (z ′ , z ′ )
2 f (x′ , x′ ) + f (y ′ , y ′ ) + f (z ′ , z ′ )
ˆ ˜
=
X
= 2 δnk f (k, n).
k,n
table’s claim that ǫimn ǫijk = δmj δnk − δmk δnj . (Notice that the compound
Kronecker operator δmj δnk includes nonzero terms for the case that j = k =
m = n = x′ , for the case that j = k = m = n = y ′ and for the case that
j = k = m = n = z ′ , whereas the compound Levi-Civita operator ǫimn ǫijk
does not. However, the compound Kronecker operator −δmk δnj includes
canceling terms for these same three cases. This is why the table’s claim is
valid as written.)
To belabor the topic further here would serve little purpose. The reader
who does not feel entirely sure that he understands what is going on might
work out the table’s several properties with his own pencil, in something like
the style of the example, until he is satisfied that he adequately understands
the several properties and their correct use.
Section 16.7 will refine the notation for use when derivatives with respect
to angles come into play but, before leaving the present section, we might
pause for a moment to appreciate (15.29) in the special case that b = c = n̂:
The difference a − n̂(n̂·a) evidently projects a vector a onto the plane whose
unit normal is n̂. Equation (15.30) reveals that the double cross product
−n̂ × (n̂ × a) projects the same vector onto the same plane. Figure 15.6
illustrates.
n̂(n̂ · a)
n̂
−n̂ × (n̂ × a)
= a − n̂(n̂ · a)
That is,
c · (a × b) = a · (b × c) = b · (c × a). (15.31)
Besides the several vector identities, the table also includes the three vector
products in Einstein notation.27
Each definition and identity of Table 15.2 is invariant under reorientation
of axes.
15.6 Isotropy
A real,28 three-dimensional coordinate system29 (α; β; γ) is isotropic at a
point r = r1 if and only if
β̂(r1 ) · γ̂(r1 ) = 0,
γ̂(r1 ) · α̂(r1 ) = 0, (15.32)
α̂(r1 ) · β̂(r1 ) = 0,
and
∂r ∂r ∂r
∂α =
= . (15.33)
r=r1 ∂β r=r1 ∂γ r=r1
where ργ = α̂α + β̂β represents position in the α-β plane. (If the α-β plane
happens to be the x-y plane, as is often the case, then ργ = ρz = ρ and per
eqn. 3.20 one can omit the superscript.) The two-dimensional rectangular
system (x, y) naturally is isotropic. Because |∂ρ/∂φ| = (ρ) |∂ρ/∂ρ| the
standard two-dimensional cylindrical system (ρ; φ) as such is nonisotropic,
but the change of coordinate
ρ
λ ≡ ln , (15.36)
ρo
called upon to devise. The two three-dimensional parabolic systems are the
parabolic cylindrical system (σ, τ, z) of § 15.7.4 and the circular paraboloidal
system31 (η; φ, ξ) of § 15.7.5, where the angle φ and the length z are familiar
to us but σ, τ , η and ξ—neither angles nor lengths but root-lengths (that
is, coordinates having dimensions of [length]1/2 )—are new.32 Both three-
dimensional parabolic systems derive from the two-dimensional parabolic
system (σ, τ ) of § 15.7.2.33
However, before handling any parabolic system we ought formally to
introduce the parabola itself, next.
b
a
a
σ2
any way (though naturally in that case eqns. 15.37 and 15.40 would have
to be modified). Observe also the geometrical fact that the parabola’s track
necessarily bisects the angle between the two line segments labeled “a” in
Fig. 15.7. One of the consequences of this geometrical fact—a fact it seems
better to let the reader visualize and ponder than to try to justify in so many
words36 —is that a parabolic mirror reflects precisely37 toward its focus all
light rays that arrive perpendicularly to its directrix (which for instance is
why satellite dish antennas have parabolic cross-sections).
a
τ2
ρ=0 b
a
a
σ2
conclude, significantly, that the two parabolas cross precisely at right angles
to one another.
Figure 15.9 lays out the parabolic coordinate grid. Notice in the figure
that one of the grid’s several cells is subdivided at its quarter-marks for
illustration’s sake, to show how one can subgrid at need to locate points like,
for example, (σ, τ ) = ( 72 , − 94 ) visually. (That the subgrid’s cells approach
square shape implies that the parabolic system is isotropic, a significant fact
§ 15.7.3 will demonstrate formally.)
Using the Pythagorean theorem, one can symbolically express the equi-
distant construction rule above as
a = σ 2 + y = τ 2 − y,
(15.41)
a2 = ρ2 = x2 + y 2 .
τ 2 − σ2
y= . (15.42)
2
On the other hand, combining the two lines of (15.41),
(σ 2 + y)2 = x2 + y 2 = (τ 2 − y)2 ,
or, subtracting y 2 ,
σ 4 + 2σ 2 y = x2 = τ 4 − 2τ 2 y.
Substituting (15.42)’s expression for y,
x2 = (στ )2 .
432 CHAPTER 15. VECTOR ANALYSIS
σ=0 ±1
±2
±3
±3
±2
τ =0 ±1
x = στ. (15.43)
τ 2 + σ2
ρ= . (15.44)
2
Combining (15.42) and (15.44) to isolate σ 2 and τ 2 yields
σ 2 = ρ − y,
(15.45)
τ 2 = ρ + y.
15.7.3 Properties
The derivatives of (15.43), (15.42) and (15.44) are
dx = σ dτ + τ dσ,
dy = τ dτ − σ dσ, (15.46)
dρ = τ dτ + σ dσ.
15.7. PARABOLIC COORDINATES 433
Solving the first two lines of (15.46) simultaneously for dσ and dτ and then
collapsing the resultant subexpression τ 2 + σ 2 per (15.44) yields
τ dx − σ dy
dσ = ,
2ρ
(15.47)
σ dx + τ dy
dτ = ,
2ρ
from which it is apparent that
x̂τ − ŷσ
σ̂ = √ ,
τ 2 + σ2
x̂σ + ŷτ
τ̂ = √ ;
τ 2 + σ2
or, collapsing again per (15.44), that
x̂τ − ŷσ
σ̂ = √ ,
2ρ
(15.48)
x̂σ + ŷτ
τ̂ = √ ,
2ρ
of which the dot product
σ̂ · τ̂ = 0 if ρ 6= 0 (15.49)
is null, confirming our earlier finding that the various grid parabolas cross
always at right angles to one another. Solving (15.48) simultaneously for x̂
and ŷ then produces
τ̂ σ + σ̂τ
x̂ = √ ,
2ρ
(15.50)
τ̂ τ − σ̂σ
ŷ = √ .
2ρ
One can express an infinitesimal change in position in the plane as
dρ = x̂ dx + ŷ dy
= x̂(σ dτ + τ dσ) + ŷ(τ dτ − σ dσ)
= (x̂τ − ŷσ) dσ + (x̂σ + ŷτ ) dτ,
in which (15.46) has expanded the differentials and from which
∂ρ
= x̂τ − ŷσ,
∂σ
∂ρ
= x̂σ + ŷτ,
∂τ
434 CHAPTER 15. VECTOR ANALYSIS
x = στ τ̂ σ + σ̂τ
x̂ = √
τ 2 − σ2 2ρ
y = τ̂ τ − σ̂σ
2 ŷ = √
τ + σ2
2 2ρ
ρ = x̂τ − ŷσ
2 σ̂ = √
ρ = x + y2
2 2 2ρ
σ2 = ρ − y x̂σ + ŷτ
τ̂ = √
τ2 = ρ + y 2ρ
σ̂ × τ̂ = ẑ
· τ̂ = 0
σ̂
∂ρ
= ∂ρ
∂σ ∂τ
and thus
∂ρ ∂ρ
= . (15.51)
∂σ ∂τ
ρ = ηξ ξ̂η + η̂ξ
ρ̂ = √
ξ 2 − η2 2r
z =
2 ξ̂ξ − η̂η
ẑ = √
ξ 2 + η2 2r
r =
2 ρ̂ξ − ẑη
η̂ = √
r 2 = ρ2 + z 2 = x2 + y 2 + z 2 2r
η2 = r − z ρ̂η + ẑξ
ξ̂ = √
ξ2 = r + z 2r
η̂ × ξ̂ = −φ̂
η̂ · ξ̂ = 0
∂r
= ∂r
∂η ∂ξ
Sometimes one would like to extend the parabolic system to three dimensions
by adding an azimuth φ rather than a height z. This is possible, but then
one tends to prefer the parabolas, foci and directrices of Figs. 15.8 and 15.9
to run in the ρ-z plane rather than in the x-y. Therefore, one defines the
coordinates η and ξ to represent in the ρ-z plane what the letters σ and τ
have represented in the x-y. The properties of Table 15.4 result, which are
just the properties of Table 15.3 with coordinates changed. The system is
the circular paraboloidal system (η; φ, ξ).
The surfaces of constant η and of constant ξ in the circular paraboloidal
system are paraboloids, parabolas rotated about the z axis (and the surfaces
of constant φ are planes, or half planes if you like, just as in the cylindrical
system). Like the parabolic cylindrical system, the circular paraboloidal
system too is isotropic in two dimensions.
Notice that, given the usual definition of the φ̂ unit basis vector, η̂ × ξ̂ =
−φ̂ rather than +φ̂ as one might first guess. The correct, right-handed
sequence of the orthogonal circular paraboloidal basis therefore would be
436 CHAPTER 15. VECTOR ANALYSIS
[η̂ φ̂ ξ̂].38
This concludes the present chapter on the algebra of vector analysis.
Chapter 16, next, will venture hence into the larger and even more interest-
ing realm of vector calculus.
38
See footnote 31.
Chapter 16
Vector calculus
437
438 CHAPTER 16. VECTOR CALCULUS
but, since
q(r) = x̂′ qx′ (r) + ŷ′ qy′ (r) + ẑ′ qz ′ (r)
for any orthogonal basis [x′ y′ z′ ] as well, the specific scalar fields qx (r),
qy (r) and qz (r) are no more essential to the vector field q(r) than the specific
scalars bx , by and bz are to a vector b. As we said, the three components
come tactically; typically, such components are uninteresting in themselves.
The field q(r) as a whole is the interesting thing.
Scalar and vector fields are of utmost use in the modeling of physical
phenomena.
As one can take the derivative dσ/dt or df /dt with respect to time t
of a function σ(t) or f (t), one can likewise take the derivative with respect
to position r of a field ψ(r) or a(r). However, derivatives with respect to
position create a notational problem, for it is not obvious what symbology
like dψ/dr or da/dr would actually mean. The notation dσ/dt means “the
rate of σ as time t advances,” but if the notation dψ/dr likewise meant “the
rate of ψ as position r advances” then it would necessarily prompt one to
ask, “advances in which direction?” The notation offers no hint. In fact
dψ/dr and da/dr mean nothing very distinct in most contexts and we shall
avoid such notation. If we will speak of a field’s derivative with respect to
position r then we shall be more precise.
Section 15.2 has given the vector three distinct kinds of product. This
section gives the field no fewer than four distinct kinds of derivative: the
directional derivative; the gradient; the divergence; and the curl.3
So many derivatives bring the student a conceptual difficulty one could
call “the caveman problem.” Imagine a caveman. Suppose that you tried
to describe to the caveman a house or building of more than one floor. He
might not understand. You and I who already grasp the concept of upstairs
and downstairs do not find a building of two floors, or three or even thirty,
especially hard to envision, but our caveman is used to thinking of the ground
and the floor as more or less the same thing. To try to think of upstairs
and downstairs might confuse him with partly false images of sitting in a
tree or of clambering onto (and breaking) the roof of his hut. “There are
many trees and antelopes but only one sky and floor. How can one speak of
many skies or many floors?” The student’s principal conceptual difficulty
with the several vector derivatives is of this kind.
3
Vector veterans may notice that the Laplacian is not listed. This is not because the
Laplacian were uninteresting but rather because the Laplacian is actually a second-order
derivative—a derivative of a derivative. We shall address the Laplacian in § 16.4.
16.1. FIELDS AND THEIR DERIVATIVES 439
If you think that the latter does not look very much like a vector, then the
writer thinks as you do, but consider:
The writer does not know how to interpret a nonsensical term like
“[Tuesday]ax ” any more than the reader does, but the point is that c behaves
as though it were a vector insofar as vector operations like the dot product
are concerned. What matters in this context is not that c have amplitude
and direction (it has neither) but rather that it have the three orthonormal
components it needs to participate formally in relevant vector operations.
It has these. That the components’ amplitudes seem nonsensical is beside
the point. Maybe there exists a model in which “[Tuesday]” knows how to
operate on a scalar like ax . (Operate on? Yes. Nothing in the dot product’s
definition requires the component amplitudes of c to multiply those of a.
Multiplication is what the component amplitudes of true vectors do, but c
is not a true vector, so “[Tuesday]” might do something to ax other than
to multiply it. Section 16.1.2 elaborates the point.) If there did exist such
a model, then the dot product c · a could be licit in that model. As if this
were not enough, the cross product c × a too could be licit in that model,
composed according to the usual rule for cross products. The model might
allow it. The dot and cross products in and of themselves do not forbid it.
Now consider a “vector”
∂ ∂ ∂
∇ = x̂ + ŷ + ẑ . (16.1)
∂x ∂y ∂z
This ∇ is not a true vector any more than c is, maybe, but if we treat it as
one then we have that
∂ax ∂ay ∂az
∇·a= + + .
∂x ∂y ∂z
Such a dot product might or might not prove useful; but, unlike the terms
in the earlier dot product, at least we know what this one’s terms mean.
440 CHAPTER 16. VECTOR CALCULUS
Well, days of the week, partial derivatives, ersatz vectors—it all seems
rather abstract. What’s the point? The answer is that there wouldn’t be
any point if the only nonvector “vectors” in question were of c’s nonsensical
kind. The operator ∇ however shares more in common with a true vector
than merely having x, y and z components; for, like a true vector, the
operator ∇ is amenable to having its axes reoriented by (15.1), (15.2), (15.7)
and (15.8). This is easier to see at first with respect the true vector a, as
follows. Consider rotating the x and y axes through an angle φ about the z
axis. There ensues
where the final expression has different axes than the original but, relative
to those axes, exactly the same form. Further rotation about other axes
would further reorient but naturally also would not alter the form. Now
consider ∇. The partial differential operators ∂/∂x, ∂/∂y and ∂/∂z change
no differently under reorientation than the component amplitudes ax , ay
and az do. Hence,
∂ ∂ ∂ ∂
∇ = ı̂ = x̂′ ′ + ŷ′ ′ + ẑ′ ′ , (16.2)
∂i ∂x ∂y ∂z
evidently the same operator regardless of the choice of basis [x̂′ ŷ′ ẑ′ ]. It is
this invariance under reorientation that makes the ∇ operator useful.
If ∇ takes the place of the ambiguous d/dr, then what takes the place
of the ambiguous d/dr′ , d/dro , d/dr̃, d/dr† and so on? Answer: ∇′ , ∇o , ∇, ˜
†
∇ and so on. Whatever mark distinguishes the special r, the same mark
distinguishes the corresponding special ∇. For example, where ro = ı̂io ,
there ∇o = ı̂ ∂/∂io . That is the convention.4
Introduced by Oliver Heaviside, informally pronounced “del” (in the
author’s country at least), the vector differential operator ∇ finds extensive
use in the modeling of physical phenomena. After a brief digression to
4
A few readers not fully conversant with the material of Ch. 15, to whom this chap-
ter had been making sense until the last two sentences, may suddenly find the notation
16.1. FIELDS AND THEIR DERIVATIVES 441
discuss operator notation, the subsections that follow will use the operator
to develop and present the four basic kinds of vector derivative.
in the leftmost form of which the summation sign is implied not written. Refer to § 15.4.
5
[10]
442 CHAPTER 16. VECTOR CALCULUS
∂
(b · ∇) = bi (16.3)
∂i
to express the derivative unambiguously. This operator applies equally to
the scalar field,
∂ψ
(b · ∇)ψ(r) = bi ,
∂i
as to the vector field,
∂a ∂aj
(b · ∇)a(r) = bi = ̂bi . (16.4)
∂i ∂i
For the scalar field the parentheses are unnecessary and conventionally are
omitted, as
∂ψ
b · ∇ψ(r) = bi . (16.5)
∂i
In the case (16.4) of the vector field, however, ∇a(r) itself means nothing co-
herent6 so the parentheses usually are retained. Equations (16.4) and (16.5)
define the directional derivative.
Note that the directional derivative is the derivative not of the reference
vector b but only of the field ψ(r) or a(r). The vector b just directs and
scales the derivative; it is not the object of it. Nothing requires b to be
constant, though. It can be a vector field b(r) that varies from place to
place; the directional derivative does not care.
Within (16.5), the quantity
∂ψ
∇ψ(r) = ı̂ (16.6)
∂i
is called the gradient of the scalar field ψ(r). Though both scalar and
vector fields have directional derivatives, only scalar fields have gradients.
The gradient represents the amplitude and direction of a scalar field’s locally
steepest rate.
Formally a dot product, the directional derivative operator b · ∇ is in-
variant under reorientation of axes, whereupon the directional derivative is
invariant, too. The result of a ∇ operation, the gradient ∇ψ(r) is likewise
invariant.
6
Well, it does mean something coherent in dyadic analysis [9, Appendix B], but this
book won’t treat that.
444 CHAPTER 16. VECTOR CALCULUS
16.1.4 Divergence
There exist other vector derivatives than the directional derivative and gra-
dient of § 16.1.3. One of these is divergence. It is not easy to motivate
divergence directly, however, so we shall approach it indirectly, through the
concept of flux as follows.
The flux of a vector field a(r) outward from a region in space is
I
Φ≡ a(r) · ds, (16.7)
S
where
ds ≡ n̂ · ds (16.8)
The outward flux Φ of a vector field a(r) through a closed surface bound-
ing some definite region in space is evidently
ZZ ZZ ZZ
Φ= ∆ax (y, z) dy dz + ∆ay (z, x) dz dx + ∆az (x, y) dx dy,
where
xmax (y,z)
∂ax
Z
∆ax (y, z) = dx,
xmin (y,z) ∂x
ymax (z,x)
∂ay
Z
∆ay (z, x) = dy,
ymin (z,x) ∂y
zmax (x,y)
∂az
Z
∆az (x, y) = dz
zmin (x,y) ∂z
∂ax
∆ax (y, z) = ∆x(y, z),
∂x
∂ay
∆ay (z, x) = ∆y(z, x),
∂y
∂az
∆az (x, y) = ∆z(x, y),
∂z
upon which
∂ax ∂ay
ZZ ZZ
Φ = ∆x(y, z) dy dz + ∆y(z, x) dz dx
∂x ∂y
∂az
ZZ
+ ∆z(x, y) dx dy.
∂z
But each of the last equation’s three integrals represents the region’s vol-
ume V , so
∂ax ∂ay ∂az
Φ = (V ) + + ;
∂x ∂y ∂z
8
Naturally, if the region’s boundary happens to be concave, then some lines might enter
and exit the region more than once, but this merely elaborates the limits of integration
along those lines. It changes the problem in no essential way.
446 CHAPTER 16. VECTOR CALCULUS
16.1.5 Curl
Curl is to divergence as the cross product is to the dot product. Curl is a
little trickier to visualize, though. It needs first the concept of circulation
as follows.
The circulation of a vector field a(r) about a closed contour in space is
I
Γ ≡ a(r) · dℓ, (16.11)
H
where, unlikeH the S of (16.7) which represented a double integration over a
surface, the here represents only a single integration. One can in general
contemplate circulation about any closed contour, but it suits our purpose
here to consider specifically a closed contour that happens not to depart
from a single, flat plane in space.
Let [û v̂ n̂] be an orthogonal basis with n̂ normal to the contour’s plane
such that travel positively along the contour tends from û toward v̂ rather
than the reverse. The circulation Γ of a vector field a(r) about this contour
is evidently Z Z
Γ= ∆av (v) dv − ∆au (u) du,
where
umax (v)
∂av
Z
∆av (v) = du,
umin (v) ∂u
vmax (u)
∂au
Z
∆au (u) = dv
vmin (u) ∂v
represent the increase across the contour’s interior respectively of av or au
along a û- or v̂-directed line. If the field has constant derivatives ∂a/∂i, or
16.1. FIELDS AND THEIR DERIVATIVES 447
the name directional curl, representing the intensity of circulation, the degree
of twist so to speak, about a specified axis. The cross product in (16.13),
∂ak
∇ × a(r) = ǫijk ı̂ , (16.14)
∂j
we call curl.
Curl (16.14) is an interesting quantity. Although it emerges from di-
rectional curl (16.13) and although we have developed directional curl with
respect to a contour in some specified plane, curl (16.14) itself turns out
to be altogether independent of any particular plane. We might have cho-
sen another plane and though n̂ would then be different the same (16.14)
would necessarily result. Directional curl, a scalar, is a property of the field
448 CHAPTER 16. VECTOR CALCULUS
and the plane. Curl, a vector, unexpectedly is a property of the field only.
Directional curl evidently cannot exceed curl in magnitude, but will equal
it when n̂ points in its direction, so it may be said that curl is the locally
greatest directional curl, oriented normally to the locally greatest directional
curl’s plane.
We have needed n̂ and (16.13) to motivate and develop the concept
(16.14) of curl. Once developed, however, the concept of curl stands on its
own, whereupon one can return to define directional curl more generally
than (16.13) has defined it. As in (16.4) here too any reference vector b or
vector field b(r) can serve to direct the curl, not only n̂. Hence,
∂ak ∂ak
b · [∇ × a(r)] = b · ǫijk ı̂ = ǫijk bi . (16.15)
∂j ∂j
This would be the actual definition of directional curl. Note however that
directional curl so defined is not a distinct kind of derivative but rather is
just curl, dot-multiplied by a reference vector.
Formally a cross product, curl is invariant under reorientation of axes.
An ordinary dot product, directional curl is likewise invariant.
∂ψ
b × ∇ψ = ǫijk ı̂bj ,
∂k (16.16)
∂ai ∂aj
b × ∇ × a = ̂bi − .
∂j ∂i
9
The author is unaware of a conventional name for these derivatives. The name cross-
directional seems as apt as any.
16.2. INTEGRAL FORMS 449
where the Levi-Civita identity that ǫmni ǫijk = ǫimn ǫijk = δmj δnk − δmk δnj
comes from Table 15.1.
If one subdivides a large volume into infinitesimal volume elements dv, then
the flux from a single volume element is
I
Φelement = a · ds.
Selement
Even a single volume element however can have two distinct kinds of surface
area: inner surface area shared with another element; and outer surface area
shared with no other element because it belongs to the surface of the larger,
overall volume. Interior elements naturally have only the former kind but
450 CHAPTER 16. VECTOR CALCULUS
boundary elements have both kinds of surface area, so one can elaborate the
last equation to read
Z Z
Φelement = a · ds + a · ds
Sinner Souter
H R R
for a single element, where Selement = Sinner + Souter . Adding all the ele-
ments together, we have that
X X Z X Z
Φelement = a · ds + a · ds;
elements elements Sinner elements Souter
but the inner sum is null because it includes each interior surface twice,
because each interior surface is shared by two elements such that ds2 = −ds1
(in other words, such that the one volume element’s ds on the surface the
two elements share points oppositely to the other volume element’s ds on
the same surface), so
X X Z I
Φelement = a · ds = a · ds.
elements elements Souter S
In this equation, the last integration is over the surface of the larger, overall
volume, which surface after all consists of nothing other than the several
boundary elements’ outer surface patches. Applying (16.9) to the equation’s
left side to express the flux Φelement from a single volume element yields
X I
∇ · a dv = a · ds.
elements S
That is, Z I
∇ · a dv = a · ds. (16.17)
V S
Equation (16.18) is Stokes’ theorem,12 ,13 neatly relating the directional curl
over a (possibly nonplanar) surface to the circulation about it. Like the di-
vergence theorem (16.17), Stokes’ theorem (16.18) serves to swap one vector
integral for another where such a maneuver is needed.
∂(ψω) ∂ψ ∂ω
∇(ψω) = ı̂ = ωı̂ + ψı̂ = ω∇ψ + ψ∇ω.
∂i ∂i ∂i
∇ × (a × b) = ∇ × (ǫijk ı̂aj bk )
∂(ǫijk ı̂aj bk )i ∂(aj bk )
= ǫmni m̂ = ǫmni ǫijk m̂
∂n ∂n
∂(aj bk )
= (δmj δnk − δmk δnj )m̂
∂n
∂(aj bk ) ∂(aj bk ) ∂(aj bi ) ∂(ai bj )
= ̂ − k̂ = ̂ − ̂
∂k ∂j ∂i ∂i
∂aj ∂bi ∂bj ∂ai
= ̂bi + ̂aj − ̂ai + ̂bj
∂i ∂i ∂i ∂i
= (b · ∇ + ∇ · b)a − (a · ∇ + ∇ · a)b.
16.3. SUMMARY OF DEFINITIONS AND IDENTITIES 453
Table 16.1: Definitions and identities of vector calculus (see also Table 15.2
on page 425).
∂ ∂
∇ ≡ ı̂ b · ∇ = bi
∂i ∂i
∂ψ ∂ψ
∇ψ = ı̂ b · ∇ψ = bi
∂i ∂i
∂ai ∂a ∂aj
∇·a = (b · ∇)a = bi = ̂bi
∂i ∂i ∂i
∂ak ∂ak
∇ × a = ǫijk ı̂ b·∇×a = ǫijk bi
∂j ∂j
∂ψ ∂ai ∂aj
b × ∇ψ = ǫijk ı̂bj b×∇×a = ̂bi −
∂k ∂j ∂i
Z Z I
Φ ≡ a · ds ∇ · a dv = a · ds
Z S Z V I S
Γ ≡ a · dℓ (∇ × a) · ds = a · dℓ
C S
∇ · (a + b) = ∇ · a + ∇ · b
∇ × (a + b) = ∇ × a + ∇ × b
∇(ψ + ω) = ∇ψ + ∇ω
∇(ψω) = ω∇ψ + ψ∇ω
∇ · (ψa) = a · ∇ψ + ψ∇ · a
∇ × (ψa) = ψ∇ × a − a × ∇ψ
∇(a · b) = (b · ∇ + b × ∇ × )a + (a · ∇ + a × ∇ × )b
∇ · (a × b) = b · ∇ × a − a · ∇ × b
∇ × (a × b) = (b · ∇ + ∇ · b)a − (a · ∇ + ∇ · a)b
∂2 ∂ 2 ai
∇2 ≡ ∇∇ · a = ̂
∂i2 ∂j ∂i
∂ 2ψ ∂ 2a ∂ 2 aj
∇2 ψ = ∇ · ∇ψ = 2 ∇2 a = = ̂ = ̂∇2 (̂ · a)
∂i ∂i2 ∂i2
∇ × ∇ψ = 0 ∇·∇×a = 0
∂ ∂ai ∂aj
∇ × ∇ × a = ̂ −
∂i ∂j ∂i
∇∇ · a = ∇2 a + ∇ × ∇ × a
454 CHAPTER 16. VECTOR CALCULUS
Combining the various second-order vector derivatives yields the useful iden-
tity that
∇∇ · a = ∇2 a + ∇ × ∇ × a. (16.21)
Table 16.1 summarizes.
The table includes two curious null identities,
∇ × ∇ψ = 0,
(16.22)
∇ · ∇ × a = 0.
In words, (16.22) states that gradients do not curl and curl does not diverge.
This is unexpected but is a direct consequence of the definitions of the
gradient, curl and divergence:
∂2ψ
∂ψ
∇ × ∇ψ = ∇ × ı̂ = ǫmni m̂ = 0;
∂i ∂n ∂i
∂ 2 ak
∂ak
∇ · ∇ × a = ∇ · ǫijk ı̂ = ǫijk = 0.
∂j ∂i ∂j
A field like ∇ψ that does not curl is called an irrotational field. A field like
∇×a that does not diverge is called a solenoidal, source-free or (prosaically)
divergenceless field.17
17
In the writer’s country, the United States, there has been a mistaken belief afoot
that, if two fields b1 (r) and b2 (r) had everywhere the same divergence and curl, then the
two fields could differ only by an additive constant. Even at least one widely distributed
textbook expresses this belief, naming it Helmholtz’s theorem; but it is not just the one
textbook, for the writer has heard it verbally from at least two engineers, unacquainted
with one other, who had earned Ph.D.s in different eras in different regions of the country.
So the belief must be correct, mustn’t it?
Well, maybe it is, but the writer remains unconvinced. Consider the admittedly con-
trived counterexample of b1 = x̂y + ŷx, b2 = 0.
456 CHAPTER 16. VECTOR CALCULUS
ω(ℓ) of the scalar distance ℓ along the contour, whereupon (4.25) applies—
for (16.23) never evaluates ψ(r) or ω(r) but along the contour. The same
naturally goes for the vector fields a(r) and b(r), which one can treat as
vector functions a(ℓ) and b(ℓ) of the scalar distance ℓ; so the second and
third lines of (16.23) are true, too, since one can write the second line in the
form
∂ ∂ψ ∂ai
ı̂ (ψai ) = ı̂ ai +ψ
∂ℓ ∂ℓ ∂ℓ
and the third line in the form
∂ ∂ai ∂bi
(ai bi ) = bi + ai ,
∂ℓ ∂ℓ ∂ℓ
each of which, according to the first line, is true separately for i = x, for
i = y and for i = z.
The truth of (16.23)’s last line is slightly less obvious. Nevertheless, one
can reorder factors to write the line as
∂ ∂a ∂b
(a × b) = ×b+a× ,
∂ℓ ∂ℓ ∂ℓ
the Levi-Civita form (§ 15.4.3) of which is
∂ ∂aj ∂bk
ǫijk ı̂ (aj bk ) = ǫijk ı̂ bk + aj .
∂ℓ ∂ℓ ∂ℓ
The Levi-Civita form is true separately for (i, j, k) = (x, y, z), for (i, j, k) =
(x, z, y), and so forth, so (16.23)’s last line as a whole is true, too, which
completes the proof of (16.23).
Table 16.2: The metric coefficients of the rectangular, cylindrical and spher-
ical coordinate systems.
d/di, notation which per § 15.4.2 stands for d/dx′ , d/dy ′ or d/dz ′ where the
coordinates x′ , y ′ and z ′ represent lengths. But among the cylindrical and
spherical coordinates are θ and φ, angles rather than lengths. Because one
cannot use an angle as though it were a length, the notation d/di cannot
stand for d/dθ or d/dφ and, thus, one cannot use the table in cylindrical or
spherical coordinates as the table stands.
We therefore want factors to convert the angles in question to lengths (or,
more generally, when special coordinate systems like the parabolic systems of
§ 15.7 come into play, to convert coordinates other than lengths to lengths).
Such factors are called metric coefficients and Table 16.2 lists them.20 The
use of the table is this: that for any metric coefficient hα a change dα in
its coordinate α sweeps out a length hα dα. For example, in cylindrical
coordinates hφ = ρ according to table, so a change dφ in the azimuthal
coordinate φ sweeps out a length ρ dφ—a fact we have already informally
observed as far back as § 7.4.1, which the table now formalizes.
In the table, incidentally, the metric coefficient hφ seems to have two
different values, one value in cylindrical coordinates and another in spherical.
The two are really the same value, though, since ρ = r sin θ per Table 3.4.
ds = α̂hβ hγ dβ dγ (16.24)
represents an area infinitesimal normal to α̂. For example, the area infinites-
imal on a spherical surface of radius r is ds = r̂hθ hφ dθ dφ = r̂r 2 sin θ dθ dφ.
Again in any orthogonal, right-handed, three-dimensional coordinate
system (α; β; γ), the product
dv = hα hβ hγ dα dβ dγ (16.25)
Section 16.10 will have more to say about vector infinitesimals in non-
rectangular coordinates.
The scalar fields aρ (r), ar (r), aθ (r) and aφ (r) in and of themselves do not
differ in nature from ax (r), ay (r), az (r), ψ(r) or any other scalar field. One
does tend to use them differently, though, because constant unit vectors x̂,
ŷ and ẑ exist to combine the scalar fields ax (r), ay (r), az (r) to compose the
vector field a(r) whereas no such constant unit vectors exist to combine the
scalar fields aρ (r), ar (r), aθ (r) and aφ (r). Of course there are the variable
unit vectors ρ̂(r), r̂(r), θ̂(r) and φ̂(r), but the practical and philosophical
differences between these and the constant unit vectors is greater than it
might seem. For instance, it is true that ρ̂ · φ̂ = 0, so long as what is meant
by this is that ρ̂(r) · φ̂(r) = 0. However, ρ̂(r1 ) · φ̂(r2 ) 6= 0, an algebraic error
fairly easy to commit. On the other hand, that x̂ · ŷ = 0 is always true.
(One might ask why such a subsection as this would appear in a section
on metric coefficients. The subsection is here because no obviously better
spot for it presents itself, but moreover because we shall need the under-
standing the subsection conveys to apply metric coefficients consistently and
correctly in § 16.9 to come.)
The symbols ı̂, ̂ and k̂ need no modification even when modified symbols
like ı̃, ̃ and k̃ are in use, since ı̂, ̂ and k̂ are taken to represent unit vectors
and [ı̂ ̂ k̂], a proper orthogonal basis irrespective of the coordinate system—
so long, naturally, as the coordinate system is an orthogonal, right-handed
coordinate system as are all the coordinate systems in this book.
The modified notation will find use in § 16.9.3.
∂ ρ̂ ∂
= (x̂ cos φ + ŷ sin φ) = −x̂ sin φ + ŷ cos φ = +φ̂.
∂φ ∂φ
∂ψ ∂ψ ∂ψ
∇ψ = ρ̂ + φ̂ + ẑ . (16.27)
∂ρ ρ ∂φ ∂z
462 CHAPTER 16. VECTOR CALCULUS
RECTANGULAR
∂ x̂ ∂ x̂ ∂ x̂
=0 =0 =0
∂x ∂y ∂z
∂ ŷ ∂ ŷ ∂ ŷ
=0 =0 =0
∂x ∂y ∂z
∂ẑ ∂ẑ ∂ẑ
=0 =0 =0
∂x ∂y ∂z
CYLINDRICAL
∂ ρ̂ ∂ ρ̂ ∂ ρ̂
=0 = +φ̂ =0
∂ρ ∂φ ∂z
∂ φ̂ ∂ φ̂ ∂ φ̂
=0 = −ρ̂ =0
∂ρ ∂φ ∂z
∂ẑ ∂ẑ ∂ẑ
=0 =0 =0
∂ρ ∂φ ∂z
SPHERICAL
∂r̂ ∂r̂ ∂r̂
=0 = +θ̂ = +φ̂ sin θ
∂r ∂θ ∂φ
∂ θ̂ ∂ θ̂ ∂ θ̂
=0 = −r̂ = +φ̂ cos θ
∂r ∂θ ∂φ
∂ φ̂ ∂ φ̂ ∂ φ̂
=0 =0 = −ρ̂ = −r̂ sin θ − θ̂ cos θ
∂r ∂θ ∂φ
16.9. DERIVATIVES IN THE NONRECTANGULAR SYSTEMS 463
∂a
(b · ∇)a = bi .
∂i
∂a ∂a ∂a
(b · ∇)a = bρ + bφ + bz . (16.28)
∂ρ ρ ∂φ ∂z
Here are three derivatives of three terms, each term of two factors. Evaluat-
ing the derivatives according to the contour derivative product rule (16.23)
yields (3)(3)(2) = 0x12 (eighteen) terms in the result. Half the 0x12 terms
involve derivatives of the basis vectors, which Table 16.3 computes. Some
of the 0x12 terms turn out to be null. The result is that
∂aρ ∂aφ ∂az
(b · ∇)a = bρ ρ̂ + φ̂ + ẑ
∂ρ ∂ρ ∂ρ
bφ ∂aρ ∂aφ ∂az
+ ρ̂ − aφ + φ̂ + aρ + ẑ
ρ ∂φ ∂φ ∂φ
∂aρ ∂aφ ∂az
+ bz ρ̂ + φ̂ + ẑ . (16.29)
∂z ∂z ∂z
To evaluate divergence and curl wants more care. It also wants a constant
basis to work in, whereas [x̂ ŷ ẑ] is awkward in a cylindrical geometry and
[ρ̂ φ̂ ẑ] is not constant. Fortunately, nothing prevents us from defining a
constant basis [ρ̂o φ̂o ẑ] such that [ρ̂ φ̂ ẑ] = [ρ̂o φ̂o ẑ] at the point r = ro at
which the derivative is evaluated. If this is done, then the basis [ρ̂o φ̂o ẑ] is
constant like [x̂ ŷ ẑ] but not awkward like it.
According to Table 16.1,
∂ai
∇·a=
∂i
464 CHAPTER 16. VECTOR CALCULUS
In cylindrical coordinates and the [ρ̂o φ̂o ẑ] basis, this is22
∂(ρ̂o · a) ∂(φ̂o · a) ∂(ẑ · a)
∇·a= + + .
∂ρ ρ ∂φ ∂z
Applying the contour derivative product rule (16.23),
∂a ∂ ρ̂o ∂a ∂ φ̂o ∂a ∂ẑ
∇ · a = ρ̂o · + · a + φ̂o · + · a + ẑ · + · a.
∂ρ ∂ρ ρ ∂φ ρ ∂φ ∂z ∂z
But [ρ̂o φ̂o z] are constant unit vectors, so
∂a ∂a ∂a
∇ · a = ρ̂o · + φ̂o · + ẑ · .
∂ρ ρ ∂φ ∂z
That is,
∂a ∂a ∂a
∇ · a = ρ̂ ·
+ φ̂ · + ẑ · .
∂ρ ρ ∂φ ∂z
Expanding the field in the cylindrical basis,
∂ ∂ ∂
∇ · a = ρ̂ · + φ̂ · + ẑ · ρ̂aρ + φ̂aφ + ẑaz .
∂ρ ρ ∂φ ∂z
As above, here again the expansion yields 0x12 terms. Fortunately, this time
most of the terms turn out to be null. The result is that
∂aρ aρ ∂aφ ∂az
∇·a= + + + ,
∂ρ ρ ρ ∂φ ∂z
or, expressed more cleverly in light of (4.28), that
∂(ρaρ ) ∂aφ ∂az
∇·a= + + . (16.30)
ρ ∂ρ ρ ∂φ ∂z
Again according to Table 16.1,
∂ak
∇ × a = ǫijk ı̂
∂j
" # " #
∂(ẑ · a) ∂(φ̂o · a) ∂(ρ̂o · a) ∂(ẑ · a)
= ρ̂o − + φ̂o −
ρ ∂φ ∂z ∂z ∂ρ
" #
∂(φ̂o · a) ∂(ρ̂o · a)
+ ẑ − .
∂ρ ρ ∂φ
22
Mistakenly to write here that
∂aρ ∂aφ ∂az
∇·a = + + ,
∂ρ ρ ∂φ ∂z
which is not true, would be a ghastly error, leading to any number of hard-to-detect false
conclusions. Refer to § 16.6.2.
16.9. DERIVATIVES IN THE NONRECTANGULAR SYSTEMS 465
That is,
∂a ∂a ∂a ∂a
∇ × a = ρ̂ ẑ · − φ̂ · + φ̂ ρ̂ · − ẑ ·
ρ ∂φ ∂z ∂z ∂ρ
∂a ∂a
+ ẑ φ̂ · − ρ̂ · .
∂ρ ρ ∂φ
Here the expansion yields 0x24 terms, but fortunately as last time this time
most of the terms again turn out to be null. The result is that
∂az ∂aφ ∂aρ ∂az ∂aφ aφ ∂aρ
∇ × a = ρ̂ − + φ̂ − + ẑ + − ,
ρ ∂φ ∂z ∂z ∂ρ ∂ρ ρ ρ ∂φ
∂ψ ∂ψ ∂ψ
∇ψ = ρ̂ + φ̂ + ẑ
∂ρ ρ ∂φ ∂z
∂a ∂a ∂a
(b · ∇)a = bρ + bφ + bz
∂ρ ρ ∂φ ∂z
∂aρ ∂aφ ∂az
= bρ ρ̂ + φ̂ + ẑ
∂ρ ∂ρ ∂ρ
bφ ∂aρ ∂aφ ∂az
+ ρ̂ − aφ + φ̂ + aρ + ẑ
ρ ∂φ ∂φ ∂φ
∂aρ ∂aφ ∂az
+ bz ρ̂ + φ̂ + ẑ
∂z ∂z ∂z
∂(ρaρ ) ∂aφ ∂az
∇·a = + +
ρ ∂ρ ρ ∂φ ∂z
∂az ∂aφ ∂aρ ∂az ẑ ∂(ρaφ ) ∂aρ
∇×a = ρ̂ − + φ̂ − + −
ρ ∂φ ∂z ∂z ∂ρ ρ ∂ρ ∂φ
less proper, less insightful, even more tedious and not recommended, be-
ing to take the Laplacian in rectangular coordinates and then to convert
back to the cylindrical domain; for to work cylindrical problems directly in
cylindrical coordinates is almost always advisable.)
Φ
∇ · a(r) ≡ lim , (16.37)
∆V →0 ∆V
∂ψ ∂ψ ∂ψ
∇ψ = r̂ + θ̂ + φ̂
∂r r ∂θ (r sin θ) ∂φ
∂a ∂a ∂a
(b · ∇)a = br + bθ + bφ
∂r r ∂θ (r sin θ) ∂φ
∂ar ∂aθ ∂aφ
= br r̂ + θ̂ + φ̂
∂r ∂r ∂r
bθ ∂ar ∂aθ ∂aφ
+ r̂ − aθ + θ̂ + ar + φ̂
r ∂θ ∂θ ∂θ
bφ ∂ar ∂aθ
+ r̂ − aφ sin θ + θ̂ − aφ cos θ
r sin θ ∂φ ∂φ
∂aφ
+ φ̂ + ar sin θ + aθ cos θ
∂φ
1 ∂(r 2 ar ) ∂(aθ sin θ)
∂aφ
∇·a = + +
r r ∂r (sin θ) ∂θ (sin θ) ∂φ
r̂ ∂(aφ sin θ) ∂aθ θ̂ ∂ar ∂(raφ )
∇×a = − + −
r sin θ ∂θ ∂φ r (sin θ) ∂φ ∂r
φ̂ ∂(raθ ) ∂ar
+ −
r ∂r ∂θ
470 CHAPTER 16. VECTOR CALCULUS
So long as the test volume ∆V includes the point r and is otherwise in-
finitesimal in extent, we remain free to shape the volume as we like,24 so let
us give it six sides and shape it as an almost rectangular box that conforms
precisely to the coordinate system (α; β; γ) in use:
∆α ∆α
α− ≤ α′ ≤ α + ;
2 2
∆β ∆β
β− ≤ β′ ≤ β + ;
2 2
∆γ ∆γ
γ− ≤ γ′ ≤ γ + .
2 2
The fluxes outward through the box’s +α- and −α-ward sides will then be25
Φα = Φ+α + Φ−α
h i
= aα hβ hγ |r′ =r(α+∆α/2;β;γ) − aα hβ hγ |r′ =r(α−∆α/2;β;γ) ∆β ∆γ
∂(aα hβ hγ ) ∂(aα hβ hγ )
= ∆α ∆β ∆γ = ∆α ∆β ∆γ
∂α ∂α
∆V ∂(aα hβ hγ )
= .
hα hβ hγ ∂α
Naturally, the same goes for the other two pairs of sides:
∆V ∂(aβ hγ hα )
Φβ = ;
hα hβ hγ ∂β
∆V ∂(aγ hα hβ )
Φγ = .
hα hβ hγ ∂γ
The three equations are better written
∆V ∂ h3 aα
Φα = ,
h3 ∂α hα
∆V ∂ h3 aβ
Φβ = 3 ,
h ∂β hβ
∆V ∂ h3 aγ
Φγ = 3 ,
h ∂γ hγ
where
h3 ≡ hα hβ hγ . (16.38)
The total flux from the test volume then is
Φ = Φα + Φβ + Φγ
∂ h3 aβ ∂ h3 aγ
3
∆V ∂ h aα
= + + ;
h3 ∂α hα ∂β hβ ∂γ hγ
or, invoking Einstein’s summation convention in § 16.7’s modified style,
∆V ∂ h3 aı̃
Φ= 3 .
h ∂ı̃ hı̃
Finally, substituting the last equation into (16.37),
3
∂ h aı̃
∇·a= 3 . (16.39)
h ∂ı̃ hı̃
472 CHAPTER 16. VECTOR CALCULUS
An analogous formula for curl is not much harder to derive but is harder
to approach directly, so we shall approach it by deriving first the formula
for γ̂-directed directional curl. Equation (16.12) has it that26
Γ
γ̂ · ∇ × a(r) ≡ lim , (16.40)
∆A→0 ∆A
where per (16.11) Γ = γ a(r′ ) · dℓ and the notation γ reminds us that the
H H
contour of integration lies in the α-β plane, perpendicular to γ̂. In this case
the contour of integration bounds not a test volume but a test surface, which
we give four edges and an almost rectangular shape that conforms precisely
to the coordinate system (α; β; γ) in use:
∆α ∆α
α− ≤ α′ ≤ α + ;
2 2
∆β ∆β
β− ≤ β′ ≤ β + ;
2 2
γ ′ = γ.
and likewise the circulations along the −β- and +β-ward edges will be
Likewise,
hα ∂(hγ aγ ) ∂(hβ aβ )
α̂ · ∇ × a = 3 − ,
h ∂β ∂γ
hβ ∂(hα aα ) ∂(hγ aγ )
β̂ · ∇ × a = 3 − .
h ∂γ ∂α
But one can split any vector v into locally rectangular components as v =
α̂(α̂ · v) + β̂(β̂ · v) + γ̂(γ̂ · v), so
consisting of a step in the γ̂ direction plus the corresponding steps in the or-
thogonal α̂ and β̂ directions. This is easy once you see how to do it. Harder
is the surface infinitesimal, but one can nevertheless correctly construct it
as the cross product
hγ ∂γ hγ ∂γ
ds = α̂hα + γ̂ dα × β̂hβ + γ̂ dβ
∂α ∂β
1 ∂γ ∂γ
= γ̂ − α̂ − β̂ h3 dα dβ (16.44)
hγ hα ∂α hβ ∂β
of two vectors that lie on the surface, one vector normal to β̂ and the other
to α̂, edges not of a rectangular patch of the surface but of a patch whose
projection onto the α-β plane is an (hα dα)-by-(hβ dβ) rectangle.
So, that’s it. Those are the essentials of the three-dimensional geomet-
rical vector—of its analysis and of its calculus. The geometrical vector of
Chs. 15 and 16 and the matrix of Chs. 11 through 14 have in common
that they represent well developed ways of marshaling several quantities
together to a common purpose: three quantities in the specialized case of
16.10. VECTOR INFINITESIMALS 475
477
Chapter 17
The book starts on its most advanced, most interesting mathematics from
here.
It might fairly be said that, among advanced mathematical techniques,
none is so useful, and few so appealing, as the one Lord Kelvin has acclaimed
“a great mathematical poem,”1 the Fourier transform, which this chapter
and the next will develop. This first of the two chapters will develop the
Fourier transform in its primitive guise as the Fourier series.
The Fourier series is an analog of the Taylor series of Ch. 8 but meant
for repeating waveforms, functions f (t) of which
1
[35, Ch. 17]
479
480 CHAPTER 17. THE FOURIER SERIES
f (t)
A
t
T1
Algebraically,
8A (2π)t 1 3(2π)t
f (t) = cos − cos
2π T1 3 T1
1 5(2π)t 1 7(2π)t
+ cos − cos + ··· . (17.2)
5 T1 7 T1
How faithfully (17.2) really represents the repeating waveform and why its
coefficients happen to be 1, − 31 , 15 , − 17 , . . . are among the questions this chap-
ter will try to answer; but, visually at least, it looks as though superimposing
sinusoids worked.
The chapter begins in preliminaries, starting with a discussion of Parse-
val’s principle.
∆ω T1 = 2π,
ℑ(∆ω) = 0,
ℑ(to ) = 0, (17.3)
ℑ(T1 ) = 0,
T1 6= 0,
17.1. PARSEVAL’S PRINCIPLE 481
f (t)
f (t)
f (t)
t
482 CHAPTER 17. THE FOURIER SERIES
The angular frequency ω will have dimensions of inverse time like radians
per second.
The applied mathematician should make himself aware, and thereafter
keep in mind, that the cycle per second and the radian per second do not dif-
fer dimensionally from one another. Both are technically units of [second]−1 ,
whereas the words “cycle” and “radian” in the contexts of the phrases “cy-
cle per second” and “radian per second” are verbal cues that, in and of
themselves, play no actual part in the mathematics. This is not because the
cycle and the radian were ephemeral but rather because the second is un-
fundamental. The second is an arbitrary unit of measure. The cycle and the
radian are definite, discrete, inherently countable things; and, where things
are counted, it is ultimately up to the mathematician to interpret the count
(consider for instance that nine baseball hats may imply nine baseball play-
ers and one baseball team, but that there is nothing in the number nine itself
to tell us so). To distinguish angular frequencies from cyclic frequencies, it
remains to the mathematician to lend factors of 2π where needed.
The word “frequency” without a qualifying adjective is usually taken to
mean cyclic frequency unless the surrounding context implies otherwise.
Frequencies exist in space as well as in time:
kλ = 2π. (17.8)
λ ω
v= = (17.9)
T k
relates periods and frequencies in space and time.
Now, we must admit that we fibbed when we said that T had to have
dimensions of time. Physically, that is the usual interpretation, but math-
ematically T (and T1 , t, to , τ , etc.) can bear any units and indeed are not
required to bear units at all, as § 17.1 has observed. The only mathematical
form “hertz” thus is singular as well as plural.
7
The wavenumber k is no integer, notwithstanding that the letter k tends to represent
integers in other contexts.
17.3. THE SQUARE AND TRIANGULAR PULSES 485
for the Dirac delta, both of which pulses evidently share Dirac’s property
that
Z ∞
1 τ − to
δ dτ = 1,
−∞ T T
Z ∞
1 τ − to
Π dτ = 1, (17.11)
−∞ T T
Z ∞
1 τ − to
Λ dτ = 1,
−∞ T T
Π(t)
t
− 21 1
2
Λ(t)
t
−1 1
∞
X
f (t) = aj eij ∆ω t (17.13)
j=−∞
to +T1 /2
1
Z
aj = e−ij ∆ω τ f (τ ) dτ. (17.14)
T1 to −T1 /2
Why does (17.14) work? How does it recover a Fourier coefficient aj ? The
answer is that it recovers a Fourier coefficient aj by isolating it, and that it
isolates it by shifting frequencies and integrating. It shifts the jth component
aj eij ∆ω t of (17.13)’s waveform—whose angular frequency is ω = j ∆ω—
down to a frequency of zero, incidentally shifting the waveform’s several
other components to various nonzero frequencies as well. Significantly, it
leaves each shifted frequency to be a whole multiple of the waveform’s funda-
mental frequency ∆ω. By Parseval’s principle (17.5), the integral of (17.14)
then kills all the thus frequency-shifted components except the zero-shifted
one by integrating the components over complete cycles, passing only the
zero-shifted component which, once shifted, has no cycle. Changing dummy
variables τ ← t and ℓ ← j in (17.13) and then substituting into (17.14)’s
right side the resulting expression for f (τ ), we have by successive steps that
to +T1 /2
1
Z
e−ij ∆ω τ f (τ ) dτ
T1 to −T1 /2
to +T1 /2 ∞
1
Z X
= e−ij ∆ω τ aℓ eiℓ ∆ω τ dτ
T1 to −T1 /2 ℓ=−∞
∞ Z to +T1 /2
1 X
= aℓ ei(ℓ−j) ∆ω τ dτ
T1 to −T1 /2
ℓ=−∞
Z to +T1 /2
aj
= ei(j−j) ∆ω τ dτ
T1 to −T1 /2
aj to +T1 /2
Z
= dτ = aj ,
T1 to −T1 /2
in which Parseval’s principle (17.5) has killed all but the ℓ = j term in the
summation. Thus is (17.14) formally proven.
488 CHAPTER 17. THE FOURIER SERIES
But
e−ij ∆ω τ = (−i)j ,
τ =T1 /4
so
" 3T1 /4 #
−ij ∆ω τ
T1 /4
e
−
−T1 /4 T1 /4
Therefore,
i2A
aj = (−i)j − ij
2πj
(
(j−1)/2 (17.15)
(−) 4A/2πj for odd j,
=
0 for even j
are the square wave’s Fourier coefficients which, when the coefficients are
applied to (17.13) and when (5.17) is invoked, indeed yield the specific series
of sinusoids (17.2) and Fig. 17.2 have proposed.
f (t)
A
t
T1 ηT1
where Π(·) is the square pulse of (17.10); the symbol A represents the pulse’s
full height rather than the half-height of Fig. 17.1; and the dimensionless
factor 0 ≤ η ≤ 1 is the train’s duty cycle, the fraction of each cycle its pulse
is as it were on duty. By the routine of § 17.4.2,
T1 /2
1
Z
aj = e−ij ∆ω τ f (τ ) dτ
T1 −T1 /2
ηT1 /2
A
Z
= e−ij ∆ω τ dτ
T1 −ηT1 /2
iA −ij ∆ω τ ηT1 /2
2A 2πηj
= e = sin
2πj
−ηT1 /2 2πj 2
T1 /2 ηT1 /2
1 A
Z Z
a0 = f (τ ) dτ = dτ = ηA,
T1 −T1 /2 T1 −ηT1 /2
490 CHAPTER 17. THE FOURIER SERIES
f (t)
t
T1
2A sin 2πηj if j 6= 0,
aj = 2πj 2 (17.17)
ηA if j = 0.
the same for every index j. As the duty cycle η tends to vanish the pulse
tends to disappear and the Fourier coefficients along with it; but we can com-
pensate for vanishing duty if we wish by increasing the pulse’s amplitude A
proportionally, maintaining the product
ηT1 A = 1 (17.19)
of the pulse’s width ηT1 and its height A, thus preserving unit area8 under
the pulse. In the limit η → 0+ , the pulse then by definition becomes the
Dirac delta of Fig. 7.10, and the pulse train by construction becomes the
Dirac delta pulse train of Fig. 17.5. Enforcing (17.19) on (17.18) yields the
8
In light of the discussion of time, space and frequency in § 17.2, we should clarify
that we do not here mean a physical area measurable in square meters or the like. We
merely mean the dimensionless product of the width (probably measured in units of time
like seconds) and the height (correspondingly probably measured in units of frequency like
inverse seconds) of the rectangle a single pulse encloses in Fig. 17.4. The term area is thus
overloaded.
17.4. EXPANDING WAVEFORMS IN FOURIER SERIES 491
1
aj = . (17.20)
T1
M
1 X (−ij ∆ω)(to +ℓ ∆τM )
aj = lim e f (to + ℓ ∆τM ) ∆τM ,
M →∞ T1
ℓ=−M
T1
∆τM ≡ ,
2M + 1
M ∞
∆τM X X
aj = lim e(−ij ∆ω)(to +ℓ ∆τM ) f (to + p ∆τM )Π(ℓ − p)
M →∞ T1
ℓ=−M p=−∞
M
∆τM X (−ij ∆ω)(to +ℓ ∆τM )
= lim e f (to + ℓ ∆τM ).
M →∞ T1
ℓ=−M
lim aM = lim CM fM ,
M →∞ M →∞
whereby
−1
lim fM = lim CM aM ,
M →∞ M →∞
−1 T1
[CM ]ℓj = e(+ij ∆ω)(to +ℓ ∆τM ) ,
(2M + 1) ∆τM
−1 M and thus per (13.2) also that C −1 C
such that11 CM CM = I−M M
M M = I−M .
So, the answer to our question is that, yes, CM is invertible.
Because CM is invertible, § 14.2 has it that neither fM nor aM can be
null unless both are. In the limit M → ∞, this implies that no continuous,
11 M
Equation (11.30) has defined the notation I−M , representing a (2M + 1)-dimensional
identity matrix whose string of ones extends along its main diagonal from j = ℓ = −M
through j = ℓ = M .
17.4. EXPANDING WAVEFORMS IN FOURIER SERIES 493
It is usually best, or at least neatest and cleanest, and moreover more evoca-
tive, to calculate Fourier coefficients and express Fourier series in terms of
complex exponentials as (17.13) and (17.14) do. Occasionally, though, when
the repeating waveform f (t) is real, one prefers to work in sines and cosines
rather than in complex exponentials. One writes (17.13) by Euler’s for-
mula (5.11) as
∞
X
f (t) = a0 + [(aj + a−j ) cos j ∆ω t + i(aj − a−j ) sin j ∆ω t] .
j=1
12
Chapter 8’s footnote 6 has argued in a similar style, earlier in the book.
13
Where this subsection’s conclusion cannot be made to apply is where unreasonable
waveforms like A sin[B/ sin ωt] come into play. We shall leave to the professional mathe-
matician the classification of such unreasonable waveforms, the investigation of the wave-
forms’ Fourier series and the provision of greater rigor generally.
494 CHAPTER 17. THE FOURIER SERIES
If the waveform happens to be real then f ∗ (t) = f (t), which in light of the
last equation and (17.14) implies that
2πηj
aj = ηA Sa , (17.25)
2
where
sin z
Sa z ≡ (17.26)
z
17.5. THE SINE-ARGUMENT FUNCTION 495
sin t
Sa t ≡
t
1
−2π − 2π
2
2π
2 2π
t
d cos z − Sa z
Sa z = . (17.28)
dz z
The function’s integral is expressed as a Taylor series after integrating the
function’s own Taylor series (17.27) term by term to obtain the form
∞ j
" #
z
z −z 2
Z X Y
Si z ≡ Sa τ dτ = , (17.29)
0 2j + 1 m=1 (2m)(2m + 1)
j=0
14
Many (including the author himself in other contexts) call it the sinc function, denot-
ing it sinc(·) and pronouncing it as “sink.” Unfortunately, some [48, § 4.3][13, § 2.2][18]
use the sinc(·) notation for another function,
2πz sin(2πz/2)
sincalternate z ≡ Sa = .
2 2πz/2
The unambiguous Sa(·) suits this particular book better, anyway, so this is the notation
we will use.
496 CHAPTER 17. THE FOURIER SERIES
Rt
Si t ≡ 0
Sa τ dτ
2π
4
− 2π 1
−2π 2
t
2π 2π
−1 2
− 2π
4
plotted in Fig. 17.7. Convention gives this integrated function its own name
and notation: it calls it the sine integral 15 ,16 and denotes it by Si(·).
• The sine-argument function is real over the real domain. That is, if
ℑ(t) = 0 then ℑ(Sa t) = 0.
• It is that |Sa t| < 1 over the real domain ℑ(t) = 0 except at the global
maximum t = 0, where
Sa 0 = 1. (17.30)
• Each of the sine-argument’s lobes has but a single peak. That is, over
the real domain ℑ(t) = 0, the derivative (d/dt) Sa t = 0 is zero at only
a single value of t on each lobe.
15
[43, § 3.3]
16
The name “sine-argument” incidentally seems to have been back-constructed from the
name “sine integral.”
17.5. THE SINE-ARGUMENT FUNCTION 497
lim Sa t = 0,
t→±∞
d (17.31)
lim Sa t = 0.
t→±∞ dt
tan t = t
nπ ≤ t < (n + 1)π, n ≥ 0,
• Over the real domain, ℑ(t) = 0, the sine integral Si t is positive for
positive t, negative for negative t and, of course, zero for t = 0.
• The local extrema of Si t over the real domain ℑ(t) = 0 occur at the
zeros of Sa t.
498 CHAPTER 17. THE FOURIER SERIES
f (t)
− 2π
2
t
2π
2
t
tan t
• The global maximum and minimum of Si t over the real domain ℑ(t) =
0 occur respectively at the first positive and negative zeros of Sa t,
which are t = ±π.
That the sine integral should reach its local extrema at the sine-argument’s
zeros ought to be obvious to the extent to which the concept of integration is
understood. To explain the other properties it helps first to have expressed
the sine integral in the form
Z t
Si t = Sn + Sa τ dτ,
nπ
n−1
X
Sn ≡ Uj ,
j=0
Z (j+1)π
Uj ≡ Sa τ dτ,
jπ
nπ ≤ t < (n + 1)π,
0 ≤ n, (j, n) ∈ Z,
17.5. THE SINE-ARGUMENT FUNCTION 499
where each partial integral Uj integrates over a single lobe of the sine-
argument. The several Uj alternate in sign but, because each lobe majorizes
the next (§ 8.10.2)—that is, because,17 in the integrand, |Sa τ | ≥ |Sa τ + π|
for all τ ≥ 0—the magnitude of the area under each lobe exceeds that under
the next, such that
Z t
j
0 ≤ (−) Sa τ dτ < (−)j Uj < (−)j−1 Uj−1 ,
jπ
jπ ≤ t < (j + 1)π,
0 ≤ j, j ∈ Z
(except that the Uj−1 term of the inequality does not apply when j = 0,
since there is no U−1 ) and thus that
0 = S0 < S2m < S2m+2 < S∞ < S2m+3 < S2m+1 < S1
for all m > 0, m ∈ Z.
The foregoing applies only when t ≥ 0 but naturally one can reason similarly
for t ≤ 0, concluding that the integral’s global maximum and minimum
over the real domain occur respectively at the sine-argument function’s first
positive and negative zeros, t = ±π; and further concluding that the integral
is positive for all positive t and negative for all negative t.
Equation (17.32) wants some cleverness to calculate and will be the sub-
ject of the next subsection.
e+iz − e−iz
Sa z = ,
i2z
rather than trying to integrate the sine-argument function all at once let us
first try to integrate just one of its two complex terms, leaving the other
17
More rigorously, to give the reason perfectly unambiguously, one could fuss here for a
third of a page or so over signs, edges and the like. To do so is left as an exercise to those
that aspire to the pure mathematics profession.
500 CHAPTER 17. THE FOURIER SERIES
ℑ(z)
I3
I4 I2
I6
I5 I1
ℜ(z)
only possible if
I3 = 0.
The integral up the contour’s right segment is
Z a i(a+iy) Z a ia −y
eiz dz e dy e e dy
Z
I2 = = lim = lim ,
C2 i2z a→∞ 0 2z a→∞ 0 2z
I4 = 0
when one truncates the Fourier series. At a discontinuity, the Fourier series
oscillates and overshoots.21
Henry Wilbraham investigated this phenomenon as early as 1848.
J. Willard Gibbs explored its engineering implications in 1899.22 Let us
along with them refer to the square wave of Fig. 17.2 on page 481. As
further Fourier components are added the Fourier waveform better approx-
imates the square wave, but, as we said, it oscillates and overshoots—it
“rings,” in the electrical engineer’s vernacular—about the square wave’s
discontinuities. This oscillation and overshot turn out to be irreducible, and
moreover they can have significant physical effects.
Changing t − T1 /4 ← t in (17.2) to delay the square wave by a quarter
cycle yields
∞
8A X 1 (2j + 1)(2π)t
f (t) = sin ,
2π 2j + 1 T1
j=0
Again changing
2(2π)t
∆v ←
T1
makes this
N −1
T1 4A X 1
f ∆v = lim Sa j + ∆v ∆v.
2(2π) N →∞ 2π 2
j=0
0 < ∆v ≪ 1,
f (t)
A
t
nately, by rolling the Fourier series off gradually rather than truncating it
exactly at N terms. Engineers may do one or the other, or both, explicitly or
implicitly, which is why the full Gibbs is not always observed in engineered
systems. Nature may do likewise. Neither however is the point. The point
is that sharp discontinuities do not behave in the manner one might naı̈vely
have expected, yet that one can still analyze them profitably, adapting this
section’s subtle technique as the circumstance might demand. A good en-
gineer or other applied mathematician will make himself aware of Gibbs’
phenomenon and of the mathematics behind it for this reason.
506 CHAPTER 17. THE FOURIER SERIES
Chapter 18
The Fourier series of Ch. 17 though useful applies solely to waveforms that
repeat. An effort to generalize the Fourier series to the broader domain of
nonrepeating waveforms leads to the Fourier transform, this chapter’s chief
subject.
[Chapter to be written.]
507
508 CHAPTER 18. THE FOURIER AND LAPLACE TRANSFORMS
Plan
19. Probability2
509
510 CHAPTER 18. THE FOURIER AND LAPLACE TRANSFORMS
27. Remarks
Chapters are likely yet to be inserted, removed, divided, combined and shuf-
fled, but that’s the planned outline at the moment.
The book means to stop short of hypergeometric functions, parabolic
cylinder functions, selective-dimensional (Weyl and Sommerfeld) Fourier
transforms, wavelets, and iterative techniques more advanced than the con-
jugate-gradient (the advanced iterative techniques being too active an area
of research for such a book as this yet to treat). The book may however add
a new appendix “Additional derivations” before the existing Appendix D to
prove a few obscure results.9
7
Chapter 24 would be pretty useless if it did not treat Legendre polynomials, so pre-
sumably it will do at least this.
8
The author has not yet decided how to apportion the treatment of the wave equation
in spherical geometries between Chs. 22, 23 and 24.
9
Additional derivations in the new appendix might include those of distributed resis-
tance, of resistance to ground and of the central angle in a methane molecule or other
tetrahedron.
Appendices
511
Appendix A
513
514 APPENDIX A. HEXADECIMAL NOTATION, ET AL.
e2 + π 2 = O 2
a2 + b2 = c2
for the Pythagorean theorem, you would not find that person’s version so easy to read,
would you? Mathematically, maybe it doesn’t matter, but the choice of letters is not a
matter of arbitrary utility only but also of convention, tradition and style: one of the early
517
518 APPENDIX B. THE GREEK ALPHABET
ROMAN
Aa Aa Gg Gg Mm Mm Tt Tt
Bb Bb Hh Hh Nn Nn Uu Uu
Cc Cc Ii Ii Oo Oo Vv Vv
Dd Dd Jj Jj Pp Pp Ww Ww
Ee Ee Kk Kk Qq Qq Xx Xx
Ff Ff Lℓ Ll Rr Rr Yy Yy
Ss Ss Zz Zz
GREEK
Aα alpha Hη eta Nν nu Tτ tau
Bβ beta Θθ theta Ξξ xi Υυ upsilon
Γγ gamma Iι iota Oo omicron Φφ phi
∆δ delta Kκ kappa Ππ pi Xχ chi
Eǫ epsilon Λλ lambda Pρ rho Ψψ psi
Zζ zeta Mµ mu Σσ sigma Ωω omega
α2 + β 2 = γ 2
a2 + b2 = c2 ,
the Greek: αβγ; δǫ; κλµνξ; ρστ ; φψωθχ (the letters ζηθξ can also form a
set, but these oftener serve individually). Greek letters are frequently paired
with their Roman congeners as appropriate: aα; bβ; cgγ; dδ; eǫ; f φ; kκ; ℓλ;
mµ; nν; pπ; rρ; sσ; tτ ; xχ; zζ.2
Some applications (even in this book) group the letters slightly differ-
ently, but most tend to group them approximately as shown. Even in West-
ern languages other than English, mathematical convention seems to group
the letters in about the same way.
Naturally, you can use whatever symbols you like in your own private
papers, whether the symbols are letters or not; but other people will find
your work easier to read if you respect the convention when you can. It is a
special stylistic error to let the Roman letters ijkℓmn, which typically rep-
resent integers, follow the Roman letters abcdef gh directly when identifying
mathematical quantities. The Roman letter i follows the Roman letter h
only in the alphabet, almost never in mathematics. If necessary, p can fol-
low h (curiously, p can alternatively follow n, but never did convention claim
to be logical).
Mathematicians usually avoid letters like the Greek capital H (eta),
which looks just like the Roman capital H, even though H (eta) is an entirely
proper member of the Greek alphabet. The Greek minuscule υ (upsilon) is
avoided for like reason, for mathematical symbols are useful only insofar as
we can visually tell them apart.3
2
The capital pair Y Υ is occasionally seen but is awkward both because the Greek
minuscule υ is visually almost indistinguishable from the unrelated (or distantly related)
Roman minuscule v; and because the ancient Romans regarded the letter Y not as a
congener but as the Greek letter itself, seldom used but to spell Greek words in the
Roman alphabet. To use Y and Υ as separate symbols is to display an indifference to,
easily misinterpreted as an ignorance of, the Graeco-Roman sense of the thing—which is
silly, really, if you think about it, since no one objects when you differentiate j from i, or u
and w from v—but, anyway, one is probably the wiser to tend to limit the mathematical
use of the symbol Υ to the very few instances in which established convention decrees it.
(In English particularly, there is also an old typographical ambiguity between Y and a
Germanic, non-Roman letter named “thorn” that has practically vanished from English
today, to the point that the typeface in which you are reading these words lacks a glyph
for it—but which sufficiently literate writers are still expected to recognize on sight. This
is one more reason to tend to avoid Υ when you can, a Greek letter that makes you look
ignorant when you use it wrong and pretentious when you use it right. You can’t win.)
The history of the alphabets is extremely interesting. Unfortunately, a footnote in an
appendix to a book on derivations of applied mathematics is probably not the right place
for an essay on the topic, so we’ll let the matter rest there.
3
No citation supports this appendix, whose contents (besides the Roman and Greek
alphabets themselves, which are what they are) are inferred from the author’s subjective
observation of seemingly consistent practice in English-language applied mathematical
520 APPENDIX B. THE GREEK ALPHABET
publishing, plus some German and a little French, dating back to the 1930s. From the
thousands of readers of drafts of the book, the author has yet to receive a single seri-
ous suggestion that the appendix were wrong—a lack that constitutes a sort of negative
(though admittedly unverifiable) citation if you like. If a mathematical style guide exists
that formally studies the letter convention’s origins, the author is unaware of it (and if a
graduate student in history or the languages who, for some reason, happens to be reading
these words seeks an esoteric, maybe untouched topic on which to write a master’s thesis,
why, there is one).
Appendix C
At least three of the various disciplines of pure mathematics stand out for
their pedagogical intricacy and the theoretical depth of their core results.
The first of the three is number theory which, except for the simple results
of § 6.1, scientists and engineers tend to get by largely without. The second
is matrix theory (Chs. 11 through 14), a bruiser of a discipline the applied
mathematician of the computer age—try though he might—can hardly es-
cape. The third is the pure theory of the complex variable.
The introduction’s § 1.3 admires the beauty of the pure theory of the
complex variable even while admitting that “its arc takes off too late and
flies too far from applications for such a book as this.” To develop the
pure theory properly is a worthy book-length endeavor of its own requiring
relatively advanced preparation on its reader’s part which, however, the
reader who has reached the end of the present book’s Ch. 9 possesses. If the
writer doubts the strictly applied necessity of the pure theory, still, he does
not doubt its health to one’s overall mathematical formation. It provides
another way to think about complex numbers. Scientists and engineers
with advanced credentials occasionally expect one to be acquainted with it
for technical-social reasons, regardless of its practical use. Besides, the pure
theory is interesting. This alone recommends some attention to it.
The pivotal result of pure complex-variable theory is the Taylor series
by Cauchy’s impressed residue theorem. If we will let these few pages of
appendix replace an entire book on the pure theory, then Cauchy’s and
521
522 APPENDIX C. A SKETCH OF PURE COMPLEX THEORY
Taylor’s are the results we shall sketch. (The bibliography lists presentations
far more complete; this presentation, as advertised, is just a sketch. The
reader who has reached the end of the Ch. 9 will understand already why the
presentation is strictly optional, interesting maybe but deemed unnecessary
to the book’s applied mathematical development.)
Cauchy’s impressed residue theorem 1 is that
1 f (w)
I
f (z) = dw (C.1)
i2π w−z
if z lies within the closed complex contour about which the integral is taken
and if f (z) is everywhere analytic (§ 8.4) within and along the contour. More
than one proof of the theorem is known, depending on the assumptions from
which the mathematician prefers to start, but this writer is partial to an
instructively clever proof he has learned from D.N. Arnold2 which goes as
follows. Consider the function
1 f [z + (t)(w − z)]
I
g(z, t) ≡ dw,
i2π w−z
∂g
= 0,
∂t
meaning that g(z, t) does not vary with t. Observing per (8.26) that
1 dw
I
= 1,
i2π w−z
we have that
f (z) dw 1 f (w)
I I
f (z) = = g(z, 0) = g(z, 1) = dw
i2π w−z i2π w−z
4
The professionals minimalistically actually require only that the function be once
differentiable under certain conditions, from which they prove infinite differentiability,
but this is a fine point which will not concern us here.
5
[31, § 10.7]
524 APPENDIX C. A SKETCH OF PURE COMPLEX THEORY
that
1 f (w)
I
f (z) = dw
i2π (w − zo ) − (z − zo )
1 f (w)
I
= dw
i2π (w − zo )[1 − (z − zo )/(w − zo )]
∞
f (w) X z − zo k
1
I
= dw
i2π w − zo w − zo
k=0
∞
1 f (w)
X I
k
= dw (z − zo ) ,
i2π (w − zo )k+1
k=0
dk f
k! f (w)
I
k
= dw, (C.3)
dz z=zo
i2π (w − zo )k+1
Manuscript history
The book in its present form is based on various unpublished drafts and notes
of mine, plus some of my wife Kristie’s (née Hancock), going back to 1983
when I was fifteen years of age. What prompted the contest I can no longer
remember, but the notes began one day when I challenged a high-school
classmate to prove the quadratic formula. The classmate responded that
he didn’t need to prove the quadratic formula because the proof was in the
class math textbook, then counterchallenged me to prove the Pythagorean
theorem. Admittedly obnoxious (I was fifteen, after all) but not to be out-
done, I whipped out a pencil and paper on the spot and started working.
But I found that I could not prove the theorem that day.
The next day I did find a proof in the school library,1 writing it down,
adding to it the proof of the quadratic formula plus a rather inefficient proof
of my own invention to the law of cosines. Soon thereafter the school’s chem-
istry instructor happened to mention that the angle between the tetrahe-
drally arranged four carbon-hydrogen bonds in a methane molecule was 109◦ ,
so from a symmetry argument I proved that result to myself, too, adding it
to my little collection of proofs. That is how it started.2
The book actually has earlier roots than these. In 1979, when I was
twelve years old, my father bought our family’s first eight-bit computer.
The computer’s built-in BASIC programming-language interpreter exposed
1
A better proof is found in § 2.10.
2
Fellow gear-heads who lived through that era at about the same age might want to
date me against the disappearance of the slide rule. Answer: in my country, or at least
at my high school, I was three years too young to use a slide rule. The kids born in 1964
learned the slide rule; those born in 1965 did not. I wasn’t born till 1967, so for better
or for worse I always had a pocket calculator in high school. My family had an eight-bit
computer at home, too, as we shall see.
525
526 APPENDIX D. MANUSCRIPT HISTORY
functions for calculating sines and cosines of angles. The interpreter’s man-
ual included a diagram much like Fig. 3.1 showing what sines and cosines
were, but it never explained how the computer went about calculating such
quantities. This bothered me at the time. Many hours with a pencil I spent
trying to figure it out, yet the computer’s trigonometric functions remained
mysterious to me. When later in high school I learned of the use of the Tay-
lor series to calculate trigonometrics, into my growing collection of proofs
the series went.
Five years after the Pythagorean incident I was serving the U.S. Army as
an enlisted troop in the former West Germany. Although those were the last
days of the Cold War, there was no shooting war at the time, so the duty
was peacetime duty. My duty was in military signal intelligence, frequently
in the middle of the German night when there often wasn’t much to do.
The platoon sergeant wisely condoned neither novels nor cards on duty, but
he did let the troops read the newspaper after midnight when things were
quiet enough. Sometimes I used the time to study my German—the platoon
sergeant allowed this, too—but I owned a copy of Richard P. Feynman’s
Lectures on Physics [19] which I would sometimes read instead.
Late one night the battalion commander, a lieutenant colonel and West
Point graduate, inspected my platoon’s duty post by surprise. A lieutenant
colonel was a highly uncommon apparition at that hour of a quiet night, so
when that old man appeared suddenly with the sergeant major, the company
commander and the first sergeant in tow—the last two just routed from
their sleep, perhaps—surprise indeed it was. The colonel may possibly have
caught some of my unlucky fellows playing cards that night—I am not sure—
but me, he caught with my boots unpolished, reading the Lectures.
I snapped to attention. The colonel took a long look at my boots without
saying anything, as stormclouds gathered on the first sergeant’s brow at his
left shoulder, then asked me what I had been reading.
“Feynman’s Lectures on Physics, sir.”
“Why?”
“I am going to attend the university when my three-year enlistment is
up, sir.”
“I see.” Maybe the old man was thinking that I would do better as a
scientist than as a soldier? Maybe he was remembering when he had had
to read some of the Lectures himself at West Point. Or maybe it was just
the singularity of the sight in the man’s eyes, as though he were a medieval
knight at bivouac who had caught one of the peasant levies, thought to be
illiterate, reading Cicero in the original Latin. The truth of this, we shall
never know. What the old man actually said was, “Good work, son. Keep
527
it up.”
The stormclouds dissipated from the first sergeant’s face. No one ever
said anything to me about my boots (in fact as far as I remember, the first
sergeant—who saw me seldom in any case—never spoke to me again). The
platoon sergeant thereafter explicitly permitted me to read the Lectures on
duty after midnight on nights when there was nothing else to do, so in the
last several months of my military service I did read a number of them. It
is fair to say that I also kept my boots better polished.
In Volume I, Chapter 6, of the Lectures there is a lovely introduction to
probability theory. It discusses the classic problem of the “random walk” in
some detail, then states without proof that the generalization of the random
walk leads to the Gaussian distribution
exp(−x2 /2σ 2 )
p(x) = √ .
σ 2π
For the derivation of this remarkable theorem, I scanned the book in vain.
One had no Internet access in those days, but besides a well equipped gym
the Army post also had a tiny library, and in one yellowed volume in the
library—who√ knows how such a book got there?—I did find a derivation
of the 1/σ 2π factor.3 The exponential factor, the volume did not derive.
Several days later, I chanced to find myself in Munich with an hour or two
to spare, which I spent in the university library seeking the missing part
of the proof, but lack of time and unfamiliarity with such a German site
defeated me. Back at the Army post, I had to sweat the proof out on my
own over the ensuing weeks. Nevertheless, eventually I did obtain a proof
which made sense to me. Writing the proof down carefully, I pulled the old
high-school math notes out of my military footlocker (for some reason I had
kept the notes and even brought them to Germany), dusted them off, and
added to them the new Gaussian proof.
That is how it has gone. To the old notes, I have added new proofs
from time to time, and although somehow I have misplaced the original
high-school leaves I took to Germany with me the notes have nevertheless
grown with the passing years. These years have brought me the good things
years can bring: marriage, family and career; a good life gratefully lived,
details of which interest me and mine but are mostly unremarkable as seen
from the outside. A life however can take strange turns, reprising earlier
themes. I had become an industrial building construction engineer for a
living (and, appropriately enough, had most lately added to the notes a
3
The citation is now unfortunately long lost.
528 APPENDIX D. MANUSCRIPT HISTORY
4
Weisstein lists results encyclopedically, alphabetically by name. I organize results
more traditionally by topic, leaving alphabetization to the book’s index, that readers who
wish to do so can coherently read the book from front to back.
There is an ironic personal story in this. As children in the 1970s, my brother and
I had a 1959 World Book encyclopedia in our bedroom, about twenty volumes. It was
then a bit outdated (in fact the world had changed tremendously in the fifteen or twenty
years following 1959, so the encyclopedia was more than a bit outdated) but the two of
us still used it sometimes. Only years later did I learn that my father, who in 1959 was
fourteen years old, had bought the encyclopedia with money he had earned delivering
newspapers daily before dawn, and then had read the entire encyclopedia, front to back.
My father played linebacker on the football team and worked a job after school, too, so
where he found the time or the inclination to read an entire encyclopedia, I’ll never know.
Nonetheless, it does prove that even an encyclopedia can be read from front to back.
529
spot, why, whip this book out and turn to § 2.10. That should confound
’em.
THB
Bibliography
[2] George B. Arfken and Hans J. Weber. Mathematical Methods for Physi-
cists. Academic Press, Burlington, Mass., 6th edition, 2005.
[5] Christopher Beattie, John Rossi, and Robert C. Rogers. Notes on Ma-
trix Theory. Unpublished, Department of Mathematics, Virginia Poly-
technic Institute and State University, Blacksburg, Va., 6 Dec. 2001.
531
532 BIBLIOGRAPHY
[16] The Debian Project. Debian Free Software Guidelines, version 1.1.
http://www.debian.org/social_contract#guidelines.
[22] The Free Software Foundation. GNU General Public License, version 2.
/usr/share/common-licenses/GPL-2 on a Debian system. The De-
bian Project: http://www.debian.org/. The Free Software Founda-
tion: 51 Franklin St., Fifth Floor, Boston, Mass. 02110-1301, USA.
[24] Edward Gibbon. The History of the Decline and Fall of the Roman
Empire. 1788.
BIBLIOGRAPHY 533
[25] J.W. Gibbs. Fourier series. Nature, 59:606, 1899. Referenced indirectly
by way of [67, “Gibbs Phenomenon,” 06:12, 13 Dec. 2008], this letter
of Gibbs completes the idea of the same author’s paper on p. 200 of the
same volume.
[26] William Goldman. The Princess Bride. Ballantine, New York, 1973.
[36] Eric M. Jones and Paul Fjeld. Gimbal angles, gimbal lock, and
a fourth gimbal for Christmas. http://www.hq.nasa.gov/alsj/
gimbals.html. As retrieved 23 May 2008.
534 BIBLIOGRAPHY
[48] Charles L. Phillips and John M. Parr. Signals, Systems and Transforms.
Prentice-Hall, Englewood Cliffs, N.J., 1995.
BIBLIOGRAPHY 535
[49] W.J. Pierson, Jr., and L. Moskowitz. A proposed spectral form for fully
developed wind seas based on the similarity theory of S.A. Kitaigorod-
skii. J. Geophys. Res., 69:5181–90, 1964.
[55] Al Shenk. Calculus and Analytic Geometry. Scott, Foresman & Co.,
Glenview, Ill., 3rd edition, 1984.
[56] William L. Shirer. The Rise and Fall of the Third Reich. Simon &
Schuster, New York, 1960.
[58] Susan Stepney. Euclid’s proof that there are an infinite num-
ber of primes. http://www-users.cs.york.ac.uk/susan/cyc/p/
primeprf.htm. As retrieved 28 April 2006.
[59] James Stewart, Lothar Redlin, and Saleem Watson. Precalculus: Math-
ematics for Calculus. Brooks/Cole, Pacific Grove, Calif., 3rd edition,
1993.
[62] Eric de Sturler. Lecture, Virginia Polytechnic Institute and State Uni-
versity, Blacksburg, Va., 2007.
[63] Henk A. van der Vorst. Iterative Krylov Methods for Large Linear
Systems. Number 13 in Monographs on Applied and Computational
Mathematics. Cambridge University Press, Cambridge, 2003.
537
538 INDEX
altitude, 40 arithmetic, 9
AMD, 295 exact, 295, 310, 320, 338, 374, 382
amortization, 202 of matrices, 243
amplitude, 49, 401 arithmetic mean, 124
analytic continuation, 163 arithmetic series, 17
analytic function, 46, 163 arm, 100
angle, 37, 47, 57, 407, 409, 457 articles “a” and “the”, 337
double, 57 artillerist, 334
half, 57 assignment, 12
hour, 57 associativity, 9
interior, 37 nonassociativity of the cross prod-
of a polygon, 38 uct, 409
of a triangle, 37 of matrix multiplication, 243
of rotation, 55 of unary linear operators, 442
right, 47 automobile, 410
square, 47 average, 123
sum of, 37 axes, 53
angular frequency, 483 changing, 53
antelope, 438 invariance under the reorientation
antenna of, 407, 408, 439, 454, 455
parabolic, 429 reorientation of, 404
satellite dish, 429 rotation of, 53, 401
antiderivative, 133, 197 axiom, 2
and the natural logarithm, 198 axis, 64
guessing, 201 of a cylinder or circle, 412
of a product of exponentials, powers axle, 414
and logarithms, 221 azimuth, 334, 412, 413
antiquity, 223
applied mathematics, 1, 148, 158 barometric pressure, 444
foundations of, 239 baroquity, 233
approximation to first order, 177 baseball, 484
arc, 47 basis, 383
arccosine, 47 complete, 384
derivative of, 111 constant, 411, 463
arcsine, 47 converting to and from, 384
derivative of, 111 cylindrical, 412
arctangent, 47 secondary cylindrical, 413
derivative of, 111 spherical, 413
area, 9, 38, 141, 458, 490 variable, 411
enclosed by a contour, 446 basis, orthogonal, 410
surface, 141 battle, 334
arg, 43 battlefield, 334
Argand domain and range planes, 164 belonging, 17
Argand plane, 42 binomial theorem, 74
Argand, Jean-Robert (1768–1822), 42 bisection, 429
INDEX 539
wave
complex, 111
propagating, 111
square, 479, 488
waveform
approximation of, 479
discontinuous, 502
real, 494
repeating, 479
wavy sea, 412
Wednesday, 439
week
days of, 439
Weierstrass, Karl Wilhelm Theodor (1815–
1897), 157
weighted sum, 286
west, 47, 334
Wilbraham, Henry (1825–1883), 502
wind, 410, 437
winding road, 410
wooden block, 72
worker, 339
world, physical, 401
yaw, 404
zero, 9, 40
dividing by, 70
matrix, 250
padding a matrix with, 248