An Introduction To Proofs and The Mathematical Vernacular

Download as pdf or txt
Download as pdf or txt
You are on page 1of 147

1

An Introduction to Proofs and the Mathematical Vernacular

Martin V. Day
Department of Mathematics
Virginia Tech
Blacksburg, Virginia 24061
http://www.math.vt.edu/people/day/ProofsBook

Dedicated to the memory of my mother:


Coralyn S. Day, November 6, 1922 – May 13, 2008.

December 7, 2016

1
This work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United
States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/us/ or
send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Contents

Preface: To the Student iii

1 Some Specimen Proofs 1


A Inequalities and Square Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
A.1 Absolute Value and the Triangle Inequality . . . . . . . . . . . . . . . . . . . . . . . . 1
A.2 Square Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
A.3 Another Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
B Some Spoofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
C Proofs from Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
C.1 Pythagoras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
C.2 The Curry Triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
D From Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
E Irrational Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
F Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
F.1 Simple Summation Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
F.2 Properties of Factorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Mathematical Language and Some Basic Proof Structures 24


A Basic Logical Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
A.1 Compounding Propositions: Not, And, Or . . . . . . . . . . . . . . . . . . . . . . . . . 25
A.2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
A.3 Negations of Or and And . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
B Variables and Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
B.1 The Scope of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
B.2 Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
B.3 Subtleties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
B.4 Negating Quantified Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
C Some Basic Types of Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
C.1 Elementary Propositions and “And” . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
C.2 “Or” Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
C.3 Implications and “For all . . . ” Propositions . . . . . . . . . . . . . . . . . . . . . . . . 39
C.4 Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
C.5 Existence and Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
C.6 Contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
C.7 Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
D Some Advice for Writing Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
E Perspective: Proofs and Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3 Sets and Functions 54


A Notation and Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
B Basic Operations and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
C Product Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
D The Power Set of a Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

i
E Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
F Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
G Cardinality of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
G.1 Finite Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
G.2 Countable and Uncountable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
G.3 The Schroeder-Bernstein Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
H Perspective: The Strange World at the Foundations of Mathematics . . . . . . . . . . . . . . 71
H.1 The Continuum Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
H.2 Russell’s Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4 The Integers 73
A Properties of the Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
A.1 Algebraic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
A.2 Properties of Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
A.3 Comparison with Other Number Systems . . . . . . . . . . . . . . . . . . . . . . . . . 75
A.4 Further Properties of the Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
A.5 A Closer Look at the Well-Ordering Principle . . . . . . . . . . . . . . . . . . . . . . . 78
B Greatest Common Divisors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
B.1 The Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
C Primes and the Fundamental Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
D The Integers Mod m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
E Axioms and Beyond: Gödel Crashes the Party . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5 Polynomials 93
A Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
B Q[x] and the Rational Root Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
C R[x] and Descartes’ Rule of Signs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
D C[z] and The Fundamental Theorem of Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 103
D.1 Some Properties of the Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 103
D.2 The Fundamental Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6 Determinants and Linear Algebra in Rn 110


A Permutations: σ ∈ Sn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
B The Sign of a Permutation: sgn(σ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
C Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
D Cofactors and Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
E Linear Independence and Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
F The Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Appendix: Mathematical Words 136

Appendix: The Greek Alphabet and Other Notation 138

Bibliography 139

Index 141

ii
Preface: To the Student

In the standard year (or two) of university calculus and differential equations courses you have learned a lot
of mathematical techniques for solving various types of problems. Along the way you were offered “proofs”
of many of the fundamental relationships and formulas (stated as “theorems”). Perhaps occasionally you
were asked to “show” or “prove” something yourself as a homework problem. For the most part, however,
you probably viewed the proofs as something to be endured in the lectures and skimmed over in the book.
The main emphasis of those courses was on learning how to use the techniques of calculus, and the proofs
may not have seemed very helpful for that.
Historically, techniques of calculation were the principal concern of mathematics. But as those techniques
became more complex, the concepts behind them became increasingly important. You are now at the stage
of your mathematical education where the focus of your studies shifts from techniques to ideas. The goal of
this book is to help you make the transition from being a mere user of mathematics to becoming conversant
in the language of mathematical discussion. This means learning to critically read and evaluate mathe-
matical statements and being able to write mathematical explanations in clear, logically precise language.
We will focus especially on mathematical proofs, which are nothing but carefully prepared expressions of
mathematical reasoning.
By focusing on how proofs work and how they are expressed we will be learning to think about math-
ematics as mathematicians do. This means learning the language and notation (symbols) which we use to
express our reasoning precisely. But writing a proof is always preceded by finding the logical argument that
the proof expresses, and that may involve some exploration and experimentation, trying various ideas, and
being creative. We will do some practicing with mathematics that is familiar to you, but it is important to
practice with material that you don’t already know as well, so that you can really have a try at the creative
exploration part of writing a proof. For that purpose I have tried to include some topics that you haven’t
seen before (and may not see elsewhere in the usual undergraduate curriculum). On the other hand I don’t
want this course to be divorced from what you have learned in your previous courses, so I also have tried
to include some problems that ask you to use your calculus background. Of course the book includes many
proofs which are meant to serve as examples as you learn to write your own proofs. But there are also some
which are more involved than anything you will be asked to do. Don’t tune these out — learning to read a
more complicated argument written by someone else is also a goal of this course.
Some consider mathematics to be a dry, cold (even painful) subject. It certainly is very difficult in places.
But it can also be exciting when we see ideas come together in unexpected ways, and see the creative ways
that great minds have exploited unseen connections between topics. To that end I have included a few
examples of really clever proofs of famous theorems. It is somewhat remarkable that a subject with such
high and objective standards of logical correctness should at the same time provide such opportunity for the
expression of playful genius. This is what attracts many people to the study of mathematics, particularly
those of us who have made it our life’s work. I hope this book communicates at least a little of that excitement
to you.

iii
Observe that theorems, lemmas, and propositions are numbered consecutively within chapters. For in-
stance in Chapter 4 we find Theorem 4.6 followed by Lemma 4.9. Many theorems also have names (e.g. “The
Division Theorem”). You can locate them using the index. Definitions are not numbered, but are listed in
the index by topic. Problems are numbered within chapters similarly; for instance you will find Problem 2.8
in Chapter 2. The end of each problem statement is marked by a dotted line followed by a cryptic word in
a box, like this:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . whatever

The word in the box is of no significance to you; it is just a reminder for me of the computer file which
contains the text of the problem.
References are marked by a number in brackets like this: [9]. You can find the full reference for [9] in the
Bibliography at the back of the book. (This is the customary format for citing references in mathematics.)
Sometimes a page number or other information is included after the number to help you find the relevant
place in the reference, like [9, page 99].
It is unlikely that every problem in the book will be assigned. If a problem you are assigned refers to
another which was not assigned, you are not expected to work the unassigned problem, but can just take its
statement for granted and use it in your solution of the assigned problem. For instance, if you were assigned
Problem 4.16 (which asks you to use Problem 4.15), you are free to use the fact stated in Problem 4.15
whether or not it was assigned.
I want to thank the students of Math 3034, Spring 2008, who had to deal with the rather rough first
version of this book. A special thanks to those who took the time to give me constructive criticism and
suggestions: Aron Chun, Chelsey Cooper, Justin Cruz, Anna Darby, Victoria Dean, Mark McKinley, Drew
Nagy, David Richardson, Ryan Ritch, Diana Schoof, Samantha Simcik, Jennifer Soldan, John Stevens, Dean
Stevenson, and Kendra Valencia. Thanks also to Dawson Allen, Michele Block, Todd Gasparello, Micah
Hafich, Michael Overson, Kelsey Pope, Kehong Shi, Melissa Tilashalski, and Michael Wolf from Fall 2008 for
their comments and suggestions. Thanks also to William Gunther for correcting some misstatements about
the continuum hypothesis.
Martin Day, Blacksburg, VA

iv
Chapter 1

Some Specimen Proofs

This chapter begins our study of proofs by looking at numerous examples. In the next chapter we will try
to summarize the logic which underlies typical proofs and the special ways the English language is used
in precise mathematical discussion. This is the way most people learn a new language — learn to say a
few simple things first and after that start to learn the rules of grammar. But not everyone is comfortable
with this jumping–right–in approach. So if you want more up–front explanation, feel free to skip ahead to
Chapter 2 and read it now. In particular you might look at the chart on page 37 which catalogues some
basic types of proofs, and the advice for writing proofs on page 49. Consulting those as we work through
this chapter may be helpful.
Along with the proof specimens in this chapter we include a couple spoofs, by which we mean arguments
that seem like proofs on their surface, but which in fact come to false conclusions. The point of these is
that the style or language of an argument does not make it a proof; what matters is that its logic stands up
to close scrutiny. Learning to look past the language and critically examine the content of an argument is
important for both reading and writing proofs.
The mathematical topics in this chapter don’t fit together in any particular way, so don’t look for some
mathematical theme which connects them. Instead, you should view this chapter as a sampler of different
types of proofs. In fact, much of the material of this chapter will be familiar to you. As we discuss each
topic, ask yourself not whether you already know it, but whether you know why it is true. Could you write
a convincing justification of it to someone who is skeptical?

A Inequalities and Square Roots


A.1 Absolute Value and the Triangle Inequality
Our first examples are inequalities involving the absolute value. Before anything else, we need to be sure
everyone understands what the absolute value refers to. Here is its definition.
Definition. For a real number x, the absolute value of x is defined to be
(
x if x ≥ 0
|x| =
−x if x < 0.

This definition is doing two things. It is stating precisely what we mean by the words “the absolute
value of x” and it is announcing the notation “|x|” which is used to refer to it. (Don’t let the minus sign in
second line of the definition confuse you. In the case of x < 0 observe that |x| = −x is a positive value. For
instance, if x = −2 then −x = +2.)
Here is an elementary property of the absolute value.
Lemma 1.1. For every real number x, x ≤ |x|.

1
We have called this a “lemma.” The words “lemma,” “theorem,” “proposition,” and “corollary” all refer
to a statement or announcement of a mathematical fact. You have seen theorems before. Later on1 we will
talk about how “lemma” is different from “theorem.” For now, just think of it as a sort of mini-theorem.
This particular lemma is saying that no matter what value of x you pick, if you compare the values of x and
|x| you will always find that |x| is at least as big as x. For instance, considering x = −3.227 the lemma is
1 1 √1
telling us that −3.227 ≤ | − 3.227|. Or considering x = 2 , the lemma is telling us that 2 ≤ 2 . You
√ √

probably have no doubt that this lemma is true. The question for us is how we can write a proof of it. We
can’t check all possible values of x one at a time. We need to give reasons that apply simultaneously to all
x. Since the definition of |x| falls into two cases, it is natural to write a proof that provides reasons in two
cases; one line of reasoning that works if x ≥ 0 and a second line of reasoning that works if x < 0. Let’s talk
through these two lines of reasoning.
The first case we want to consider accounts for those x for which x ≥ 0. According to the definition
above, for these x we know |x| = x. So what we are trying prove is just that x ≤ x, which is certainly true.
The second case considers those x for which x < 0. In this case the definition says that |x| = −x. So
what we are trying to prove is that x < −x. What reasons can we give for this? For this case we know that
x < 0, and that implies (multiplying both sides by −1) that 0 < −x. So we can string the two inequalities

x<0 and 0 < −x

together to conclude that x < −x, just as we wanted.


Now we will write out the above reasoning as a proof of Lemma 1.1 in a typical style. Observe that the
beginning of our proof is announced with the italicized word “Proof ” and the end is marked with a small
box2 , so you know exactly where the proof starts and stops.
Proof. We give the proof in two cases.
Case 1: Suppose x ≥ 0. In this case, the definition of absolute value says that |x| = x. So the inequality
x ≤ |x| is equivalent to x ≤ x, which is true.

Case 2: Suppose x < 0. In this case, the definition of absolute value says that |x| = −x. Since x < 0, it
follows that 0 < −x, and therefore
x ≤ −x = |x|,
proving the lemma in this case as well.
Since the two cases exhaust all possibilities, this completes the proof.
Here is another property of the absolute value which everyone knows as the Triangle Inequality, stated
here as a theorem.

Theorem 1.2 (Triangle Inequality). For all real numbers a and b,

|a + b| ≤ |a| + |b|.

We can test what the theorem is saying by just picking some values for a and b.
Example 1.1.
• For a = 2, b = 5: |a + b| = 7, |a| + |b| = 7, and 7 ≤ 7 is true.
• For a = −6, b = 17/3: |a + b| = | − 1/3| = 1/3, |a| + |b| = 35/3, and 1/3 ≤ 35/3 is true.
• For a = −π, b = 1.4: |a + b| = 1.741595 . . ., |a| + |b| = 4.54159 . . ., and 1.741595 . . . ≤ 4.54159 . . . is
true.
1 See the footnote on page 25 and the description in Appendix: Mathematical Words.
2 Called a halmos, after Paul Halmos who initiated that convention.

2
In each of these examples the inequality works out to be true, as the theorem promised it would. But
we have only checked three of the infinitely many possible choices of a and b. We want to write a proof that
covers all possible choices for a and b. There are several approaches we might follow in developing a proof.
One would be to consider all the possible ways the three absolute values in the triangle inequality could work
out. For instance if a ≥ 0, b < 0 and a + b < 0, then we could make the replacements

|a + b| = −(a + b), |a| = a, |b| = −b.

Then we would need to show that −(a + b) ≤ a − b, using the assumptions that a ≥ 0, b < 0, and a + b < 0.
We can develop a valid proof this way, but it would have many cases.
1: a ≥ 0, b ≥ 0.

2: a ≥ 0, b < 0, and a + b ≥ 0.
3: a ≥ 0, b < 0, and a + b < 0.
4: a < 0, b ≥ 0, and a + b ≥ 0.

5: a < 0, b ≥ 0, and a + b < 0.


6: a < 0, b < 0.
This would lead to a tedious proof. There is nothing wrong with that, but a more efficient argument will be
more pleasant to both read and write. We will produce a shorter proof using just two cases: Case 1 will be
for a + b ≥ 0 and Case 2 will account for a + b < 0.
As we think through how to write this (or any) proof, it is important to think about what our
reader will need in order to be convinced. Picture a reader who is another student in the class and
has followed everything so far, but is skeptical about the triangle inequality. They know the definition of
absolute value and have read our proof of Lemma 1.1 and so will accept its use as part of the proof. They
have seen our examples above but are thinking, “Well, how do I know there aren’t some other clever choices
of a and b for which |a + b| > |a| + |b|?” We expect the reader to to look for gaps or flaws in our proof but
can be confident they will agree with our reasoning when there is no logical alternative3 . Our proof needs
to be written to convince this reader.
With this reader in mind, lets think about Case 1. In that case |a + b| = a + b, so we need to provide
reasons to justify a + b ≤ |a| + |b|. But we can deduce that by adding the two inequalities

a ≤ |a|, b ≤ |b|,

both of which our reader will accept once we point out that these are just restatements of Lemma 1.1. What
about Case 2, in which |a + b| = −(a + b)? Since −(a + b) = (−a) + (−b), we can do something similar to
what we did above: add the two inequalities

−a ≤ |a|, −b ≤ |b|.

Now, what should we write to convince our skeptical reader of these? Using the fact that | − a| = |a| we
can point out to our reader that by Lemma 1.1 −a ≤ | − a| and therefore −a ≤ | − a| = |a|. Likewise for
−b ≤ |b|. We should ask ourselves whether our reader knows that | − a| = |a|. Well they probably do, but
we can’t be absolutely sure. There are always judgments like this to make about how much explanation to
include. (Ask yourself how much you would want written out if you were the skeptical reader.) We could
give them a proof that | − x| = |x| for all real numbers x. Or as a compromise, we can just put in a reminder
of this fact — that’s what we will do. Here then is our proof, written with these considerations in mind.
Proof. Consider any pair of real numbers a and b. We present the proof in two cases.
3 We must assume that our reader is rational and honest. If they are adamant in doubting the triangle inequality, even when

faced with irrefutable logic, there is nothing we can do to convince them.

3
Case 1: Suppose a + b ≥ 0. In this case |a + b| = a + b. Note that Lemma 1.1 implies that a ≤ |a| and
b ≤ |b|. Adding these two inequalities yields a + b ≤ |a| + |b|. Therefore we have

|a + b| = a + b ≤ |a| + |b|,

which proves the triangle inequality in this case.

Case 2: Suppose a + b < 0. In this case |a + b| = −a − b. Applying Lemma 1.1 with x = −a we know that
−a ≤ | − a|. But because | − a| = |a|, this implies that −a ≤ |a|. By the same reasoning, −b ≤ |b|.
Adding these two inequalities we deduce that −a − b ≤ |a| + |b| and therefore

|a + b| = −a − b ≤ |a| + |b|.

This proves the triangle inequality in the second case.

Since these cases exhaust all possibilities, this completes the proof.
The proofs above are simple examples of proofs by cases, described in the “For all x . . . ” row
of the table on page 37. We gave different arguments that worked under different circumstances (cases)
and every possibility fell under one of the cases. Our written proofs say the same things as our discussions
preceding it, but are more concise and easier to read.

Problem 1.1 Write out proofs of the following properties of the absolute value, using whatever cases seem
most natural to you.
a) |x| = | − x| for all real numbers x.
b) 0 ≤ |x| for all real numbers x.

c) |x|2 = x2 for all real numbers x.


d) |xy| = |x| |y| for all real numbers x and y.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . absv

Problem 1.2 At the top of page 3 a proof of the triangle inequality in six cases was contemplated. Write
out the proofs for cases 4, 5, and 6.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . trialt

Problem 1.3 Prove that for any two real numbers a and b,

||a| − |b|| ≤ |a − b|.

(There are four absolute values here; if you split each of them into two cases you would have 16 cases to
consider! You can do it with just two cases: whether |a| − |b| is positive or not. Then take advantage of
properties like the triangle inequality which we have already proven.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . absv3

Problem 1.4 For two real numbers a, b, the notation max(a, b) refers to the maximum of a and b, i.e. the
larger of the two: (
a if a ≥ b
max(a, b) =
b if a < b.
The smaller of the two is denoted min(a, b).

4
Prove that for any two real numbers a and b, max(a, b) = 12 (a + b + |a − b|). Find a similar formula for
the minimum. (You don’t need to prove the minimum formula. Just find it.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . max

Problem 1.5 Define the sign or signum function of a real number x to be



+1
 if x > 0
sgn(x) = 0 if x = 0 .

−1 if x < 0

Prove the following facts, for all real numbers a and b.


a) sgn(ab) = sgn(a) sgn(b).
b) sgn(a) = sgn(1/a), provided a 6= 0.

c) sgn(−a) = − sgn(a).
d) sgn(a) = a/|a|, provided a 6= 0.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sgn

A.2 Square Roots



Next we consider some basic properties of the square root function, s. Although these facts are well-known
to you, we want to make ourselves stop long enough to √ ask why they are true. √
√ First, what exactly do we mean by the square root s of a real number s? For instance
√ is 4 = 2 or is
4 = −2? You might be inclined to say “both,” but √ that is incorrect. The notation s refers only to the
nonnegative 4 number r which solves r2 = s. Thus 4 = 2, not −2.

Definition. Suppose s is a nonnegative real number. We say r is the square root of s, and write r = s,
when r is a nonnegative real number for which r2 = s.

√ all know that for s ≥ 0, s does
We exist and there is only one value
√ of r that qualifies to be called
r = s, but for s < 0 there is no (real5 ) value that qualifies to be called s. For nonnegative s these facts
are stated in the following proposition.

Proposition 1.3. If s is a nonnegative real number, then s exists and is unique.

You will prove the existence as Problem 1.6; we will just write a proof of the uniqueness assertion here.
When we say a mathematical object is unique we mean that there is only one of them (or none at all). This
proof is typical of uniqueness proofs. We assume there are two values (r and r̃ here) which both satisfy the
requirements of the definition, and then based on that show that the two values must be the same.

Proof. To prove uniqueness, suppose both r and r̃ satisfy the definition of s. In other words, r ≥ 0, r̃ ≥ 0,
and r2 = s = r̃2 . We need to show that r = r̃. Since r2 = r̃2 we know that

0 = r2 − r̃2 = (r − r̃)(r + r̃).

Thus either r − r̃ = 0 or r + r̃ = 0. If r − r̃ = 0 then r = r̃. If r + r̃ = 0 then r = −r̃. Since r̃ ≥ 0 this means


that r ≤ 0. But we also know that r ≥ 0, so we conclude that r = 0, and that r̃ = −r = 0 as well. Thus in
either case r = r̃.
4 “nonnegative” means ≥ 0; “positive” means > 0.
5 We are considering only real square roots here. If we allow complex numbers for r then r2 = s can always be solved for r
(even if s is complex). But it may be that neither of them is nonnegative, so that the definition of “square root” above is no
longer appropriate. Complex square roots are important in many subjects, differential equations for instance.

5
This is a typical proof of uniqueness; see the last row of the table on page 37. Observe that embedded
in it is another argument by cases. The proof does not start by saying anything about cases. Only when we
get to (r − r̃)(r + r̃) = 0 does the argument split into the two cases: r − r̃ = 0 or r + r̃ = 0. Since those are
the only ways r2 − r̃2 = 0, the cases are exhaustive.

Problem 1.6 In this problem you will use some of what you know about calculus to prove the existence
of square roots.
a) Look up the statement of the Intermediate Value Theorem in a calculus book. Write it out carefully
and turn it in. (Use an actual book, not just a web page. Include the author, title, edition number,
publisher and date of publication for the specific book you used.) The statement of a theorem can be
divided into hypotheses and conclusions. The hypotheses are the part of the theorem that must be true
in order for the theorem to be applicable. (For example, in Proposition 1.3 the hypotheses are that s
is a real number and that s is nonnegative.) The conclusions are things that the√ theorem guarantees
to be true when it is applicable. (In Proposition 1.3 the conclusions are that s exists and that it
is unique.) What are the hypotheses of the Intermediate Value Theorem? What are its conclusions?
Does the book you consulted provide a proof of the Intermediate Value Theorem? (You don’t need
to copy the proof, just say whether or not it included one.) If it doesn’t provide a proof does it say
anything about how or where you can find a proof?
b) Suppose that s > 0 and consider the function f (x) = x2 on the √ inteval [a, b] where a = 0 and b = s+1/2.
By applying the Intermediate Value Theorem, prove that s does exist. (To “apply” the theorem,
you need to make a particular choice of function f (x), interval [a, b] and any other quantities involved
in the theorem. We have already said what we want to use for f (x) and [a, b]; you need to specify
choices for any other quantities in the theorem’s statement. Next explain why your choices satisfy all
the hypotheses of the theorem. Then explain why the conclusion √ of the Intermediate Value Theorem
means that an r exists which satisfies the definition of r = s. Your are not writing a proof of
the Intermediate Value Theorem. You are taking the validity of the Intermediate Value Theorem for
granted, and explaining how it, considered√ for the specific f (x) that you selected, tells us that that an
r exists which satisfies the definition of s.)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IVTsqrt

In general it is not true that the inequality a2 ≤ b2 implies a ≤ b. You can easily think of an example
with b negative for which this doesn’t work. So under what circumstances can we legitimately say that
a ≤ b is a valid deduction from a2 ≤ b2 ? If we rule out negative values by insisting that both a and b be
nonnegative, that will do it. We state this property as a lemma. Note that because square roots are by
definition nonnegative,√the lemma does not need to say something like “assume a and b are nonnegative” —

that is implicit in a = x and b = y.
Lemma 1.4 (Square Root Lemma). If x and y are nonnegative real numbers with x ≤ y, then
√ √
x ≤ y. (1.1)
√ √
The reasoning for our proof will go like this. Let a = x and b = y. We know these exist, are
nonnegative, and that a2 = x ≤ y = b2 . Working from these facts we want to give reasons leading to the
conclusion that a ≤ b. Now for any pair of real numbers either a ≤ b or b < a must be true. What our proof
will do is show that b < a is impossible under our hypotheses, leaving a ≤ b as the only possibility. In other
words, instead of a sequence of logical steps leading to a ≤ b, we provide reasoning which eliminates the only
alternative.
√ √
Proof. Suppose 0 ≤ x ≤ y. Let a = x and b = y. We know both a and b are nonnegative, and by the
hypothesis that x ≤ y,
a2 ≤ b2 .
Suppose it were true that b < a. Since b ≥ 0 it is valid to multiply both sides of b < a by b to deduce that
b2 ≤ ab.

6
We can also multiply both sides of b < a by a. Since a > 0 we deduce that
ab < a2 .
Putting these inequalities together it follows that
b2 < a2 .
But that is impossible since it is contrary to our hypothesis that a2 ≤ b2 . Thus b < a cannot be true under
our hypotheses. We conclude that a ≤ b.
This may seem like a strange sort of proof. We assumed the opposite of what we wanted to prove (i.e. that
b < a), showed that it lead to a logically impossible situation, and called that a proof! As strange as it
might seem, this is a perfectly logical and valid proof technique. This is called proof by contradiction;
see the second row of the table on page 37. We will see some more examples in Section E.

Problem 1.7 Prove that |x| = x2 , where x is any real number. (You are free to use any of the properties
from Problem 1.1.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . altfor

Problem 1.8
a) Using Problem 1.6 as as a guide, prove that for any real number x there exists a real number y with
y 3 = x. (Observe that there is no assumption that x or y are positive.)
b) Show that y is unique, in other words that if y 3 = z 3 = x then y = z. (Although there are a number
of ways to do this, one of the fastest is to use calculus. Can you think of a way to write z 3 − y 3 using
caluclus, and use it to show that y 3 = z 3 implies that y = z?)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CubeRt


Problem 1.9 Prove (by contradiction) that if 1 ≤ x then x ≤ x.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ri

A.3 Another Inequality


Here is another inequality, whose proof will be instructive for us.
Proposition 1.5. For any real numbers x and y, the following inequality holds.
p
x2 + y 2 ≤ |x| + |y|.
Any proof must exhibit a connection between things we already know and the thing we are trying to
prove. Sometimes we can discover such a connection by starting with what we want to prove and then
manipulate it into something we know. Let’s try that with the inequality of the lemma.

p ?
x2 + y 2 ≤ |x| + |y| (1.2)
?
x2 + y 2 ≤ (|x| + |y|)2 (1.3)
?
x2 + y 2 ≤ |x|2 + 2|x| |y| + |y|2
?
x2 + y 2 ≤ x2 + 2|x| |y| + y 2
Yes!
0 ≤ 2|x| |y|.

7
The question marks above the inequalities are to remind us that at each stage we are really asking ourselves,
“Is this true?” The last line is certainly true, so we seem to have succeeded! So is this a proof? No —
the logic is backwards. As presented above, we started from what we don’t know (but want to prove) and
deduced from it something we do know. To make this into a proof we need to reverse the logical direction,
so that we start from what we do know (the last line) and argue that each line follows from the one below
it. In other words we need to check that our reasoning still works when the order of steps is reversed. For
most of the steps that just involves simple algebraic manipulations. But there is one place where where we
should be careful: in deducing (1.2) from (1.3) we need to appeal the Square Root Lemma. Here then is
what the proof looks like when written out in the correct sequence.
Proof. Suppose x and y are real numbers. We know from Problem 1.1 that 0 ≤ |x| and 0 ≤ |y|, so that

0 ≤ 2|x| |y|.

Adding x2 + y 2 to both sides, and using the fact that x2 + y 2 = |x|2 + |y|2 we find that

x2 + y 2 ≤ x2 + 2|x| |y| + y 2 = |x|2 + 2|x| |y| + |y|2 = (|x| + |y|)2 .

By the Square Root Lemma, it follows from this that


p p
x2 + y 2 ≤ (|x| + |y|)2 .
p
Since 0 ≤ |x| + |y| we know that (|x| + |y|)2 = |x| + |y|, and so we conclude that
p
x2 + y 2 ≤ |x| + |y|.

The lesson of this proof is that we need to be sure our reasoning starts with our assumptions and leads
to the desired conclusion. When we discover a proof by working backwards from the conclusion,
we will need to be sure the logic still works when presented in the other order. (See the spoof
of the next section for an example of an argument that works in one direction but not the other.) This is an
example of a direct proof; see the top row of the table on page 37.

Problem 1.10 Suppose that a and b are both positive real numbers. Prove the following inequalities:
2ab √ a+b p 2
≤ ab ≤ ≤ (a + b2 )/2.
a+b 2
The first of these quantities is called the harmonic mean of a and b. The second is the geometric mean. The
third is the arithmetic √ mean. √ The last is the rootp mean square. (You should prove this as three separate
2ab
inequalities, a+b ≤ ab, ab ≤ a+b 2 , and a+b
2 ≤ (a2 + b2 )/2. Write a separate proof for each of them.
Follow the way we found a proof for Proposition 1.5 above: work backwards to discover a connection and
then reverse the order and carefully justify the steps to form a proof.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . means

B Some Spoofs
To read a proof you need to think about the steps, examining each critically to be sure it is logically sound,
and convince yourself that the overall argument leaves nothing out. To help you practice that critical reading
skill, we offer a couple “spoofs,” arguments which seem valid on careless reading, but which come to a false
conclusion (and so must have at least one logically flawed step).
Here is an old chestnut, probably the best known spoof, which “proves” that 1 = 2.

8
Spoof ! Let a = b. It follows that
ab = a2 ,
a2 + ab = a2 + a2 ,
a2 + ab = 2a2 ,
a2 + ab − 2ab = 2a2 − 2ab, and
a2 − ab = 2a2 − 2ab.
This can be written as
1(a2 − ab) = 2(a2 − ab),
and canceling a2 − ab from both sides gives 1 = 2. Ha Ha!

See if you can spot the flaw in the reasoning before we talk about it in class. This shows that is it
important to examine each step of a sequence of deductions carefully. All it takes is one false step
to reach an erroneous conclusion.
This spoof also reinforces the point of page 8: the order of the logic in an argument matters. Here is the
argument of the spoof but in the opposite order. Let a = b 6= 0. Then we check the truth of 1 = 2 by the
following argument.
1=2
1(a − ab) = 2(a2 − ab)
2

a2 − ab = 2a2 − 2ab
a2 + ab − 2ab = 2a2 − 2ab
a2 + ab = 2a2 ,
ab = a2 ,
a = b.
This time every line really does follow from the line before it! (The division by a in the last line is valid
because a 6= 0.) So does this prove that 1 = 2? Of course not. We can derive true statements from false
ones, as we just did. That doesn’t make the false statement we started from true. A valid proof must start
from what we do know is true and lead to what we want to prove. To start from what we want to prove
and deduce something true from it is not a proof. To make it into a proof you must reverse the sequence of
the logic so that it leads from what we know to what we are trying to prove. In the case of the above, that
reversal of logic is not possible.
Here are a couple more spoofs for you to diagnose.

Problem 1.11 Find the flawed steps in each of the following spoofs, and explain why the flawed steps are
invalid.
a)
−2 = −2
4−6=1−3
4 − 6 + 9/4 = 1 − 3 + 9/4
(2 − 3/2)2 = (1 − 3/2)2
2 − 3/2 = 1 − 3/2
2=1

Notice once again that if we started with the last line and worked upwards, each line does follow
logically from the one below it! But this would certainly not be a valid proof that 2 = 1. This again

9
makes our point that starting from what you want to prove and logically deriving something true from
it does not constitute a proof !
b) Assume a < b. Then it follows that
a2 < b2
0 < b2 − a2
0 < (b − a)(b + a).
Since b − a > 0 it must be that b + a > 0 as well. (Otherwise (b − a)(b + a) ≤ 0.) Thus
−a < b.
But now consider a = −2, b = 1. Since a < b is true, it must be that −a < b, by the above reasoning.
Therefore
2 < 1.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . more

Problem 1.12 We claim that 1 is the largest positive integer. You laugh? Well here’s the spoof (taken
from [5]). Let n be the largest positive integer. Since n ≥ 1, multiplying both sides by n implies that n2 ≥ n.
But since n is the biggest positive integer, we must also have n2 ≤ n. It follows that n2 = n. Dividing both
sides by n implies that n = 1. Explain what’s wrong!
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . largest1

C Proofs from Geometry


C.1 Pythagoras
The Pythagorean Theorem is perhaps the best known theorem in mathematics. You can find some of its
history in Velijan [26] — there are roughly 400 known proofs! The one we consider below is attributed to
Chou-pei Saun-ching around 250 B.C.
Theorem 1.6 (Pythagorean Theorem). Suppose a right triangle has hypotenuse of length c and sides of
length a and b adjacent to the right angle. Then
a2 + b2 = c2 .

Proof. Start with a square with sides of length c (blue in the figure)
and draw four copies of the right triangle (green in the figure),
each with its hypotenuse along one of the sides of the square, as
illustrated. This produces a larger square with sides of length a+b,
b
with the corners P of the original square touching each of the sides. P
We can calculate the area of the larger square in two ways. On one a
hand the area must be (a + b)2 because it is a square with sides of
length a + b. On the other hand it must be the sum of the areas of c
the smaller square and the four right triangles. It follows that b

1
(a + b)2 = c2 + 4( ab)
2
a2 + 2ab + b2 = c2 + 2ab
a2 + b2 = c2 .

The last of these proves the theorem.

10
This is rather different than the other proofs we have looked at so far. The proof starts by instructing the
reader to draw a particular figure. Properties of that figure are then taken to be facts that can be used in
the proof without further justification. The proof depends on the formulas for areas of squares and triangles.
Especially, it depends on the fact that if a plane figure is partitioned into the (disjoint) union of several other
figures, then the areas of the figures in the partition must add up to the area of the original figure. That is
the basis of the equation on which the whole proof rests. The fundamental idea behind the proof is the idea
of finding the Pythagorean relationship as a consequence of this equality of area calculations. This took a
spark of creative insight on the part of Chou-pei Saun-ching. That is the hardest part of writing proofs for
many students, finding an idea around which the proof can be built. There is no way someone can teach
you that. You have to be willing to try things and explore until you find a connection. In time
you will develop more insight and it will get easier.

Problem 1.13
Consider a right triangle with sides a, b and hypotenuse c. Ro-
tate the triangle so the hypotenuse is its base, as illustrated.
Draw a line perpendicular to the base up through the corner
at the right angle. This forms two right triangles similar to
the original. Calculate the lengths of the sides of these two a b
h
triangles (h, l, and r in the figure). When you substitute these
values in the equation l + r = c, you will have the outline of l r
another proof of the Pythagorean Theorem. Write out this c
proof.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . pyth2

Problem 1.14
a

b
There is a proof of the Pythagorean Theorem attributed to James A. Garfield (the
20th president of the United States); see [22]. It is based on calculating the area of c
the figure at right in two different ways. Write out a proof based on this figure.

c
a

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Garfield

C.2 The Curry Triangle


Our proof of the Pythagorean Theorem takes for granted properties of the figure which it describes. Look
at the picture below, a famous example called the Curry Triangle6 . The triangle on the left is decomposed
into 6 smaller regions, which are then rearranged on the right to cover only part of the original triangle!
This would seem to cast doubt on the principle of adding areas which our proof of Pythagorean Theorem
is based. It is in fact a fraud; see Problem 1.15 below. But it serves to make the point that your eyes
can deceive you; what seems apparent to you visually need not be true. There are such things as
optical illusions. For that reason, proofs that are based on pictures are always a little suspect. In the case of
the Pythagorean Theorem above the figure is honest. But the Curry Triangle tricks you into thinking you
see something that isn’t really there.
6 See [3] for more on this dissection fallacy.

11
Problem 1.15 Explain the apparent inconsistency between the sums of the areas in the Curry Triangle.
Based on your understanding of it, what features of figure for the Pythagorean Theorem would you want to
check to be sure our proof of the Pythagorean Theorem is valid?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Curry

D From Calculus
This section considers some proofs which involve ideas from calculus. In particular we will consider some
aspects of ordinary differential equations. As our primary example we consider the following equation:
y 00 (x) + y(x) = 0. (1.4)
Some other equations will be considered in the problems.
For readers not familiar with differential equations, we provide a brief introduction. The equation (1.4)
specifies that the function y(x) and its second derivative y 00 (x) be connected to each other in a specific way,
namely that y 00 (x) = −y(x) for all x. Some functions y(x) will satisfy this, while most will not. To solve the
differential equation means to find the function (or functions) y(x) which have this property.
We can try guessing a solution to (1.4). For instance if we try y(x) = x2 , we have y 00 (x) = 2 so
00
y (x) + y(x) = 2 + x2 . That is not = 0 for all x, so y(x) = x2 is not a solution. If keep trying things, we will
eventually discover that y(x) = sin(x) works. To see this observe that y 0 (x) = cos(x) and y 00 (x) = − sin(x),
so we have
y 00 (x) + y(x) = − sin(x) + sin(x) = 0 for all x.
We have now guessed one solution: y(x) = sin(x). The next obvious guess is y(x) = cos(x), which we find
also works. In fact we can put sin(x) and cos(x) together in various ways to get even more solutions, like

y(x) = sin(x) + cos(x) and y(x) = 3 sin(x) − 2 cos(x), . . .
In fact if you choose any two constants a and b the function
y(x) = a cos(x) + b sin(x) (1.5)
is a solution. We now know that there are many solutions of (1.4). Could there be even more, perhaps
involving something other than sin(x) or cos(x)? What we are going to prove is that there are no others;
every solution of (1.4) is of the form (1.5) for some choice of values for a and b.
Proposition 1.7. The functions
y(x) = a cos(x) + b sin(x),
where a and b are constants, are the only twice differentiable functions which solve the differential equation
y 00 (x) + y(x) = 0 for all x.

12
The next example illustrates how this proposition is used.
Example 1.2. Find a solution of (1.4) for which y(π/4) = 3 and y 0 (π/4) = −1. According to the proposition
y(x) = a cos(x) + b sin(x) are the only solutions; we just need to find values for a and b so that
a cos(π/4) + b sin(π/4) = 3
−a sin(π/4) + b cos(π/4) = −1
√ √
A bit of algebra leads to a = 2 2 and b = 2, so that
√ √
y(x) = 2 2 cos(x) + 2 sin(x)
solves the problem.
We turn our attention now to proving the proposition. Here is the plan. Suppose y(x) is a solution. The
proposition implies that if we use y(x) to determine the values a = y(0) and b = y 0 (0) and use these values
define the function
ỹ(x) = a cos(x) + b sin(x),
then y(x) = ỹ(x) for all x. Our approach to proving this consists of two parts.
1) First verify that ỹ(x) is a solution of (1.4), and in fact so is φ(x) = y(x) − ỹ(x). Moreover, φ(0) = 0
and φ0 (0) = 0.
2) Apply the following lemma to conclude that φ(x) = 0 and therefore y(x) = ỹ(x) for all x.
Lemma 1.8. Suppose φ(x) is a twice differentiable function of a real variable x satisfying φ00 (x) + φ(x) = 0
for all x and φ(0) = φ0 (0) = 0. Then φ(x) = 0 for all x.
Let’s pause and put this in perspective. A fair bit of what we have said above is specific to the equation
(1.4), especially the form (1.5) of solutions. We just arrived at (1.5) by guessing. Don’t worry about whether
you might be able to guess the solutions to differential equations other than (1.4); you are not going to be
asked to do that for this course. (If you take a course in differential equations you will learn techniques
for finding formulas like (1.5) for differential equations other than (1.4).) What is more important here is
how we are approaching the task of proving that there are no solutions other than those of (1.5): we find
the solution ỹ(x) from (1.5) so that y(0) = ỹ(0) and y 0 (0) = ỹ 0 (0), and then use Lemma 1.8 to show that
y(x) = ỹ(x). What we have done is reduce all the situations covered by Proposition 1.7 to the special case
considered in the lemma. Everything depends on our ability to prove the lemma.
Proof of the Lemma. Define the function f (x) by f (x) = (φ(x))2 + (φ0 (x))2 . By hypothesis this is differen-
tiable. Using the the chain rule and the differential equation we find
f 0 (x) = 2φ(x)φ0 (x) + 2φ0 (x)φ00 (x) = 2φ0 (x)[φ(x) + φ00 (x)] = 0 for all x.
Thus f (x) is a constant function. Therefore its value for any x is the same as its value at x = 0: f (x) =
f (0) = 02 + 02 = 0. Thus φ(x)2 + φ0 (x)2 = 0 for all x. Now since φ(x)2 ≥ 0 and φ0 (x)2 ≥ 0, for their to sum
to be 0 both terms must be 0 individually: φ(x)2 = 0 and φ0 (x)2 = 0, both for all x. But φ(x)2 = 0 implies
that φ(x) = 0. Since this holds for all x, we have proven the lemma.
This proof is based on introducing a new object, the function f (x). To read the proof you don’t need
to know where the idea came from, you just have to be able to follow the reasoning leading to f (x) = 0.
However to write such a proof in the first place you would need to come up with the idea, which requires some
creativity. The proof we have given is specific to the equation (1.4); it won’t work for different differential
equations. But sometimes you can modify the idea of a proof you have seen before to get it to
work in new circumstances. That can be a way you come up with a new proof. The problems will
give you examples of different differential equations for which different sorts of modifications of the proof of
the lemma above will work. We should also say that there are ways to prove a general version of the lemma
that is not tied to one specific differential equation. Such a proof would be too much of a diversion for us;
you will learn about that if you take a more advanced course on differential equations.
With the lemma established we can now write out the proof of our original proposition as we outlined it
above.

13
Proof of Proposition 1.7. Suppose y(x) solves y 00 (x) + y(x) = 0 for all x. Let

ỹ(x) = a cos(x) + b sin(x), where a = y(0), b = y 0 (0).

We first verify that ỹ(x) is a solution of (1.4).

ỹ 00 (x) + ỹ(x) = [−a cos(x) − b sin(x)] + [a cos(x) + b sin(x)] = 0, for all x.

Next, define
φ(x) = y(x) − ỹ(x).
Observe that

φ00 (x) + φ(x) = y 00 (x) − ỹ 00 (x) + y(x) − ỹ(x)


= [y 00 (x) + y(x)] − [ỹ 00 (x) + ỹ(x)]
= 0.

In addition,

φ(0) = y(0) − ỹ(0) = a − [a cos(0) + b sin(0)] = 0


φ0 (0) = y 0 (0) − ỹ 0 (0) = b − [−a sin(0) + b cos(0)] = 0.

Thus φ satisfies the hypotheses of Lemma 1.8. We conclude that φ(x) = 0 for all x, which means that
y(x) = ỹ(x).

Problem 1.16 Here is a different differential equation:

y 0 (x) + y(x) = 0 for all x. (1.6)

a) Check that y(x) = ce−x is a solution, for any constant c.


b) Suppose φ0 (x) + φ(x) = 0 with φ(0) = 0. Write a proof that φ(x) = 0 for all x by considering the
function f (x) = ex φ(x).
c) State and prove a version of Proposition 1.7 for the equation (1.6), using part b) in place of the lemma.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . exp

Problem 1.17 If we change the equation in Lemma 1.8 to φ00 (x) − φ(x) = 0, the conclusion is still true,
but the same proof no longer works.
a) Write the statement of a revision of the lemma based this new differential equation.

b) Write a proof of your new lemma based on the following idea. Show that both of the following must
equal to 0 for all x:
f (x) = e−x (φ(x) + φ0 (x)), g(x) = ex (φ(x) − φ0 (x)).
Now view f (x) = 0 and g(x) = 0 as two equations for φ(x) and φ0 (x). By solving those equations
deduce that φ(x) = 0 for all x.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . htrig

Problem 1.18 Consider this differential equation:

φ00 (x) + φ(x)3 = 0; with φ(0) = φ0 (0) = 0.

14
Prove that φ(x) = 0 for all x in a manner similar to Lemma 1.8, but using f (x) = cφ(x)n + (φ0 (x))2 for
appropriate choices of c and n.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . nonlin

Problem 1.19 An interesting application of Proposition 1.7 is to prove trigonometric identities. For
example, suppose α is a constant and consider the following two functions.

y1 (x) = sin(α) cos(x) + cos(α) sin(x), and y2 (x) = sin(α + x).

Explain how Proposition 1.7 and Lemma 1.8 can be used to prove that y1 (x) = y2 (x) for all x. What
trigonometric identity does this prove? Write a proof of

cos(α + β) = cos(α) cos(β) − sin(β) sin(α)

in a similar way.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TrigId

Problem 1.20 Prove that if φ0 (x) = φ(x) and φ(0) = 0 then φ(x) ≡ 0. [Hint: Consider f (x) = e−x φ(x).]
Use this to prove the identity
ea+b = ea eb .
[Hint: Do something like Problem 1.19.]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . expsol

E Irrational Numbers
The study of properties of the integers, especially related to prime numbers, is called number theory. Because
we are very familiar with integer arithmetic, elementary number theory is a nice source of topics to illustrate
and practice our proofs skills on. In this section we will look at two proofs which are often cited as favorite
examples7 .
We begin with the definition of divisibility for integers. All the interesting structure of the integers is
based on the fact that we can not always divide one integer by another. For instance, we cannot divide 3
by 2. Now you may say, “yes we can; we get a fraction 32 .” What we mean by divisibility of integers is that
the result of the division is another integer, i.e. we can perform the division without going outside the set of
integers for the result. Here is the formal definition.
Definition. If m and n are integers, we say that m is divisible by n when there exists an integer k for which
m = kn. This is denoted n|m.
We often use the alternate phrasings “n divides m” and “n is a divisor of m”. Observe that the definition
does not refer to an actual operation of division (÷). Instead it refers to the solvability of the equation
m = kn for for an integer k. We see that 1 divides all integers, and 0 divides only itself.
Definition. An integer p > 1 is called prime if p has no positive divisors other than 1 and itself. An integer
n > 1 is called composite if it has a positive divisor other than 1 and itself.
As an example, 6 is composite since in addition to 1 and 6 it has positive divisors 2 and 3. On the other
hand the only positive divisors of 7 are 7 and 1, so 7 is prime.
Observe that the terms “prime” and “composite” are only defined for integers larger than 1. In particular
1 is neither prime nor composite. (It’s what we call a unit.) For integers n > 1 prime and composite are
complementary properties; for n to be composite is the same as being not prime. If n > 1 is composite then
can be factored (in a unique way) as a product of primes: n = p1 p2 · · · pk where each pi is a prime. This
7 See for instance G. H. Hardy’s famous little book [13].

15
fact is the Fundamental Theorem of Arithmetic — we will talk about it more in Chapter 4. In particular a
composite number is always divisible by some prime number. We will take these things for granted as we
prove the theorems of this section.
Theorem 1.9. There are infinitely many prime numbers.

How are we going to prove this? We can’t hope to write out all infinitely many primes for someone to
examine. Since “infinite” means “not finite,” we will show that any finite list of prime numbers does not
include all the primes. Here is the proof.
Proof. Consider a finite list of prime numbers,

p1 , p2 , . . . , pn . (1.7)

We are going to show that there must be a prime number not in this list. To prove this by contradiction,
assume that our list includes all the primes. Now consider the positive integer

q = 1 + p1 p2 · · · pn .

Observe that because all pk > 1, it follows that q is bigger than all the primes pk and so is not in our list
(1.7) and is therefore not prime. So q is composite and must be divisible by some prime number. By our
hypothesis every prime number appears in our list. So q = mpk for some integer m and one of the primes
pk from out list. But this means that we can write 1 as

1 = q − p1 p2 · · · pn = (m − p1 · · · pk−1 pk+1 · · · pn )pk .

I.e. 1 is divisible by pk . But this is certainly false, since 1 is not divisible by any positive integer except itself.
Thus our assumption that all primes belong to our list must be false, since it leads to a contradiction with
a known fact. There is therefore a prime number not in the list. Thus no finite collection of prime numbers
contains all primes, which means that there are infinitely many prime numbers.
Some mathematicians try avoid proofs by contradiction. The next problem brings out one reason some
people feel that way.

Problem 1.21 Let p1 = 2, p2 = 3, p3 = 5, p4 = 7, . . . pn be the first n prime numbers. Is it always true


that q = 1 + p1 p2 · · · pn is also a prime number? (For instance, for n = 4 we get q = 1 + 2 · 3 · 5 · 7 = 211,
which is a prime. The question is whether this is true for all n. You might want to experiment with other n.
Mathematica will quickly check8 whether a given number is prime or not for you.) The proof of the theorem
above seems to conclude that q has to be a prime (since it is not divisible by any of the primes p1 , . . . pn .
Does your answer to this problem cast doubt on the proof of the theorem?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FP

Now we move beyond the integers to real numbers. We said above that 3 is not divisible by 2, because
the division cannot be carried out within the integers. But we can do the division if we allow real numbers
as the result, getting 23 . Such real numbers, those which are expressible as the ratio of two integers, are what
we call rational numbers.
Definition. A real number r is called rational if it can be expressed as the ratio of two integers: r = m/n
where m and n are integers. A real number which is not rational is called irrational.
Are there any irrational numbers? Yes, and the next theorem identifies one. As we will see, the proof
again uses the fact that every integer bigger than 1 is either prime or has a unique factorization into to a
product of primes.

Theorem 1.10. 2 is an irrational number.
8 The web site http://www.onlineconversion.com/prime.htm will do it also.

16
√ √
Proof. Suppose the contrary, namely that 2 is rational. Then 2 = m
n , where m and n are positive integers.
2
By canceling any common factors, we can assume that m and n share no common factors. Then 2 = m n2 ,
which implies
2n2 = m2 .
Therefore 2 divides m2 and so 2 must be one of the prime factors of m. But that means m2 is divisible by 22 ,
which implies that n2 is divisible by 2. Therefore n is also divisible by 2. Thus both n and m are divisible
by 2, contrary to the fact that they share no common factors. This contradiction proves the theorem.
This, and the proof of Theorem 1.9, are examples of proof by contradiction. (See the chart
on page 37.) It may seem strange that the proof is more concerned with what is not true than what is. But
given our understanding of infinite as meaning “not finite,” the proof has to establish that “there are a finite
number of primes” is a false statement. So the proof temporarily assumes that the collection of all primes is
finite, for the purpose of logically shooting that possibility down. That’s the way a proof by contradiction
works: temporarily assume that what you want to prove is false and explain how that leads to a logical
impossibility. This shows that what you want to prove is not false, which means it is true!

Problem 1.22 If you think about it, the same proof works if we replace 2 by any √ prime number p. In fact
it’s even more general than that. As an example, revise the proof to show that 60 is not √ rational. So how
far does it go? Can you find an easy way to describe those positive integers n for which n is irrational? √
(In other words you are being asked to complete this sentence, “The positive integers n for which n is
irrational are precisely those n for which . . . ” You are not being asked to write a proof of your conclusion
for this last part, but just to try to figure out what the correct answer is.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sqir

The familiar constants π and e are also irrational, but the proofs are harder. We√need a precise way
to identify these numbers in order to prove anything about them. That was easy for 2 because it is the
number x > 0 for which x2 = 2. There are no such polynomial equations that π or e are solutions to, which
is one reason the proofs are harder. There is however an infinite series9 representation for e:

X 1
e= .
0
n!

A short proof of the irrationality of e can be based on this. The irrationality of π is harder, but you can find
proofs of both in Nagell [20], as well as Spivak [24].

F Induction
This section will consider some algebraic formulas whose proofs illustrate the technique of
proof by induction. See the “For all positive integers n . . . ” row of the table on page 37. Like proof by
contradiction, this can be hard to understand at first, so we offer several examples.

F.1 Simple Summation Formulas


You have probably encountered the following formulas for the sums of the first n integers, squares of integers,
and cubes of integers.
Proposition 1.11. For each positive integer n the following formulas hold:
n
X n(n + 1)
a) k= ,
1
2
9 You
P∞ xn
may recall the power series representation of the exponential function: exp(x) = 0 n!
, converging for all x. The
value x = 1 gives the formula for e.

17
n
X n(n + 1)(2n + 1)
b) k2 = ,
1
6
n
X n2 (n + 1)2
a) k3 = .
1
4
These formulas are often used in calculus P books for working out examples of the definite integral. Similar
n m
formulas exist for every integer power m: k=1 k = (formula).
What we want to do here is consider how such formulas can be proven. Let’s focus on the first formula:
Pn n(n+1)
1 k = 2 . We can check it for various values of n. For n = 1 it says that
1(1 + 1)
1= ,
2
which is certainly true. For n = 2 it says
2(2 + 1)
1+2= ,
2
which is correct since both sides are = 3. For n = 3,
3(3 + 1)
1 + 2 + 3 = 6 and = 6,
2
so it works in that case too. We could go on for some time this way. Maybe we could use computer software
to help speed things up. But there are an infinite number of n values to check, and we will never be able to
check them all individually.
What we want is a way to prove that the formula is true for all n at once, without needing to check each
value of n individually. There are a couple ways to do that. We want to focus on one which illustrates the
technique of mathematical induction. The idea is to link the truth of the formula for one value of n to the
truth of the formula for the next value of n. Suppose, for instance, you knew that if the formula worked for
n = 50, then it was guaranteed to work for n = 51 as well. Based on that, if at some point you confirmed
that it was true for n = 50, then you could skip checking n = 51 separately. Now how could we connect the
n = 50 case with the n = 51 case without actually knowing if either were true yet? Well, consider this:
51 50
!
X X
k = 1 + 2 + · · · + 50 + 51 = k + 51.
1 1
P50 50·(50+1)
Now if it does turn out that 1 k= 2 , then we will automatically know that
51
X 50 · (50 + 1) 50 · 51 + 2 · 51 51 · 52 51 · (51 + 1)
k= + 51 = = = ,
1
2 2 2 2

which is what the formula for n = 51 claims! Now be careful to understand what we have said here. We have
not yet established that the formula is true for either of n = 50 or n = 51. But what we have established is
that if at some point in the future we are able to establish that the n = 50 formula is true, then at the same
moment we will know that the n = 51 formula is also true, without needing to check it separately. We have
connected the truth of the n = 50 case to the truth of the n = 51 case, but have not established the truth
of either by itself. We have established a logical implication, specifically the statement
if the formula holds for n = 50 then the formula holds for n = 51.
We will talk more about logical implications in the next chapter.
What we did for n = 50 and n = 51 we can do for any pair of successive integers. We can show that the
truth of the formula for any n will automatically imply the truth of the formula for n + 1. The reasoning is
essentially the same as what we said above.
n+1 n
!
X X
k= k + (n + 1).
1 1

18
Pn n(n+1)
So if it turns out that 1 k= 2 is true for this particular value of n, then it will also be true that
n+1
X n(n + 1) n(n + 1) + 2(n + 1) (n + 1)((n + 1) + 1)
k= + (n + 1) = = .
1
2 2 2

In other words once the formula is true for one n it will automatically be true for all the values of n that
come after it. If it’s true for n = 1 then it is true for n = 2, and being true for n = 2 automatically makes it
true for n = 3, which in turn automatically makes it true for n = 4 and so on. This argument doesn’t show
the formula is true for any specific n, but it does show that if we can check it for n = 1 then all the other
values of n are automatically true without needing to be checked separately. Well, we did check it for n = 1,
so it must be true for all n. That’s proof by induction: we check just the first case and link all the others
cases to it logically. Here is a finished version of the proof we just described.
Proof (of first formula). We prove the formula by induction. For n = 1 we have
1
X 1(1 + 1)
k = 1 and = 1,
1
2

verifying the case of n = 1. Suppose the formula holds for n and consider n + 1. Then we have
n+1 n
!
X X
k= k +n+1
1 1
n(n + 1)
= +n+1
2
n(n + 1) + 2(n + 1)
=
2
(n + 1)((n + 1) + 1)
= ,
2
verifying the formula for n + 1. By induction this proves the formula for all positive integers n.
You will prove the other two formulas from the proposition as a homework problem. Here is another
example.
Proposition 1.12. For every real number a 6= 1 and positive integer n,
n
X an+1 − 1
ak = .
a−1
k=0

Proof. To prove the formula by induction, first consider the case of n = 1:


1
X (a + 1)(a − 1) a2 − 1
ak = a0 + a1 = 1 + a = = .
a−1 a−1
k=0

Next, suppose the formula is true for n and consider n + 1. We have


n+1 n
!
X X
k k
a = a + an+1
k=0 k=0
n+1
a −1
= + an+1
a−1
an+1 − 1 + an+1 (a − 1)
=
a−1
a(n+1)+1 − 1
= .
a−1
Thus the formula for n + 1 follows. By induction this completes the proof.

19
Problem 1.23 Prove that for every positive integer n
n n
X n(n + 1)(2n + 1) X n2 (n + 1)2
k2 = and k3 = .
6 4
k=1 k=1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ind1

Problem 1.24 Prove that for every positive integer n, n3 + 5n is divisible by 3. (One way to do this
is by induction. For the induction step, assume n3 + 5n is divisible by 3, which means n3 + 5n = 3k for
some integer k. To show (n + 1)3 + 5(n + 1) is divisible by 3 you need to show that it is possible to write
(n + 1)3 + 5(n + 1) = 3(· · · ) where the (· · · ) is some other integer. Aviod using division; i.e. don’t write
n3 +5n
3 = · · · . You are proving a property of the integers and so should try to give a proof that does not
depend on operations like division that require a larger number system than the integers.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . div3

Problem 1.25 Prove that for every positive integer n, n3 + (n + 1)3 + (n + 2)3 is divisible by 9.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . div9

d
Problem 1.26 Use the fact that dx x = 1 and the usual product rule to prove (by induction) the usual
d n n−1
power rule: dx x = nx , for all n ≥ 1.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . pr

F.2 Properties of Factorial


Definition. If n ≥ 0 is an integer, we define n! (called n factorial ) by

0! = 1, and n! = n · (n − 1) · · · 3 · 2 · 1 for n ≥ 1.

You may wonder why 0! is defined to be 1. The reason is that many formulas involving factorials work for
n = 0 only if we take 0! = 1. One example is the formula (n + 1)! = (n + 1) · n! of the next problem.

Problem 1.27 The formula (n + 1)! = (n + 1) · n! for n ≥ 1 is pretty obvious, but write a proof by
induction for it. Explain why the convention 0! = 1 is necessary if we want this formula to hold for n = 0.
Is there a way to define (−1)! so that the formula holds for n = −1?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . facrecur

We want to look at some bounds on n!. An upper bound is pretty easy.

n! = n · (n − 1) · · · 3 · 2 · 1 ≤ n · n · · · n · n · 1 ≤ nn−1 .

Here is a lower bound.

Proposition 1.13. For every positive integer n,

nn e1−n ≤ n!.

The proof of this will be another example of an induction argument. But before we start the proof we
need to recall some facts about the exponential function. First,

ex ≥ 1 for all x ≥ 0.

20
d x
Secondly, dx e = ex , so integrating both sides of the above inequality we obtain
Z x Z x
ex − 1 = et dt ≥ 1 dt = x.
0 0

Therefore, for all x ≥ 0 we have


ex ≥ x + 1.
(Notice that x + 1 is the tangent line to ex at x = 0, so the above inequality simply says that the exponential
function is never below that tangent line.) Now we can prove the proposition.
Proof. We prove the inequality by induction. First consider n = 1.

n! = 1 and nn e1−n = 1e0 = 1,

so the inequality holds for n = 1. Next, suppose the inequality holds for n and consider n + 1.

(n + 1)! = (n + 1) · n! ≥ (n + 1)nn e1−n .


n
(n+1)
Since (using the inequalities above) e1/n ≥ 1 + n1 = n+1
n , we know that e ≥ nn . Using this we deduce
that
(n + 1)n −n
(n + 1)! ≥ (n + 1)nn e = (n + 1)n+1 e1−(n+1) ,
nn
which proves the inequality for the case of n + 1, and completes the induction proof.
You may be wondering where all the inequalities in this proof came from. To read the proof you just
need check that they they are all correct and lead to the desired conclusion. But to write the proof you have
to think of them — how did we do that? The answer is that there was some scratch work that we did first,
but did not record as part of the proof. To help you learn how to do this on your own, here is our scratch
work. The hard part is to find a way to get from the induction hypothesis n! ≥ nn e1−n to the inequality
for n + 1: (n + 1)! ≥ (n + 1)n+1 e1−(n+1) . Using (n + 1)! = (n + 1)n! and the induction hypothesis gets us
as far as (n + 1)! ≥ (n + 1)nn e1−n , as in the fourth line of the proof. So what we hope to do is show that
(n + 1)nn e1−n ≥ (n + 1)n+1 e1−(n+1) . If we simplify this, it reduces to
 n
1 n+1
e ≥ . (1.8)
n

So that is the additional the fact we need to complete our proof. The inequality ex ≥ 1 + x that we recalled
n
before the proof was exactly what we would need justify e1 ≥ n+1 n . Thus we had an idea of how the
proof should work (combine (n + 1)! = (n + 1)n! and the induction hypothesis) and asked ourselves what
other fact we would need to be able to reach the desired conclusion. Equation (1.8) was what we decided
we would need, and so the proof included inequalities leading to (1.8). This sort of strategizing is a
natural part of the process of writing a proof. Try to formulate an overall plan for your proof, how
it will be organized, what the main steps or stages will be. And then focus on those parts individually and
see if you can find a valid way to do each of them.

Additional Problems

Problem 1.28 Observe that the successive squares differ by the successive odd integers. This can be
expressed succinctly as
Xn
n2 = (2i − 1).
i=1

21
Pn
Show that this is a consequence of the formula for 1 k above. Now observe that there is a somewhat
different pattern for the cubes:
13 = 1
23 = 3 + 5
33 = 7 + 9 + 11
43 = 13 + 15 + 17 + 19
..
.
Explain why this pattern is equivalent to the formula
n(n+1)
n
X 2X
k3 = (2i − 1),
1 i=1
Pn Pn
and use the formulas that we gave previously for 1 k 3 and 1 k to verify this new formula.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sqodd

Problem 1.29 Observe that


1+2=3
4+5+6=7+8
9 + 10 + 11 + 12 = 13 + 14 + 15
16 + 17 + 18 + 19 + 20 = 21 + 22 + 23 + 24.
Find a formula that expresses the pattern we see here, and prove it. You may use the summation formulas
above if you would like.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SumoSum

Problem 1.30 The Gamma function, Γ(x), is defined for x > 0 by


Z ∞
Γ(x) = e−t tx−1 dt.
0

a) Verify that Γ(1) = Γ(2) = 1.


b) Verify that Γ(x) = (x − 1)Γ(x − 1) for all x > 1. (Hint: Integrate by parts.)
c) Explain why n! = Γ(n + 1) for all integers n ≥ 0.
This is another example of a formula involving n! for which 0! = 1 is the only way to define n! for n = 0 that
is consistent with the formula.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . gamma

Problem 1.31 You are probably familiar with Pascal’s Triangle:


1
1 1
1 2 1
1 1 3 3
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
..
.

22
Except for the 1s on the “sides” of the triangle, each number is obtained by adding the numbers to its left
and right in the row above. The significance of these numbers is that each row gives the coefficients of the
corresponding power of (x + y).

(x + y)1 = 1x + 1y
(x + y)2 = x2 + 2xy + y 2
(x + y)3 = x3 + 3x2 y + 3xy 2 + y 3
.. (1.9)
.
(x + y)6 = x6 + 6x5 y + 15x4 y 2 + 20x3 y 3 + 15x2 y 4 + 6xy 5 + y 6
..
.

The numbers in Pascal’s Triangle are the binomial coefficients defined by


 
n n!
= ,
k k!(n − k)!

pronounced “n choose k”. The nth row in Pascal’s Triangle,

1 n · · · n 1,
n

are the values of k for k = 0, . . . , n:
     
n n n n
··· .
0 1 n−1 n

For instance          
4 4 4 4 4
= 1, = 4, = 6, = 4, = 1.
0 1 2 3 4
The first (and last) entry of each row is always 1 because
 
n n!
= = 1.
0 0! n!

(Notice that this would be incorrect without 0! = 1.)


The fact that each entry in Pascal’s Triangle is obtained by adding the two entries in the row above it is
the following formula involving binomial coefficients: for integers k and n with 1 ≤ k ≤ n,
     
n n n+1
+ = .
k−1 k k

Prove the preceeding formula using the definition of binomial coefficients. (No induction is needed; it just
boils down to manipulating formulas.)
The connection of Pascal’s Triangle with (x+y)n and the binomial coefficients is expressed as this famous
theorem. (You are not being asked to prove it; it’s just here for your information.)
Theorem 1.14 (The Binomial Theorem). For every positive integer n
n  
X n
(x + y)n = xn−k y k .
k
k=0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BinomT

23
Chapter 2

Mathematical Language and Some


Basic Proof Structures

Now that we have looked at several examples, we want to consider the language and basic logical ideas used
in proofs. A good example of the kind of complicated statement we need to work with is the definition of the
limit of a function (from calculus). Suppose f (x) is a function and a and L are two numbers. The statement
limx→a f (x) = L might be true or false, depending on the specifics. The precise definition of this statement
is the following1 .
For every  > 0 there exists a δ > 0 so that |f (x) − L| <  holds for all 0 < |x − a| < δ.

In order to write proofs about limits we would need to work with the above definition in a precise way.
Simply understanding what this definition says takes some care — what do all the phrases like “for every”
and “there exists” mean? If you think you understand the statement above, then do you think you can write
down exactly what it means for this statement to be false2 ? This is an example of the kind of thing we will
talk about in this chapter.
You will have noticed that proofs are typically written in a somewhat formal literary style. They are full
of “therefore,” “it follows that,” and other words or phrases that you probably don’t use much in everyday
speech. In part this is a stylistic tradition, but it also serves an important purpose: the language is designed
to tell the reader how they are supposed to logically connect the facts and deductions that make up the
proof. Section A starts by summarizing different kinds of logical statements and how we work with them.
The appendix on page 136 will provide a sort of brief glossary of some of the English words and phrases
that are common in this kind of writing. Section B will discuss the use of variables in logical statements.
Section C will describe some basic types of proofs. We have already encountered many of these in Chapter 1.
Be forewarned that you cannot learn to write proofs by learning a couple basic patterns and then following
them over and over. There is no algorithm for writing a proof. Writing a proof is really distilling the result
of a completed thinking process into a written form, so that it can be understood by someone else. You need
first to think carefully to formulate your reasoning, and after that write it out clearly. Some general advice
is collected in Section D.

A Basic Logical Propositions


Written mathematical discussion is made up of several different kinds of statements. The most basic are
assertions of fact, statements that certain things are true (or false). Here are some examples.

• 2 is an irrational number.
• sin(x) is a differentiable function.
1 You will study this more in advanced calculus.
2 See Example 2.9 below.

24
• 49 is a prime number.
You might object to the third of these because it is false: 49 can be factored as 7 · 7. It is important to make
a distinction between statements which are clear in their mathematical meaning (but false), and statements
whose meaning itself is not clear (so that we can’t even tell if they are true or false). Here are some examples
of statements with no clear mathematical meaning (even though they refer to mathematical topics).
• π is an interesting number.
• Algebra is more fun than differential equations.
• The proof of the Four Color Problem3 by computer is not a proper proof.
While you may have strong opinions about statements such as these, they are not statements that have a pre-
cise mathematical meaning. Such statements of opinion and preference are fine in talking about mathematics
and editorials in scientific publications, but they don’t belong in an actual proof.
A precise mathematical statement, with a definite true or false “value,” is what we call a proposition, with
a small “p”. The first three bullets above are examples of propositions. The second three bullets are not.
(“Proposition” with a capital “P” refers to a particular named statement4 , like Proposition 1.3 in Chapter 1.
In this usage it is a proper name, and so is capitalized.) To say something is a proposition, does not mean
that we know whether it is true or false, only that its mathematical meaning is crystal clear.
Propositions
√ which are false do have legitimate roles in some proofs. For instance in our proof
that 2 is irrational, we considered the consequences of the following proposition.

2 is rational.
Based on that we deduced that the following proposition would also be true.
There exist integers n and m for which 2n2 = m2 .
These propositions
√ are both false, but we did state them (in a sort of hypothetical way) as part of our proof.
We found that “ 2 is rational” could not
√ be true because its logical consequences were impossible. In brief
the proof consisted of showing that “ 2 is rational” is a false proposition. For purposes of making this
argument we needed to be able to state this (false) proposition in order to examine it.
We also state propositions which are not known to be true when we make a conjecture. A conjecture is
a proposition that we hope or expect to be true, but by calling it a conjecture we are acknowledging that
we don’t really know for sure (yet). See for instance the Twin Primes Conjecture on page 51 below.

A.1 Compounding Propositions: Not, And, Or


Starting with one or more propositions we can form new compound propositions which express relationships
between the original component propositions. The simplest of these is the logical negation of a proposition.
If S is a proposition its negation is “not S.” The truth-value (true or false) of “not S” is exactly opposite
that of S. For instance, if S is the proposition “49 is prime” then its negation can be expressed several ways:
• not (49 is prime)
• 49 is not prime
• 49 is composite
These all mean the same thing, but the last two are less awkward. The original proposition was false so the
negation is true.
When we have two assertions of fact to make, we can combine them in one compound proposition joined
by “and.” Consider this example.
3 TheFour Color Problem is described on page 51.
4 Whether a named result is a Theorem, Lemma, or Proposition is really a matter of the author’s preference. Usually the
main results of a book or paper are Theorems. Lemmas are results which are tools for proving the theorems. The significance of
a Proposition is more ambiguous. For some a Proposition is a result of some interest of its own, not as important as a Theorem,
but not just a stepping stone to something bigger either. In this book, Proposition is used for things that are interesting
examples of things for us to prove, but not as important as Theorems.

25

2 is irrational and 47 is prime.
Such an “and”-joined compound proposition is true when both of the component propositions are true
individually; otherwise the compound proposition is false. The above example is a true proposition. The
following one is false, even though one piece of it would be true in isolation.

49 is prime and 47 is prime.


We can form a different compound proposition by joining component propositions with “or.” Consider
for instance
49 is prime or 47 is prime.

Such an “or”-joined compound proposition is true when at least one of the component propositions is true.
So the above example is a true proposition, as is the following.

2 is irrational or 47 is prime.
Here both component propositions are true5 .
The English language usually gives us several alternate ways to express the same logical proposition. For
instance, “both . . . and . . . ” expresses the same thing as “. . . and . . . ” Including the word “both” doesn’t
change the meaning, it just adds emphasis. Whether to use it or not is simply a stylistic choice. Instead of
“. . . or . . . ” we might say “either . . . or . . . .”

Problem 2.1 Determine whether the following propositions are true or false.

a) 127 is prime and 128 is prime.


√ √
b) Either 27 is rational or 28 is rational.
R5 R5
c) 0 (1 − x)5 dx ≥ −1 or 0 (1 − x)5 dx ≤ 1.
R5 R5
d) 0 (1 + x)5 dx ≥ 0 and 0 (1 + x)5 dx ≤ 10000.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sprop

The true/false value of a compound proposition such as

· · · and |{z}
|{z} ···
S1 S2

is determined by the true/false values of the two component propositions “. . . ”, which we will label as S1
and S2 to refer to them more easily. For instance a compound proposition of the form
( true proposition ) and ( true proposition )

is true. In other words, “S1 and S2” is true when both S1 and S2 are true individually. There are four
possible combinations of true/false values for the pair S1, S2. We can summarize the meaning of “S1 and
S2” by listing all the possible combinations in a truth table.
S1, S2 T,T T,F F,T F,F
S1 and S2 T F F F

The top row lists the four possible true/false combinations of S1, S2: “T,T” indicates the case in which S1 is
true and S2 is true; “T,F” indicates the case in which S1=true and S2=false; and so forth. The second row
indicates the true/false value of the compound proposition “S1 and S2” in each case. The table is a concise
way to describe the meaning of the logical connective “and.”
5 Don’t confuse this usage of “or” with what is called the exclusive or. That is what we would mean by a proposition like,

“Either . . . or . . . but not both.”

26
Problem 2.2 Determine whether each of the following propositions is true or false and explain why.
(Assume that n refers to an unspecified positive integer, and x and α refer to unspecified real numbers. To
say one of these statements is true means that it is true regardless of the specific such value assigned to n,
x, or α.)
a) Either n = 2 or (n is prime and n is odd).
R∞
b) 1 xα dx < ∞ if and only if α < 0.
c) A real number x is rational if and only if both (x + 1)2 and (x − 1)2 are rational.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . compound

A.2 Implications
The compound proposition “S1 implies S2” is called an implication. Here is its truth table.
S1, S2 T,T T,F F,T F,F
S1 implies S2 T F T T
We call S1 the antecedent of the implication and S2 the consequent. We should emphasize that the truth
of “S1 implies S2” does not mean that S1 is necessarily true. What it does mean is that the truth
of S1 and S2 are linked in such a way that truthfulness of S1 guarantees truthfulness of S2. You might say
“S1 implies S2” means that there is a logical pipeline6 through which truthfulness will flow; truthfulness of
S1 will automatically flow through the implication-pipeline and become truthfulness of S2. The implication
is the pipeline itself, even if the pipeline is empty of any truthfulness. (Truthfulness is not guaranteed to
flow in the other direction, from S2 to S1. It’s only a one-way pipeline)
If√
it seems strange to consider “S1 implies S2” to be true when S1 is false, consider once again our proof
that 2 is irrational. We can view the proof as establishing the proposition,
√ n
2= m implies that both n and m are even.
√ n
This is a true implication. But the truth of this implication does not mean that 2 = m is true. What we
actually did in the proof was show that the above implication is true in a situation in which its consequent,
“both n and m are even,” is false. Inspecting the truth √ table we see that that can only happen when the
n
antecedent is false. That is why we can conclude that “ 2 = m ” must be false. The point is that a true
implication with a false antecedent was important for our argument!
Implications are particularly important in mathematics. One equivalent wording is “If S1 then S2.” In
that form we have seen several examples in Chapter 1. For instance Proposition 1.3 is of this form, using

S1 = “s is a nonnegative real number” and S2 = “ s exists.”
(Lemma 1.4 is another example.) Notice in this example that both S1 and S2 are statements involving a
variable s. In asserting
√ that S1 implies S2 we don’t have just one specific value of s in mind. We are saying
more than just that 2 exists, for instance — that would be the meaning of the implication if we understood
s to refer only to the specific value 2. Rather we mean s to refer to an arbitrary but unspecified nonnegative
value, so that the proposition that S1 implies S2 is intended to be an assertion about all possible s ≥ 0
simultaneously.
√ We might word it a little more explicitly as “whenever s is a real number with s ≥ 0 then
s does exist.” Virtually all implications of any significance in mathematics involve variables. We will
look at the use of variables more carefully in Section B. For the time being we can take the view that an
implication “S1(x) implies S2(x)” involving a variable x means that knowing the truth of S1(x) is enough
for us to be confident that S2(x) is also true, without needing to know the exact value of x.
Lemma 1.4 is another example of an implication. The √ antecedent (S1) is “x and y are nonnegative

real numbers with x ≤ y” and the consequent (S2) is “ x ≤ y.” Now look back at how we proved the
implication of that lemma. You should see that it followed this pattern:
6 It is common to use the symbolic shorthand “S1⇒S2” for “S1 implies S2,” which is very suggestive of this pipeline

interpretation. There are symbolic shorthands for all the basic compound propositions we are discussing. I do not encourage
you to use them for the work you turn in, and so I am not even mentioning the others. If you want to use them for your own
notes to yourself, that is up to you.

27
√ √
Suppose 0 ≤ x ≤ y. · · · (argument using x ≤ y) · · · Therefore x≤ y.
We assumed S1 and only gave an argument for S2 under the presumption that S1 is true. If we look at
the truth table above we can see why; if S1 is false then the implication is true regardless of S2 — there is
nothing to prove in that case! In general, to prove “S1 implies S2” you assume S1 and then use
that to prove S2. Such a proof will typically take the form

Suppose S1 ... Therefore S2.


When we say “Suppose S1” in such a proof we are not making the assertion that S1 is in fact true; we are
saying that for the sake of establishing “S1 implies S2” we are going to consider the case in which S1 is
presumed to be true, so that we can write out the reasons which then lead to S2. The other case is that
S1 is false, but there is nothing to prove in that case since “S1 implies S2” is always true when S1 is false.
Thus we are focusing our attention on the only set of circumstances in which the implication might be false,
namely the case in which S1 is true, and explaining why S2 is then necessarily true. You might think of it as
testing the logical pipeline by artificially applying truth to the input (antecedent) end to see if truth comes
out the output (consequent) end.
There are many alternate English phrases which can be used to express “S1 implies S2”:

• If S1 then S2.
• S2 if S1.
• S1 only if S2.

• S2 is necessary for S1.


• For S1 it is necessary that S2.
• S1 is sufficient for S2.
• S2 whenever S1.

Remember that the “if” or “is sufficient” go with the antecedent S1. The “only if” or “is necessary” go with
the consequent S2.

Converse and Contrapositive


The converse of “S1 implies S2” is the implication “S2 implies S1.” An implication and its converse are
not interchangeable. The truth table below lists the implication and its converse in the second and third
lines; we see that they have different truth values in some cases. The contrapositive of “S1 implies S2” is the
implication “(not S2) implies (not S1).” The last line of the truth table below records its true/false values.
(The next-to-last line is included just as an intermediate step in working out the contrapositive.) We see
that the contrapositive has the same true/false value as the original implication in every case. This shows
that an implication and its contrapositive are logically equivalent. To prove one is the same as proving the
other.
S1, S2 T,T T,F F,T F,F
original: S1 implies S2 T F T T
converse: S2 implies S1 T T F T
not S2, not S1 F,F T,F F,T T,T
contrapositive: (not S2) implies (not S1) T F T T
You might also observe that when the original implication is false then its converse is true. But this
connection does not hold when the implication involves variables; see Section B below. Since virtually all
useful implications do involve variables, this is an unhelpful observation!

28
Equivalence
A two-way logical pipeline is what we mean by saying “S1 is equivalent to S2;” the truth of either guarantees
the truth of the other. In other words the equivalence means S1 and S2 have the same truth value; they are
either both true or both false. Some other ways to express equivalence are
• S1 if and only if S2 (sometimes written “S1 iff S2”);
• (S1 implies S2) and (S2 implies S1).
• S1 implies S2, and conversely.
• S1 is necessary and sufficient for S2.
The second of these is the most common way to approach the proof of an equivalence. Here is an example.
Observe that the proof consists of proving two implications, each the converse of the other.
Proposition 2.1. Suppose x and c are real numbers.
|x| < c if and only if (−c < x and x < c).
Proof. To prove the “if” implication, suppose that −c < x and x < c. We consider two cases.
Case 1: x ≥ 0. In this case |x| = x and, since x < c, it follows that |x| < c.
Case 2: x < 0. In this case |x| = −x. Since −c < x it follows that −x < c and therefore |x| < c.

To prove the “only if” implication, suppose |x| < c. By Lemma 1.1, x ≤ |x|. Putting these inequlaities
together we conclude that x < c. By Problem 1.1 part a), we know | − x| = |x|. Using Lemma 1.1 again, we
find that −x ≤ | − x| = |x| < c, so that −x < c. This implies that −c < x. Thus we have shown that both
−c < x and x < c are true.
The definition of a term establishes an equivalence between a word or phrase and its mathematical
meaning. Look back at the definition we gave of rational numbers on page 16. The wording of our definition
was that if r = m/n for some integers m, n then we say r is rational. But what we really meant was
r is called rational if and only if r = m/n for some integers m, n.

A literal reading of the definition as we stated in on page 16, using “if,” would allow 2 to also be called a
rational number. It would not say that a rational number has to be expressible as m/n. We actually meant
the definition to establish an equivalence, not just an implication. Unfortunately this inconsistent usage is
quite common. You simply need to remember that when reading a definition the author’s “if ” might
(and probably does) really mean “if and only if.”

Problem 2.3 We said that “(P implies Q) and (Q implies P)” has the same meaning as “P iff Q.” Verify
that with a truth table, like we did on page 34. What about “(P or Q) implies (P and Q);” does that also
have the same meaning? What about “(P or not Q) implies (Q or not P)”?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L1

A.3 Negations of Or and And


In the introduction to this chapter we talked about the importance of false propositions in proofs. If we are
writing a proof by contradiction for instance, we need to be able to state the negation of what we want to
prove, in order to see that it leads to an impossible situation.
Sometimes our vocabulary allows us to conveniently state the negation of a proposition. For instance,
consider “459811 is prime.” The negation of this could be expressed as “459811 is not prime,” but we have
a word meaning not prime: “composite.” So the negation of our proposition can be stated concisely as
“459811 is composite.”

29
For compound propositions built up from multiple uses of the above connectives we need to know how to
handle the connectives when we form the negation. A basic rule is that “not” interchanges “and” with
“or” when it is distributed over them7 .
“Not (S1 and S2)” is equivalent to “(not S1) or (not S2)”.

For example the negation of our (false) proposition “49 is prime and 47 is prime” is the (true) proposition
either 49 is not prime or 47 is not prime.
You will work out the negation of an implication in Problem 2.5.

Problem 2.4 Verify using truth tables that each of the following pairs of propositions are equivalent.

a) “Not (S1 and S2)” and “(not S1) or (not S2).”


b) “Not (S1 or S2)” and “(not S1) and (not S2).”
c) “S1 or S2” and “(not S1) implies S2.”

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EqCon

Problem 2.5 Find a proposition which is equivalent to “not (S1 implies S2)” which is not formulated as
an implication.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . negimp

Problem 2.6 Suppose ` is a positive integer, and γ is a real number. For each of the following, formulate
a wording of its negation.
a) Both ` and ` + 2 are prime numbers.

b) Either γ is positive or −γ is positive.


c) Either γ ≤ 0 or both γ and γ 3 are positive.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . negation

B Variables and Quantifiers


Theorem 1.10 is unusual in that it is a proposition about a single specific quantity. All the other propositions
we proved in Chapter 1 refered to many different situations at once: Lemma 1.1 makes an assertion about
every real number, the Triangle Inequality is a proposition about all pairs of real numbers, the Pythagorean
Theorem refers to all right triangles, Proposition 1.13 asserts certain inequalities or equations for all positive
integers, and Proposition 1.3 says that for every s ≥ 0 there exists an r ≥ 0 for which r2 = s is true. All of
these are propositions that involve variables which can take a range of different specific values. Statements
involving variables are sometimes called open statements. In general whether they are true or false will
depend on the actual values of the variables. For example consider the statement “n is a prime number.”
We can refer to this statement as P(n) to emphasize that it is a different statement for each value of the
variable n: P(1) is false; P(2) is true, P(3) is true, P(4) is false . . . . The true/false value of P(n) depends on
the value of the variable n. In this section we will look more carefully at the handling of propositions with
variables.
7 We have used parentheses to clarify. English can sometimes be ambiguous. For instance “Not [(I am a thief) or (I am a

liar)]” is different than “(Not I am a thief) or (I am a liar)” although the word order is the same. The first means “I am neither
a thief nor a liar.” The second means “Either I am not a thief or I am a liar.”

30
B.1 The Scope of Variables
When faced with a proposition involving variables we need to understand what values the variables are
allowed to take. For instance in our statement P(n)√above what values of n are we considering? Should
n = 0 or n = −13 be considered? What about n = 1/ 2? We will call this the scope of the variable.
A proposition always occurs within a certain context, meaning assumptions and hypotheses within which
the proposition is intended to be understood. The scope of a variable is part of the context. Sometimes we
simply infer the scope of a variable from the context. For example in a book on number theory (the study
of integers) if we encountered a variable α with no explicit statement of scope we would presume the the
scope of α is the integers. But in a book about complex numbers we would presume the scope to be all
complex numbers. In Section A of Chapter 1 “number” referred to real number, and we said so consistently.
However in Chapter 4 below we will be discussing the integers exclusively, so whenever we say “number”
there it refers to numbers which are integers, even though we may not say so for every instance.
Sometimes the propositions being considered don’t even make sense unless the variables are limited to
a certain scope. For instance the notion of prime number makes no sense for nonintegers. So when we talk
about n being a prime number, as in Example 2.1, we presume that only integer values are intended for n.
Even the choice of notation can indicate the intended scope of a variable. For example the letters
i, j, k, `, m, n are most often used to refer to integer-valued variables. Although this is not universal, when
we encounter variables with these names it is often a clue that the author intends them to be integer-valued.
Often the scope of variables is explicitly stated in hypotheses that immediately precede the proposition
itself. The proposition is only meant to be asserted within the limitations imposed by those hypotheses. In
Problem 2.2 for instance the hypotheses that x and y are real and n is a positive integer are stated at the
beginning, and are intended as the context for all the individual parts a)–c).

B.2 Quantifiers
To make a valid proposition out of an open statement we must include quantifiers which explain how to
combine the various statements which result from the different possible values of the variables (within their
scope). The next example illustrates what we mean by a quantifier.
Example 2.1. Again let P(n) stand for the statement, “n is a prime number.” P(n) is true for some values
of n (like n = 2, 3, 5, 7), but is false for others (like n = 4, 6, 8, 9). So P(n) by itself, without any information
about what value of n to consider, is ambiguous. But the statement
P(n) for all positive integers n
is a valid proposition, a logically clear statement (but one which is false). The “positive integers n” is
specifying scope, i.e. what values of n we want to consider, the quantifier “for all” that tells us how to
logically combine all the different statements P(n). The meaning of the “for all” proposition is to make an
infinite number of statements all at once, as if we joined them all with “and”:
P(1) and P(2) and P(3) and . . .
As always, there are many alternate ways to express the same proposition.
• For every positive integer n, P(n) holds.
• P(n) holds for every positive integer n.
• P(n) is true whenever n is a positive integer.
• If n is a positive integer, then P(n) holds.
Notice that the last two of these are worded as implications (the next-to-last using the “whenever” phrasing
of an implication). When working with the negation or converse of an implication involving variables it can
be important to understand the implication as a formulation of a proposition involving a quantifier. We will
discuss that more below; see page 33.
Example 2.2. Continuing with Example 2.1, we can combine the statements P(n) in a different way by
saying,

31
for some positive integer n, n is a prime number.
The quantifier here is the “for some.” This means the same thing as combining all the statements P(n) with
“or”:
P(1) or P(2) or P(3) or . . .
This is a true proposition. Its meaning is simply that there is at least one positive integer which is prime.
Some alternate expressions of the “for some” quantifier are
• There exists a n so that . . .
• There exists a n such that . . .
• There exists a n for which . . .
• . . . for some n.
Propositions involving two or more variables are more complicated, because different quantifiers may be
applied to the different variables. Consider the following statement, which we will denote by R(r, s).
r ≥ 0 and r2 = s.
This is an open statement involving two (real) variables. For some √ r, s combinations R(r, s) is true while for
others it is false. For R(r, s) to be true is the definition of r = s. Now consider the following proposition
formed using two different quantifiers for r and s.
For every s ≥ 0 there exists an r so that R(r, s) holds.

This is just our Proposition 1.3 asserting that s exists for all s ≥ 0. It is important to understand that
the order of the quantifiers matters — we get a false proposition if we reverse their order:
there exists an r so that for every s ≥ 0 the statement R(r, s) holds.

This proposition claims that there is a “universal square root,” a single value r which is r = s for all s ≥ 0
simultaneously — certainly a false claim.
Example 2.3.
• The proposition “there exists an integer n so that for all integers m we have m < n” is false. It claims
there is one integer that is strictly larger than all integers (and so strictly larger than itself).
• The proposition “for all integers m there exists an integer n for which m < n” is true. It says that
given an integer m you can always find another integer n which is larger (n = m + 1 for instance).

Problem 2.7 Assume that x and y are real numbers and n is a positive integer. Determine whether each
of the following is true or false, and explain your conclusion.
a) For all x there exists a y with x + y = 0.
b) There exists y so that for all x, x + y = 0.
c) For all x there exists y for which xy = 0.
d) There exists y so that for all x, xy = 0.
e) For all x there exists y such that xy = 1.
f) There exists y so that for all x, xy = 1.
g) For all n, n is even or n is odd.
h) Either n is even for all n, or n is odd for all n.
(From [9].)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . tf

32
B.3 Subtleties
Stay within the scope
When we form a contrapositive, converse, or negation of a proposition we do not change the context
within which the implication is made. We form the new propositions within the same context
as the original. The scope of variables is part of the context. The intended scope of a variable in a
contrapositive, converse, or negation remains the same as for for the original.
So for instance in Problem 2.7 if we were writing the negation of part h) we would not change the
hypothesis that n is limited to positive integers. The statement of the negation would still be understood
subject to the hypothesis that n is a positive integer.
Example 2.4. Suppose the variable x is allowed to take real values. Consider the proposition “x2 ≥ 0 for all
x.” Its negation is the following:
x2 < 0 for some x.
We read this within the same context: the variable x is still understood to be limited to real values. With
this understanding the original proposition is true. The negation would not consider non-real values for x
because the scope of x is still that specified by the context.
Example 2.5. Suppose n is an integer. Consider the following proposition.
If n is even then n2 is even.
According to the hypothesis the scope of n is limited to integer values. However we could rephrase this by
bundling the hypothesis into the antecedent of the implication to get the following, which we will consider
within the context of all real values for x:
If x is an even integer then x2 is an even integer.
For the first version, the contrapositive would be
If n2 is odd then n is odd,
still understood within the context of n being a positive integer. For the second version the contrapositive
would be
If x2 is not an even integer even, then either x is an odd integer or x is not an integer,
considered with the scope of x being all real numbers. This statement of the contrapositive is rather different
than before because the context is different. There is no presumption that x is an integer, so that “not even”
could include fractions and irrational numbers. In both cases the contrapositive is equivalent to the original,
but the contexts are different in the two cases and that affects the statements of the contrapositive.

Subtleties of the “For All” Quantifier


Some propositions with variables are lacking the words “for all” even though they are part of the intended
meaning. So you sometimes need to read between the lines to recognize the presence of a “for all” quantifier.
Consider for instance the statement of Problem 1.10.
Suppose that a and b are positive real numbers. Prove that
2ab p
≤ · · · ≤ (a2 + b2 )/2.
a+b
There are variables a and b here, but you don’t see the words “for all” in the statement of the proposition,
even though the intent is to claim that the inequalities of the problem are true for all possible choices of a
and b. Thus the intent was to say
2ab p
≤ · · · ≤ (a2 + b2 )/2 for all positive a and b.
a+b

33
We encountered other propositions like this in Proposition 2.1, and in Examples 2.2 and 2.6. In all of these
the intent was to make assertions that applied to all possible values of the variables. We would say there
is an unwritten but implicit “for all” quantifier. In general, when you encounter a proposition with a
variable having no quantifier, the intended meaning is most likely that the statement is meant
for all values of the variables within their scope. On the other hand, the quantifier “there exists” or
“for some” is virtually always stated explicitly.
A “for all” quantifier is sometimes phrased as an implication; we indicated this as the last of the alternate
phrasings on page 31. If an implication has variables in the antecedent then a “for all” proposition is usually
intended. Consider Proposition 1.3 for example.

If s is a nonnegative real number, then s exists and is unique.
Although worded as an implication, clearly the intent was

For all nonnegative real numbers s, s exists and is unique.
The meaning of either phrasing is the√same: knowing that s ≥ 0 (regardless of the specific value of s) is all
that we need in order to be sure that s exists. A different proposition but with the same ultimate content
would be this:

For all real numbers s, if s ≥ 0 then s exists and is unique.
Here we are making a proposition in the form of an implication within the enlarged scope of all real values
for s. The correctness of this proposition is
√ depends on our understanding that “S1 implies S2” is true when
S1 is false. We understand “if s ≥ 0 then s exists” to be true for any real number s, in particular it is true
for negative values of s, in which case the antecedent (“if s ≥ 0”) is false.
Example 2.6. Here are some examples of implications involving a variable x (whose scope we take to be all
real numbers), along with their converses.
• The implication “If x > 3 then x2 > 9” is true, but its converse, “If x2 > 9 then x > 3,” is false. (Just
think about x = −4.) However its contrapositive, “If x2 ≤ 9 then x ≤ 3,” is true.
• The implication “For sin(x) = 0 it is sufficient that x be an integral multiple of π” is true. So is its
converse, which we can express as “For sin(x) = 0 it is necessary that x be an integral multiple of π.”
• “If x < 3 then x2 > 9” is false and so is its converse: “If x2 > 9 then x < 3.”
In the first example the converse “If x2 > 9 then x > 3” would be true for some specific values of x but it is
not true for all such values. That is why the converse is a false proposition.
The third of these examples illustrates that when variables are involved it is possible that both the
original and converse can be false, contrary to what we saw in the table from page 28 when no variables
were involved. If we replace x by a value for which the first implication is false (say x = 2) then the converse
would take the form of false implies true (e.g. “if 4 > 9 then 2 < 3”) which is true. But with the variable the
converse means that “x2 > 9 implies x < 3” is true for every value x, not just those that make the original
implication false but also those (like x = −4) which make it true. In general, for implications involving
variables (which are really “for all” propositions) you can’t say the converse is true or false based on whether
the original is true or false.

Problem 2.8 Determine whether each of the implications below is true or false. For each, write a statement
of its converse and its contrapositive and explain whether they are true or false. (Notice that there are
variables here: n refers to an arbitrary integer; x and a refer to arbitrary real numbers.)
a) An integer n is divisible by 6 whenever 4n is divisible by 6.
(
sin(x)
x for x 6= 0
b) For f (x) = to be a continuous function it is necessary that a = 0.
a for x = 0

c) If x2 is irrational then x is irrational.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . mprops

34
Vacuous Propositions
What about a “for all” proposition in which there are no possible values for the variable? For example
consider the following.
For all real numbers x with x2 < 0 the equation x = x + 1 holds.

The scope of x consists of all real values for which x2 < 0. But there are no such values of x. So the
above proposition is an empty statement; there are no values of x about which it has anything to say. Such a
proposition is considered true; we often say that it is vacuous or vacuously true. Reworded as an implication,
the statment becomes “if x2 < 0 then x = x + 1.” In that form it becomes another instance of an implication
“S1 implies S2” being true when S1 is false. While the above example may seem silly, vacuous propositions
do sometimes come up in legitimate proofs. For instance you might formulate a proof by cases and find
that one of the cases can never hold. The “if (description of vacuous case) then (conclusion)” would still be
considered a true implication as part of the proof.

B.4 Negating Quantified Propositions


For compound propositions, we need to know how to handle logical connectives when we form their negations.
On page 30 we observed that negating a proposition exchanges “and” with “or.” The negation of an
implication “S1 implies S2” can be expressed “S1 and not S2.” This can be confirmed with a truth table.
S1, S2 T,T T,F F,T F,F
S1 implies S2 T F T T
S1, not S2 T,F T,T F,F F,T
S1 and not S2 F T F F

Since the first and last lines have the opposite true/false value in all cases, we do have a correct formulation
of the negation.
We can think of a “for all” quantifier as producing a multiple “and” proposition. Since negating an
“and” proposition produces an “or” proposition, we expect the negation of a “for all” proposition to be a
“for some” proposition. Similarly, the negation of a “for some” proposition produces a “for all” proposition.
In brief, negation interchanges the two types of quantifiers.
“Not (for all x, P(x))” is equivalent to “for some x, not P(x).
“Not (for some x, P(x))” is equivalent to “for all x, not P(x).
Example 2.7. The negation of

there exists a real number y with y 2 = −1


is
y 2 6= −1 for all real numbers y.
The negation of

all prime numbers n are odd,


is
there exists a prime number n which is even.

It’s simple enough when there is just one quantifier. When there are two or more, form the negation by
working from the outside in, as illustrated in the following examples.
Example 2.8. Suppose x and y denote real numbers. The following proposition is false.
For every x there exists a y with x = y 2 .

35
Its negation is therefore true. We can develop a natural wording of the negation by first putting a “not”
in front of the whole thing, and then progressively moving the “not” from the outside in, producing the
following sequence of equivalent propositions.
Not (for every x there exists a y with x = y 2 ).
For some x, not (there exists a y with x = y 2 ).
For some x (for all y, not x = y 2 ).
There exists an x so that x 6= y 2 for all y.
To see that this negated proposition is true, just consider x = −1; in the context of the real numbers all y
have y 2 ≥ 0, so x < y 2 for all y.
Example 2.9. In the same manner we can now work out the negation of the definition of limx→a f (x) = L,
which we contemplated at the beginning of the chapter.
Not (for every  > 0 (there exists a δ > 0 so that (for all 0 < |x − a| < δ, (|f (x) − L| < )))).
For some  > 0 not (there exists a δ > 0 so that (for all 0 < |x − a| < δ, (|f (x) − L| < ))).
For some  > 0 (for all δ > 0 not (for all 0 < |x − a| < δ, (|f (x) − L| < ))).
For some  > 0 (for all δ > 0 (there exists an 0 < |x − a| < δ so that not (|f (x) − L| < ))).
For some  > 0 (for all δ > 0 (there exists an 0 < |x − a| < δ so that |f (x) − L| ≥ )).
There exists an  > 0 so that for every δ > 0 there is an x with 0 < |x − a| < δ and |f (x) − L| ≥ .
The next two example are to make the following point. When forming negations, it is important
to recognize quantifiers which are implicit or phrased with “if.”
Example 2.10. Consider the proposition
if n is prime then n is odd.
This is false, because of n = 2. So its negation should be true. But if we were careless and ignored the
implied “for all” quantifier (taking the negation of “S1 implies S2” to be “S1 and not S2”) we might write
the following for the negation
n is prime and n is even.
Seeing the variable n and references to “prime” and “odd” we would probably presume n was intended to
be a positive integer. But seeing no qualifier for n we might read an implicit “for all” into this and interpret
is as
all positive integers n are both prime and even.
That is a false proposition and not the correct negation of our original. If however we are careful to include
the implicit qualifier, our orginal proposition is
for all prime numbers n, n is odd.
Now we get the correct negation by properly handling the quantifier:
for some prime number n, n is even.
Example 2.11. Suppose x is a positive real number. Consider the proposition

ln(x) = 0.

Although there is a value of x > 0 for which this is true, it is not true for all positive x. Since we would
understand this with an implicit “for all x > 0” quantifier, we would properly understand the proposition
to be false. We want to form the negation, which should be true. But if we neglected the implicit quantifier
we might say the negation is simply
ln(x) 6= 0.
That would be the correct negation if there were only a single x under consideration. But with the implicit
quantifier in the original proposition is the correct negation is

36
There exists a positive x for which ln(x) 6= 0.
This is indeed true; just consider x = 2.

Problem 2.9 Consider the proposition


for all real numbers a, if f (a) = 0 then a > 0.
Write a version of the negation of this proposition.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . negpos

Problem 2.10 Suppose A is a set of real numbers. We say x is an upper bound of A when
For all a in A, a ≤ x.
We say α is the supremum of A when α is an upper bound of A and for every upper bound x of A we have
α ≤ x.
a) Write a statement of what it means to say x is not an upper bound of A.
b) Write a statement of what it means to say α is not the supremum of A.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . supremum

C Some Basic Types of Proofs


There is no general prescription for how to write a proof. However, for various standard types of propositions
there are some natural ways to arrange a proof. The chart below lists some of the most common. The
following subsections will elaborate and cite examples.
Statement Typical Methods
S (elementary proposition) • Direct: Assume hypotheses. · · · Therefore S.
• Contradiction
S1 and S2 • Prove S1. Prove S2. (Two separate proofs.)
S1 or S2 • Cases, each leading to one of the possible conclusions.
• Prove that (not S1) implies S2.
S1 implies S2 • Assume S1. · · · Therefore S2
S1 iff S2 • S1 implies S2. S2 implies S1. (Prove two implications.)
Multiple equivalences • A connecting cycle of implications.
For all x satisfying P(x), S(x) • Cases.
• Assume P(x). · · · Therefore S(x). (Treat as an implication.)
For all positive integers n, S(n) • Induction.
There exists x for which . . . • Either construct x or connect to some other existence result.
Uniqueness of x such that . . . • Assume x and x̃ both qualify. Show x = x̃.

C.1 Elementary Propositions and “And”


By a direct proof, we mean an argument which simply starts with the hypotheses and proceeds through a
sequence of logical deductions arriving in the end at the desired conclusion. Our proof of Proposition 1.5
falls into this category. Even though the proposition was worded as a “for all” proposition, we proved it
directly: we assumed the hypotheses (x and y are real numbers) and gave a sequence of deductions (taking
advantage of other facts about absolute value and square roots that were already known to us) which led to
the conclusion. There were no cases, no contradiction, no induction, no auspicious new objects, just a direct
line of reasoning leading from the hypotheses to the conclusion.
Proving a proposition “S1 and S2” is nothing other than proving S1 and then proving S2. It is not really
any different than preparing proofs of S1 and S2 separately and then writing them down one after the other.

37
C.2 “Or” Propositions
Things can be a little trickier when proving “S1 or S2.” One approach is to prove the equivalent proposition,
“not S1 implies S2;” in other words suppose that S1 is false and show that S2 must be true. Here is an
example of such a proof.
Example 2.12. Suppose a and b are real numbers. Either the system of two linear equations
x+y+z =a
x + 2y + 3z = b
has has no solution (x, y, z) or it has infinitely many solutions.
Proof. Suppose the system has a solution (x0 , y0 , z0 ). Consider
x = x0 + t, y = y0 − 2t, z = z0 + t.
Observe that for every real number t
x + y + z = (x0 + t) + (y0 − 2t) + (z0 + t) = x0 + y0 + z0 + (t − 2t + t) = x0 + y0 + z0 = a
x + 2y + 3z = (x0 + t) + 2(y0 − 2t) + 3(z0 + t) = x0 + 2y0 + 3z0 + (t − 4t + 3t) = x0 + 2y0 + 3z0 = b,
so that (x, y, z) is a solution. Thus there are infinitely many solutions, one for each t.
(If you are wondering where the +t, −2t, +t in this proof came from, remember your linear algebra: (1, −2, 1)
is a solution of the associated homogeneous linear system.)
Another approach is to consider an exhaustive set of cases in some of which you can conclude S1 while
in others you can conclude S2. We didn’t see any examples of this in Chapter 1, but here is one with a
four-way “or.”
Example 2.13. Suppose q(x) = ax2 + bx + c is a quadratic polynomial with real coefficients a, b, c. The
number of solutions to the equation q(x) = 0 is 0, 1, 2, or infinite.
Proof. We consider six cases.
Case 1: a = b = c = 0. In this case q(x) = 0 for all x, so there are infinitely many solutions.
Case 2: a = b = 0 and c 6= 0. In this case q(x) = c 6= 0 for all x, so there are no solutions.
Case 3: a = 0 and b 6= 0. In this case q(x) = bx + c. There is exactly one solution, x = −c/b.
In the remaining cases a 6= 0. Assuming that, we can rewrite q(x) by completing the square.
 
2 b c
q(x) = a x + x +
a a
b2 b2
 
2 b 4ac
=a x +2 x+ 2 − 2 + 2
2a 4a 4a 4a
" 2 #
2
b b − 4ac
=a x+ − (2.1)
2a 4a2
We now proceed to the remaining cases using this formulation.
2
−4ac
Case 4: a 6= 0 and b2 − 4ac < 0. Then b 4a2 < 0. For x to be a solution to q(x) = 0 would require
b 2 b2 −4ac
(x + 2a ) = 4a2 < 0, which is impossible. Thus there are no solutions in this case.
b 2
Case 5: a 6= 0 and b2 − 4ac = 0. Now (2.1) implies that q(x) = 0 is equivalent to (x + 2a ) = 0, for which
b
x = − 2a is the one and only solution.
b2 −4ac b 2 b2 −4ac
Case 6: a 6= 0 and b2 − 4ac > 0. In this case > 0, so q(x) = 0 is equivalent to (x +
4a2 2a ) = 4a2 ,
q
b 2 −4ac
for which there are exactly two solutions: x = − 2a ± b 4a 2 .
Since the cases exhaust all the possibilities, and in each case there were either 0, 1, 2, or infinitely many
solutions, we have proven the result.

38
C.3 Implications and “For all . . . ” Propositions
We explained above that to prove “S1 implies S2” we typically assume S1 and use that to deduce S2. Since
an implication is equivalent to its contrapositive, another way to prove “S1 implies S2” is to suppose that
S2 is false and show S1 is false.
We have observed that when variables are involved a “for all . . . ” proposition can often be expressed
as an implication. It is often proven as an implication as well. We essentially prove it directly as one
proposition about the quantities referred to by the variables, and for which what we know about them is not
their precise values, but just the properties given in the hypotheses. For instance consider the proof of the
Pythagorean Theorem in Chapter 1. The theorem is really making an assertion for all a, b, c which occur as
the sides of a right triangle. But the proof just gives one argument which applies simultaneously to all right
triangles, so it is worded as if there is a single (generic) right triangle under consideration. Proposition 1.5
is another example. Sometimes you need to break into separate cases for the proof. Lemma 1.1 and the
Triangle Inequality are simple examples of that kind of proof.
To disprove a “for all x, S(x)” proposition we need to show that there exists an x (within the intended
scope) for which S(x) is false. In other words you need to exhibit an x which is a counterexample. Even if
the “for all” is expressed as an implication, that is still what you need to do to disprove it.
Example 2.14. Consider the following proposition.
If A and B are 2 × 2 matrices then AB = BA.
The implication is really a “for all” assertion. It is the same as saying “for all 2 × 2 matrices A and B, the
equation AB = BA is true.” To disprove this all we need to do is exhibit one counterexample, i.e. we want
to produce a specific pair of matrices for which AB 6= BA. For instance let
   
1 2 0 1
A= , B= .
3 4 −1 0

We find that    
−2 1 3 4
AB = 6= = BA.
−4 3 −1 −2
This one counterexample is enough to disprove the proposition above.

Problem 2.11 Assume that a, b, and c are positive integers. Prove that if a divides b or c then a divides
bc.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . div

Problem 2.12 Prove that if m and n are odd integers, then n2 − m2 is divisible by 8.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D8

Problem 2.13 Assume a and b are real numbers. Prove or disprove the following statement.
If 4ab < (a + b)2 then a < b.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ineQ

C.4 Equivalence
To prove “S1 if and only if S2” we simply prove both implications, “S1 implies S2” and “S2 implies S1.”
Proposition 2.1 was an example. Here is another.
Proposition 2.2. Suppose a < b < c are three consecutive positive integers. There is a right triangle with
sides of length a, b, and c if and only if a = 3.

39
Proof. Suppose a = 3. Then b = 4 and c = 5. The triangle with sides 3, 4, 5 is a right triangle. This proves
the “if” assertion.
Suppose a < b < c are consecutive positive integers which form the sides of a right triangle. Then
b = a + 1, c = a + 2, and a2 + b2 = c2 and by the Pythagorean Theorem. Therefore we have

a2 + (a + 1)2 = (a + 2)2
a2 + a2 + 2a + 1 = a2 + 4a + 4
a2 − 2a − 3 = 0
(a − 3)(a + 1) = 0.

Therefore either a = −1 or a = 3. Since a is a positive integer, a 6= −1. Thus a = 3.

Multiple Equivalences
Sometimes we encounter propositions that three or more propositions are equivalent. Here is an example.
Proposition 2.3. Supoose f (x) is a twice differentiable function of a real variable x. The following are
equivalent.
a) f 00 (x) ≥ 0 for all x.

b) For any three values a < b < c the following inequality holds:

f (b) − f (a) f (c) − f (b)


≤ .
b−a c−b

c) For any two values s and t and any 0 ≤ λ ≤ 1 the following inequality holds:

f (λs + (1 − λ)t) ≤ λf (s) + (1 − λ)f (t).

The phrase “the following are equivalent” means that any two of the parts are equivalent to each other. With
3 parts that comes to 3 individual equivalences. We could prove the proposition by proving each equivalence
as a pair of implications, which would require 6 different implications. But a proof does not need to include
arguments for all 6 implications. If we prove that a) implies b) and that b) implies c) then the truth of a)
implies c) follows immediately and does not need to be proven separately. The most common proof of a
multiple equivalence like this is to establish a collection of the implications which comprise a cycle through
all of the parts. In this case we will prove that a) implies b), b) implies c), and that c) implies a). Nothing
more is needed.
Proof. Assume a). That means that f 0 is an increasing function. By the Mean Value Theorem of calculus
there exists a value x between a and b and a value y between b and c for which

f (b) − f (a) f (c) − f (b)


f 0 (x) = and f 0 (y) = .
b−a c−b
Because of the assumed ordering a < b < c this implies that x ≤ y. Since f 0 is increasing we know that
f 0 (x) ≤ f 0 (y). Therefore
f (b) − f (a) f (c) − f (b)
≤ .
b−a c−b
Thus a) implies b).
Now assume b). Consider any s, t and 0 ≤ λ ≤ 1. The cases of s = t, λ = 0 or λ = 1 are all trivial.
Without loss of generality we can assume that s < t and 0 < λ < 1. Let a = s, c = t and b = λs + (1 − λ)t.
Observe that a < b < c. So from b) we know that

f (b) − f (a) f (c) − f (b)


≤ .
b−a c−b

40
This can be rearranged as
 
1 1 1 1
+ f (b) ≤ f (a) + f (c)
b−a c−b b−a c−b
c−a 1 1
f (b) ≤ f (a) + f (c)
(b − a)(c − b) b−a c−b
c−b b−a
f (b) ≤ f (a) + f (c).
c−a c−a
c−b b−a
Bust since c−a = λ and c−a = 1 − λwe see that

f (λs + (1 − λ)t) ≤ λf (s) + (1 − λ)f (t).

Thus b) implies c).


1
Finally assume c) and consider any x. For any h > 0 take s = x − h, t = s + h and λ = 2. Then
λs + (1 − λ)t = x and so we know that
1 1
f (x) ≤ f (x + h) + f (x − h). (2.2)
2 2
L’Hopital’s Rule tells us that
1 1 0 1 0
f (x + h) − f (x) + 12 f (x − h) 2 f (x + h) − 2 f (x − h)
 
lim 2 1 2 = lim
h→0+
2h
h→0+ h
 0
f (x + h) − f 0 (x) f 0 (x) − f 0 (x − h)

= lim+ +
h→0 2h 2h
= f 00 (x).

From (2.2) we know that the limit on the left is nonnegative. Therefore f 00 (x) ≥ 0, proving that c) implies
a).
Other examples of multiple equivalences occur in the fourth bullet on page 81 and in Theorem 6.16 below.
You will be asked for a proof of a multiple equivalence in Problem 3.15 in the next chapter.

C.5 Existence and Uniqueness


To prove a proposition of the form “there exists x such that P(x)” often requires particular creativity. Instead
of showing that something is always true or always false, you must somehow find a way to pick or identify
one special x out of all the possibilities for which P(x) holds. There is no general pattern to follow for this
kind of proof; it depends very much on the specifics of what you are proving the existence of.
Problem 1.6 in Chapter 1 is an example of one approach to this kind of problem. The proof you wrote
for that worked by connecting the existence of what we want (x ≥ 0 with x2 = s) to a more general-purpose
existence theorem (the Intermediate Value Theorem).
Sometimes you can prove the existence of the thing you want by just finding a way to construct or
produce it, apart from using some fancy theorem. Here is an example that can be done that way.
Example 2.15. Prove that for every real number y there exists a real number x for which y = 21 (ex − e−x ).
We could do this by appealing to the Intermediate Value Theorem again, but this time we can find the x we
want by some algebra (this will be our scratch work) and then just verify that our expression for x works as
the proof. Here is the scratch work: given y we are trying to solve for x.
1 x
y= (e − e−x )
2
0 = ex − 2y − e−x
0 = e2x − 2yex − 1
0 = (ex )2 − 2yex − 1.

41
So, using the quadratic formula,
p
x 2y ± 4y 2 + 4 p
e = = y ± y 2 + 1.
2
Since ex > 0, we must use the positive of the two possible roots, and so taking the logarithm we get
p
x = ln(y + y 2 + 1).

That’s the formula we wanted. Now we prove that it is indeed the x we were looking for. Observe that the
proof itself gives the reader no clue about how we came up with the formula for x.
p
Proof. Given a real number y, consider x = ln(y + y 2 + 1). We will confirm that this x works as claimed.
p 1
ex − e−x = y + y2 + 1 − p
y+ y2 + 1
p
(y + y 2 + 1)2 − 1
= p
y + y2 + 1
p
y 2 + 2y y 2 + 1 + y 2 + 1 − 1
= p
y + y2 + 1
p
2y(y + y 2 + 1)
= p
y + y2 + 1
= 2y

Therefore we find that 12 (ex − e−x ) = y, proving the existence of x as claimed.


An assertion of uniqueness says that there is not more than one object satisfying some description or
collection of properties.
√ Proposition 1.3 is a good example. We assumed that two numbers both satisfied the
definition of s and then argued from there that those two numbers must be the same. That is the typical
pattern of a uniqueness proof.

C.6 Contradiction
A proof by contradiction is essentially the proof that the conclusion can not be false. We simply take all the
hypotheses and add to them the assumption that the conclusion is false, and show that from all those together
we can deduce something impossible8 . We saw several examples in Chapter 1: Lemma 1.4, Theorems 1.9
and 1.10.
If you are proving a “for all” proposition by contradiction, the negation is a “there exists” proposition.
To show that the negation is false you must show that in fact no values of the variables do what the negation
would say.
x x+1
Example 2.16. Prove that for all x > 0, the inequality x+1 < x+2 holds. Before proceeding, what is the
intended scope for x here? Since x > 0 only makes sense for real numbers we surmise that the scope does
not extend beyond that. If the variables i, j, n, k, `, or m had been used we might suspect that just integers
were intended. However the choice of x suggests that real numbers are intended, so that is how we will
understand the scope of x.
x x+1
Proof. Suppose the proposition is false. Then there exists an x > 0 with x+1 ≥ x+2 . Since both x + 1 and
x + 2 are positive we can multiply both sides of this inequality by them, obtaining x(x + 2) ≥ (x + 1)2 .
Multiplying this out we find that

x2 + 2x ≥ x2 + 2x + 1, and therefore 0 ≥ 1.
8 The technique is called “reductio ad absurdum,” which is Latin for “reduction to the absurd”. In brief, the negation of the

conclusion leads to an absurdity. I like to say that by assuming the conclusion is false we can produce a mathematical train
wreck.

42
This is clearly false, showing that the existence of such a x is not possible. This completes the proof by
contradiction.

Problem 2.14 Prove that if a, b, c are positive integers with a2 + b2 = c2 then either a or b is even.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sqeven

√ √
Problem 2.15 Prove (by contradiction) that 3 − 2 is irrational.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . r3r2

Problem 2.16 Prove that there does not exist a smallest positive real number.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . minR

Problem 2.17 Suppose x, y, z are positive real numbers. Prove that x > z and y 2 = xz together imply
x > y > z. (You can do this by contradiction; negating x > y > z results in an “or” statement, so the proof
by contradiction will be in two cases.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ineq3

Problem 2.18 Prove that if f and g are differentiable functions with x = f (x)g(x), then either f (0) 6= 0
or g(0) 6= 0. Hint: take a derivative. (Adapted from [24].)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . noxprd

C.7 Induction
Induction is a special technique for proving a proposition of the form
P (n) for all positive integers n.
A proof by induction of P (n) for all n = 1, 2, 3, · · · consists of two parts.

1. A proof that P (1) is true. (The base case.)


2. A proof that P (n) implies P (n + 1), for all positive integers n. (The induction step.)
The base case is often simple, because it boils down to just checking one case (and usually an easy case at
that). The proof of the induction step typically takes the form

suppose P(n). . . . therefore P(n + 1).


Newcomers to inductive proofs sometimes object to this, because it looks like we are assuming what we are
trying to prove. But as we discussed in Chapter 1, the “suppose P(n)” doesn’t mean that we are prematurely
claiming that P(n) is true, but showing how the truth of P(n + 1) will be a consequence of the truth of P(n).
We saw several examples of such induction proofs in the last chapter: Propositions 1.11, 1.12, and 1.13.
This proof technique only applies for propositions in which n ranges over integer values. It doesn’t work
for propositions of the form “. . . for all real numbers x.” Its validity boils down to a basic property of the
integers, as we will discuss in Chapter 4. But there are a couple variants of it, which we will describe in this
section.
Example 2.17. First, here is a spoof by induction to show that all Canadians were born on the same day.
For each positive integer n let S(n) be the statement that every group of n Canadians all were born on the
same day. We will show by induction that S(n) is true for each n.

43
In any group that consists of just one Canadian, everybody in the group has the same age, because after
all there is only one person! Thus S(1) is true.
Next suppose S(n) is true and consider any group G of n + 1 Canadians. Suppose p and q are two of
the Canadians in G. We want to show that p and q have the same birthday. Let F be the group of all the
people in G excluding p. Then F has n people in it, so since S(n) is true all the people in F have a common
birthday. Likewise, let H be the group of people in G excluding q. Then H has n people and so by S(n)
all the people in H share a common birthday. Now let r be someone in G other than p or q. Both r and q
are in F , so they have the same birthday. Similarly, both r and p are in H so they have the same birthday.
But then p and q must have the same birthday, since they both have the same birthday as r. Thus any two
people in G have the same birthday, so everyone in G has the same birthday. Since G was any group of n + 1
Canadians, it follows that S(n + 1) is true. This completes the proof by induction that S(n) is true for all n.
To finish, let n be the current population of Canada. Since we know S(n) is true, we see that all Canadians
have the same birthday.
Can you spot the flaw here9 ? Hint: find the first n for which S(n + 1) is false — the argument for the
induction step must be invalid for that n.

Problem 2.19 Let P(n) be the proposition that if i, j are positive integers with max(i, j) = n, then i = j.
We spoof by induction that P(n) is true for all n. First consider n = 1. If i and j are positive integers
with max(i, j) = 1, then i = 1 = j. This shows that P(1) is true. For the induction step suppose P(n) is
true and i, j are positive integers with max(i, j) = n + 1. Then max(i − 1, j − 1) = n, so by the induction
hypothesis we know i − 1 = j − 1. Adding 1 to both sides we deduce that i = j. This shows that P(n) implies
P(n + 1), and completes the proof by induction. As a corollary, if i and j are any two positive integers, take
n = max(i, j). Then since P(n) is true we conclude that i = j. Thus all positive integers are equal!
What’s wrong here? (Taken from [5].)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . fi

Generalized Induction
The first variant of induction is sometimes called generalized induction. Its only difference from ordinary
induction is that it proves something for all n ≥ n0 where n0 can be any integer starting value. Ordinary
induction is just the case of n0 = 1. For a generalized induction proof you prove the base case of n = n0 ,
and then (assuming n ≥ n0 ) prove the usual induction step. Here is an example.
Example 2.18. Prove that (1 + n1 )n < n for all integers n ≥ 3. Observe that this inequality is false for
n = 1, 2. The fact that the proof starts with n = 3 (instead of n = 1) is what makes this generalized
induction.
Proof. We use induction on n. First consider n = 3.
 n  3
1 4 64 10
1+ = = =2+ < 3,
n 3 27 27
verifying the assertion for n = 3.
Next, suppose that (1 + n1 )n < n and n ≥ 3. We want to prove that (1 + n+1
1
)n+1 < n + 1. We begin by
writing
 n+1  n    n
1 1 1 1 n+2
1+ = 1+ 1+ = 1+ .
n+1 n+1 n+1 n+1 n+1
1 1 1
Since n+1 < n, it follows that (1 + n+1 )n < (1 + n1 )n . Therefore
 n+1  n
1 1 n+2
1+ < 1+
n+1 n n+1
n+2
<n , using the induction hypothesis.
n+1
9 If not then on what day do you think all those Canadians were born. April 1 perhaps?

44
n+2
To finish we claim that n n+1 < n + 1. We can deduce this from the following inequalities.

0<1
n + 2n < n2 + 2n + 1
2

n(n + 2) < (n + 1)2


n+2
n < n + 1,
n+1
the last line following from the preceding since n + 1 is positive. So putting the pieces together we have
shown that  n+1
1 n+2
1+ <n < n + 1.
n+1 n+1
This proves the assertion for n + 1 and completes the proof by induction.

Problem 2.20 Prove that for all integers n ≥ 2, we have n3 + 1 > n2 + n.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cubeineq

Problem 2.21 Prove that n2 < 2n for all n ≥ 5.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i3

Problem 2.22 Prove that n! > 2n for all integers n ≥ 4.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . facineq

Problem 2.23 Prove that for all x > 0 and all nonnegative integers n,

(1 + x)n ≥ 1 + nx.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . pineq

Strong Induction
A more significant variation is strong induction. In the induction step for ordinary (or generalized) induction
we assume P(n) for just one value n and use that to prove the next case, P(n + 1). But for strong induction
we assume all the preceding cases P(n0 ), P(n0 + 1), . . . , P(n) and use them to prove P(n + 1). Sometimes
we also prove multiple base cases P(n0 ), P(n0 + 1), . . . , P(k), and then just apply the induction argument
for n ≥ k. Here is an elementary example.
Example 2.19. Prove that every integer n ≥ 12 can be written in the form n = 3m + 7` where m and ` are
nonnegative integers.
Let’s discuss the proof before writing out the finished version. The basic issue for any induction proof is
connecting the next case, n + 1, to the previous case(s). Here we want to produce nonnegative integers m
and ` for which
n + 1 = 3m + 7`. (2.3)
If we were using ordinary induction, we would assume n = 3m0 + 7`0 , for some nonnegative m0 and `0 , and
try to use this to produce m and ` in (2.3). This doesn’t work very well;

n + 1 = 3m0 + 7`0 + 1, but how does this become 3m + 7` ?

45
But we can make a connection between n + 1 and n − 2 because they are 3 steps apart: n + 1 = (n − 2) + 3.
If we assume that the n − 2 case is true, that is that n − 2 = 3m0 + 7`0 , then
n + 1 = (n − 2) + 3 = 3m0 + 7`0 + 3 = 3(m0 + 1) + 7`0 = 3m + 7` where m = m0 + 1 and ` = `0 .
So the induction step works out neatly if we are not limited to assuming only the n case but assume all the
cases up to n, including the case of n − 2 in particular. Here is this strong induction proof written out.
Proof. We use strong induction. First consider the cases of n = 12, n = 13, and n = 14. Observe that
12 = 3 · 4 + 7 · 0, 13 = 3 · 2 + 7 · 1, 14 = 3 · 0 + 7 · 2.
Next, consider 14 ≤ n and suppose that the assertion is true for each 12 ≤ k ≤ n. Consider n + 1 and let
k = n − 2. Then 12 ≤ k ≤ n so by our induction hypothesis there exist nonnegative integers m0 , `0 with
n − 2 = 3m0 + 7`0 and consequently, n + 1 = n − 2 + 3 = 3(m0 + 1) + 7`0 . Since m = m0 + 1 and ` = `0 are
nonnegative integers, this proves the assertion for n + 1. By induction, this completes the proof.
Notice that we proved multiple base cases: n = 12, 13, and 14. The reason we had to do that is that the
argument to prove the case of n + 1 uses the case of n − 2. Since our proposition is only true for integers
at least as large as 12, the induction step is only valid for n − 2 ≥ 12, which means n ≥ 14. So we have to
check the values 12 ≤ n ≤ 14 individually as base cases before we can start using the induction argument.
We will see more examples of strong induction in Proposition 2.4 and Theorem 2.5 below. Additional
examples will be found in Chapter 4.

d n d 0
Problem 2.24 Here’s a spoof to show that in fact dx x = 0 for all n ≥ 0. Clearly dx x = 0. If we make
d k
the (strong) induction hypothesis that dx x = 0 for all k ≤ n, then the product rule implies
d n+1 d
x = (x · xn )
dx dx
 
d 1 d
= x xn + x1 xn
dx dx
= 0 xn + x1 0
= 0.
d n
Thus dx x = 0 What’s wrong? (See [5] for the source of this one.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ezd

Recursive Definitions
Induction proves a proposition P(n) for all n ≥ n0 by proving it for a first case n0 and then showing why
the truth of each successive case P(n + 1) follows from the preceding one P(n). The same idea can be used
to define some function f (n) for n ≥ n0 , by specifying a starting value f (n0 ) and then giving a formula for
f (n + 1) which is based on the preceding f (n). Together these two pieces of information determine or define
the values f (n) for all n ≥ n0 . This is what we call a recursive definition. For instance, a recursive definition
of n! could consist of specifying that
• 0! = 1, and
• (n + 1)! = (n + 1) · n! for each n ≥ 0.
This determines the same values for n! as our original definition on page 20. For instance, applying the
definition repeatedly (recursively), we find that
4! = 4 · 3! = 4 · 3 · 2! = 4 · 3 · 2 · 1! = 4 · 3 · 2 · 1 · 0! = 4 · 3 · 2 · 1 · 1 = 24.
We will see another recursive definition in our proof of Theorem 3.8 below.
Here is a famous example that uses a strong induction sort of recursive definition.

46
Definition. The Fibonacci numbers, Fn for n ≥ 1, are defined recursively by
F1 = 1, F2 = 1, and Fn+1 = Fn + Fn−1 for n ≥ 2.
The first few Fibonacci numbers are
F1 = 1, F2 = 1, F3 = 2, F4 = 3, F5 = 5, F6 = 8, F7 = 13, F8 = 21, . . .
The Fibonacci numbers have many unusual properties. Since Fn+1 depends on the previous two terms, it is
not surprising that strong induction is a natural tool for proving those properties. As an example we offer
the following.
Proposition 2.4 (Binet’s Formula). The Fibonacci numbers are given by the following formula:
" √ !n √ !n #
1 1+ 5 1− 5
Fn = √ − .
5 2 2
It is surprising that this formula always produces integers, but it does!
Proof. We prove Binet’s formula by (strong) induction. First consider the cases of n = 1, 2. For n = 1 we
have
√ !1 √ !1 √
 
1  1+ 5 1− 5  2 5
√ − = √ = 1 = F1 .
5 2 2 2 5
For n = 2 we have
√ !2 √ !2 √ √ √
  " #
1  1+ 5 1− 5  1 1+2 5+5 1−2 5+5 4 5
√ − =√ − = √ = 1 = F2 .
5 2 2 5 4 4 4 5

Now suppose the formula is correct for all 1 ≤ k ≤ n, where n ≥ 2. We will show that it must also be true
for n + 1. By definition of the Fibonacci sequence and the induction hypothesis we have
Fn+1 = Fn−1 + Fn
√ !n−1 √ !n−1 √ !n √ !n #
  "
1  1+ 5 1− 5 1 1 + 5 1− 5
=√ − + √ −
5 2 2 5 2 2
√ !n−1 √ !n √ !n−1 √ !n
   
1  1+ 5 1+ 5  1  1− 5 1− 5 
=√ + −√ + .
5 2 2 5 2 2

Now observe that


√ !n+1 " √ # √ !n−1
1± 5 1±2 5+5 1± 5
=
2 4 2
" √ # √ !n−1
2±2 5 1± 5
= 1+
4 2
" √ !# √ !n−1
1±1 5 1± 5
= 1+
2 2
√ n−1
! √ !n
1± 5 1± 5
= + .
2 2
Using this we find that
√ !n+1 √ !n+1
 
1  1+ 5 1− 5
Fn+1 =√ − ,
5 2 2

which verifies Binet’s formula for n + 1 and completes the proof by induction.

47
Why is this strong induction? Because the proof for Fn+1 needs to use the assumption that the formula
is true for both Fn−1 and Fn , not just Fn . Why are there two base cases? Because we need both n ≥ 1 and
n − 1 ≥ 1 for the induction step to work, i.e. the induction argument is only valid once n ≥ 2.
Here is another interesting property of the Fibonacci numbers; see [2].
Theorem 2.5 (Zeckendorf’s Theorem). Every positive integer n can be expressed as a sum of distinct
Fibonacci numbers:
n = Fi1 + Fi2 + · · · Fik ,
where 1 < i1 < i2 < · · · < ik .
The proof is another example of strong induction.
Proof. We prove this by strong induction. First consider n = 1. Since 1 = F2 the theorem holds for n = 1.
Now we make the strong induction hypothesis that the theorem holds for every 1 ≤ m ≤ n and consider
n + 1. If n + 1 is itself a Fibonacci number, n + 1 = Fi , then again the theorem’s assertion is true. If n + 1
is not a Fibonacci number then it falls between two Fibonacci numbers:

F` < n + 1 < F`+1 , for some ` ≥ 4.

(F4 , F5 is the first pair of Fibonacci numbers between which there is a gap.) Let m = n + 1 − F` . It follows
that
0 < m ≤ F`+1 − F` = F`−1 < F` < n + 1,
using the basic recurrence relation for the Fibonacci numbers. By our induction hypothesis, we know that
m can be written as a sum of Fibonacci numbers,

m = Fi1 + Fi2 + · · · Fik0 ,

where 1 < i1 < i2 < · · · < ik0 . Since m < F` we know ik0 < `. Now define k = k 0 + 1 and ik = `. We have

n + 1 = m + F`
= Fi1 + Fi2 + · · · Fik0 + F`
= Fi1 + Fi2 + · · · Fik .

This proves the assertion of the theorem for n + 1, and completes the proof.

Problem 2.25 Define a sequence an recursively by a1 = 1 and an+1 = 6a n +5


an +2 for n ∈ N. Prove by
induction that 0 < an < 5. [9, #17, p.55]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . recseq

Problem 2.26 Here are some more equations involving the Fibonacci numbers. Prove them. (Do not use
Binet’s formula. Ordinary induction will suffice for all of these.)
Pn
a) i=1 Fi = Fn+2 − 1.
Pn
b) i=1 F2i−1 = F2n .
Pn
c) i=1 F2i = F2n+1 − 1.
Pn 2
d) i=1 Fi = Fn Fn+1 .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fib2

Problem 2.27 Prove that F5n is divisible by 5 for all positive integers n. (Apply the recursive definition
several times to find a formula for F5n+5 in terms of F5n+1 and F5n . Once you have that an easy induction
argument will finish the problem.)

48
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fib5

Problem 2.28 Write a proof that limn→∞ Fn = ∞ without using Binet’s Formula. (One way to do this
is prove a simple lower bound, Fn ≥ f (n) for some expression f (n) with f (n) → ∞ as n → ∞.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FibInfty

Problem 2.29 We can define the Fibonacci numbers Fn for n ≤ 0 by using the basic recursion formula

Fn+1 = Fn + Fn−1

and working backwards from F1 = F2 = 1. For instance, from F2 = F1 + F0 we deduce that F0 = 0; and
from F1 = F0 + F−1 we then deduce that F−1 = 1.

a) Find the values of F−2 , . . . , F−8 .


b) Prove that for all n ≥ 1,
F−n = (−1)n+1 Fn .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S1

D Some Advice for Writing Proofs


Creating a proof consists of two tasks. The first is to find the mathematical ideas, relationships and arguments
that form the proof. The second is writing it down for someone else to read. Don’t try to do both at once.
Start with a piece of scratch paper and just explore the hypotheses and what you are supposed to prove
to see if you can discover how they are connected, or some strategy for getting from the hypotheses to the
conclusion. There is no prescription or procedure to help you. This is fundamentally a creative process. Play
with what you want to prove, try some examples, draw pictures, see if you can work out any possibilities
that occur to you — just try things until you find an idea that seems to work. Once you come up with the
mathematical idea(s) of your proof, only then comes the job of writing it out in careful, logical language.
(Don’t expect someone else to read your scratch paper. It’s for your eyes only — don’t turn it in.)
When you read someone else’s proof you may wonder how they ever thought of it. For instance in the
proof of the Pythagorean Theorem above, how did someone first think of the idea of using a partition of
one square into smaller figures and adding up the areas? There is almost always some sort of a basic idea
or strategy behind a proof. Sometimes it is obvious. Sometimes it takes some effort to stand back and look
past the details to figure out what the guiding idea was, but it is usually there to be found. When studying
a proof, try to identify the main idea(s) behind it. Once you’ve recognized that, the rest of the details will
be easier to understand. Learning to think that way will also help you to design proofs of you own.
The writing part is the process of putting your argument on paper, carefully arranged and explained for
someone else to read. This might involve deciding what symbols and notation to use, what order to say
things in. After logical correctness the most important thing is to design the written proof with the reader
in mind. Picture one of your classmates who has already read the proposition of what is to be proven, but
has not thought about how to do the proof, and is even a bit skeptical that it is really true. Your written
proof has to lead them through the reasoning, so that by the end they can’t help but concede the conclusion.
Expect them to be skeptical, looking for holes in your argument. Try to anticipate where in your proof they
might raise objections and put in explanation to dispel their doubts. Your written proof is their tour guide
through the labyrinth of your argument; you don’t want them to wander away from your logical path.
We often mingle mathematical notation with English √ grammar. For instance, instead of writing “The
square root of two is an irrational number,” we write “ 2 is an irrational number.” The latter is briefer and
easier to read. We also use mathematical notation to squeeze several things we want to say into a shorter
sentence. Consider this example:

49

2 is the real number x ≥ 0 for which x2 = 2.
This is briefer than “. . . the real number x which is greater than or equal to 0 and . . . ” The goal is clarity
for your reader; decide how much notation to use by thinking about what will make it easiest for them to
understand.
Even if we are mixing mathematical notation with conventional English, we are still writing in English;
there is no substitute for fluency. Proper punctuation is still important. For instance, sentences should
end with periods, even when a mathematical symbol is the last “word.” In other words we view the use of
mathematical notation as an enhancement of our natural English language usage. The language we use is
precise, but it still leaves plenty of room for stylistic nuances. Good writing is as important in mathematics
as in any other subject. Like any language, fluency is only attained by experience in actually using the
language.
Here are a few more specific suggestions that may be helpful as you write out your proofs.
• Be sure you are clear in your own mind about what you are assuming as opposed to what you need
to prove. Don’t use language that claims things are true before you have actually proven they are
true. For instance, you could start a proof of the Pythagorean Theorem with “We want to show that
a2 + b2 = c2 ,” but don’t start with “Therefore square of the hypotenuse equals the sum of the squares
of the other two sides.” (It seems silly to warn you about that, but you would be surprised at how
often students write such things.)
• Be sure your argument accounts for all possibilities that the hypotheses allow. Don’t add extra assump-
tions beyond the hypotheses, unless you are considering cases which, when taken together, exhaust all
the possibilities that the hypotheses allow.
• Don’t confuse an example, which illustrates the assertion in a specific instance, with a proof that
accounts for the full scope of the hypotheses.
• Be wary of calculations that start at the conclusion and work backwards. These are useful in discovering
a line of reasoning connecting hypotheses and conclusion, but often are logically the converse of what
you want to show. Such an argument needs to be rewritten in the correct logical order, so that it
proceeds from hypotheses to conclusion, not the other way around. We saw this as we developed our
proof of Proposition 1.5.
• Don’t just string formulas together. Your proof is not a mathematical diary, in which you write a
record of what you did for your own sake. It is an explanation that will guide another reader through
a logical argument. Explain and narrate. Use words, not just formulas. (And don’t include things you
thought about but ended up not using.)
• Don’t use or talk about objects or variables you haven’t introduced to the reader. Introduce them
with “let” or “define” so the reader knows what they are when they are first encountered. On the
other hand, don’t define things long before the reader needs to know about them (or worse, things that
aren’t needed at all).
• Keep notation consistent, and no more complicated than necessary.
• When you think you are done, reread your proof skeptically, looking for possibilities that your argument
overlooked. Perhaps get someone else (who is comfortable with proofs) to read it. Make it your
responsibility to find and fix flaws in your reasoning and writing before you turn it in.
• One thing that some find awkward at first is the traditional use of first person plural “we . . . ” instead
of “I” in mathematical writing. The use of “I” is fine for personal communications, talking about
yourself. But when you are describing a proof it is not a personal statement or testimony. The “we”
refers to you and all your readers — you are leading all of them with you through your reasoning.
Write so as to take them along with you through your arguments. By writing “we” you are saying that
this is what everyone in your logical tour group should be thinking, not just your personal perspective.
• Separate symbols by words. For instance, “Consider q 2 , where q ∈ Q” is good. “Consider q 2 , q ∈ Q”
is vague.

50
• Don’t start a sentence with a symbol.
• A sentence starting with “if” needs to include “then.”
• Use words accurately. An inequality is not an equation, for instance.
If you want more discussion of good mathematical writing, you could try Tips on Writing Mathematics
at the beginning of [17], or the classic little brochure [12].

E Perspective: Proofs and Discovery


Since proofs are often not emphasized in freshman-sophomore calculus courses, students can get the impres-
sion that the proofs are merely a sort of formal ritual by which a fact receives the official title of “Theorem.”
In reality the development of a proof is often part of the process of discovery, i.e. the way we separate what
we know to be true from what we do not. If we believe something is true, but can’t prove it, then there
remains some uncertainty. Perhaps what we are thinking of is only true in some more limited context; until
there is a proof we don’t know for certain. Here are two famous examples.

Theorem 2.6 (Fermat’s Last Theorem). If m > 2 is an integer, there are no triples (a, b, c) of positive
integers with
am + bm = cm .
Conjecture (Twin Primes). There are an infinite number of positive integers n for which both n and n + 2
are prime numbers.

To find a proof of Fermat’s Last Theorem was one of the most famous unsolved problems in mathematics
since Fermat claimed it was true in 1637. As the years went by many mathematicians10 tried to prove
or disprove it. Even without a valid proof, mathematicians developed an increasing understanding of the
problem and its connections to other mathematical subjects. There was a growing consensus that it was
probably true, but it remained a strongly held belief or opinion until it was finally proven11 in 1993 by
Andrew Wiles. The point is that Wiles’ successful proof is the way we discovered that it really is true.
The Twin Primes conjecture is still on the other side of the fence. So far no one has been able to prove it,
so it’s not yet a theorem. We don’t yet know for sure if it is true or not, even though most number theorists
have a pretty strong opinion.
It is important to realize that what makes a proof valid is not that it follows some prescribed pattern, or
uses authoritative language or some magic phrases like “therefore” or “it follows that”. The final criterion is
whether the proof presents an argument that is logically complete and exhaustive. There are many examples
of proofs that seem convincing at first reading, but which contain flaws on closer inspection. In the history
of both Fermat’s Last Theorem and the Twin Primes Conjecture there have been people who thought they
had found a proof, only to find later that there were logical flaws. It is entirely possible to give a false
proof of a true conclusion. Often however something is learned from the flawed argument which helps future
efforts. A couple interesting examples of things that were thought to have been proven, but later found to
be false are given in Ch.11 of [1].
Another very interesting case is the Four Color Problem. Picture a map with borders between various
countries (or states, or counties, . . . ) drawn in. Each country is to be shaded with a color, but two countries
that share a common border can not use the same color. The problem is to prove that you can always do this
10 A prize was offered in 1908 for the first person to prove it, the Wolfskehl prize. It has been estimated that over the years

more that 5,000 “solutions” have been submitted to the Göttingen Royal Society of Science, which was responsible for awarding
the prize. All of them had flaws; many were submitted by crackpots. The prize of DM 25,000 was awarded to A. Wiles on June
27, 1997. The interesting story of this prize (and the courageous soul who read and replied to each submitted “solution” until
his death in 1943) is told in [4].
11 Wiles announced that he had proved it on June 23, 1993. But shortly after that a flaw in the proof was found. Wiles

enlisted the help of Richard Taylor and together they fixed the problem in September 1994. The proof was published as two
papers in the May 1995 issue of Annals of Mathematics (vol. 141 no. 3): Andrew Wiles, Modular elliptic curves and Fermat’s
last theorem, pp. 443–551; Richard Taylor and Andrew Wiles, Ring-theoretic properties of certain Hecke algebras, pp. 553–572.
A chapter in [8] provides an overview of Fermat’s Last Theorem, its history and final solution by A. Wiles.

51
with no more than four colors12 . This problem was posed to the London Mathematical Society in 1878. It
resisted a proof until 1976, when Kenneth Appel and Wolfgang Haken of the University of Illinois achieved
a proof. But their proof was controversial because it depended in an essential way on a computer program.
They were able to reduce the problem to a finite number of cases, but thousands of them. The computer was
instrumental both in identifying an exhaustive list of cases and then in checking each of them. It took four
years to develop the program and 1200 hours to run. So was this a proof? It wasn’t a proof like the ones
we have been talking about, which when written down can be read and evaluated by another person. It was
a proof based on an algorithm, which (the proof claims) terminates successfully only if the assertion of the
four color problem is true, and a computer program which (it is claimed) faithfully executes the algorithm,
and the claim that the algorithm did run and successfully terminate on the authors’ computer. By now most
of the controversy has died down. Various efforts to test the algorithm and modifications of it, program
it differently, run it on different machines, etc. . . have led most people to accept it, even though some may
remain unhappy about it. You can read more about the Four Color Problem in [27].

Additional Problems

Problem 2.30 Prove that for all real numbers a, b, c, the inequality ab + bc + ca ≤ a2 + b2 + c2 holds.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . trineq

Problem 2.31 If A is a square matrix, the notation An refers to the result of multiplying A times itself
n times: A3 = A · A · A for instance. Prove that for every positive integer n the following formula holds:
n  n
nan−1
 
a 1 a
=
0 a 0 an

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i2

Problem 2.32 Suppose that x is a real number and i refers to the complex number described by i2 = −1.
For that for all positive integers n the following formula holds.

(cos(x) + i sin(x))n = cos(nx) + i sin(nx).

This is elementary if you use Euler’s formula (eix = cos(x) + i sin(x)) and assume that the laws of expo-
nents hold for complex numbers. But do not use those things to prove it here. Instead use induction and
conventional trigonometric identities (which you may need to look up if you have forgotten them).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i5

Problem 2.33 Prove (by induction) that if we divide the plane into regions by drawing a number of
straight lines, then it is possible to “color” the regions with just 2 colors so that no two regions that share
an edge have the same color. (From [9].)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . color2

Problem 2.34 Explain Binet’s formula in terms of the eigenvalues of the matrix [ 01 11 ].
12 It’s not hard to see that you need at least four colors. Look at a map of the United States for instance. California, Oregon,

and Nevada all share borders with each other, so we will need at least three colors just for them. Suppose we color Oregon
green, California blue and Nevada red. If we had only these three colors, then Arizona would have to be different than both
California and Nevada, so Arizona would have to be green. Now Utah has to be different from both Arizona and Nevada, so it
would have to be blue. Similarly Idaho has to be different from both Oregon and Nevada, so it has to be blue too. But Idaho
and Utah share a border and we have colored them both blue. We see that 3 colors alone are not enough.

52
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Binet

Problem 2.35 Observe that

52 = 3 · 8 + 1
82 = 5 · 13 − 1
132 = 8 · 21 + 1
212 = 13 · 34 − 1
..
.

Express the pattern as a formula involving the Fibonacci numbers Fn , and prove it. (Taken from [17].)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FibForm

53
Chapter 3

Sets and Functions

This chapter is devoted to the subject of sets, perhaps the most basic of mathematical objects. The theory
of sets is a topic of study in its own right, part of the “foundations of mathematics” which involves deep
questions of logic and philosophy. (See Section H.2 below for a hint of this.) For our purposes sets and
functions are just another part of the language of mathematics. Sections A–F are intended mainly to
familiarize you with this part of the vernacular. We will encounter a lot of notation, of which some may be
new to you. This notation too is part of the vocabulary. You need to learn to use this language, not be
dependent on an example or picture as a way to bypass literacy. Section G introduces some ideas regarding
infinite sets. Here we will be going beyond mere terminology and will prove some interesting theorems.

A Notation and Basic Concepts


A set is simply a collection of things. The things in the set are the elements of the set. For instance, the set
of multiples of 13 less than 100 is
S = {13, 26, 39, 52, 65, 78, 91}.
This set has exactly 7 elements. Order does not matter; S = {52, 13, 26, 91, 39, 78, 65} describes the same
set. The mathematical notation for “is an element of” is ∈. In our example, 65 ∈ S but 66 ∈ / S (by which
we mean 66 is not an element of S). A set is viewed as a single object, distinct from and of a different type
than its elements. Our S is not a number; it is the new object formed by putting those specific numbers
together as a “package.” Even when a set contains only one element, we distinguish between the element
and the set containing the element. Thus
0 and {0}
are two different objects. The first is a number; the second is a set containing a number.
Simple sets are indicated by listing the elements inside braces {· · · } as we have been doing above. When
a pattern is clear we might abbreviate the listing by writing “. . . “ For instance we might identify the set of
prime numbers by writing
P = {2, 3, 5, 7, 11, 13, . . .}.
But to be more precise it is better to use a descriptive specification, such as

P = {n : n is a prime number}.

This would be read “the set of n such that n is a prime number.” The general form for this kind of set
specification is1
{x : criteria(x)},
where “criteria(x)” is an open sentence specifying the qualifications for membership in the set. For example,

T = {x : x is a nonnegative real number and x2 − 2 = 0}


1 Many authors prefer to write “{n| . . .}, using a vertical bar instead of a colon. That’s just a different notation for the same
thing.

54

is just a cumbersome way of specifying the set T = { 2}. In a descriptive set specification we sometimes
limit the scope of the variable(s) by indicating something about it before the colon. For instance we might
write our example T above as
T = {x ∈ R : 0 ≤ x and x2 − 2 = 0},
or even
T = {x ≥ 0 : x2 − 2 = 0},
if we understand x ∈ R to be implicit in x ≥ 0.
There are special symbols2 for many of the most common sets of numbers:

the natural numbers: N = {1, 2, 3, . . .} = {n : n is a positive integer},


the integers: Z = {. . . − 3, −2, −1, 0, 1, 2, 3, . . .} = {n : n is an integer},
n
the rational numbers: Q = {x : x is a real number expressible as x = , for some n, m ∈ Z}
m
the real numbers: R = {x : x is a real number}, and
the complex numbers: C = {z : z = x + iy, x, y ∈ R}.

Intervals are just special types of sets of real numbers, for which we have a special notation.

[a, b) = {x ∈ R : a ≤ x < b}
(b, +∞) = {x ∈ R : b < x}.

In particular (−∞, +∞) is just another notation for R.


Another special set is the empty set
∅ = { },
the set with no elements at all. Don’t confuse it with the number 0, or the set containing 0. All of the
following are different mathematical objects.

0, ∅, {0}, {∅}.

B Basic Operations and Properties


Definition. Suppose A and B are sets. We say A is a subset of B, and write A ⊆ B, when every element
of A is also an element of B. In other words, A ⊆ B means that

x ∈ A implies x ∈ B.

For instance, N ⊆ Z, but Z ⊆ N is a false statement. (You will often find “A ⊂ B” written instead of
“A ⊆ B”. They mean the same thing; it’s just a matter of the author’s preference.) No matter what the set
A is, it is always true that
∅ ⊆ A.
(This is in keeping with our understanding that vacuous statements are true.) To say A = B means that A
and B are the same set, that is they contain precisely the same elements:

x ∈ A if and only if x ∈ B,

which means the same as


A ⊆ B and B ⊆ A.
This provides the typical way of proving A = B: prove containment both ways. See the proof of Lemma 3.1
part d) below for an example.
Starting with sets A and B there are several operations which form new sets from them.
2 Thechoice of Q for the rational numbers is suggested by Q for “quotient.” The choice of Z for the integers comes from the
German term “zahlen” for numbers.

55
Definition. Suppose A and B are sets. The intersection of A and B is the set

A ∩ B = {x : x ∈ A and x ∈ B}.

The union of A and B is the set


A ∪ B = {x : x ∈ A or x ∈ B}.
The set difference, A remove B, is the set

A \ B = {x : x ∈ A and x ∈
/ B}.

We can illustrate these definitions by imagining that A and B are sets of points inside two circles in the
plane, and shade the appropriate regions to indicate the newly constructed sets. Such illustrations are called
Venn diagrams. Here are Venn diagrams for the definitions above.

A B A B A B

A∩B A∪B A\B

Don’t let yourself become dependent on pictures to work with sets. For one thing, not all sets are geometrical
regions in the plane. Instead you should try to work in terms of the definitions, using precise logical language.
For instance x ∈ A ∩ B means that x ∈ A and x ∈ B.
Example 3.1. For A = {1, 2, 3} and B = {2, 4, 6} we have

A ∩ B = {2}, A ∪ B = {1, 2, 3, 4, 6}, A \ B = {1, 3}, and B \ A = {4, 6}.

When two sets have no elements in common, A ∩ B = ∅ and we say the two sets are disjoint. When we
have three or more sets we say they are disjoint if every pair of them is disjoint. That is not the same as
saying their combined intersection is empty.
Example 3.2. Consider A = {1, 2, 3}, B = {2, 4, 6}, and C = {7, 8, 9}. These are not disjoint (because
A ∩ B 6= ∅), even though A ∩ B ∩ C = ∅.
We want to say that the complement of a set A, to be denoted3 Ã, is the set of all things which are not
elements of A. But for this to be meaningful we have to know the allowed scope of all possible elements
under consideration. For instance, if A = {1, 2, 3}, is à to contain all natural numbers that are not in A,
or all integers that are not in A, or all real numbers that are not in A, or. . . ? It depends on the context
in which we are working. If the context is all natural numbers, then à = {4, 5, 6, . . .}. If the context is all
integers, then à = {. . . , −3, −2, −1, 0, 4, 5, 6, . . .}. If the context is all real numbers R, then

à = (−∞, 1) ∪ (1, 2) ∪ (2, 3) ∪ (3, +∞).

The point is that the complement of a set is always determined relative to an understood context of what
the scope of all possible elements is.
Definition. Suppose X is the set of all elements which are allowed as elements of sets. The complement of
a set A ⊆ X is
à = X \ A.
Here is a Venn diagram. The full box illustrates X. The shaded region is Ã.
3 Other common notations are Ac and A0 .

56
Ž
A

In the context of the real numbers for instance,


]
(a, b) = (−∞, a] ∪ [b, +∞).

Lemma 3.1. Suppose X is the set of all elements under consideration, and that A, B, and C are subsets
of X. Then the following hold.
a) A ∪ B = B ∪ A and A ∩ B = B ∩ A.
b) A ∪ (B ∪ C) = (A ∪ B) ∪ C and A ∩ (B ∩ C) = (A ∩ B) ∩ C
c) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) and A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).
d) (A ∪ B)∼ = Ã ∩ B̃ and (A ∩ B)∼ = Ã ∪ B̃.
e) A ∪ Ã = X and A ∩ Ã = ∅.

f ) (g
Ã) = A.
Proofs of these statements consist of just translating the symbols into logical language, working out the
implications, and then translating back. We will prove the first part of d) as an example.
Proof. To prove the first part of d), we prove containment both ways.
Suppose x ∈ (A ∪ B)∼ . This means x ∈ X and x ∈ / A ∪ B. Since (x ∈ A or x ∈ B) is false, we know
x ∈ / A and x ∈ / B. So we know that x ∈ Ã and x ∈ B̃, which means that x ∈ Ã ∩ B̃. This shows that
(A ∪ B)∼ ⊆ Ã ∩ B̃.
Now assume that x ∈ Ã∩ B̃. This means that x ∈ à and x ∈ B̃, and so x ∈ X, x ∈
/ A, and x ∈
/ B. Since x
/ A∪B. We find therefore that x ∈ (A∪B)∼ . This shows that x ∈ Ã∩ B̃ ⊆ (A∪B)∼ .
is in neither A nor B, x ∈
Having proven containment both ways we have established that (A ∪ B)∼ = Ã ∩ B̃.

Problem 3.1 If x and y are real numbers, min(x, y) refers to the smaller of the two numbers. If a, b ∈ R,
prove that
{x ∈ R | x ≤ a} ∩ {x ∈ R | min(x, a) ≤ b} = {x ∈ R | x ≤ min(a, b)}.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sets1

Problem 3.2 Prove part c) of the above lemma.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L21c

Problem 3.3 Prove that for any two sets, A and B, the three sets

A ∩ B, A \ B, and B \ A

57
are disjoint.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . djex

Problem 3.4 The symmetric difference of two sets A, B is defined to be

A 4 B = (A \ B) ∪ (B \ A).

a) Draw a Venn diagram to illustrate A 4 B.

4 B = (A ∩ B) ∪ (Ã ∩ B̃).
b) Prove that A^

4 B = Ã 4 B̃? Either prove or give a counterexample.


c) Is it true that A^

d) Draw a Venn diagram for (A 4 B) 4 C.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . symdif

Indexed Families of Sets


Sometimes we need to work with many sets whose descriptions are similar to each other. Instead of giving
them all distinct names, “A, B, C, . . . ” it may be more convenient to give them the same name but with a
subscript which takes different values.
Example 3.3. Let
1 1 1 1
A1 = (−1, 1), A2 = (− , ), A3 = (− , ) ...;
2 2 3 3
in general for k ∈ N,
1 1
Ak = (− , ).
k k
In the above example, the “k” in “Ak ” is the index. The set of allowed values for the index is called the
index set. In this example the index set is N. The collection Ak , k ∈ N is an example of an indexed family
of sets. We can form the union of an indexed family, but instead of writing

A1 ∪ A2 ∪ A3 ∪ . . .

we write
∪k∈N Ak or, for this particular index set, ∪∞
k=1 Ak .

Similarly,
∩k∈N Ak = A1 ∩ A2 ∩ A3 ∩ . . . ,
We will see some indexed families of sets in the proof of the Schroeder-Bernstein Theorem below. Here
are some simpler examples.
Example 3.4. Continuing with Example 3.3, we have ∪∞
k=1 Ak = (−1, 1), and ∩k∈N Ak = {0}.
Example 3.5. R \ Z = ∪k∈Z Ik , where Ik = (k, k + 1).

There is no restriction on what can be used for an index set.


Example 3.6. Let U = (0, ∞), the set of positive real numbers. For each r ∈ U let

Sr = {(x, y) : x, y are real numbers with |x|r + |y|r = 1}.

58
r = 10

The figure at right illustrates some of the sets Sr . r=4

r=2
The set ∪r∈U Sr is somewhat tricky to describe:
r=1

∪r∈U Sr = (B \ A) ∪ C, r = .6

r = .4

where

B = {(x, y) : |x| < 1 and |y| < 1}


A = {(x, y) : x = 0 or y = 0}
C = {(0, 1), (0, −1), (1, 0), (−1, 0)}.

Problem 3.5
a) For each x ∈ R let Cx = {y : x2 + y 2 ≤ 1}. What is ∪x∈R Cx ? What is ∩x∈R Cx ?
b) Let Mn = {k ∈ N : k = nm for some integer m > 1}. What is ∪n∈N Mn ?
c) Suppose A ⊆ R and for each  > 0 let I = {a ∈ R : (a − , a + ) ⊆ A}. Is it true that A = ∪>0 I ?
Explain.
d) Let S = {n ∈ N : sin(n) > 1 − }. For  > 0 is S finite or infinite? What is ∩>0 S ?

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Idx

Problem 3.6 For each x ∈ R, let Px be the set

Px = {y : y = xn for some n ∈ N}.

a) There are exactly three values of x for which Px is a finite set. What are they?
b) Find ∩0<x<1 Px and ∪0<x<1 Px .

c) For a positive integer N , find ∩N
k=1 P2k . Find ∩k=1 P2k .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sx

C Product Sets
Sets don’t care about the order of their elements. For instance,

{1, 2, 3} and {3, 1, 2}

are the same set. You might think of a set as an unordered list of elements. An ordered list of numbers is
a different kind of thing. For instance if we are thinking of (1, 2) and (2, 1) as the coordinates of points in
the plane, then the order matters. When we write (x, y) we mean the ordered pair of numbers. The use of
parentheses indicates that we mean ordered pair, not set.
If A and B are sets, we can consider ordered pairs (a, b) where a ∈ A and b ∈ B.
Definition. Suppose A and B are sets. The set of all such ordered pairs (a, b) with a ∈ A and b ∈ B is
called the Cartesian product of A and B:

A × B = {(a, b) : a ∈ A, b ∈ B}.

59
Don’t let the use of the word “product” and the notation “×” mislead you here — there is no multi-
plication involved. The elements of A × B are a different kind of object than the elements of A and B.
For instance the elements of R and Z are individual numbers (the second more limited that the first), but
an element of R × Z is an ordered pair of two numbers (not the result of multiplication). For instance
(π, −3) ∈ R × Z.
Example 3.7. Let X = {−1, 0, 1} and Y = {π, e}. Then

X × Y = {(−1, π), (0, π), (1, π), (−1, e), (0, e), (1, e)} .

We can do the same thing with more than two sets: for a set of ordered triples we would write

A × B × C = {(a, b, c) : a ∈ A, b ∈ B, c ∈ C}.

Example 3.8. If Γ = {a, b, c, d, e, . . . , z} is the English alphabet (thought of as a set) then Γ × Γ × Γ × Γ is


the (notorious) set of four-letter words (including ones of no known meaning).
When we form the Cartesian product of the same set with itself, we often write

A2 as an alternate notation for A × A.

So the coordinates of points in the plane make up the set R2 . The set of all possible coordinates for points
in three dimensional space is R3 = R × R × R. The set of four-letter words in the example above is Γ4 . The
next lemma lists some basic properties of Cartesian products.
Lemma 3.2. Suppose A, B, C, D are sets. The following hold.

a) A × (B ∪ C) = (A × B) ∪ (A × C).
b) A × (B ∩ C) = (A × B) ∩ (A × C).
c) (A × B) ∩ (C × D) = (A ∩ C) × (B ∩ D).

d) (A × B) ∪ (C × D) ⊆ (A ∪ C) × (B ∪ D).
As an example of how this sort of thing is proven, here is a proof of part b).
Proof. Suppose (x, y) ∈ A × (B ∩ C). This means x ∈ A and y ∈ B ∩ C. Since y ∈ B it follows that
(x, y) ∈ A × B. Similarly, since y ∈ C it follows that (x, y) ∈ A × C as well. Since (x, y) is in both A × B
and A × C, we conclude that (x, y) ∈ (A × B) ∩ (A × C). This proves that A × (B ∩ C) ⊆ (A × B) ∩ (A × C).
Suppose (x, y) ∈ (A × B) ∩ (A × C). Since (x, y) ∈ A × B we know that x ∈ A and y ∈ B. Since
(x, y) ∈ A × C we know that x ∈ A and y ∈ C. Therefore x ∈ A and y ∈ B ∩ C, and so (x, y) ∈ A × (B ∩ C).
This proves that (A × B) ∩ (A × C) ⊆ A × (B ∩ C).
Having proven containment both ways, this proves b) of the lemma.
The next example shows why part d) is only a subset relation, not equality, in general.
Example 3.9. Consider A = {a}, B = {b}, C = {c}, D = {d} where a, b, c, d are distinct. Observe that (a, d)
is an element of (A ∪ C) × (B ∪ D), but not of (A × B) ∪ (C × D).

Problem 3.7 Prove part c) of the above lemma.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L12c

60
D The Power Set of a Set
We can also form sets of sets. For instance if A = {1}, B = {1, 2} and C = {1, 2, 3}, we can put these
together into a new set
F = {A, B, C}.
This F is also a set, but of a different type than A, B, or C. The elements of C are integers; the elements of
F are sets of integers, which are quite different. We have used a script letter “F” to emphasize the fact that
it is a different kind of set than A, B, and C. While it is true that A = {1} ∈ F, it is not true that 1 ∈ F.
Starting with any set, A we can form its power set, namely the set of all subsets of A.
Definition. Suppose A is a set. The power set of A is

P(A) = {B : B ⊆ A}.

As an example 
P({1, 2, 3}) = ∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3} .
Observe that B ∈ P(A) means that B ⊆ A. We will have something interesting to say about the power set
of an infinite set in Section G.2.
We could keep going, and form the power set of the power set, and so forth. But if you carelessly wander
too far down that path you will find yourself in a logical quagmire! See Section H.2.

E Relations
There are many ways we can compare two mathematical objects of the same type. Here are some examples.
• inequality for real numbers: x ≤ y,
• containment for sets: A ⊆ B,
• division for integers: n|m (defined on page 15 above),
• equality of integers mod k: n ≡k m (defined in Section D of the next chapter).
Each of these is an example of what we call a relation between two objects of the same type. If X is a
set, a relation on X is really an open sentence R(x, y) taking two variables x, y ∈ X. For the frequently
used relations we have special symbols (like the examples above) that we write between the two arguments
instead of in front of them: “x ≤ y” instead of “≤ (x, y).” We will follow this pattern and write “x R y”
instead of “R(x, y)” for our discussion below.
A statement of relation “x R y” does not refer to some calculation we are to carry out using x and y. It
is simply a statement which has a well-defined truth value for each pair (x, y) ∈ X × X. Whether it is true
or false depends on the specific choices for x, y ∈ X. For instance consider again the inequality relation, ≤.
The statement x ≤ y is true for x = 2 and y = 3, but false for x = 3 and y = 2. (This also shows that order
matters; in general x R y is not the same statement as y R x.)
We can make up all kinds of strange relations; you will find several in the problems below. The important
ones are those which describe a property which is significant for some purpose, like the examples above. Most
useful relations have one or more of the following properties.
Definition. Suppose R is a relation on a set X and x, y, z ∈ X.
• R is called reflexive when x R x is true for all x ∈ X.
• R is called symmetric when x R y is equivalent to y R x.
• R is called transitive when x R y and y R z together imply x R z.
Example 3.10.
• Inequality (≤) on R is transitive and reflexive, but not symmetric .

61
• Strict inequality (<) on R is transitive, but not reflexive or symmetric.
• Define the coprime relation C on N so that n C m means that n and m share no common positive
factors other than 1. Then C is symmetric, but not reflexive or transitive.
Example 3.11. Define the lexicographic order relation on R2 by (x, y) L (u, v) when x < u or (x = u and
y ≤ v). Then L is transitive and reflexive, but not symmetric. Let’s write out the proof that L is transitive.
Suppose (x, y) L (u, v) and (u, v) L (w, z), where x, y, u, v, w, z ∈ R. Our goal is to show that (x, y) L (w, z).
We know that x ≤ u and u ≤ w. If either of these is a strict inequality, then x < w and therefore
(x, y) L (w, z). Otherwise x = u = w, y ≤ v, and v ≤ z. But then x = w and y ≤ z, which imply
(x, y) L (w, z). So in either case we come to the desired conclusion.

Problem 3.8 For each of the following relations on R, determine whether or not it is reflexive, symmetric,
and transitive and justify your answers.
a) x♦y means xy = 0.
b) x g y means xy 6= 0.
c) x  y means |x − y| < 5.
d) x y means x2 + y 2 = 1.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . relations

Definition. A relation R is called an equivalence relation when it is symmetric, reflexive, and transitive.
Equivalence relations are especially important because they describe some notion of “sameness.” For
instance, suppose we are thinking about angles θ in the plane. We might start by saying an angle is any
real number x ∈ R. But that is not quite what we mean: π/3 and 7π/3 are different as numbers, but they
are the same as angles, because they differ by a multiple of 2π. The next example defines an equivalence
relation on R that expresses this idea of sameness as angles.
Example 3.12. Let be the relation on R defined by

x y means that there exists k ∈ Z so that x = 2kπ + y.

For instance π/3 7π/3, because π/3 = 2(−1)π + 7π/3 so that the definition holds with k = −1.
This is an equivalence relation, as we will now check. Since x = 2 · 0 · π + x, the relation is reflexive.
If x = 2kπ + y, then y = 2(−k)π + x and −k ∈ Z if k is. This proves symmetry. If x = 2kπ + y and
y = 2mπ + z, then x = 2(k + m)π + z, showing that the relation is transitive.
Definition. Suppose R is an equivalence relation on X and x ∈ X. The equivalence class of x is the set

[x]R = {y ∈ X : xRy}.

The y ∈ [x]R are called representatives of the equivalence class.


When it is clear what equivalence relation is intended, often we leave it out of the notation and just write
[x] for the equivalence class. The equivalence classes “partition” X into to the sets of mutually equivalent
elements.
Example 3.13. Continuing with Example 3.12, the equivalence classes of are sets of real numbers which
differ from each other by multiples of 2π.

[x] = {. . . , x − 4π, x − 2π, x, x + 2π, x + 4π, . . .}.

For instance π/3 and 7π/3 belong to the same equivalence class, which we can refer to as either [π/3] or
[7π/3] — both refer to the same set of numbers. But [π/3] and [π] are different.

62
One of the uses of equivalence relations is to define new mathematical objects by disregarding irrelevant
properties or information. The equivalence relation x y of the above examples defines exactly what we
mean by saying x and y represent the “same angle.” If we have a particular angle in mind, then the set of
all the x values corresponding to that angle form one of the equivalence classes of . If we want a precise
definition of what an “angle” actually is (as distinct from a real number), the standard way to do it is say
that an angle θ is an equivalence class of : θ = [x]. The x ∈ θ are the representatives of the angle θ. The
angle θ is the set of all x which represent the same angle. In this way we have defined a new kind of object
(angle) by basically gluing together all the equivalent representatives of the same object and considering that
glued-together lump (the equivalence class) as the new object itself. This will be the basis of our discussion
of the integers mod m in Section D of the next chapter.

Problem 3.9 Show that each of the following is an equivalence relation.


a) On N × N define (j, k) k (m, n) to mean jn = km. (From [10].)
b) On R × R define (x, y) d (u, v) to mean x2 − y = u2 − v.
c) On (0, ∞) define x w y to mean x/y ∈ Q.
For a) and b), give a geometrical description of the the equivalence classes.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . er

Problem 3.10 Suppose ` is an equivalence relation on X. For x, y ∈ X let [x] and [y] be their equivalence
classes with respect to `.
a) Show that x ∈ [x].
b) Show that either [x] = [y], or [x] and [y] are disjoint.
c) Show that x ` y iff [x] = [y].
d) Consider the realtion ♦ on R from Problem 3.8. (Recall that this is not an equivalence relation.) Define

hxi = {y ∈ R : x♦y}.

Which of a), b) and c) above are true if [x] and [y] are replaced by hxi and hyi and ` is replaced by ♦?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ec

F Functions
The basic notion of a function f is familiar to any calculus student. It is a “rule” which assigns to each
allowed “input” value x value an “output” value f (x). In calculus the allowed values for x are usually
all real numbers, or some interval of real numbers. For functions of several variables the x = (x1 , x2 ) or
x = (x1 , x2 , x3 ) take values in R2 or R3 respectively. In general we can talk about a function whose domain
A is a set and whose codomain B is another. We write

f : A → B. (3.1)

This notation means that f is a function4 for which the allowed input values are the a ∈ A, and for each
such a the associated output value b = f (a) ∈ B. Don’t confuse “function” with “formula.” We can describe
a function with a table for instance, even if we can’t think of a formula for it. For our purposes, it is helpful
to visualize a function with a diagram like the following. The circle A indicates the points in the domain;
the circle B indicates the points in the codomain; and the function provides an arrow from each x ∈ A to
its associated value f (x) ∈ B.
4 We sometimes use the words “map” or “mapping” instead of “function.” They mean the same thing.

63
A B
fHxL
x

Example 3.14. Here are some examples to make the point that the elements of the domain and codomain
can be more complicated than just numbers.
a) Let R2×2 denote the set of 2 × 2 matrices with real entries, M = [ m 11
m21
m12
m22 ] and det(M ) be the usual
determinant:
det(M ) = m11 m22 − m12 m21 .
Then det : R2×2 → R.
b) Let C([0, 1]) be the set of all continuous functions f : [0, 1] → R. We can view the integral I(f ) =
R1
0
f (x) dx as a function I : C([0, 1]) → R.
c) For any c = (c1 , c2 , c3 ) ∈ R3 we can form a quadratic polynomial P (c) = c3 x2 + c2 x + c1 . We can view
P as a function P : R3 → C([0, 1]) from ordered triples to continuous functions.

Definition. Suppose f : A → B is a function. The range of f is the set

Ran(f ) = {b : there exists a ∈ A such that b = f (a)}.

Don’t confuse the codomain of a function with its range. The range of f is the set of values in b ∈ B for
which there actually does exist an a ∈ A with f (a) = b. The codomain B can be any set containing the
range as a subset. In general there is no presumption that Ran(f ) is all of B. What we understand the
codomain to be does affect how we answer some questions, as we will see shortly. When Ran(f ) = B we say
f is onto, or surjective; see the definition below. In particular whether f is onto or not depends on what we
understand the codomain to be.
Definition. Suppose f : A → B is a function.
a) f is said to be surjective (or a surjection, or onto) when for every b ∈ B there exists an a ∈ A for
which b = f (a).
b) f is said to be injective (or an injection, or one-to-one) when a = a0 is necessary for f (a) = f (a0 ).
c) When f is both surjective and injective, we say f is bijective (or a bijection).
Example 3.15. Suppose we consider the function f (x) = x2 for the following choices of domain A and
codomain B: f : A → B.
a) A = [0, ∞) and B = R. This makes f injective but not surjective. To prove that it is injective, suppose
a, a0 ∈ A and f (a) = f (a0 ). This means a, a0 ≥ 0 and a2 = (a0 )2 . It follows that a = a0 just as in the
proof of Proposition 1.3. This proves that f is injective. To see that f is not surjective, simply observe
that −1 ∈ B but there is no a ∈ A with f (a) = −1.
b) Now take A = R and B = R. For these choices f is still not surjective, but is not injective either,
because f (1) = f (−1).
c) Consider A = R and B = [0, ∞). Compared to b) all we have done is change what we understand the
As in b) f is not injective, but now is surjective, because every b ∈ B does have a
codomain to be. √
square root: a = b is in A and f (a) = b.
When we have two functions, f and g, we can sometimes follow one by the other to obtain the composition
of f with g.

64
Definition. Suppose f : A → B and g : C → D are functions, and that B ⊆ C. Their composition is the
function g ◦ f : A → D defined by

g ◦ f (a) = g (f (a)) for all a ∈ A.

Observe that we must have B ⊆ C in order for g (f (a)) to be defined.


√ x2
Example 3.16. Let f : [0, ∞) → R be defined by f (x) = x and g : R → R defined by g(x) = x2 +1 . Then
x
g ◦ f (x) = x+1 , for x ≥ 0.
Proposition 3.3. Suppose f : A → B and g : C → D are functions with B ⊆ C.
a) If g ◦ f is injective then f is injective.
b) If g ◦ f is surjective then g is surjective.
c) If B = C and both f and g are bijective, then g ◦ f is bijective.

Problem 3.11 Prove the Proposition.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . prpr

Problem 3.12
a) For part a) of the Proposition, give an example to show that g can fail to be injective.
b) For part b) of the Proposition, give an example to show that f can fail to be surjective.
c) In part c) of the Proposition, show that it is possible for f and g to fail to be bijections even if g ◦ f is.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . comp

Suppose f : A → B is a function. We think of f as “sending” each a ∈ A to a b = f (a) ∈ B: a → b.


What happens if we try to reverse all these arrows: a ← b; does that correspond to a function g : B → A?
The answer is “yes” precisely when f is a bijection, and the resulting function g is called its inverse.
Definition. Suppose f : A → B is a function. A function g : B → A is called an inverse function to f when
a = g(f (a)) for all a ∈ A and b = f (g(b)) for all b ∈ B. When such a g exists we say that f is invertible
write g = f −1 .
Example 3.17. The usual exponential function exp : R → (0, ∞) is invertible. Its inverse is the natural
logarithm: exp−1 (x) = ln(x).
x
Example 3.18. Consider X = (−2, ∞), Y = (−∞, 1) and f : X → Y defined by f (x) = x+2 . We claim that
2y
f is invertible and its inverse g : Y → X is given by g(y) = 1−y . To verify this we need to examine both
g ◦ f and f ◦ g. First consider g ◦ f . For any x ∈ X we can write
x 2
f (x) = =1− .
x+2 x+2
2
Since x + 2 > 0 it follows that 1 − x+2 < 1. Therefore f (x) ∈ Y for every x ∈ X, and so g(f (x)) is defined.
Now we can calculate that for every x ∈ X,
x
2 x+2 2x 2x
g(f (x)) = x = = = x.
1− x+2 x+2−x 2

Similar considerations apply to f (g(y)). For any y ∈ Y , 1 − y > 0. Since 2y > 2y − 2 = −2(1 − y), we see
2y
that g(y) = 1−y > −2, so that g(y) ∈ X. For each y ∈ Y we can now check that
2y
1−y 2y y
f (g(y)) = 2y = = = y.
1−y + 2 2y + 2(1 − y) 1

65
Problem 3.13 Suppose a, b, c, d are positive real numbers. What is the largest subset X ⊆ R on which
f (x) = ax+b cx+d is defined? Show that f : X → R is injective provided ad 6= bc. What is its range Y ? Find a
formula for f −1 : Y → X.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . domex

Problem 3.14 Prove that if an inverse function exists, then it is unique.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . invu

Problem 3.15 Suppose that f : A → B and g : B → A such that g(f (x)) = x for all x ∈ A.

a) Show by example that f need not be surjective and g need not be injective. Show that f (g(y)) = y for
all y ∈ B fails for your example.
b) Show that the following are equivalent.
1. f is surjective.
2. g is injective.
3. f (g(y)) = y for all y ∈ B.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . invcex

Theorem 3.4. A function f : X → Y is a bijection if and only if it has an inverse function.


Proof. Suppose f is a bijection. For any y ∈ Y , since f is a surjection there exists x ∈ X with f (x) = y.
Since f is injective, this x is unique. Thus each y ∈ Y determines a unique x ∈ X for which f (x) = y. We
can therefore define a function g : Y → X by

g(y) = x for that x ∈ X with f (x) = y.

We claim that g is an inverse function to f . To see that, consider any x ∈ X and let y = f (x). Then y ∈ Y
and by definition of g we have x = g(y). In other words x = g(f (x)) for all x ∈ X. Next consider any y ∈ Y
and let x = g(y). Then x ∈ X and by definition of g we know f (x) = y. Thus y = f (g(y)) for all y ∈ Y . We
see that g is an inverse function to f .
Now assume that there does exist a function g : Y → X which is an inverse to f . We need to show that f
is a bijection. To see that it is surjective, consider any y ∈ Y and take x = g(y). Then f (x) = f (g(y)) = y.
Thus all of Y is in the range of f . To see that f is injective, consider x, x0 ∈ X with f (x) = f (x0 ). Then
x = g(f (x)) = g(f (x0 )) = x0 . Hence f is indeed injective.

x
Problem 3.16 Let f : (−1, 1) → R be the function given by f (x) = 1−x2 . Prove that f is a bijection,
and that its inverse g : R → (−1, 1) is given by by
( √
1− 1+4y 2
for y 6= 0
g(y) = 2y
0 for y = 0.

One way to do this is to show f is injective, surjective, and solve y = f (x) for x and see that you get the
formula x = g(y). An alternate way is to verify that g(f (x)) = x for all x ∈ (−1, 1), that f (g(y)) = y for all
y ∈ R, and then appeal to Theorem 3.4.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . invex2

66
Images and Preimages of Sets
We usually think of a function f : A → B as sending elements a of A to elements b = f (a) of B. But
sometimes we want to talk about what happens to all the elements of a subset at once. We can think of f
as sending a set E ⊆ A to the set f (E) ⊆ B defined by

f (E) = {b ∈ B : b = f (e) for some e ∈ E}.

We call f (E) the image of E under f . We can do the same thing in the backward direction. If G ⊆ B, the
preimage of G under f is the set
f −1 (G) = {a ∈ A : f (a) ∈ G}.
We do not need f to have an inverse function to be able to do this! In other words f −1 (G) is
defined for sets G ⊆ B even if f −1 (b) is not defined for elements b ∈ B. The meaning of “f −1 (·)” depends
on whether what is inside the parentheses is a subset or an element of B.
Example 3.19. Consider f : R → R defined by f (x) = x2 .
Then
• f ((−2, 3)) = [0, 9). In other words the values 0 ≤ y < 9 are the only real numbers which arise as
y = f (x) for −2 < x < 3.
√ √
• f −1 ([1, 2]) = [− √2, −1] ∪ [1, 2]. In other words the values of x for which 1 ≤ f (x) ≤ 2 are those for
which 1 ≤ |x| ≤ 2.
• f −1 ([−2, −1)) = ∅. In other words there are no values of x for which −2 ≤ f (x) ≤ −1.
Warning: This function is neither injective nor surjective — f −1 does
√ not
√ exist as a function. However f
−1
−1 −1
of sets is still defined. f (2) is not defined, but f ({2}) is (= {− 2, 2}).

Problem 3.17 Consider f : Z → Z defined by f (n) = n2 − 7. What is f −1 ({k ∈ Z : k ≤ 0})?


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . invex

Problem 3.18 Suppose f : X → Y and A, B ⊆ Y . Prove that


a) A ⊆ B implies f −1 (A) ⊆ f −1 (B).

b) f −1 (A ∪ B) = f −1 (A) ∪ f −1 (B).
c) f −1 (A ∩ B) = f −1 (A) ∩ f −1 (B).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . setinv

Problem 3.19 Suppose f : X → Y , A, B ⊆ Y and C, D ⊆ X.

a) Show that it is not necessarily true that f −1 (A) ⊆ f −1 (B) implies A ⊆ B.


b) Show that it is not necessarily true that f (C) ∩ f (D) = f (C ∩ D).
c) If f is injective, show that it is true that f (C) ∩ f (D) = f (C ∩ D).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . nsetinv

67
G Cardinality of Sets
Now that we have summarized all the standard ideas about sets and functions, we are going to use them to
discuss the basic idea of the size of a set. Most of us have no trouble with what it would mean to say that
a set “has 12 elements” for instance, or even that a set “is finite.”
Example 3.20. Let A = {a, b, c, . . . , z} be our usual alphabet, and B = {1, 2, 3, . . . , 26}. There is a simple
bijection f (a) = 1, f (b) = 2, f (c) = 3, . . . in which f assigns to each letter its position in the usual ordering
of the alphabet. This f provides a one-to-one pairing of letters with numbers 1 ≤ n ≤ 26:

a goes with 1
b goes with 2
c goes with 3
..
.
z goes with 26.

Of course this works because both sets have exactly 26 elements.


In general when there is a bijection between two sets A and B, that means that A and B have exactly
the same number of elements. The bijection provides a way to pair the elements of A with the elements of
B, a “one-to-one correspondence” between the two sets. Here is the formal definition for this idea.
Definition. Two sets, A and B, are called equipotent 5 if there exists a bijection f : A → B. We will write
A ' B to indicate that A and B are equipotent.
Two equipotent sets are often said to have the same cardinality. The idea of equipotence gives a precise
meaning to “have the same number of elements” even when the number of elements is infinite. This agrees
with our intuitive idea of “same size” for finite sets, as the example above illustrated. Things becomes more
interesting for infinite sets.
Example 3.21. Let A = (0, 1) and B = (0, 2). The function f : A → B defined by f (x) = 2x is a bijection,
as is simple to check. So f provides a one-to-one correspondence between A and B, showing that these two
intervals are equipotent.
The same reasoning applied to Problem 3.16 says that (−1, 1) is equipotent to R. In both these examples
we have two sets A and B which are “of the same size” in terms of equipotence, even though in terms of
geometrical size (length) one is larger than the other.

G.1 Finite Sets


What exactly do we mean by saying as set A is a finite set? One way to express that idea is to say we
can count its elements, “this is element #1, this is element #2, . . . ” and get done at some point, “. . . this
is element #n, and that accounts for all of them.” That counting process consists of picking a function
f : {1, 2, 3, . . . , n} → A; f (1) is the element of A we called #1, f (2) is the element we called #2, and so on.
The fact that we accounted for all of them means that f is surjective. The fact that we didn’t count the
same element more than once means that f is injective. We can turn our counting idea of finiteness into the
following definition.
Definition. A set A is said to be finite if it is either empty or equipotent to {1, 2, 3, . . . , n} for some n ∈ N.
A set which is not finite is called infinite.
Could it happen that a set is equipotent to {1, 2, 3 . . . , n} as well as to {1, 2, 3, . . . , m} for two different
integers n and m? We all know that this is not possible.
Lemma 3.5 (Pigeon Hole Principle6 ). Suppose n, m ∈ N and f : {1, 2, 3, . . . , n} → {1, 2, 3, . . . , m} is a
function.
5 There are many different words people use for this: equipollent, equinumerous, congruent, equivalent. Many use the symbol

“≈,” but since that suggests approximately in some sense, we have chosen “'” instead.
6 Although this name sounds like something out of Winnie the Pooh, it’s what everyone calls it.

68
a) If f is injective then n ≤ m.
b) If f is surjective then n ≥ m.
We could write a proof of this (using the Well-Ordering Principle of the integers from the next chapter),
but it is tricky because what we are proving seems so obvious. Most of the effort would go toward sorting
out exactly what we can and cannot assume about subsets of Z. Instead we will take it for granted.
Corollary 3.6. If f : {1, 2, 3, . . . , n} → {1, 2, 3, . . . , m} is a bijection (n, m ∈ N), then n = m.

Problem 3.20 Prove that if there is an injection f : A → A which is not surjective, then A is not a finite
set.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . mapfin

G.2 Countable and Uncountable Sets


To say that “infinite” means “not finite” is simple enough. But now the big question: are all infinite sets
equipotent to each other? This is where things get interesting: the answer is “no”! In fact there are infinitely
many nonfinite cardinalities, as we will see. But a point to make first is that we are beyond our intuition
here. We depend on proofs and counterexamples to know what is true and what is not. Few people can
guess their way through this material.
Example 3.22. We will exhibit bijections for the following in class.
• {2k : k ∈ N} ' {2, 3, 4, . . .} ' N ' Z.
• R ' (0, 1) ' (0, 1].

Problem 3.21 Find an injective function f : N → [0, 1) (the half-open unit interval).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . injint

The next example shows that it is possible for two infinite sets to fail to be equipotent.
Example 3.23. N and [0, 1) are not equipotent! To see this, consider any function f : N → [0, 1). We
will show that f is not surjective, by identifying a y ∈ [0, 1) which is not in Ran(f ). We will do this by
specifying the decimal representation y = .d1 d2 d3 . . . where each digit di ∈ {0, 1, 2, . . . , 9}. Consider the
decimal representation of f (1). We want to choose d1 to be different than the first digit in the decimal
representation of f (1). For instance, if f (1) = .352011 . . . we could choose d1 = 7, since 7 6= 3. To be
systematic about it, if f (1) = .3 . . . take d1 = 7, and otherwise take d1 = 3. If f (2) = . ∗ 3 . . . take d2 = 7,
and otherwise take d2 = 3. If f (k) = . ∗ · · · ∗ 3 . . . (a 3 in the k th position) take dk = 7, and dk = 3 otherwise.
This identifies a specific sequence of digits di which we use to form the decimal representation of y. Then
y ∈ [0, 1), and for every k we know y 6= f (k) because their decimal expansions differ in the k th position.
Thus y is not in the range of f . Hence f is not surjective.

Definition. A set A is called countably infinite when A ' N. We say A is countable when A is either finite
or countably infinite. A set which is not countable is called uncountable.
Theorem 3.7 (Cantor’s Theorem). If A is a set, there is no surjection F : A → P(A).
(Note that we are using an upper case “F ” for this function to help us remember that F (a) is a subset, not
an element, of A.)
It’s easy to write down an injection: F (a) = {a}. Essentially Cantor’s Theorem says that P(A) always
has greater cardinality than A itself. So the sets N, P(N), P(P(N)), . . . will be an unending list of sets, each
with greater cardinality than the one before it! This is why we said above there are infinitely many nonfinite
cardinalities. Here is the proof of Cantor’s Theorem.

69
Proof. Suppose F : A → P(A) is a function. In other words, for each a ∈ A, F (a) is a subset of A. Consider

C = {a ∈ A : a ∈
/ F (a)}.

Clearly C ⊆ A, so C ∈ P(A). We claim that there is no b ∈ A for which F (b) = C. If such a b existed then
either b ∈ C or b ∈ C̃. But b ∈ C means that b ∈
/ F (b) = C, which is contradictory. And b ∈ C̃ would mean
that b ∈ F (b) = C, again a contradiction. Thus C = F (b) leads to a contradiction either way. Hence no
such b can exist.

Problem 3.22 Let F : R → P(R) be defined by


(
∅ if − 1 ≤ x ≤ 0,
F (x) =
(−x, x2 ) otherwise.

Find the set C described in the proof of Theorem 3.7.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cset

G.3 The Schroeder-Bernstein Theorem


The definition of equipotence gives a precise meaning for saying two sets have the same size. It is pretty easy
to express what we mean by a set A being at least as large as B, the existence of an injection f : B → A.
This implies that B is equipotent to a subset of A, which seems to agree with our idea of A being at least as
large as B. Now, if A is at least as large as B and B is at least as large as A, we might naturally expect that
A and B must be equipotent. That seems natural, but is it true? Yes, it is – this is the Schroeder-Bernstein
Theorem. But to prove it is not a simple task.
Theorem 3.8 (Schroeder-Bernstein Theorem). Suppose X and Y are sets and there exist functions f : X →
Y and g : Y → X which are both injective. Then X and Y are equipotent.
Example 3.24. To illustrate how useful this theorem can be, lets use it to show that Z ' Q. It’s easy to
exhibit an injection f : Z → Q; just use f (n) = n. It’s also not too hard to find an injection g : Q → Z.
n
Given q ∈ Q, start by writing it as q = ± m , where n, m are nonnegative integers with no common factors.
Using this representation, define g(q) = ±2n 3m . It is clear that this is also an injection. Theorem 3.8 now
implies that Z ' Q. By Example 3.22, it follows that Q is countable!

Problem 3.23 Use Theorem 3.8 to show [0, 1] ' (0, 1).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SBappl

Proof. Suppose f : X → Y and g : Y → X are both injective. Let Y0 = f (X), the range of f . Then f is a
bijection from X to Y0 , so X ' Y0 . Although Y0 is only a subset of Y we will show that there is a bijection
Y0 ' Y .
Let B0 = Y \ Y0 , the part of Y that f misses. Then recursively define

B1 = f (g(B0 )), B2 = f (g(B1 )), . . . , Bn+1 = f (g(Bn )), . . .

For n ≥ 1 note that Bn is a subset of Y0 and therefore disjoint from B0 . Define h : Y → Y0 by


(
f (g(y)) if y ∈ ∪∞0 Bn
h(y) =
y otherwise.

To see that h is surjective consider any y0 ∈ Y0 . If y0 does not belong to ∪∞


0 Bn then y0 = h(y0 ). Suppose
y0 does belong to ∪∞0 Bn . Since y0 ∈ Y0 which is disjoint from B0 it must be that y0 ∈ Bn for some n ≥ 1.
By definition of Bn this means y0 = f (g(y)) for some y ∈ Bn−1 . Thus y0 = h(y) for some y ∈ Y .

70
To see that h is injective suppose that h(y) = h(y 0 ) for some y, y 0 ∈ Y . Observe that h(∪∞ ∞
0 Bn ) = ∪1 Bn .
∞ 0 0 ∞ 0 0
Suppose y ∈ / ∪0 Bn then h(y ) = h(y) = y, and therefore y ∈ / ∪0 Bn . That implies that y = h(y ) = h(y) =
y. Next suppose y ∈ ∪∞ 0 B n . Then it must be that y 0
∈ ∪∞
0 B 0 0 ∞
n as well, else y = h(y ) = h(y) ∈ ∪1 Bn would
be a contradiction. But then f (g(y)) = h(y) = h(y ) = f (g(y )). Since f ◦ g is injective it follows that y = y 0 .
0 0

Having shown that h is a bijection, we know it has an inverse h−1 : Y0 → Y . The composition h−1 ◦ f is
a bijection from X to Y , proving the theorem.

H Perspective: The Strange World at the Foundations of Mathe-


matics
We close this chapter with a brief look at a couple of issues from the foundations of mathematics.

H.1 The Continuum Hypothesis


The collection of possible cardinalities of sets form what are called the cardinal numbers, a number system
that includes the nonnegative integers as well as the infinite cardinals. The possible cardinalities of finite sets
are just the nonnegative integers 0, 1, 2, 3, . . .. After that come the cardinalities of infinite sets — we have seen
that they are not all the same. We know that both P(N) and R have strictly larger cardinality than N; both
are uncountable. (In fact it can be shown that P(N) and R are equipotent.) A few infinite cardinal numbers
have symbols associated with them. For instance the cardinality of N (or any other countably infinite set)
is usually denoted ℵ0 , and the cardinality of R is sometimes denoted c. We can say with confidence that
ℵ0 < c, because there is an injection from N into R but we know that R is uncountable. During the latter
nineteenth and early twentieth centuries mathematicians pondered the question of whether there could be
a set K with cardinality strictly between these two: ℵ0 < cardinality of K < c. The Continuum Hypothesis
is the assertion that there is no such K, that c is the next largest cardinality after ℵ0 . Is the Continuum
Hypothesis true? The answer (if you can call it that) is more amazing than you would ever guess. We will
leave it hanging over a cliff until the end of the next chapter!

H.2 Russell’s Paradox


We have mentioned that a logical swamp awaits those who wander too far along the path of forming sets of
sets of sets of . . . This is a real mind-bender — take a deep breath, and read on.

Let S be the set of all sets. It contains sets, and sets of sets, and sets of sets of sets and so
on. If A and B are sets then A, B ∈ S, and it is possible that A ∈ B, e.g. B = {A, . . .}. In
particular we can ask whether A ∈ A. Consider then the set

R = {A ∈ S : A ∈
/ A}.

Now we ask the “killer” question: is R ∈ R? If the answer is “yes,” then the definition of R says
that R ∈
/ R, meaning the answer is “no.” And if the answer is “no,” the definition of R says
that R ∈ R, meaning the answer is “yes.” Either answer to our question is self-contradictory.

What is going on here? We seem to have tied ourselves in a logical knot. This conundrum is called Russell’s
Paradox. A paradox is not quite the same as a logical contradiction or impossibility. Rather it is an apparent
contradiction, which typically indicates something wrong or inappropriate about our reasoning. Sometimes a
paradox is based on a subtle/clever misuse of words. Here we are allowing ourselves to mingle the ideas of set
and element too freely, opening the door to the above paradoxical discussion that Bertrand Russell brewed
up. Historically, Russell’s Paradox showed the mathematical world that they had not yet fully worked out
what set theory actually consisted of, and sent them back to the task of trying to decide which statements
about sets are legitimate and which are not. This led to the development of axioms which govern formal set
theory (the Zermelo-Fraenkel axioms) which prevent the misuse of language that Russell’s paradox illustrates.
(In the next chapter we will see what we mean by “axioms” in the context of the integers.) This is a difficult

71
topic in mathematical logic which we cannot pursue further here. Our point here is only that careless or
sloppy use of words can tie us in logical knots, and a full description of what constitutes appropriate usage
is a major task.
These things do not mean that there is something wrong with set theory, and that we are just choosing
to go ahead in denial. Rather they mean that we have to be careful not to be too cavalier with the language
of sets. For virtually all purposes, if we limit ourselves to two or three levels of elements, sets, sets of sets,
and sets of sets of sets (but stop at some point) we will be able to sleep safely, without fear of these dragons
from the logical abyss. See Chapter 4, Section E for a bit more on these topics.

72
Chapter 4

The Integers

The number systems we use most frequently are the natural numbers (N), the integers (Z), the real numbers
(R), and the complex numbers (C). Each of these has properties that the others don’t. In this chapter we
are going to focus on the essential properties of the integers. But as we work through that we will observe
the contrasts with other number systems. In the final section we will introduce the integers mod m (Zm ), a
finite number system that can be remarkably useful.
A proof always depends on √ some prior facts which are taken to be true without question. For instance
in the proof of irrationality of 2 we used properties of prime numbers. Often those prior facts are things
that were proven previously. But how far back can we go? Does every mathematical fact have a proof? In
this chapter we will explore this for the integers Z. We will be looking backwards to the logical foundations
of what we know about the integers. This requires some mental discipline. If we are going to use Fact #1
to prove Fact #2, then we must not use Fact #2 to prove Fact #1. We have to restrict the facts used in a
proof to those which come earlier in this development from the most primitive foundations. We can’t just
use anything we know about the integers; we need to constantly ask ourselves “can I use this fact in a proof
yet?” We have to suspend our instinctive knowledge of the integers temporarily while we build the logical
framework on which that instinctive knowledge rests.

A Properties of the Integers


We begin in this section by stating some of the most basic properties of the integers. These properties are
of two types, algebraic properties and order properties.

A.1 Algebraic Properties


All of the number systems mentioned above have two basic algebraic operations: addition + and multipli-
cation · . The algebraic properties of the integers are shared by the other number systems we mentioned.
We describe them below for Z but you should observe that all of these hold if Z is replaced by R or C as
well. The standard terminology describing each property is given in parentheses following the statement of
the property itself.

(A1) a + b ∈ Z for all a, b ∈ Z (closure under addition).


(A2) a + b = b + a for all a, b ∈ Z (commutative law of addition).
(A3) a + (b + c) = (a + b) + c for all a, b, c ∈ Z (associative law of addition).

(A4) There exists an element 0 ∈ Z with the property that 0 + a = a for all a ∈ Z (existence of additive
identity).
(A5) For each a ∈ Z there exists an element −a with the property that a + (−a) = 0 (existence of additive
inverses).

73
(M1) a · b ∈ Z for all a, b ∈ Z (closure under multiplication).
(M2) a · b = b · a for all a, b ∈ Z (commutative law of multiplication).
(M3) a · (b · c) = (a · b) · c for all a, b, c ∈ Z (associative law of multiplication).
(M4) There exists an element 1 ∈ Z with the property that 1 · a = a for all a ∈ Z (existence of multiplicative
identity).
(D) a · (b + c) = (a · b) + (a · c) for all a, b, c ∈ Z (distributative law).

Notice first of all that the properties make no reference to an operation of subtraction of two integers:
b − a. The negation −a of a single integer is described in (A5). We understand subtraction as shorthand for
the combined operation of first taking the element −a, and then adding it to b:
“b − a” = b + (−a).

In that way the properties of subtraction can be deduced from the above properties.
Notice also that the properties make no mention of division: a/b. This time the reason is different: there
is no fully-defined operation of division for the integers. We can always take two inegers a, b and add them
to get another integer a + b; that’s what closure under addition, property (A1), is about; the operation
of addition is always defined and always produces another integer. The same holds for multiplication; as
property (M1) says. But we can not always divide any pair of integers. For instance if a = 3 and b = 2,
there is no integer 3/2. You may say “but wait, 3/2 is defined; it’s the number we call 1.5.” Well yes, but
that is not another integer. You can only make sense of 3/2 by moving into a different number system like
Q or R which does have operations of division. You can’t form 3/2 and stay within the integers. The word
“closure” in (A1) and (M1) means that those operations always produce results within the integers. You
can’t do that with division; it would force you outside the integers in general. So the integers are not closed
under division. We can talk about doing something like division within the integers if we allow a remainder;
we will come to that in Theorem 4.5 below. But the usual operation of division is undefined within the
integers proper.
Lets consider the properties above with the natural numbers N instead of Z. We see that they do not
satisfy property (A4); the additive identity 0 is not a natural number, and there is no other natural number
with the property of (A4). Since 0 does not exist in N we can’t even consider property (A5). We are starting
to see how different number systems have different properties. The natural numbers do not have properties
(A4) and (A5), while the integers do. The integers have no operation of division, while R and C do. Although
the real numbers and complex numbers (as well as integers mod m) do share all the properties of Z we have
listed so far. There is a property of the integers that they do not share, and that has do do with the order
relation <.

A.2 Properties of Order


You know from the last chapter what we mean by a relation on a set Z. Here are the fundamental properties
of the order relation <.

(O1) For any a, b ∈ Z one and only one of the following is true: a < b, a = b, b < a (total ordering property).
(O2) If a < b and b < c then a < c (order is transitive).
(O3) If a < b and c ∈ Z then a + c < b + c (translation invariance).
(O4) If a < b and 0 < c then c · a < c · b (multiplicative invariance).

Again, these are not unique to Z. They also hold for R. But now we come to the special property of Z
which distinguishes it from R and other “higher” number systems: the Well-Ordering Principle.
(W) If S is a nonempty subset of {a ∈ Z : 0 < a}, then there exists s∗ ∈ S with the property that s∗ < s
for all s ∈ S other than s = s∗ .

74
First, notice that {a ∈ Z : 0 < a} is just a description of the natural numbers, so we could view (W) as a
property of N which is inherited by Z. What (W) says is that any set S of positive integers that has at least
one element always has a smallest element s∗ . For instance there is a smallest prime number (2), a smallest
perfect square (1), a smallest prime p such that the next prime is p + 10 (p = 139). We will see some of
the consequences of the Well-Ordering Principle in Section A.5, and in the next section we will observe that
neither R nor C satisfy this property.

A.3 Comparison with Other Number Systems


We have noted that N does not satisfy properties (A4) or (A5), but does satisfy all the other properties
of the integers. There are other settings in which both multiplication and addition are defined but fail to
satisfy all the algebraic properties above. For example, consider the set M2×2 of all 2 × 2 matrices. We can
add, multiply and negate such matrices: if
   
a11 a12 b11 b12
A= and B = ,
a21 a22 b21 b22
then
     
a + b11 a12 + b12 a b + a12 b21 a11 b12 + a12 b22 −a11 −a12
A + B = 11 , AB = 11 11 , −A = .
a21 + b21 a22 + b22 a21 b11 + a22 b21 a21 b12 + a22 b22 −a21 −a22

The role of 0 is played by the matrix of all zeros [ 00 00 ], and the role of 1 is played by the identity matrix
I = [ 10 01 ]. All the algebraic properties listed above hold for matrices, except (M2); matrix multiplication is
not commutative. Problem 4.1 will point out another familiar setting in which our algebraic properties are
not all satisfied. So as seemingly obvious as all those properties are, they are not universal!
Now lets consider what the Well-Ordering Principle would say in the context of the real numbers R.
Consider S = (1, +∞) = {x ∈ R : 1 < x}. This is certainly a nonempty subset of {a ∈ R : 0 < a}. So if
the Well-Ordering Principle were true for R, There would exist a smallest number in S = (1, +∞). But that
is not true; for any number s ∈ S we can always find an even smaller one, like (1 + s)/2 which is halfway
between 1 and s. There are numbers like s∗ = 1/2 or s∗ = 1 with the property that s∗ < x for all x ∈ S,
but none of them belong to S themselves, as the Well-Ordering Principle would require. So we see that (W)
is not true for R. If you take a course in advanced calculus or analysis, you will learn that R has a different
property in its place, called “completeness,” which is the foundation for limits and the other fundamental
constructions of calculus.
What about the complex numbers? Problem 4.3 explains that there does not exist an order relation <
for the complex numbers satisfying (O1)—(O4), so it’s pointless to even consider (W) in the case of C!

Problem 4.1 You are familiar with adding and subtracting vectors: if ~x = (x1 , . . . , xn ) and ~y = (y1 , . . . , yn ),
then
~x ± ~y = (x1 ± y1 , . . . , xn ± yn ).
In general there is no way to multiply vectors, except in the special case of n = 3. In that case we have the
cross product:
~x × ~y = (x2 y3 − x3 y2 , x3 y1 − x1 y3 , x1 y2 − x2 y1 ).
(It is usually defined in terms of ~i = (1, 0, 0), ~j = (0, 1, 0) and ~k = (0, 0, 1), but if you write out whatever
description of the cross product you are familiar with you will see that it is equivalent to what we wrote
above.) Several of the axioms fail for the cross product.
a) Show (by example) that axioms (M2) and (M3) are false for ×.
b) Show that (M4) is false for ×. I.e. there is no vector ~e that has the property of the number 1, namely
that ~e × ~x = ~x for all ~x.
c) Show (by example) that it is possible for x × y = (0, 0, 0) even if x 6= (0, 0, 0) and y 6= (0, 0, 0).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cross

75
A.4 Further Properties of the Integers
We have not offered proofs for the properties of the integers we have listed so far. We have just been describing
the integers as we know them from our past experience, not trying to explain why these properties must be
true. There are many additional properties that we have not mentioned yet, like the following.

• 0 < 1.
• 0·1=0
• (−1) · (−1) = 1.
• 1 is the smallest positive integer.

At what point can we start giving reasons (proofs!) for additional properties based on those we have already
described? Could you prove the algebraic or order properties we listed in the preceding sections? Can you
give reasons for the properties listed as bullets above? When are we done describing the integers and able
to start proving things about it? To answer the questions we need to make a distinction between those
properties which define the integers and so are assumed without proof, and those which are consequences
of the defining properties and so ought to be proven. Those properties which comprise the definition of the
integers, and so are assumed, not proven, are called the axioms of the integers. In other words axioms for the
integers consists of a collection of basic properties from which all other properties can be derived (proven)
logically. There is more than one way to identify a set of such basic properties for the integers. One such set
of axiomatic properties consists of those we have listed in the preceding sections: (A1)—(A5), (M1)—(M4),
(D), (O1)—(O4), (W), and one other: the axiom of nontriviality:

(N) 0 6= 1
This seems like a rather silly, obvious observation. But in fact it cannot be proven from the other axioms,
and without it things like 0 < 1 are unprovable. (It will be used in the proof of part 6) of Proposition 4.1
below.) So it needs to be assumed!. All other properties of the integers (like our bullets just above) should,
in principle1 , be provable from the axioms. We don’t intend to go through the (long) endeavor of writing
proofs of all the well-known properties of the integers. But we will look at a few, collected as the following
proposition, just to get an idea of how this kind of proof would go. Observe that parts 6), 2), 4), and 8)
(respectively) provide the justification for our bullets above.
Proposition 4.1.

1) 0 and 1 are unique. (In other words, 0 is the only integer satisfying property (A4) and 1 is the only
integer satisfying (M4).)
2) For all a ∈ Z, 0 · a = 0.

3) For all a ∈ Z, −(−a) = a.


4) For all a, b ∈ Z, (−a) · b = −(a · b).
5) a < 0 if and only if 0 < −a.
6) 0 < 1.

7) If c · a = c · b and c 6= 0, then a = b.
8) If 0 < a then either 1 < a or 1 = a.
1 Actually this is not quite true; see the discussion in Section E for a glimpse of what goes wrong. But for all the properties

that we will use, it’s true.

76
We might call 7) the “law of cancellation.” The reason it is true is not because we can divide both sides by
c; there is no fully defined notion of division in Z, as we have already pointed out. Rather the reason is that
it is a consequence of 2), as we will see in the proof.
As we begin this proof we have nothing available to us except what the axioms say. We have to be careful
to not slip into using some facts about the integers that are familiar to us but not stated in the axioms.
Once 1) is proven then we can use it in the proofs of the subsequent parts. Once 2) is proven we can use it
to prove the parts that come after it. And so forth.
Proof. For 1), to prove that 0 is unique, we suppose z ∈ Z also has the property that z + a = a for all a ∈ Z.
Using this we need to show why 0 = z must be true. Consider 0 + z. By (A4) with a = z we find that
0 + z = z. On the other hand using (A2) we know that 0 + z = z + 0. Our hypothesis regarding z, with
a = 0, implies that z + 0 = 0. Thus 0 + z must equal both 0 and z, and therefore 0 = z. Thus there is only
one 0 satisfying (A4). The uniqueness of 1 in (M4) is proved the same way.
For 2), consider any a ∈ Z. We want to prove that 0 · a = 0.

0 = a + (−a) by (A5)
= 1 · a + (−a) by (M4)
= a · 1 + (−a) by (M2)
= a · (0 + 1) + (−a) by (A4)
= [(a · 0) + (a · 1)] + (−a) by (D)
= [(0 · a) + (1 · a)] + (−a) by (M2)
= [(0 · a) + a] + (−a) by (M4)
= (0 · a) + [a + (−a)] by (A3)
= (0 · a) + 0 by (A5)
= 0 + (0 · a) by (A2)
=0·a by (A3)

Thus 0 · a = 0 as claimed.

Proofs of 3)–5) are left as problems. We continue with proofs of 6)–8) under the assumption that 1)—5)
have been established and so can be used in the proofs for the subsequent parts.

For 6), by virtue of (O1) we need to show that neither 1 < 0 nor 0 = 1 is possible. We know by the
nontriviality axiom that 0 6= 1. So suppose 1 < 0. By adding −1 to both sides using (O3) we find that
0 < −1. Using this in (O4) we have that

(−1) · 0 < (−1) · (−1). (4.1)

By (M2) and 2) (−1) · 0 = 0 · (−1) = 0. We also know that

(−1) · (−1) = −[1 · (−1)] by 4)


= −[−1] by (M4)
=1 by 3).

Making these substitutions in (4.1) we find that 0 < 1. But according to (O1) both 1 < 0 and 0 < 1 cannot
be true simultaneously. This shows that 1 < 0 is not possible.
For 7), since c 6= 0 we know from (O1) that either 0 < c or c < 0. Suppose first that 0 < c. From (O1)
again we know that one of the following must be true: a < b, a = b, or b < a. If a < b, then multiplying
both sides by c and using (O4), we deduce that c · a < c · b, contrary to our hypothesis that c · a = c · b. If

77
b < a, then we deduce in the same way that c · b < c · a, again contrary to our hypotheses. This proves 7) in
the case that 0 < c. If c < 0, then we know by 5) that 0 < −c. By 4) it follows that

(−c) · a = −(c · a) = −(c · b) = (−c) · b.

Since −c > 0 the same argument we just gave implies that a = b. This completes the proof of 7).
For 8), consider
S = {a ∈ Z : 0 < a}
and let s∗ be the smallest element of S, as guaranteed by (W). Since 1 ∈ S by 6), we just need to show that
s∗ < 1 is not possible. To prove this by contradiction, suppose s∗ < 1. Since we also know 0 < s∗ , we can
use (O4) and 2) to deduce that 0 < s∗ · s∗ and s∗ · s∗ < s∗ . Therefore s∗ · s∗ ∈ S. But the property of s∗
from (W) does not allow s∗ · s∗ < s∗ . This contradiction shows that s∗ < 1 is not possible, and therefore
s∗ = 1. This completes the proof.

Problem 4.2 Write proofs for parts 3)–5) of the proposition. (You may use any of the earlier parts in
your proofs of later parts.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . basics

Problem 4.3 Consider C and suppose that there was a relation ≺ satisfying (O1)—(O4). Since i 6= 0
(else i2 = 0, not −1) property (O1) would require that either 0 ≺ i or i ≺ 0. Show how using either of these,
(O4) leads to a contradiction to −1 ≺ 0 ≺ 1. The same argument as in Proposition 4.1 for < on the integers
shows that −1 ≺ 0 ≺ 1. So even if there were some bizarre order realtion ≺ on C that is incompatible with
the usual order on Z, it still would have to be true that −1 ≺ 0 ≺ 1. You can therefore take it for granted
that −1 ≺ 0 ≺ 1, and anything that violates that is indeed a contradiction.
Do you think there is an order relation on C which satisfies (O1)—(O3) but not necessarily (O4)?
Explain.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . nonord

Problem 4.4

a) Use the fact that 1 is the smallest positive integer to prove that if m, n ∈ N and n|m then n ≤ m.
b) Suppose n, m ∈ Z and that both n|m and m|n. Prove that either m = n or m = −n.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . divpm

A.5 A Closer Look at the Well-Ordering Principle


The Well-Ordering Principle is probably the least familiar of the axioms for Z. In this subsection we are
going to develop two of its important consequences, neither of which will be big surprises to you. However
they are properties that do not follow from the other axioms alone, but really depend on well-ordering. That
makes them peculiar to the integers. In particular neither the real numbers nor the complex numbers have
properties like those we prove below.

The Principle of Mathematical Induction


Recall the technique of proof by induction described in earlier chapters. We said that in order to prove a
statement of the form
P(n) for all n ∈ N,

78
we could a) prove that P(1) is true, and b) prove that whenever P(n) is true then P(n + 1) is true. The
validity of this proof technique rests on the Well-Ordering Principle. To see the connection consider the set

T = {n ∈ N : P(n) is true}.

The method of proof by induction consists of showing that T has these properties: a) 1 ∈ T , and b) if n ∈ T
then n + 1 ∈ T . From these we are supposed to be able to conclude that T = N. So at the heart of proof by
induction is the following fact.
Theorem 4.2 (Induction Principle). Suppose T ⊆ N contains 1 and has the property that whenever n ∈ T ,
then n + 1 ∈ T as well. Then T = N.

The proof is by contradiction; if T 6= N then S = N \ T would be a nonempty subset of N. The Well-


Ordering Principle would say that S has a smallest element, s∗ . Let n = s∗ − 1. Since n < s∗ n ∈
/ S and
therefore n is in T . The picture is like this:

1 ∈ T, 2 ∈ T, · · · , n ∈ T, n + 1 = s∗ ∈
/ T, · · ·

But this is contrary to the hypothesis that n ∈ T implies n + 1 ∈ T . Here is the proof written out with the
rest of the details.
Proof. Consider S = N \ T . Our goal is to show that S = ∅. Suppose on the contrary that S is not empty.
Since S ⊆ N we know that 0 < n for all n ∈ S. Thus the Well-Ordering Principle applies to S, and so there
is a smallest element s∗ of S. In particular, s∗ ∈ / T . Since S ⊆ N we know 1 ≤ s∗ . But since 1 ∈ T , we
know that 1 ∈ / S and therefore 2 ≤ s∗ . Consider n = s∗ − 1. Then 1 ≤ n and since n < s∗ , we know n ∈ / S.
Therefore n ∈ T . By the hypotheses on T , it follows that n + 1 ∈ T as well. But n + 1 = s∗ which we know is
in S, which is contrary to n + 1 ∈ T . This contradiction proves that S must in fact be empty, which means
that T = N.
The Well-Ordering Principle can be used to establish a more general version in which the set S can
include negative integers, provided it has a lower bound. A second generalization considers sets with an
upper bound, and guarantees the existence of a largest element.
Theorem 4.3 (Generalized Well-Ordering Principle). Suppose S is a nonempty subset of Z.
a) If S has a lower bound (i.e. there exists k ∈ Z with k ≤ s for all s ∈ S) then S has a least element s∗
b) If S has an upper bound (i.e. there exists m ∈ Z with s ≤ m for all s ∈ S) then S has a largest element.

We will write a proof of a) and leave the proof of b) as a problem. The idea for a) is to reduce the situation
to one for which standard well-ordering applies. Our S is a subset of {a ∈ Z : k ≤ a}, but to apply standard
well-ordering it needs to be a subset of N. So what we will do is construct a new set S 0 by adding 1 − k to
every element of S, shifting everything up so that S 0 is a subset of {a ∈ Z : 1 ≤ a} = {a ∈ Z : 0 < a}.
Standard well-ordering will then apply to S 0 . We take the least element s0∗ of S 0 and subtract 1 − k back off.
The result should be the least element of S.
Proof of a). Let S be nonempty with a lower bound k. Define

S 0 = {s + 1 − k : s ∈ S}.

Since S is nonempty, S 0 is also nonempty. For any s0 ∈ S 0 there is some s ∈ S such that s0 = s + 1 − k.
Since s ≥ k it follows that s0 ≥ 1 > 0. Therefore S 0 ⊆ {a ∈ Z : 0 < a}. The Well-Ordering Principle applies
to S 0 , so there exists s0∗ ∈ S 0 with s0∗ ≤ s0 for all s0 ∈ S 0 . Since s0∗ ∈ S 0 it follows that there is s∗ ∈ S with
s0∗ = s∗ + 1 − k. For any s ∈ S, we know that s + 1 − k ∈ S 0 and therefore s0∗ ≤ s + 1 − k. This implies that

s∗ = s0∗ + k − 1 ≤ s.

Thus s∗ is the least element of S.

79
Problem 4.5 Write a proof of part b) of Theorem 4.3. You will need to do something similar to the
proof of a), building a new set S 0 so that the upper bound of S turns into a lower bound of S 0 . Apply the
Well-Ordering Principle to S 0 and convert back to a statement about S, like what we did for part a).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ub

Theorem 4.4 (Generalized Induction Principle). Suppose T ⊆ Z contains n0 , and that n ∈ T implies
n + 1 ∈ T . Then T contains all n ∈ Z with n ≥ n0 .

Problem 4.6 Write a proof of the Generalized Induction Principle.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GIPf

Problem 4.7 Formulate and prove a version of the induction principle for strong induction.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . strongind

The Division Theorem


Although we can’t always divide one integer we can do something like division if we also allow a remainder.
Theorem 4.5. Suppose m, n ∈ Z and 0 < n. There exist unique q, r ∈ Z so that

m = q · n + r and 0 ≤ r < n.

Everyone knows this; n “goes into” m some maximum number of times q (the quotient), leaving a remainder
r. The theorem says you can always do this, and that q and r are unique, provided we insist that 0 ≤ r < n.
For instance if m = 17 and n = 5, then q = 3 and r = 2:

17 = 3 · 5 + 2 and 0 ≤ 2 < 5.

The theorem is sometimes called the “division algorithm,” but that is really a misnomer. An algorithm is a
procedure or sequence of steps that accomplishes a specific task. The theorem above offers no algorithm for
finding q and r (although you can probably think of a procedure without too much trouble).
You don’t need to see a proof of the division theorem to know it is true. But we are going to write out a
proof, because we want to exhibit its connection to the Well-Ordering Principle. (It can also be proven by
strong induction. But, as we just saw, the Induction Principle is a manifestation of well-ordering, so however
you prove it it comes down to well-ordering in the end.)
Proof. Define the set

R = {b ∈ Z : b ≥ 0 and there exists an integer k for which m = k · n + b}.

Our goal is to apply the Generalized Well-Ordering Principle to show that R has a smallest element, which
will be the r that we want.
We need to verify that R satisfies the hypotheses of the Generalized Well-Ordering Principle. First, by
definition every b ∈ R satisfies 0 ≤ b. So 0 is a lower bound for R. Second, we need to show that R is
nonempty. We do that by considering two cases: 0 ≤ m or m < 0. If 0 ≤ m, then m = 0 · n + m and so
m ∈ R. Now consider the case of m < 0. Since m = mn + (n − 1)(−m) and (n − 1)(−m) ≥ 0, we see that
(n − 1)(−m) ∈ R. Thus R is nonempty in either case.
We can now appeal to the Generalized Well-Ordering Principle to deduce that there is an r ∈ R which
is smallest possible. By definition of R we know that 0 ≤ r and there is q ∈ Z for which m = q · n + r.
To finish we need to show that r < n. We do this by contradiction. Suppose to the contrary that n ≤ r.
Then consider b = r − n. Since n ≤ r it follows that 0 ≤ b. Since m = (q + 1) · n + (r − n), we see that
b = r − n ∈ R. But r − n < r, contradicting the fact that r is the smallest element of R. This contradiction
proves that r < n, finishing the proof of existence.

80
Now we need to prove uniqueness. Assume that there is a second pair of integers q 0 , r0 with m = q 0 · n + r0
and 0 ≤ r0 < n. We want to show that q = q 0 and r = r0 . Since qn + r = q 0 n + r0 , it follows that
(q − q 0 )n = r0 − r. But 0 ≤ r, r0 < n implies that −n < r0 − r < n, and the only multiple of n in that interval
is 0 · n = 0. Thus r = r0 , and therefore (q − q 0 )n = 0. Since 0 < n we are forced to conclude that q = q 0 .
As elementary as the division theorem seems it is an important tool in writing proofs about Z. We will
see that in several places in the rest of this chapter.

Problem 4.8 Find the quotient and remainder of the division theorem for the following
a) m = 2297, n = 172.

b) m = 44633, n = 211.
c) m = 64016, n = 253.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . divex

B Greatest Common Divisors


If in the Division Theorem the remainder turns out to be r = 0, then m = qn for some integer q, which means
that n divides m as defined on page 15: n|m. Some elementary properties of divisibility are the following.
• 1|n for all n ∈ Z.
• n|0 for all n ∈ Z.

• 0|n if and only if n = 0.


• The following are equivalent:
i) n|m,
ii) n|(−m),
iii) (−n)|m.

Problem 4.9 Suppose k, m, n ∈ Z. Prove that if k divides m and m divides n, then k divides n.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . divtrans

Problem 4.10 Prove that if m divides a and b, then m divides a ± b and ac (for any integer c).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . division

Many of the most interesting properties of the integers, such as those involving prime numbers, are due
to the fact that division (with zero reminder) is not always possible. The notion of the greatest common
divisor of two integers m, n is an important tool for discussing various divisibility related issues. Because
divisibility does not care about ± signs, we will state the definition just for natural numbers.
Definition. Suppose a, b ∈ N. We say g ∈ N is the greatest common divisor of a and b, and write
g = gcd(a, b), if both the following hold:
a) g divides both a and b,
b) whenever d ∈ N divides both a and b then d also divides g.

81
This definition is unexpected in a couple ways. First, it does not refer to factorizations of n and m, even
though that is probably how you are used to thinking of greatest common divisors. (See Example 4.2
below.) Second, part b) of the definition doesn’t interpret “greatest” as d ≤ g but rather as d|g. Perhaps
that seems strange to you. However in other algebraic systems we won’t always have an order relation. This
division interpretation of “greatest” turns out to be the way to generalize the definition in those settings.
For instance, when we replace n and m by polynomials p(x), q(x) in the next chapter, the interpretation of
“greater” as p(x) ≤ q(x) would be inadequate, because for polynomials it is possible that neither p(x) ≤ q(x)
nor q(x) ≤ p(x) is true — not all polynomials are comparable in the sense of inequality. But the definition
based on divisors dividing other divisors will work just fine. We are using the more general definition here
for integers as well. Just remember that when you are asked to prove something about gcd(a, b), don’t revert
to a naive interpretation of “greatest,” but use the definition above.
Notice that the definition said “the” rather than “a” greatest common divisor. The next theorem justifies
this presumption of uniqueness. The proof in fact also yields an unexpected fact!
Theorem 4.6. For a, b ∈ N the greatest common divisor of a and b exists and is unique. Moreover, there
exist α, β ∈ Z for which gcd(a, b) = αa + βb.
Example 4.1. Looking ahead to Example 4.2, gcd(8613, 2178) = 99. Observe that

99 = (−1) · 8613 + 4 · 2178,

showing that in this case the theorem does hold, with α = −1 and β = 4.
In this example we have given no hint as to how we found the values of α and β. The proof of the
theorem establishes theoretically that they exist, but that is not a very practical way to actually find them.
Problem 4.13 below suggests an algorithm for calculating the α and β.
Proof. We first prove uniqueness. Suppose both g and g 0 satisfy the definition of gcd(a, b). Then g divides
g 0 and g 0 divides g. Since they are both positive, Problem 4.4 implies that g = g 0 or g = −g 0 . But g, g 0 ∈ N
implies that g 6= −g 0 . Therefore g = g 0 , proving uniqueness.
The existence will be a consequence of the proof of the last part. For that, define

D = {k ∈ N : k = αa + βb for some α, β ∈ Z}.

Since a, b ∈ D we know that D is nonempty, so the Well-Ordering Principle implies that there is a smallest
element d of D. In particular there are integers α0 , β0 for which

d = α0 a + β0 b. (4.2)

We claim that d divides both a and b. Suppose d did not divide a. Then there would exist a quotient q and
remainder 0 < r < d with
a = qd + r.
We can rearrange this as
r = a − qd = (1 − qα0 )a + (−qβ0 )b.
But this implies that r ∈ D and is smaller than d, contrary to the minimality of d. Thus it must be that d
divides a. A similar argument shows that d also divides b.
It follows from (4.2) that every common divisor of a and b must divide d. Since d ∈ N and is itself a
common divisor of a and b, we see that d satisfies the definition of d = gcd(a, b) proving existence of the
greatest common divisor. Now (4.2) establishes the final claim of the theorem.

Problem 4.11 Suppose n, m ∈ N and n|m. Prove that n = gcd(m, n).


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . dgcd

82
B.1 The Euclidean Algorithm
Suppose someone asks us to determine a particular greatest common divisor gcd(a, b). How might we
proceed? A natural approach is to find the prime factorizations of both a and b, and then take all the
common factors. Although this is based only on an intuitive understanding of “greatest common divisor,”
rather than the definition above, it does produce the correct answer.
Example 4.2. Find gcd(8613, 2178).

8613 = 33 · 11 · 29
2178 = 2 · 32 · 112
gcd(8613, 2178) = 32 · 11 = 99.

This approach depends on being able to work out the prime factorization of an integer. For large integers
that can be a very difficult task2 . It turns out there is a much better way, known as the Euclidean Algorithm3 .
It doesn’t require you to know anything about primes. You only need to be able to work out quotients and
remainders as in the Division Theorem.
Here is how the Euclidean Algorithm works. Suppose we want to calculate gcd(a, b) (a, b ∈ N). The
algorithm will recursively produce a sequence a0 , a1 , a2 , . . . of integers, starting with a0 = a and a1 = b. If
the algorithm has produced a0 , . . . , ak so far then we get the next term by dividing ak−1 by ak and forming
the remainder rk as in the Division Theorem,

ak−1 = qk ak + rk , 0 ≤ rk < ak .

If the remainder is positive rk > 0, we make the remainder the next term in our sequence, ak+1 = rk , and
keep going. But the first time the remainder is rk = 0, we stop and take ak as the final value of the algorithm.
Example 4.3. Here is the algorithm applied to Example 4.2.

a0 = 8613
a1 = 2178
8613 = 2178 · 3 + 2079 a2 = 2079
2178 = 2079 · 1 + 99 a3 = 99
2079 = 99 · 21 + 0 gcd = 99.

Example 4.4. Here is a longer example: gcd(1953, 2982).

a0 = 2982
a1 = 1953
2982 = 1 · 1953 + 1029 a2 = 1029
1953 = 1 · 1029 + 924 a3 = 924
1029 = 1 · 924 + 105 a4 = 105
924 = 8 · 105 + 84 a5 = 84
105 = 1 · 84 + 21 a6 = 21
84 = 4 · 21 + 0 gcd = 21.

To prove that the algorithm really works we will need a little more notation. We will let k be the
“counter” which keeps track of the different stages of the algorithm; the k th stage will be the one in which
we divide by ak . We will let k ∗ denote the value of k at which we stop: rk∗ = 0, gcd = ak∗ . (In Example 4.4,
k ∗ = 6.)
2 Many schemes for computer security and data encryption depend on the difficulty of finding prime factors of very large

integers.
3 It is one of the oldest mathematical algorithms in the world. It was known to Euclid, and may have been known as early

as 500 BC.

83
Theorem 4.7. Suppose a, b ∈ N. The Euclidean algorithm with a0 = a and a1 = b terminates after a finite
number k ∗ of steps with rk∗ = 0 and ak∗ = gcd(a, b).
We have to prove two things, 1) that the algorithm does terminate, k ∗ < ∞, and 2) that the resulting
value ak∗ is actually gcd(a, b). The key to the latter is the following.

Lemma 4.8. Suppose m, n ∈ N. Let q and r be the quotient and remainder as in the division theorem:
m = qn + r, 0 ≤ r < n. If g = gcd(n, r) then g = gcd(m, n).
Proof of Lemma. From m = qn + r it follows that any common divisor of n and r is also a common divisor
of m and n. From r = m − qn it follows that any common divisor of n and m is also a common divisor of n
and r. Suppose g = gcd(n, r), then g is also a common divisor of m and n. If d is any other common divisor
of m and n, then it is also a common divisor of n and r. By definition of g = gcd(n, r) we know d divides g.
Thus g satisfies the definition of g = gcd(m, n).
Informally, the reason the algorithm must terminate is that at each stage ak+1 = rk < ak = rk−1 , so the
sequence of remainders is strictly decreasing and must reach rk∗ = 0 in a finitely many steps. The lemma
says that
gcd(a, b) = gcd(a0 , a1 ) = gcd(a1 , a2 ) = · · · = gcd(ak∗ −1 , ak∗ ) = ak∗ ,
the last equality by Problem 4.11 because ak∗ divides ak∗ −1 .
The “· · · ” in the above indicates that there really is an induction argument here. But the things we are
claiming are only for k ≤ k ∗ . To write this as an induction proof of something which holds for all k we
can make the statement being proved an implication with k ≤ k ∗ as the antecedent. This makes the proof
more cumbersome to read than the informal explanation above, but does make the logical structure of the
argument explicit.
Proof of Theorem. We will prove by induction that the following implication is true for every positive integer
k:
if k ≤ k ∗ then both rk ≤ b − k and gcd(a, b) = gcd(ak−1 , ak ).
First consider k = 1. Since a0 = a and a1 = b the first step of the algorithm is to find q1 and 0 ≤ r1 < b
with a = q1 b + r1 . In particular, r1 ≤ b − 1. And since a0 = a and a1 = b we certainly know that
gcd(a, b) = gcd(ak−1 , ak ). This verifies the base case of k = 1. (Since k ∗ ≥ 1 the antecedent k ≤ k ∗ is true
for the base case, but since the consequent of the implication is true we don’t need to say anything about
the antecedent to verify the implication.)
Now for the induction step we assume it is true that k ≤ k ∗ implies both rk ≤ b − k and gcd(a, b) =
gcd(ak−1 , ak ). We want to prove that this implication is also true for k + 1. So we assume the antecedent,
namely that k + 1 ≤ k ∗ . Then k ≤ k ∗ follows and so the consequent of the assumed implication is true:
rk ≤ b − k and gcd(a, b) = gcd(ak−1 , ak ). Since k < k ∗ the algorithm continues at least one more step:

ak = qk+1 ak+1 + rk+1 , 0 ≤ rk+1 < ak+1 .

Since ak+1 = rk it follows that rk+1 ≤ ak+1 − 1 = rk − 1 ≤ b − k − 1 = b − (k + 1). By the induction


hypothesis gcd(a, b) = gcd(ak−1 , ak ), and by the lemma (applied to ak−1 = qk−1 ak + rk ) it follows that
gcd(ak−1 , ak ) = gcd(ak , rk ) = gcd(ak , ak+1 ). Thus both rk+1 ≤ b − (k + 1) and gcd(a, b) = gcd(ak , ak+1 )
hold. This completes the induction proof of the claimed implication for all k ∈ N.
Now k ≤ k ∗ implies that 0 ≤ b − k and consequently k ≤ b. This means that for k = b + 1 it cannot be
that k ≤ k ∗ , so k ∗ < k = b + 1, proving termination of the algorithm in no more than b steps. And for k ∗
itself we have
gcd(a, b) = gcd(ak∗ −1 , ak∗ ) = ak∗ ,
the last equality by Problem 4.11 because ak∗ divides ak∗ −1 .
In the next section we are going to prove the Fundamental Theorem of Arithmetic, the most basic result
about prime factorizations of integers. The notion of relatively prime integers is a key concept for the proofs
to come.

84
Definition. Two integers a, b ∈ N, are called relatively prime if gcd(a, b) = 1.
Example 4.5. Observe that 6 divides 4 · 15 = 60, because 60 = 10 · 6. But 6 does not divide 4, and 6 does
not divide 15.
In general if a divides a product bk, we can not conclude that a divides one of the two factors, b or k. If,
however, a and b are relatively prime, the story is different.
Lemma 4.9. Suppose a and b are relatively prime and a divides bk. Then a divides k.
Proof. Since gcd(a, b) = 1 there exist integers α and β for which

1 = αa + βb.

Therefore
k = αak + βbk.
By hypothesis, a divides both terms on the right, so it must divide k.
Lemma 4.10. Suppose a, b ∈ N are relatively prime, and that both of them divide k. Then ab divides k.

Problem 4.12 Prove Lemma 4.10.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . prodpf

Problem 4.13 In this problem you are going to work out a method to calculate the α and β of Theorem 4.6.
Suppose a0 , a1 , . . . is the sequence of values produced by the Euclidean Algorithm. At each stage we have

ak−1 = qk ak + rk , and ak+1 = rk provided rk 6= 0. (4.3)

We want to find integers α1 , . . . , , . . . and β1 , . . . , βk , . . . so that at each stage

ak = αk · a + βk · b.

We are going to keep track of our work in a table, like this.


k ak qk αk βk
0 a α0 (=?) β0 (=?)
1 b q1 α1 (=?) β1 (=?)
.. .. .. .. ..
. . . . .
k−1 ak−1 qk−1 αk−1 βk−1
k ak qk αk βk
k+1 ak+1 (= rk ) qk+1 αk+1 (=?) βk+1 (=?)
.. .. .. .. ..
. . . . .
k∗ ak∗ qk∗ αk∗ βk∗
Suppose we have filled in the correct values for rows 0 through k. We need to work out a procedure for filling
in row k +1. The ak+1 and qk+1 values are just what the Euclidean Algorithm prescribes: set ak+1 = rk from
the previous row (k), calculate the quotient and remainder, ak = qk+1 ak+1 + rk+1 and then fill in the values
in the q column. What we need are formulas to tell us what to fill in for αk+1 and βk+1 . Since ak+1 = rk
we know from (4.3) that
ak+1 = ak−1 − qk ak .
Since we know how to write ak−1 and ak in terms of a and b we can substitute the appropriate expressions
into this to find values αk+1 , βk+1 for which

ak+1 = αk+1 · a + βk+1 · b.

85
Work this out to find formulae for αk+1 and βk+1 in terms of values from rows k − 1 and k of the table. All
we need now are values for α0 , β0 , α1 , β1 to get the calculations started, and that is trivial. When you get
to the bottom row of the table, you will have the desired values as α = αk∗ and β = βk∗ .
Using this method, find the following gcds and the corresponding α and β values. Turn in your explaina-
tion of the formulae for αk+1 and βk+1 , and what values to use in the first two rows, and your filled in table
for each of the examples.
a) gcd(357, 290).
b) gcd(2047, 1633).
c) gcd(912, 345).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EAplus

Problem 4.14 Suppose a, b, c ∈ N.


a) Write a definition of gcd(a, b, c) which is analogous to the definition of gcd(a, b) we gave above.
b) Prove that gcd(a, b, c) = gcd(gcd(a, b), c).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . gcd3

Problem 4.15 Show that a, b ∈ N are relatively prime if and only if there exist integers α, β such that
1 = α · a + β · b.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lc

Problem 4.16 Suppose that a, b ∈ N are relatively prime. Use the preceding problem to prove that, for
any k ∈ N, ak and b are relatively prime. (Hint: use induction.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . power

Problem 4.17 Suppose that a and b are relatively prime, and that c and d are relatively prime. Prove
that ac = bd implies a = d and b = c. ([9, #17.6 page 215])
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . coprime

C Primes and the Fundamental Theorem


We gave the definition of prime number on page 15. In this section we want to prove some further properties
of prime numbers. Here is the theorem which guarantees that all natural numbers can be factored (uniquely)
into primes.
Theorem 4.11 (Fundamental Theorem of Arithmetic). Every natural number n > 1 has a unique factor-
ization into a product of prime numbers, n = p1 p2 · · · ps .
For this to be true when n is prime, we need to consider an individual prime number to be a “product of
primes;” i.e. one prime all by itself will be consdered to be a “product” of one prime. We all know this
theorem is true. The proof of existence is a strong induction argument. We leave it to Problem 4.18 below
and concentrate here on proving uniqueness. First we must be clear about what we mean in saying the
factorization is unique. For instance, we do not consider 12 = 2 · 2 · 3 = 2 · 3 · 2 = 3 · 2 · 2 to be different
factorizations. To eliminate the possible reorderings of a factorization we can insist on using a standard
ordering, p1 ≤ p2 ≤ . . . ≤ pk .
The following lemma is the key to the proof.

86
Lemma 4.12. Suppose p is prime and divides a product of positive integers a1 · · · am . Then p divides ai for
some 1 ≤ i ≤ m.
Proof. We will prove the lemma by induction on m. First consider m = 1. Then by hypothesis p divides a1 ,
which is what we needed to show.
Next suppose the lemma is true for m and suppose p divides a1 · · · am+1 . If p divides am+1 then we are
done. So suppose p does not divide am+1 . Then, since the only (positive) divisors of p are 1 and p, it must
be that gcd(p, am+1 ) = 1. By applying Lemma 4.9 we conclude that p divides a1 · · · am . By the induction
hypothesis it follows that p divides ai for some 1 ≤ i ≤ m. Thus the lemma holds for m + 1. This completes
the proof.
Now we can write a proof of uniqueness of prime factorizations.
Proof (Uniqueness in Theorem 4.11). Suppose there exists a natural number n with two different prime
factorizations
p1 · · · ps = n = q1 · · · qr , (4.4)
where p1 ≤ · · · ≤ ps are primes numbered in order and q1 ≤ · · · ≤ qr are also primes numbered in order.
Since these two factorizations are different, either s 6= r or pi 6= qi for some i. Starting with (4.4) we can
cancel all common factors and renumber the primes to obtain an equality
p1 · · · pk = q1 · · · qm , (4.5)
in which none of the pi appear among the qi . Since the two factorizations in (4.4) were assumed different,
there is at least one prime on each side of (4.5). In particular p1 is a prime which divides q1 · · · qm . By the
lemma above, p1 must divide one of the qi . Since p1 6= 1 and p1 6= qi this contradicts the primality of qi .
This contradiction proves that different factorizations do not exist.
Notice that we have used an “expository shortcut” by referring to a process of cancellation and renum-
bering but without writing it out explicitly. We are trusting that the reader can understand what we are
referring to without needing to see it all in explicit notation. Just describing this in words is clearer than
what we would get if we worked out notation to describe the cancellation and renumbering process explicitly.

Problem 4.18 Write a proof of the existence part of Theorem 4.11, namely that a prime factorization
exists for each n > 1. [Hint: use strong induction, starting with n = 2. For the induction step, observe that
either n + 1 is prime or n + 1 = mk where both 2 ≤ m, k ≤ n. ]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FAexist

D The Integers Mod m


All our usual number systems are infinite, but there are finite number systems too! The most basic are the
integers mod m, which we introduce in this section. We said “are” because for different choices of m ∈ N
we will get different number systems. So bear in mind throughout this section that m is allowed to be any
given positive integer.
Definition. We say a, b ∈ Z are congruent modulo m, and write a ≡m b (or a ≡ b mod m) when b − a is
divisible by m.
Example 4.6. 3 ≡ 27 mod 8, because 27 − 3 = 3 · 8. But 3 6≡ 27 mod 10, because 27 − 3 = 24 is not divisible
by 10.
Lemma 4.13. Congruence modulo m is an equivalence relation on Z.
Proof. For any a ∈ Z, since a − a = 0 = 0 · m we see that a ≡m a, showing that ≡m is reflexive. If a ≡m b,
then b − a is divisible by m. But then a − b = −(b − a) is also divisible by m, so that b ≡m a. This shows
that ≡m is symmetric. For transitivity, suppose a ≡m b and b ≡m c. Then a − b and b − c are both divisible
by m. It follows that a − c = (a − b) + (b − c) is also divisible by m, implying a ≡m c.

87
Since ≡m is an equivalence relation, we can define its equivalence classes according to Definition 3.8 on
page 62. We abbreviate the notation for an equivalence class, writing [n]m rather than [n]≡m , and will refer
to [n]m as a congruence class mod m.
Definition. Suppose m is a positive integer. The integers modulo m is the set Zm of equivalence classes
modulo m:
Zm = {[n]m : n ∈ Z}.
Back on page 63 we talked about the idea of defining new mathematical objects to be equivalence classes
with respect to some equivalence relation. There we talked about considering an angle to be the set all
real numbers which were “equivalent to each other as angles,” i.e. an equivalence class of the relation of
Example 3.12. We are doing the same thing here using the relation ≡m : we take the set of all integers which
are congruent to each other mod m and put them together as a set (congruence class); that set is a single
element of Zm .
Example 4.7. A typical element of Z8 is

[27]8 = {. . . , −13, −5, 3, 11, 19, 27, . . .}.

We can indicate the same equivalence class several ways, for instance [27]8 = [3]8 . (We have several different
ways of referring to the same real number as well, for instance 21 = .5.) We would say that 27 and 3 are
both representatives of the equivalence class [27]8 . We can choose any representative of an equivalence class
to identify it. But we often use the smallest nonnegative representative, which would be 3 in this example.
Whether we refer to it as [27]8 or [3]8 it is just one element of Z8 . There are a grand total of 8 elements
in Z8 :
Z8 = {[0]8 , [1]8 , [2]8 , [3]8 , [4]8 , [5]8 , [6]8 , [7]8 }.
Every congruence class mod 8 is the same as one of these.

We have been saying that Zm is a number system. That must mean there is a way to define addition
and multiplication on the elements of Zm , i.e. there is a way to add and multiply congruence classes. The
next example begins to explain.
Example 4.8. 3 ≡8 27 and 5 ≡8 45. Observe that
3 · 5 ≡8 27 · 45 and 3 + 5 ≡8 27 + 45.
This example illustrates the fact that ≡m “respects” the operations of multiplication and addition. The next
lemma states this precisely.
Lemma 4.14. Suppose a ≡m a0 and b ≡m b0 . Then a + b ≡m a0 + b0 , a · b ≡m a0 · b0 , and a − b ≡m a0 − b0 .
Proof. By hypothesis there exist k, ` ∈ Z for which a0 = a + km and b0 = b + `m. Then

a0 b0 = (a + km)(b + `m) = ab + (a` + bk + k`m)m,

which implies that ab ≡m a0 b0 . The proofs for addition and subtraction are similar.
Here is how you should understand this. Suppose A and B are any two elements of Zm . (For example,
A = [3]8 and B = [5]8 .) We can add A and B in the following way: pick any element a of A and any element
b of B. (For instance a = 3 and b = 5.) Form a + b using ordinary arithmetic, and then take C to be the
equivalence class of the result: C = [a + b]m . (In our example, C = [3 + 5]8 = [0]8 .) Then C is what we
mean by A + B. What the lemma says is that the a and b that you picked don’t matter; you will arrive
at the same result C regardless. (For instance if we picked a0 = 27 and b0 = 45 instead, we would still get
C = [27 + 45]8 = [72]8 = [0]8 .) The same procedue works for multiplication: D = A · B is D = [a · b]m .
Definition. Addition, multiplication, and negation are defined on Zm by

[a]m + [b]m = [a + b]m ,


[a]m · [b]m = [a · b]m ,
−[a]m = [−a]m .

88
With this definition Zm is a finite number system, and satisfies all the algebraic properties we listed in
Section A.1:(A1)–(A5), (M1)–(M4), and (D). (There is no order relation, however.) This is called arithmetic
mod m or simply modular arithmetic.
Example 4.9. Here are the addition and multiplication tables for Z6 . (All the entries should really be
surrounded by “[·]6 ” but we have left all these brackets out to spare our eyes from the strain.)
+ 0 1 2 3 4 5 * 0 1 2 3 4 5
0 0 1 2 3 4 5 0 0 0 0 0 0 0
1 1 2 3 4 5 0 1 0 1 2 3 4 5
2 2 3 4 5 0 1 2 0 2 4 0 2 4
3 3 4 5 0 1 2 3 0 3 0 3 0 3
4 4 5 0 1 2 3 4 0 4 2 0 4 2
5 5 0 1 2 3 4 5 0 5 4 3 2 1

Notice that [2]6 6= [0]6 and [3]6 6= [0]6 , but [2]6 · [3]6 = [0]6 . In other words in Z6 two nonzero numbers can
have zero as their product! (We have seen this happen before; see Problems 4.1 and the 2 × 2 matrices of
Section A.3.)
There are many clever and creative things we can use modular arithmetic for.
Example 4.10. There do not exist positive integers a, b for which a2 + b2 = 1234567. A long, tedious
approach would be to examine all possible pairs a, b with 1 ≤ a, b < 1234567. A faster way is to consider
the implications modulo 4. If a2 + b2 = 1234567 were true then (mod 4),

[a]2 + [b]2 = [a2 + b2 ] = [1234567] = [3].

(For the last equality, observe that 1234567 = 1234500 + 64 + 3, which makes it clear that 12345467 ≡4 3.)
Now in Z4 , [n]2 is always either [0] or [1]. So there are four cases: [a]2 = [0] or [1] and [b]2 = [0] or [1].
Checking the four cases, we find
[a]2 [b]2 [a]2 + [b]2
[0] [0] [0]
[0] [1] [1]
[1] [0] [1]
[1] [1] [2]

In no case do we find [a]2 + [b]2 = [3]. Thus a2 + b2 = 1234567 is not possible, no matter what the values of
a and b.
In fact, we can turn this idea into a proposition. The proof is essentially the solution of the above
example, so we won’t write it out again.
Proposition 4.15. If c ≡4 3, there do not exist integers a, b for which a2 + b2 = c.

Problem 4.19 A natural question to ask about Example 4.10 is why we choose to use mod 4; why not
some other m?

a) Show that in Z6 every [n] occurs as [a]2 + [b]2 for some a and b. What happens if we try to repeat the
argument of Example 4.10 in Z6 — can we conclude that a2 + b2 = 1234567 is impossible in that way?
b) For the argument of Example 4.10 to work in Zm , we need to use an m for which

{[a]2m + [b]2m : a, b ∈ Z} =
6 Zm .

This happens for m = 4 but not for m = 6. Can you find some values of m other than 4 for which this
happens? [Hint: there are two values m < 10 other than m = 4 for which it works.]

89
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . expyth

Problem 4.20 Find values of a, b, m ∈ N so that a2 ≡m b2 but a 6≡m b.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . powne

Problem 4.21 Suppose n ∈ N is expressed in the usual decimal notation n = dk dk−1 · · · d1 d0 , where each di
is one of the digits 0, . . . , 9. You probably know that n is divisible by 3 if and only if dk + dk−1 + · · · + d1 + d0
is divisible by 3. Use Z3 to prove why this is correct. [Hint: The notation we use for the number one
hundred twenty three, “n=123,” does not mean n = 1 · 2 · 3. What does it mean? More generally what does
“n = dk dk−1 · · · d1 d0 ” mean?] Explain why the same thing works for divisibility by 9.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . div39

Problem 4.22 Along the same lines as the preceding problem, show that n is divisible by 11 if and only
if the alternating sum of its digits d0 − d1 + d2 · · · + (−1)k dk is divisible by 11.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . div11

Problem 4.23 What is the remainder when 199 + 299 + 399 + 499 + 599 is divided by 5? (From [17].)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . pow99

Problem 4.24 What is the last digit of 21000000 ? (Based on [9, #7 page 272])
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TwoK

E Axioms and Beyond: Gödel Crashes the Party


We introduced a set of axioms for the integers in Section A.4. Axioms have been developed for many of
the most basic mathematical systems, such as the natural numbers, the real numbers, set theory. (Russell’s
paradox showed that we need to be careful about what kinds of statements about sets we allow. To resolve
this this requires developing a system of axioms for set theory.) If you take a modern algebra class you will
see definitions of other types of algebraic systems (such as groups, rings and fields) in terms of axioms. In
any of these settings, a set of axioms is a collection of basic properties from which all other properties can
be derived and proven logically.
In the 1920s David Hilbert proposed that all of mathematics might be reduced to an appropriate list
of axioms, from which everything mathematical could be then be derived in an orderly, logical way. This
system of axioms should be complete, i.e. all true statements should be provable from it. It should also be
consistent, i.e. there should be no contradictions that follow logically from the axioms. This would put all
of mathematics on a neat and tidy foundation. By developing formal rules that govern logical arguments
and deductions, so that proofs could be carried out mechanically, we would in principle be able to turn over
all of mathematics to computers which would then determine all mathematical truth for us. In 1931 Kurt
Gödel pulled the plug on that possibility. He showed that in any axiomatic system (provided it is at least
elaborate enough to include N) there are statements that can be neither proven nor disproven, i.e. whose
truth or falsity cannot be logically established based on the axioms. (A good discussion of Gödel’s brilliant
proof is given in [21].) Gödel’s result tells us that we can not pin our hopes on some ultimate set of axioms.
There will always be questions which the axioms are not adequate to answer.
For instance suppose we consider the axioms for the integers as listed in Section A.4, but leave out the
well-ordering principle. Now we ask if the well-ordering principle is true or false based on the other axioms.
We know that Z satisfies the axioms and the well-ordering principle is true for Z. That means it is impossible

90
to prove that the well-ordering property is false from the other axioms. But the axioms also are true for
R, and the well-ordering property is false in the context of R. That means there is no way to prove the
well-ordering property is true from the other axioms. So whether the well-ordering property is true or false
cannot be decided based on the other axioms alone, it is undecidable from the other axioms. For more
difficult propositions it can take years before we can tell if the proposition is undecidable as opposed to just
really hard. For many years people wondered if Euclid’s fifth postulate (axiom) was provable from his other
four. Eventually (2000 years after Euclid) other “geometries” were discovered which satisfied Euclid’s other
axioms but not the fifth, while standard plane geometry does satisfy the fifth. That made it clear; the fifth
axiom is undecidable based on the first four. You could assume it (and get standard plane geometry) or
replace it by something different (and get various non-Euclidean geometries). It took 300 years before we
knew that Fermat’s Last Theorem was provable from the axioms of the integers. We still don’t know if the
Twin Primes Conjecture is provable. Maybe it is unprovable — we just don’t know (yet). Another example
is the Continuum Hypothesis, long considered one of the leading unsolved problems of mathematics. In 1938
Gödel showed that it was consistent with set theory, i.e. could not be disproved. Then in 1963 Paul Cohen
showed that its negation was also consistent. Thus it can not be proven true and it cannot be proven false!
It is undecidable based on “standard” set theory. (This of course requires a set of axioms to specify exactly
what “set theory” consists of.)
Please don’t leave this discussion thinking that Gödel’s result makes the study of axioms useless. Identi-
fying a set of axioms remains one of the best ways to delineate exactly what a specific mathematical system
consists of and what we do and don’t know about it. However, the axioms of a system are something that
distills what we have learned about a system after years of study. It is not typically where we begin the
study of a new mathematical system.

Additional Problems

Problem 4.25 Prove that each row of the multiplication table for Zm contains 1 if and only if it contains
0 only once. (See [9, #13 page 272].)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . row1

Problem 4.26 The following quotation appeared in Barry A. Cipra’s article Sublinear Computing in SIAM
News 37 (no. 3) April 2004: “Suppose that someone removes a card from a deck of 52 and proceeds to show
you the rest, one at a time, and then asks you to name the missing card. In principle, you could mentally
put check marks in a 4 × 13 array and then scan the array for the unchecked entry, but very few people can
do that, even with lots of practice. It’s a lot easier to use some simple arithmetic: convert each card into
a three-digit number, the hundreds digit specifies the suit — say 1 for clubs, 2 for diamonds, 3 for hearts,
and 4 for spades — and the other two digits specify the value, from 1(ace) to 13 (king); then simply keep
a running sum, subtracting 13 whenever the two digit part exceeds 13 and dropping and thousands digit
(e.g. adding the jack of hearts — 311 — to the running sum 807 gives 1118, which reduces to 105). The
missing card is simply what would have to be added to the final sum to get 1013 (so that for a final sum of
807, the missing card would be the 6 of diamonds).” Explain why this works!
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . trick

Problem 4.27 Prove the following.


Theorem 4.16 (Fermat’s Little Theorem). If p is prime and a is an integer, then ap = a mod p.
(Hint: You can do this by induction on a. For the induction step use the Binomial Theorem to work out
(a + 1)p .)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FLT

91
Problem 4.28 Suppose that p is prime. Then Zp has additional properties. Prove the following.
a) If [a] · [b] = [0], then either [a] = [0] or [b] = [0].
b) If [a] 6= [0] then the function fa : Zp → Zp defined by fa ([b]) = [a] · [b] is injective. (Hint: suppose not
and use a).)

c) If [a] 6= [0] the function fa above is surjective. (Hint: use the Pigeon Hole Principle.)
d) If [a] 6= [0] there exists [b] ∈ Zp for which [a] · [b] = [1].
e) The [b] of d) is unique.

In other words, in Zp we can divide by nonzero numbers!


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zp

Problem 4.29 Prove that infinitely many prime numbers are congruent to 3 mod 4.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . inf3primes

Problem 4.30 Not every equivalence relation will produce equivalence classes that respect addition and
multiplication, as ≡m did. For instance define an equivalence realtion t on Z so that n t m means that one
of the following holds:
i) both −10 ≤ n ≤ 10 and −10 ≤ m ≤ 10,
ii) both n < −10 and m < −10,

iii) both 10 < n and 10 < m.


It is not hard to check that t is indeed an equivalence relation (but you don’t need to do it), and it has
three equivalence classes:

[−11]t = {. . . , −13, −12, −11}


[0]t = {−10, −9, . . . , 9, 10}
[11]t = {11, 12, 13, . . .}.

Now, the point we want to make in this problem is that we cannot define addition (or multiplication) on
this set of equivalence classes in the same way that we did for Zm . Show (by example) that Lemma 4.14 is
false for t and its equivalence classes.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . notnum

Problem 4.31 A 3-tuple of positive integers (a, b, c) is called a Pythagorean triple if a2 + b2 = c2 . For
instance (3, 4, 5) is the most familiar example. The next best known is (5, 12, 13). Prove that Euclid’s formula

a = m2 − n2 , b = 2mn, c = m 2 + n2

produces a Pythagoran triple for any two integers 0 < n < m. Show that if m and n are relatively prime
and one of them is even, then a, b, c have no common factor (other than 1). Use this to prove that there are
infinitely many distinct Pythagorean triples, no two of which can be obtained as integer multiples of each
other..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PyTrip

92
Chapter 5

Polynomials

A polynomial is a function of the form

p(x) = cn xn + cn−1 xn−1 + · · · + c1 x + c0 (5.1)


Xn
= ck xk .
0

In other words it is a function that can be expressed as a (finite) sum of coefficients ck times nonnegative
integer powers of the variable x. (We understand x0 = 1 for all x, and we customarily write the higher
powers first and lower powers last.) Polynomials arise in almost all branches of mathematics. Usually the
roots of a polynomial are particularly important. There are many important theorems about the roots of
polynomials. In this chapter we will look at proofs of three such theorems. The first is the Rational Root
Theorem, whose proof is not too hard using what we know about prime numbers. The other two, Descartes’
Rule of Signs and the Fundamental Theorem of Algebra, are difficult. Their proofs are more substantial
than any we have looked at previously, and are good examples of how a proof is built around a creative
idea, not just methodically applying definitions. The purpose for studying these more difficult proofs is not
that you will be expected to produce equally difficult proofs on your own. Rather the purpose is for you
to gain experience reading and understanding difficult proofs written by someone else. But before we can
study those proofs we need first to go through some preliminary material about polynomials.

A Preliminaries
To start, we need to make the distinction between a function and the expression we use to describe it.
Example 5.1. Consider the following descriptions of a function f : R → R:

f (x) = x2 + 1
p
f (x) = x4 + 2x2 + 1
f (x) = (x + 3)2 − 6(x + 3) + 10
Z x
f (x) = 2 + 2t dt.
1

All four of these expressions describe the same function.


By an “expression” we mean a combination of symbols and notation, essentially what we write or type,
like the righthand sides above. Clearly each of the four expressions above are different in terms of what is
printed on the page. But when we substitute a real value in for x, all four expressions produce the same
real value for f (x). For instance, x = −1/3 produces the value f (−1/3) = 10/9 in all four expressions for
f above, and you will likewise find agreement for every possible value of x ∈ R that you try. So the four
different expressions above all describe the same function.

93
When we say p(x) is a polynomial, we mean that it is a function which can be described using an
expression of the particular form (5.1). I.e. it is a polynomial function if there exists a polynomial expression
which describes p(x) for all x. Our example f (x) above is a polynomial because the first description1 we
gave for it is the required polynomial expression.
Example 5.2. The following functions are not polynomials.
x
g(x) = ,
x2
+1
h(x) = ex ,
s(x) = sin(x),

r(x) = 3 x.

We are claiming that for each of these no polynomial expression exists2 .


We will be talking about polynomials using three different number fields: Q, R, and C. The number
field specifies the domain and codomain of the polynomial function, as well as what values are allowed for
the coefficients. When we say that p(x) is a polynomial over R, we mean that it is a function p : R → R
expressible in the form (5.1) using coefficients ck ∈ R. The set of all polynomials over R is denoted R[x].
The set of polynomials over Q is denoted Q[x]; when we say p(x) ∈ Q[x] that means the coefficients are all
rational numbers and we view Q as its domain. For functions of complex numbers it is customary to use z
for the variable, so C[z] will denote the set of all polynomials over C. A p(z) ∈ C[z] may have any complex
numbers as its coefficients, and all complex z are in its domain. To discuss all three cases at once we will
write F to stand for any of the number fields Q, R, or C with the understanding that everything we say
about F[x] is meant to refer to all three cases of Q[x], R[x], and C[z] simultaneously.
Definition. Let F be either Q, R, or C. A polynomial over F is a function p : F → F such that there exist
c0 , c1 , . . . , cn ∈ F (for some integer 0 ≤ n) for which p(x) is given by
n
X
p(x) = ck xk .
k=0

Suppose p(x) ∈ F[x]. This means that it can be represented in the form (5.1) for some n and choice of
coefficients ck ∈ F, k = 0, . . . , n. Now we come to the first important question about polynomials: might it
be possible for there to be a second such representation of the same function but using different coefficients?
In other words, could there be a different choice of coefficients ak ∈ F so that
n
X n
X
ck xk = ak xk for all x ∈ F ?
0 0

This is an important question — if two different representations are possible, then it will be meaningless
to talk about “the” coefficients of p(x) unless we also specify which particular expression for it we have in
mind. Fortunately, the answer is no.
Lemma 5.1. Suppose two polynomial expressions, with coefficients in F, agree for all all x ∈ F:
n
X n
X
k
ck x = ak xk .
0 0

Then ck = ak for all 0 ≤ k ≤ n.


1 Thethird description would be called a polynomial “in (x + 3),” as opposed to a polynomial “in x” as we intend here.
2 Thereasons, in brief, are as follows. There are infinitely many roots of s(x), but a polynomial has only finitely many roots
(see Lemma 5.5). For g(x) observe that limx→∞ g(x) = 0. But the only polynomial with this property is the zero polynomial.
For the others observe that for any nonzero polynomial there is a nonnegative integer n (its degree) with the property that
limx→∞ p(x)/xn exists but is not 0. For h(x) and r(x) that does not hold, so they can’t be polynomials.

94
Proof. By subtracting the right side from the left, the hypothesis says that
n
X
(ck − ak )xk = 0 for all x ∈ F.
0

We want to show that this implies ck − ak = 0 for all 0 ≤ k ≤ n. Thus what we need to show is that if
αk ∈ F and
Xn
αk xk = 0 for all x ∈ F, (5.2)
0

then the coefficients are αk = 0 for k = 0, . . . , n. We will prove this by induction on n. For n = 0 (5.2)
simply says that α0 = 0, which is what we want to show.
Now we make the induction hypothesis that (5.2) for n implies αk = 0 for all 0 ≤ k ≤ n, and suppose
n+1
X
αk xk = 0 for all x ∈ F. (5.3)
0

Define a new polynomial by


n+1
X n+1
X
p(x) = 2n+1 αk xk − αk (2x)k
0 0
n+1
X
= (2n+1 − 2k )αk xk
0
n
X
= (2n+1 − 2k )αk xk .
0

It follows from our hypothesis (5.3) that p(x) = 0 for all x. But observe that the coefficients of xn+1 cancel so
that p(x) is in fact an expression of the form (5.2) with nonzero terms only for 0 ≤ k ≤ n. By our induction
hypothesis, we know that (2n+1 − 2k )αk = 0 for all 0 ≤ k ≤ n. But for these k we know 2n+1 − 2k 6= 0 and
so αk = 0 for all 0 ≤ k ≤ n. Only αn+1 remains. The hypothesis (5.3) now reduces to

αn+1 xn+1 = 0 for all x ∈ F.

Plugging in x = 1 we conclude that In particular αn+1 = 0 as well. This completes the induction step.
Notice that in this proof we used the induction hypothesis twice! It might occur to you that an easier
way to do the induction step would be to first plug in x = 0 to deduce that α0 = 0. Then we could say that
for all x ∈ F
NX+1 N
X
0= αk xk = x αk+1 xk .
1 0

Now we might divide out the extra x, and then appeal to the induction hypothesis to conclude the αk+1 = 0
for k = 0, . . . , N . But
PN there is one problem with that argument: after dividing by x we can only say that the
resulting equation ( 0 αk+1 xk = 0) holds for x 6= 0. We can’t say it is true for all x, and so can’t appeal to
the induction hypothesis. One way to remedy this would be to take limx→0 to see that in fact the equation
must hold for x = 0 as well. But that would require appealing to properties of limits, which (especially in the
case of F = C) would take us beyond what you know about limits from calculus. Another approach would
be to use the derivative to reduce a degree N + 1 polynomial to one of degree N — an induction proof can
be based on that, but again we would need to justify the calculus operations when F is Q or C. Although
the proof we gave above is somewhat more complicated, it is purely algebraic and does not rely on limits.
The significance of Lemma 5.1 is that for a given a polynomial p(x) there is only one way to express it
in the form (5.1). In brief, two polynomial expressions are equal (for all x) if and only if they have the same
coefficients. Knowing that, we can now make the following definitions, which depend on this uniqueness of
coefficients.

95
Pn
Definition. Suppose p(x) = k=0 ck xk is a polynomial in F[x]. The degree of p(x), denoted deg(p), is the
largest k for which ck 6= 0, and that ck is the leading coefficient of p(x). The zero polynomial is the constant
function p(x) = 0 for all x, and is considered3 to have degree 0 (even though it has no nonzero coefficient).
A root (or zero) of p(x) is a value r ∈ F for which p(r) = 0.
Pn
In the expression p(x) = k=0 ck xk , the degree of p(x) is n provided cn 6= 0. A polynomial of degree 0 is
just a constant function, p(x) = c0 for all x. We only consider values in the appropriate F as possible roots,
so for p(x) ∈ Q[x], a root is a rational number r with p(r) = 0; for p(x) ∈ R[x] a root is a real number r
with p(r) = 0; for p(z) ∈ C[z] a root is a complex number r with p(r) = 0
We can add and multiply two polynomials and get polynomials as the results. The coefficients of p(x) +
q(x) are just the sums of the coefficients of p(x) and q(x), but the coefficients of p(x)q(x) are related to the
coefficients of p(x) and q(x) in a more complicated way. Part c) of the next lemma says that when working
with polynomial equations, it is valid to cancel common factors from both sides. (While we might call this
“dividing out s(x)” that is not quite right; s(x) may have roots so for those values of x we cannot consider
cancellation to be the same as division. We encountered the same thing in 7) of Proposition4.1.)
Lemma 5.2. Suppose p(x), q(x), s(x) ∈ F[x].

a) If neither p(x) nor q(x) is the zero polynomial, then deg(p(x)q(x)) = deg(p(x)) + deg(q(x)).
b) If p(x)q(x) is the zero polynomial, then either p(x) is the zero polynomial or q(x) is the zero polynomial.
c) If s(x) is not the zero polynomial and s(x)p(x) = s(x)q(x) (for all x), then p(x) = q(x).
Proof. We first prove a). Suppose deg(p) = n and deg(q) = m. The hypotheses imply that for some choice
of coefficients ai , bj , with an 6= 0 and bm 6= 0,
n
X m
X
p(x) = a i xi , q(x) = bj x j .
i=0 j=0

When we multiply p(x)q(x) out, the highest power of x appearing will be xn+m and its coefficient will be
an bm . Since an bm 6= 0, we see that deg(pq) = n + m. This proves a).
We prove b) by contradiction. If neither p(x) nor q(x) were the zero polynomial, then, as in a), p(x)q(x)
would have a nonzero leading coefficient an bm . But that would mean that p(x)q(x) is not the zero polynomial,
a contrary to the hypotheses.
For c), the hypothesis implies that s(x) [p(x) − q(x)] = 0 for all x. By b), since s(x) is not the zero
polynomial it follows that p(x) − q(x) is the zero polynomial and therefore p(x) = q(x), proving c).

Problem 5.1 Find a formula for the coefficients of p(x)q(x) in terms of the coefficients of p(x) and q(x).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . conv

Sometimes we can divide one polynomial by another, but not always. So, like the integers, we have a
meaningful notion of divisibility for polynomials.
Definition. Suppose p(x), d(x) are polynomials in F[x]. We say d(x) divides p(x) (or p(x) is divisible by
d(x)) when there exists q(x) ∈ F[x] so that

p(x) = q(x) · d(x).

Theorem 5.3 (Division Theorem for Polynomials). Suppose p(x), d(x) are polynomials in F[x] and that
d(x) has degree at least 1. There exist unique polynomials q(x) (the quotient) and s(x) (the remainder) in
F[x] such that s(x) has lower degree than d(x) and

p(x) = q(x) · d(x) + s(x).


3 Someauthors consider the zero polynomial to have degree −∞. With that definition, part a) of Lemma 5.2 below holds
without the restriction to nonzero polynomial.

96
Example 5.3. Let p(x) = x5 + 2x4 + 6x2 + 18x − 1 and d(x) = x3 − 2x + 9. We want to work out the quotient
and remainder as in the Division Theorem. We essentially carryout a long division process, finding q(x) one
term at a time, working from the highest power to the smallest.

(x5 + 2x4 + 6x2 + 18x − 1) − x2 (x3 − 2x + 9) = 2x4 + 2x3 − 3x2 + 18x − 1.

The x2 is just right to make the x5 terms cancel. Now we do the same to get rid of the 2x4 by subtracting
just the right multiple of d(x).

(2x4 + 2x3 − 3x2 + 18x − 1) − 2x(x3 − 2x + 9) = 2x3 + x2 − 1.

Next we eliminate the 2x3 .

2x3 + x2 − 1 − 2(x3 − 2x + 9) = x2 + 4x − 19.

We can’t reduce that any further by subtracting multiples of d(x), so that must be the remainder.

q(x) = x2 + 2x + 2, s(x) = x2 + 4x − 19.

We want to prove the Division Theorem. We proved the integer version using the well-ordering principle.
But there is no well-ordering principle for polynomials4 . So we will need a different proof. The simplest
approach is to use induction on the degree of p(x). The induction step is essentially the idea used in the
example above.
Proof. Let k ≥ 1 be the degree of d(x) and n be the degree of p(x). If n < k, we can just take q(x) ≡ 0 and
s(x) = p(x). This proves the theorem for 0 ≤ n ≤ k − 1. So we need to prove the theorem for n ≥ k. We
use (strong) induction on n. Suppose n ≥ k − 1 and that the theorem holds whenever the degree of p(x) is
at most n. We want to prove it when the degree of p(x) is n + 1. Let an+1 6= 0 be the leading coefficient of
p(x) and bk 6= 0 the leading coefficient of d(x). Consider
an+1 n+1−k
p̃(x) = p(x) − x d(x).
bk

The right side is a polynomial of degree at most n + 1, but the coefficients of xn+1 cancel, so in fact the
degree of p̃(x) is at most n. By the induction hypothesis, there exist q̃(x), s̃(x) with degree of s̃(x) less than
k so that
p̃(x) = q̃(x) · d(x) + s̃(x).
Therefore
an+1 n+1−k
p(x) = x d(x) + p̃(x)
bk
an+1 n+1−k
= x d(x) + q̃(x) · d(x) + s̃(x)
bk
 
an+1 n+1−k
= x + q̃(x) · d(x) + s̃(x)
bk
= q(x) · d(x) + s(x),

where q(x) = an+1


bk x
n+1−k
+ q̃(x) and s(x) = s̃(x). This proves the existence of q(x) and s(x) for p(x) by
induction. You will verify uniqueness in the following homework problem.

Problem 5.2 Show that the q(x) and s(x) of the above theorem are unique.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . divuniq

4 There
is no reasonable sense of order for polynomials. For instance neither x < 1 − x nor 1 − x < x is correct — it depends
on which x you consider!

97
Corollary 5.4. Suppose p(x) ∈ F[x] and r ∈ F. Then p(x) is divisible by x − r if and only if r is a root of
p(x).
Proof. By the division theorem, there is a constant c (polynomial of degree less than 1) and a polynomial
q(x) so that
p(x) = q(x) · (x − r) + c.
Evaluating both sides at x = r, we see that p(r) = c. So (x − r) divides p(x) iff c = 0, and c = 0 iff
p(r) = 0.

Lemma 5.5. A nonzero polynomial of degree n has at most n distinct roots.


Proof. We use induction. Consider n = 0. In that case p(x) = c0 for some c0 6= 0. For such a polynomial
there are no roots at all; zero roots. Since zero is indeed at most 0 = deg(p), this proves the n = 0 case.
Suppose the lemma is true for n and consider a polynomial p(x) with deg(p) = n + 1. If p(x) has no
roots, then we are done because 0 ≤ n + 1. So suppose p(x) has a root r. Then p(x) = (x − r)q(x) where
deg(q) = n. By the induction hypothesis q(x) has at most n roots. Any root of p(x) is either r or one of the
roots of q(x), so there are at most 1 + n roots of p(x). This completes the induction step.
Lemma 5.6. Suppose p(x) and q(x) are polynomials with deg(p), deg(q) ≤ n and p(x) = q(x) for n + 1
distinct x values. Then p(x) = q(x) for all x.

Problem 5.3 Prove the lemma.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lprove

We could continue to develop the notions of greatest common divisor, the Euclidean algorithm, and
unique factorization results, all analogous to what we discussed for the integers. But instead we will proceed
to the theorems about roots of polynomials that we mentioned in the introduction.

B Q[x] and the Rational Root Theorem


Suppose p(x) is a polynomial with rational coefficients, and we are trying to find its roots. We can multiply
p(x) by the least common multiple N of the denominators of the coefficients to obtain a new polynomial
p̃(x) = N p(x) all of whose coefficients will be integers. The roots of p(x) and p̃(x) are the same, so we will
proceed assuming that all the coefficients of p(x) are integers.
Theorem 5.7 (Rational Root Theorem). Suppose

p(x) = cn xn + · · · + c1 x + c0
`
is a polynomial with coefficients ck ∈ Z and that r = m is a rational root, expressed in lowest terms. Then
m divides cn and ` divides c0 .
Example 5.4. Consider the polynomial p(x) = .6x4 + 1.3x3 + .1x2 + 1.3x − .5. We want to find the roots.
All the coefficients are rational numbers. Multiplying by 10 produces a polynomial with integer coefficients:

p̃(x) = 10p(x) = 6x4 + 13x3 + x2 + 13x − 5.

The divisors of 6 are 1, 2, 3, 6 and their negatives. The divisors of −5 are 1, 5, and their negatives. So the
possible rational roots are
1 1 1 5 5 5 5
±1, ± , ± , ± , ± , ± , ± , ± .
2 3 6 1 2 3 6
Checking them all, we find that 31 and− 25 are the only ones which are actually roots. By Theorem 5.7 we
conclude that these are the only rational roots.

98
`
Proof. Assume that r = m is a root expressed in lowest terms (i.e. m and ` are relatively prime). We know
p(r) = 0. Multiplying this by mn we find that

0 = cn `n + cn−1 `n−1 m + · · · + c1 `mn−1 + c0 mn .

Since all the terms here are integers, it follows that m divides cn `n . By hypothesis m and ` are relatively
prime, and so m and `n are relatively prime, by Problem 4.16. By Lemma 4.9 it follows that m divides cn .
Similarly, ` divides c0 mn . Since ` and mn are relatively prime, we conclude that ` divides c0 .


Problem 5.4 Use Theorem 5.7 to prove that 2 is irrational, by considering p(x) = x2 − 2.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . root2

Problem 5.5 Show that for each n ≥ 2 there is a p(x) ∈ Q[x] of degree n with no (rational) roots. (Do
this by exhibiting such a p(x). You may want to consider cases depending of the value of n.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . norQ

C R[x] and Descartes’ Rule of Signs


Next we consider a result about the roots of polynomials over R,

p(x) = cn xn + cn−1 xn−1 + · · · + c1 x + c0 , ck ∈ R.

Such a polynomial is a function p : R → R. The standard results from calculus apply in this setting, and
we will use them in our proofs. The theorem of this section does not tell us what what the roots are, but it
will tell us how many there can be (at most). Here is how it works: convert each coefficient ck to a ± sign
according to whether ck > 0 or ck < 0 (you can just skip any ck = 0) and write the signs out in order.
Example 5.5. If
1
p(x) = 2x6 + x5 − 4.7x4 + x2 + 2x − 3
2
we would get (skipping c3 = 0)
+ + − + + − .
Now count the number of sign changes in this sequence, i.e. occurrences of −+ or +−. The rule of signs
says that the number of positive roots of p(x) is no greater than the number of sign changes. In our example,
this says there can’t be more than 3 positive roots. (In fact there is just one.)
Rene Descartes stated his “rule” in 1637, but it was almost 100 years before the first proofs were given.
A number of different proofs have been developed since then. We are going to consider a very recent proof
given by Komornik [15]. This an interesting example of a proof which works by proving a more general
statement than the original — sometimes working in a more general setting makes possible techniques that
would not have worked in the original setting. Here is Kormornik’s generalization of Descartes’ rule.
Theorem 5.8 (Rule of Signs). Suppose p(x) = am xbm + am−1 xbm−1 + · · · + a1 xb1 + a0 xb0 is a function with
nonzero real coefficients am , . . . , a0 and real exponents bm > bm−1 > · · · > b0 . Then p(x) cannot have more
positive roots (counted with multiplicity) than the number of sign changes in the sequence am . . . a0 .
We need to carefully consider the statement of this theorem before we can prove it. There are several
things to observe.
The coefficients ak are assumed to be nonzero. This means any terms 0xb are simply not included
in the expression for p(x). This also means that the exponents are not simply m, m − 1, . . . but a more
general sequence bm > bm−1 > · · · . In our example, b5 = 6, b4 = 5, b3 = 4, b2 = 2, b1 = 1, b0 = 0.
The exponents are not required to be nonnegative integers. They can be any real numbers,
including fractional and negative ones. Thus the p(x) of the theorem are more general functions than
polynomials. We will call such functions generalized polynomials.

99
Definition. A generalized polynomial is a function p(x) : (0, ∞) → R of the form

p(x) = am xbm + am−1 xbm−1 + · · · + a1 xb1 + a0 xb0 ,

for some sequence of exponents bm > bm−1 > · · · b1 > b0 and coefficients ak ∈ R.
Example 5.6. The function √ √
p(x) = x3/2 − 2x 2
+ 4 + 2x−1/5 − πx−2
is a generalized polynomial, but not a polynomial.
The domain of p(x) is only (0, ∞). Since the exponents of a generalized polynomial can be fractional
we can’t allow negative x. For instance x1/2 is undefined if x < 0. And since exponents are allowed to be
negative we can’t allow x = 0; x−1 is undefined for x = 0. The theorem only talks about positive roots of
p(x); it says nothing about the possibility of negative roots!
At this point you may be wondering why we have allowed such troublesome functions to be included
in the theorem. Why haven’t we just stayed with nice polynomials? The reason is that Kormornik’s proof
doesn’t work if we confine ourselves to conventional polynomials.
But there is still one more thing we need to explain about the theorem statement. The phrase “counted
with multiplicity” needs to be defined. Suppose p(x) is a polynomial in the usual sense (nonnegative
integral exponents). If r is a root of p(x), that means (x − r) is a factor of p(x). It might be a repeated
factor p(x) = (x − r)2 (· · · ) or p(x) = (x − r)3 (· · · ), or maybe p(x) = (x − r)k (· · · ) for some higher power k.
The highest power k for which (x − r)k divides p(x) is called the multiplicity of the root r. The number of
roots “counted with multiplicity” means that a root of multiplicity 2 is counted twice, a root of multiplicity
3 is counted three times, and so on. To count the roots with multiplicity we add up the multiplicities of the
different roots.
Example 5.7. Consider p(x) = (x − 1)(x + 1)3 (x − 2)5 . This has 3 roots, but 9 counted with multiplicity. It
has 2 positive roots, 6 counted with multiplicity. If we multiply this polynomial out, we get

p(x) = x9 − 8x8 + 20x7 − 2x6 − 61x5 + 58x4 + 56x3 − 80x2 − 16x + 32.

This has exactly 6 sign changes in the coefficients, confirming Descartes’ rule for this example.
However this definition of multiplicity does not work for generalized polynomials.
Example 5.8. Consider5 p(x) = xπ − 1. Clearly r = 1 is a root. But that does not mean we can write
p(x) = (x − 1)q(x) for some other generalized polynomial q(x). In fact if you try to work out what q(x)
would need to be you get an infinite series:

q(x) = xπ−1 + xπ−2 + xπ−3 + · · ·

So for generalized polynomials we need a different definition of multiplicity, one that does not appeal to
(x − r)k as a factor of p(x), and yet which reduces to the usual definition when p(x) is a conventional
polynomial. The key to this generalized definition is to observe that if p(x) is a generalized polynomial, then
the derivative p0 (x) exists and is also a generalized polynomial. That is simply because dxd b
x = bxb−1 is valid
for any b ∈ R, for x > 0. Now suppose p(x) is a conventional polynomial and that r is a root of multiplicity
1: p(x) = (x − r)q(x) where q(r) 6= 0. Then observe that

p0 (x) = q(x) + (x − r)q 0 (x); p0 (r) = q(r) + 0 6= 0.

If r is a root of multiplicity 2, p(x) = (x − r)2 q(x) with q(r) 6= 0. Then looking at the derivatives we have

p0 (x) = 2(x − r)q(x) + (x − r)2 q 0 (x); p0 (r) = 0


p00 (x) = 2q(x) + 4(x − r)q 0 (x) + (x − r)2 q 00 (x); p00 (r) 6= 0.

If we continue in this way we find that if r is a root of multiplicity k then k is the smallest integer such that
p(k) (r) 6= 0. (We use the customary notation p(k) (x) for the k th derivative of p(x).) If we make that the
definition of multiplicity, then we have a definition that makes sense for generalized polynomials as well!
5 Thanks to Randy Cone for this example.

100
Definition. Suppose p(x) is a generalized polynomial with root r > 0 and k is a positive integer. We say
the root r has multiplicity k when

0 = p(r) = p0 (r) = . . . = p(k−1) (r), and p(k) (r) 6= 0.

So when the theorem says “roots counted with multiplicity” it means multiplicity as defined in terms of
derivatives.
We are ready now to turn our attention to the proof of Theorem 5.8. Some notation will help. For
a generalized polynomial p(x) we will use ζ(p) to denote the number of positive roots of p(x), counted
according to multiplicity, and σ(p) to denote the number of sign changes in the coefficients. The assertion
of Theorem 5.8 is simply
ζ(p) ≤ σ(p).
The proof will be by induction, but not on m which counts the number of terms in p(x). Rather the induction
will be on the number n = σ(p) of sign changes in the coefficients. That’s Komornik’s first clever idea. In
other words we will prove that P(n) is true for all n = 0, 1, 2, . . . where P(n) is the statement
For every generalized polynomial p(x) with n sign changes in the coefficients,
there are at most n positive real roots (counted according to multiplicity).
The induction step for such a proof needs a way of taking a p(x) with n + 1 sign changes and connecting to
some other generalized polynomial q(x) with n sign changes. The induction hypothesis will tell us that q(x)
has at most n roots. Then we will need to use that fact to get back to our desired conclusion that p(x) has
at most n + 1 roots. Komornik’s second clever idea was a way to do this: find one of the sign changes in the
coefficients: say aj+1 < 0 < aj , marked by the f in the ± pattern below:

+ + −f+ + − .

Let bj+1 and bj be the exponents on either side of the sign change and pick a value β between them:
bj+1 > β > bj . Form a new generalized polynomial p̃(x) = x−β p(x). This will have the same number of
sign changes and the same number of roots (with the same multiplicities) as p(x), but will have the property
that all the exponents to the left of the sign change position will be positive: bk − β > 0 for k ≥ j + 1; and
all the exponents to the right of the sign change position will be negative: bk − β < 0 for k ≤ j. Now let
q(x) = p̃0 (x), the derivative of p̃(x). The negative exponents to the right of the sign change position will
reverse the signs of those coefficients, while the positive exponents on the left will leave the signs of those
coefficients unchanged. The ± pattern for the coefficients of q(x) will be

+ + −f− − + ,

which has exactly one fewer sign change than the original p(x). So the induction hypothesis will tell us that
q has at most n roots (counted with multiplicity), ζ(q) ≤ σ(q) = n in our notation. Now we need to make a
connection between ζ(q) and ζ(p̃) = ζ(p). That, it turns out, is not so hard. All the facts we need to make
the proof work are collected in the next lemma.
Lemma 5.9. Suppose p(x) is a generalized polynomial.
a) If r > 0 is a root of p(x) with multiplicity k > 1, then r > 0 is a root of p0 (x) with multiplicity k − 1.
b) ζ(p) ≤ 1 + ζ(p0 ).
c) If β ∈ R and p̃(x) = xβ p(x), then σ(p̃) = σ(p) and ζ(p̃) = ζ(p).
Proof. Part a) is elementary, because the definition of multiplicity k for p(x) says that

0 = p0 (r) = p00 (r) . . . = (p0 )(k−2) (r), and (p0 )(k−1) (r) 6= 0,

which is the definition of multiplicity k − 1 for p0 (x).


To prove b), suppose there are ` roots of p(x), numbered in order: r1 < r2 < . . . < r` . Let their
multiplicities be m1 , m2 , . . . , m` . For each successive pair of roots p(rk ) = 0 = p(rk+1 ). (Here k = 1, . . . , ` −

101
1.) Now Rolle’s Theorem (from calculus) implies that there exists a tk in (rk , rk+1 ) for which p0 (tk ) = 0.
These tk give us ` − 1 new roots of p0 (x) in addition to the rk . As roots of p0 the rk have multiplicities mk − 1
(by part a)), and the tk have multiplicities at least 1. Thus, counting multiplicities, we have that ζ(p0 ) is at
least
(m1 − 1) + (m2 − 1) + · · · + (m` − 1) + ` − 1 = (m1 + · · · m` ) − 1.
(If mk = 1 then rk is not a root of p0 (x) at all. But then mk − 1 = 0 so including it in the sum above is not
incorrect.) This shows that
ζ(p0 ) ≥ ζ(p) − 1.
Rearranging, this is b).
For c), the coefficients of p(x) and p̃(x) = xβ p(x) are the same, so σ(p) = σ(p̃). Since rβ 6= 0, it is clear
from p̃(r) = rβ p(r) that r is a root of p(x) if and only if it is a root of p̃(x). We need to show that if r has
multiplicity m for p(x) then it has multiplicity m for p̃(x). We do this by induction on m. First consider
m = 1. This means p(r) = 0 and p0 (r) 6= 0. We already know p̃(r) = 0. The product rule tells us

p̃0 (r) = βrβ−1 p(r) + rβ p0 (r) = 0 + rβ p0 (r) 6= 0.

So the multiplicity of r for p̃(x) is indeed m = 1. Next we make the induction hypothesis:
If r is a root of p(x) of multiplicity m then it is a root of p̃(x) = xβ p(x) of multiplicity m.
Suppose that r has multiplicity m + 1 with respect to p(x). The chain rule tells us that

p̃0 (x) = xβ−1 t(x), where t(x) = βp(x) + xp0 (x).

Now t(x) is another generalized polynomial. We know from a) that r is a root of multiplicity m for p0 (x). By
our induction hypothesis r is also a root of multiplicity m for xp0 (x): all its derivatives through the (m − 1)st
are = 0 at x = r, but its mth derivative is 6= 0 there. The other term, βp(x), has all derivatives through
and including the mth derivative = 0 at r. It follows then that r is a root of t(x) of multiplicity m. Our
induction hypothesis implies that it also has multiplicity m as a root of p̃0 (x). This means that as a root of
p̃(x) it has multiplicity m + 1. This completes the induction argument, proving that roots of p(x) and p̃(x)
are the same with the same multiplicities. Now the conclusion of c) is clear.
We are ready now to write the proof of Theorem 5.8.
Proof. The proof is by induction on the value of σ(p). First consider σ(p) = 0. This means all the coefficients
ak of p(x) are positive (or all negative), and therefore p(x) > 0 for all x > 0 (or p(x) < 0 for all x > 0).
Therefore p(x) has no positive roots, so ζ(p) = 0, confirming ζ(p) ≤ σ(p).
Next, we assume the theorem is true for any generalized polynomial with σ(p) = n. Suppose σ(p) = n+1.
Among the sequence of coefficients ak there is at least one sign change; choose a specific j where one of the sign
changes occurs: aj+1 aj < 0. Choose bj+1 > β > bj , and define a new generalized polynomial p̃(x) = x−β p(x).
Then

p̃(x) = am xbm −β + am−1 xbm−1 −β + · · · + a1 xb1 −β + a0 xb0 −β ,


= am xb̄m + am−1 xb̄m−1 + · · · + a1 xb̄1 + a0 xb̄0 ,

where b̄k = bk − β. The coefficients of p̃ are identical with those of p. By the lemma,

ζ(p̃) = ζ(p), σ(p̃) = σ(p) = n + 1. (5.4)

Now consider q(x) = p̃0 (x):


q(x) = αm xb̄m −1 + · · · α1 xb̄1 −1 + α0 xb̄0 −1 ,
where αk = ak b̄k . Now for k ≥ j + 1 we have b̄k = bk − β > 0, so that the sign of αk is the same as the sign
of ak . But for j ≥ k, we have b̄k = bk − β < 0 so that the signs of αk are opposite the signs of ak . It follows
that q has exactly one fewer sign changes than p̃:

σ(q) = σ(p̃) − 1 = n. (5.5)

102
By our induction hypothesis,
ζ(q) ≤ σ(q). (5.6)
We can conclude that

ζ(p) = ζ(p̃) by (5.4),


0
≤ ζ(p̃ ) + 1 by Lemma 5.9,
= ζ(q) + 1 since q = p̃0 ,
≤ σ(q) + 1 by (5.6),
= σ(p̃) by (5.5),
= σ(p) by (5.4).

This completes the induction step, and the proof of the theorem.

Problem 5.6 Explain why it is important for the above proof that the exponents bk are not required to
be nonnegative integers.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . needgp

Problem 5.7 Explain how the rule of signs can be applied to p(−x) to say something about the number
of negative roots (counted according to multiplicity) of a polynomial p(x). Apply this to the polynomial of
Example 5.5, and compare the result to the actual number of negative roots counted according to multiplic-
ity.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . negroot

D C[z] and The Fundamental Theorem of Algebra


D.1 Some Properties of the Complex Numbers
Next we turn to properties of polynomials with complex coefficients. Since complex numbers may be unfa-
miliar to some readers, this section will give a quick presentation of some of their basic properties.
The complex numbers C consist of all numbers z which can be written in the form

z = x + iy,

where x, y ∈ R and i is a new number with the property that i2 = −1. We refer to x as the real part of z
and y as the imaginary part of z. The usual notation is

x = Re(z), y = Im(z).

Arithmetic (+ and ·) on complex numbers is carried out by just applying the usual properties (axioms (A1)–
(M1) for the integers), just remembering that i2 = −1. So if z = x + iy and w = u + iv (x, y, u, v ∈ R)
then

zw = (x + iy) · (u + iv)
= xu + x iv + iy u + iy iv
= xu + ixv + iyu + i2 yv
= xu + i xv + i yu − yv
= (xu − yv) + i(xv + yu). (5.7)

In C the zero element (additive identity) is 0 = 0 + i 0, and the multiplicative identity is 1 = 1 + 0i.

103
The conjugate of z = x + iy is
z̄ = x − yi.
and the modulus of z is p
|z| = x2 + y 2 .
Observe that the modulus is an extension of absolute value from R to C, and continues to use the same
notation “| · |.” Also note that |z| is always a nonnegative real number, even though z is complex. The next
problem brings out some important properties.

Problem 5.8 Prove the following for all z, w ∈ C.

a) z = 0 if and only if |z| = 0.


b) z z̄ = |z|2 .
c) zw = z̄ w̄.
d) |zw| = |z| |w|.

e) |z n | = |z|n for all n ∈ N.


f) |z + w| ≤ |z| + |w|.
g) |z + w| ≥ |z| − |w|.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . props

Problem 5.9 Suppose z = x + iy ∈ C and z 6= 0. (That means x and y are not both 0.) Find formulas
for u, v ∈ R in terms of x and y so that w = u + iv has the property that wz = 1. I.e. find the formula for
w = z −1 .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . prd

Problem 5.10 Show that in C if wz = 0 then either z or w mult be 0. (Hint: the properties in the
preceding problems might be useful.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . nozd

A complex number z = x + iy corresponds to the pair (x, y) ∈ R2 of its real and imaginary parts, which
we can plot as a point in the plane. Viewed this way the plane is often called the complex plane. It is
important to recognize that there is no order relation in C. We can compare real numbers with ≤, > and so
forth, but inequalities are meaningless for complex numbers in general. See Problem 4.3.

Problem 5.11 Show that if z = x + iy is not 0, then by using the standard polar coordinates for the point
(x, y) in the plane z can be written as

z = |z|(cos(α) + i sin(α))

for some real number α. This is called the polar form of z. Write both −1 and i in this form. Show that if
w = |w|(cos(β) + i sin(β)) is a second complex number written in polar form, then

zw = |zw|(cos(α + β) + i sin(α + β)).

(This is just an exercise in trig identities.) The complex number cos(α) + i sin(α) is usually denoted eiα , for
several very good reasons. One is that the above multiplication formula is just the usual law of exponents:
eiα eiβ = ei(α+β) .

104
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . polar

Problem 5.12 If z = r(cos(θ) + i sin(θ)), what are the polar forms of z̄ and 1/z?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . polars

Problem 5.13 Suppose n is a positive integer. Find n different solutions to z n = 1 and explain why they
n
are all different. If w 6= 0 is nonzero, find n different √ solutions to z = w. (Hint: Find the polar form of
2
z.) In particular let ±ξ be the roots of z = 2 + i 3. Determine |ξ| and Re(ξ) exactly. (The trigonometric
identity connecting cos(2θ) and cos(θ) will be useful.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cplxroots

D.2 The Fundamental Theorem


Now we turn to a very important result about C: every polynomial in C[z] can be factored all the way down
to a product of linear factors!

Theorem 5.10 (Fundamental Theorem of Algebra). Suppose

p(z) = an z n + an−1 z n−1 + · · · + a1 z + a0

is a polynomial of degree n ≥ 1 with complex coefficients ak ∈ C. Then p(z) can be completely factored over
C. In other words, there exist ζk ∈ C so that

p(z) = an (z − ζ1 )(z − ζ2 ) · · · (z − ζn ).

Example 5.9. Consider the polynomial6

p(z) = z 4 − 4z 3 + 2z 2 + 4z + 4.

In an effort to see the locations of the roots of p(z), here is the graph of |p(z)| viewed from its underside.
(To plot the graph of p(z) would require a four-dimensional picture. Since |p(z)| ∈ R its graph is only
three-dimensional, which we are much better at visualizing.)

ReHzL 3
2
1
0
-1
30

20
ÈpHzLÈ
10

0
-1
0
1
H L
ImHzL

The locations of the roots are visible as the four “dimples” where |p(z)| comes down to touch 0.
6 Thisexample is significant in the history of the fundamental theorem. Nikolas Bernouli believed it could not be factored
at all, but in 1742 L. Euler wrote to him with a factorization into a pair of quadratics.

105
The first correct proof of the Fundamental Theorem was due to Argand in 1806, although there had been
numerous flawed attempts previously. There are now many different proofs. We will discuss the elementary
proof given in E. Landau’s classic calculus book [16]. The essence of the proof is to show that given p(z)
there always exists a root p(ζ) = 0. By Corollary 5.4 that means that p(z) = (z − ζ)q(z). Now apply the
same reasoning to q(z) and keep going. So the proof focuses on on proving that one root of p(z) exists. This
is done in two parts. First is to argue that |p(z)| has a minimum point: a z∗ such that |p(z∗ )| ≤ |p(z)| for all
z ∈ C. To do that we are going to ask you to accept on faith an extension of the Extreme Value Theorem
from calculus. The Extreme Value Theorem says that if f (x) is a continuous function of x ∈ [a, b], then
there is a minimum point c ∈ [a, b]: f (c) ≤ f (x) all x ∈ [a, b]. Here is the fact we will need.
Proposition 5.11. If p(z) is a polynomial with complex coefficients, and B ⊆ C is a closed rectangle

B = {z = x + iy : a ≤ x ≤ b, c ≤ y ≤ d}

for some real numbers a ≤ b, c ≤ d, then |p(z)| has a minimum point z∗ over B: |p(z∗ )| ≤ |p(z)| for all
z ∈ B.
This proposition is true if |p(z)| is replaced by any continuous real-valued function f (z) defined on B.
But we don’t want to make the diversion to explain continuity for functions of a complex variable (or of two
real variables: z = x + iy). That is something you are likely to discuss in an advanced calculus course. We
have simply stated it for the f (z) = |p(z)| that we care about in our proof. If you want to see a proof of this
Proposition, you can find one in [16].
The second part of the proof is to show that p(z∗ ) 6= 0 leads to a contradiction. This part of the argument
will illustrate a technique that is sometimes used to simplify a complicated proof: reducing the argument
to a special case. This is usually introduced with the phrase “without loss of generality, we can assume. . . ”
Although we didn’t use exactly those words, we used this device back in the proof of Theorem 1.10 of
Chapter 1. We will use it more than once below.
Proof. Suppose p(z) is a complex polynomial of degree at least 1.

p(z) = an z n + an−1 z n−1 + · · · + a1 z + a0

with n ≥ 1 and an 6= 0. It is sufficient to prove that there exists a root p(ζ) = 0. Indeed, this implies
p(z) = (z − ζ)q(z) for some polynomial of one lower degree. The theorem follows by induction from this.
So, by way of contradiction, suppose p(z) has no roots. Then |p(z)| > 0 for all z ∈ C. We will show that
this leads to a contradiction. The argument is in several steps. First observe that the case of n = 1 is trivial
because any polynomial of degree 1 does have a root. So we will assume that 1 < n for the rest of the proof.
Step 1. We claim that there exists a square

B = {z = x + iy : |x| ≤ b, |y| ≤ b}

with the property that |p(z)| ≥ |p(0)| for all z ∈ C \ B. First observe that all such z have |z| ≥ b. We want
to estimate |p(z)| from below, and show that if b is big enough then |p(z)| > |a0 |. Using the various parts of
Problem 5.8,
n−1
X
|p(z)| ≥ |an ||z|n − |ak ||z|k
k=0
" n−1
#
n |ak | k−n
X
= |an ||z| 1 − |z|
|an |
k=0
" n−1
#
X |ak |
n k−n
≥ |an ||z| 1 − b . (5.8)
|an |
k=0

106
 1
 n−k
2n|ak |
The last inequality follows from our assumption that |z| ≥ b > 0. Now suppose that b ≥ |an | for
|ak | k−n 1
each k in this sum. That implies that |an | b ≤ 2n , and therefore

n−1
X |ak | k−n 1
1− b ≥ . (5.9)
|an | 2
k=0

This gives the inequality


1 1
|p(z)| ≥ |an ||z|n ≥ |an |bn ,
2 2
  n1
2|a0 | 2|a0 |
From here, if b ≥ 1 + |an | it will follow that bn > |an | , and therefore

|p(z)| > |a0 | = |p(0)|. (5.10)

In summary, if we choose b to be
 1
 n−0  1  1!
2n|a0 | 2n|an−1 | 2|a0 | n
b = max ,..., , 1+
|an | |an | |an |

then for all z ∈ C \ B, we have


|p(z)| > |p(0)|,
as desired. The significance of this is that the minimum value of |p(z)| over all z ∈ C will be found in B; we
don’t need to look outside of B for the minimum.
Step 2. Let B be the square of Step 1. By the Proposition 5.11 above there exists a minimizing point z∗
for |p(z)| over B. By Step 1, this implies that z∗ is a minimum point of |p(z)| over all of C:

|p(z∗ )| ≤ |p(z)|, for all z ∈ C. (5.11)

By our hypothesis that p(z) has no roots, we must have that |p(z∗ )| > 0. The rest of the proof is devoted
to deriving a contradiction from this.
Step 3. Without loss of generality, we can assume z∗ = 0 and p(0) = 1. To see this consider p̃(z) =
p(z + z∗ )/p(z∗ ). This is again a polynomial, and from (5.11) has the property that

|p(z + z∗ )| |p(z∗ )|
|p̃(z)| = ≥ = 1 = p̃(0).
|p(z∗ )| |p(z∗ )|

In other words |p̃(z)| has its minimum at z = 0, and |p̃(0)| = 1. Considering p̃(z) in place of p(z) is the same
as assuming z∗ = 0 and p(0) = 1. We proceed under this assumption.
Step 4. Since p(0) = 1, we can write

p(z) = 1 + a1 z + · · · an z n .

Let 1 < m ≤ n be index of the first nonzero coefficient after the constant term (it exists since deg(p) > 1):

p(z) = 1 + am z m + · · · an z n ,

with am 6= 0. We know from Problem 5.13 that there exists a w ∈ C with wm = −am . Consider p̃(z) =
p(z/w). This is another complex polynomial with the property that p̃(0) = 1 ≤ |p̃(z)| for all z ∈ C. Moreover
p̃(z) has the form p̃(z) = 1 − z m + ãm+1 z m+1 + · · · + ãn z n . In other words, without loss of generality we can
assume am = −1 and write
n
X
p(z) = 1 − z m + (am+1 z m+1 + · · · an z n ) = 1 − z m + ak z k .
k=m+1

107
Note that it is not possible that m = n, else we would have p(1) = 0, contrary to the properties of p(z).
Step 5. Now suppose z = x is a real number with 0 < x ≤ 1. Then, using the properties from Problem 5.8
again, we have
n
X
m
|p(x)| ≤ |1 − x | + |ak xk |
k=m+1
Xn
= 1 − xm + |ak |xk
k=m+1
n
X
≤ 1 − xm + xm+1 |ak |
k=m+1

= 1 − xm + xm+1 c
= 1 − xm (1 − cx),

where c is the nonnegative real number


n
X
c= |ak |.
k=m+1

Consider the specific value


1
. x=
1+c
Since c ≥ 0 we see that 0 < x ≤ 1, so the above inequalities do apply. Moreover,
c 1
1 − cx = 1 − = > 0,
1+c 1+c
and therefore xm (1 − cx) > 0. We find that

|p(x)| ≤ 1 − xm (1 − cx) < 1.

This is a contradiction to our hypothesis (Step 3) that 1 ≤ |p(z)| for all z. The contradiction shows that in
fact our original p(z) must have a root, completing the proof.
Notice that there is really an induction argument in this proof, but the reader is trusted to be able to fill
that in for themselves so that the proof can focus on the argument for the induction step: the existence of a
root ζ. When you read a difficult proof like this, you often find that the author has left a number of things
for you to check for yourself.

Problem 5.14 Write out in detail the justifications of (5.8) and (5.9).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . details

Problem 5.15 There is only one place in the proof where we used a property of C that is not true for R
— where is that?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . diagnose

Problem 5.16 Suppose p(z) is a polynomial all of whose coefficients are real numbers, and ζ ∈ C is a
complex root. Show that the conjugate ζ̄ is also a root.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . conjroot

Corollary 5.12. Every polynomial p(x) ∈ R[x] of degree n ≥ 3 can be factored into a product of lower degree
polynomials with real coefficients.

108
Proof. By keeping Pnthe same real coefficients but replacing the real variable x by a complex variable z we can
consider p(z) = 0 ak z k as a polynomial in C[z] whose coefficients are all real numbers. We know that in
that setting it factors as
p(z) = an (z − ζ1 ) · · · (z − ζn ),
where ζk ∈ C are the roots.
According to Problem 5.16, for any root ζi that is not a real number, one of the other roots ζj must be
its conjugate: ζj = ζ̄i . If we pair up these two factors we get
(z − ζi )(z − ζj ) = (z − ζi )(z − ζ̄i ) = z 2 − (ζi + ζ̄i )z + ζi ζ̄i = z 2 − 2Re(ζi )z + |ζi |2 ,
which is a quadratic polynomial with real coefficients. By pairing each factor with a nonreal root with its
conjugate counterpart, we obtain a factorization of p(z) into linear factors (the (z − ζi ) with ζi ∈ R) and
quadratic factors, all with real coefficients. In this factorization, simply replace z by the original real variable
x to obtain a factorization of p(x) into a product of linear and quadratic factors, all with real coefficients.

Problem 5.17 Consider the polynomial


p(x) = x4 − 4x3 + 2x2 + 4x + 4
of Example 5.9.
a) Find all rational roots.
b) What does Descarte’s Rule of Signs say about the number of positive roots? What can you deduce
from the rule of signs about negative roots?
c) Verify that
2
p(x) = (x − 1)2 − 2 + 3.


From here find the full factorization


√ of p(z) into the product of first order terms and identify all the
complex roots. (By writing 3 = −(i 3)2 this is the difference of two squares: A2 −B 2 = (A+B)(A−B).
Using the values of ξ and ξ¯ from Problem 5.13 each of the two factors are themselves the difference of
squares!)
d) Find a way to write p(x) as a product of two quadratic polynomials each with real coefficients, as in
Corollary 5.12.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ber

Problem 5.18 As above, let Pm (n) denote the summation formula


n
X
Pm (n) = km ,
k=1

n(n+1) n(n+1)(2n+1)
Back in Chapter 1 (Proposition 1.11 specifically) we saw that P1 (n) = 2 , P2 (n) = 6 , and
2 2
P3 (n)P= n (n+1)
4 . We observe that each of these is a polynomial in n. Does the following constitute a proof
n
that k=1 k m is always a polynomial in n?
Since
n
X
k m = 1 + . . . + nm ,
k=1
and every term on the right is a polynomial, the sum is a polynomial.
(See [5] for the source of this one.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . poly

109
Chapter 6

Determinants and Linear Algebra in


Rn

A matrix is a rectangular array of numbers. We will assume that all the entries are real numbers. We
typically use upper case letters to denote matrices, and the same letter (lower case) with subscripts to
denotes its entries. In other words, if A is a matrix, we use ai j to denote the entry in row i and column j.
We sometimes write A = [ai j ] to indicate this notation. We will only be working with square matrices in
this chapter, meaning there are the same number of rows as columns. We say A is an n × n matrix if there
are n rows and n columns. So when we say A is the 5 × 5 matrix [ai j ] we mean
 
a11 a12 a13 a14 a15
a21 a22 a23 a24 a25 
 
A= a31 a32 a33 a34 a35  .

a41 a42 a43 a44 a45 
a51 a52 a53 a54 a55
If we have two (square) matrices A = [ai j ] and B = [bi j ] of the same size (n × n) then we can add them,
multiply them by each other, and multiply them by numbers α ∈ R: A + B = [ai j + bi j ], αA = [αai j ], and
AB = [ci j ] where the entries of the product are given by
n
X
ci j = ai k bk j . (6.1)
k=1

The subject of this chapter is determinants of square matrices. We are going define the determinant of
an n × n square matrix A = [aij ] directly using the following formula.
X n
Y
det(A) = sgn(σ) ai σ(i) . (6.2)
σ∈Sn i=1

But before formally stating this as the definition we need to understand what all the pieces of this formula
are. The σ ∈ Sn refer to permutations of {1, 2, 3, . . . n}, which we will describe in §A. In §B we will discuss
the sign of a permutation, sgn(σ). Then in §C we will state the above as the definition and begin using it to
prove various properties of determinants in the rest of the chapter.
Although the formula (6.2) may look formidable, it will reduce to the familiar formula for 2 × 2 matrices:
 
a11 a12
det = a11 a22 − a21 a12 .
a21 a22
Maybe you have also seen the formula for 3 × 3 matrices; the determinant of
 
a11 a12 a13
A = a21 a22 a23 
a31 a32 a33

110
is given by

det(A) = a11 a22 a33 − a11 a23 a32 + a12 a23 a31 − a12 a21 a33 + a13 a21 a32 − a13 a22 a31 . (6.3)

This likewise will just be formula (6.2) in the special case of n = 3. For 4 × 4 matrices the formula has 24
terms, more than we really want to write out1 . For larger matrices you were probably never told exactly
how its determinant is actually defined, but just given some rules or procedures for calculating it (without
being told why they are valid). Our formula (6.2) above is the general definition that you were never told.
The various rules and procedures that you may have learned are valid because they are provable from the
formula, as we will see below. This chapter is about working all this out. Many of the proofs we encounter
here will be different than those we have considered previously; they will be proofs that are based on careful
manipulation of a complicated formula. When faced with a complicated formula you may be used to relying
on an example or two to show you what the formula means. That won’t be enough here. You really will need
to read to formula itself. (Look for instance at the proof of Theorem 6.10 below.) This is part of becoming
mathematically literate. Often we will be able to “see” why what we want to prove is true by working out
examples, and being convinced that what happened in the examples will always work. The challenge in
writing a proof is to find ways to express in explicit and precise notation what we could see in an example.
This again requires mathematical literacy.
A different approach to the study of determinants, typical of advanced algebra texts, is to start by
prescribing a list of properties for det(A), and then show that these properties completely determine what
det(A) must be, leading to our formula above. We leave that axiomatic approach to the algebra texts.

Problem 6.1 Use the formula (6.1) to show that for any two n × n matrices A and B and any scalar α,
(αA)B = A(αB). [Hint: to do this let S be the matrix αA. In other words si j = αai j for every i and j.
The defintion of matrix multiplication says the i j entry of (αA)B is
n
X n
X
si k bk j = (αai k )bk j .
k=1 k=1

In a similar manner work out the i j entry of A(αB), and then explain why it is the same thing.]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sm

Problem 6.2 Suppose that A, B, and C are n×n matricies. Show that matrix multiplication is associative,
i.e. that (AB)C = A(BC). [Hint: follow the same approach as Problem 6.1.]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . assoc

A Permutations: σ ∈ Sn
Each σ in our formula (6.2) stands for a permutation of {1, 2, 3, . . . n}.
Definition. A permutation of {1, . . . , n} is a bijection from {1, 2, 3, . . . n} to itself. The set of all permuta-
tions of {1, . . . , n} is denoted Sn .

It is traditional to use lower case Greek letters (σ, π, τ , , . . . ) for the names of permutations. There
are several different notations for identifying a specific permutation. One is to simply list the values in the
1 In general there are n! terms in (6.2). It was once pointed out to me by a famous mathematician (S. Ulam) that 52! is

greater than the number of molecules in the known universe. That was in 1973; today’s estimates might come out different.
But in any case 52! is an extremely large number, larger than 1067 . It is the number of different ways to shuffle a deck of
playing cards. It is also the number of terms in the formula (6.2) for a 52 × 52 matrix. So it is laughable to think about using
(6.2) to actually evaluate a determinant of any significant size. As a mathematical expression however it is still perfectly valid.
Fortunately the properties that follow from the formula lead to much better ways to compute determinants of big matrices (it
only takes on the order of n3 operations if you do it sensibly). That’s fortunate, since in many applications matrices with n in
the thousands are common.

111
form of a table:  
1 2 ... n
σ= .
σ(1) σ(2) ... σ(n)
For instance, the permutation of {1, 2, 3, 4, 5} defined by σ(1) = 5, σ(2) = 2, σ(3) = 1, σ(4) = 4, σ(5) = 3
would be written  
1 2 3 4 5
σ= .
5 2 1 4 3
Another notation is the cycle notation which we will talk about on the next page.X
The set of all permutations of {1, 2, . . . n} is denoted Sn . Thus the outside sum in (6.2) refers to what
σ∈Sn
n
Y
we get by taking each different permutation σ of {1, 2, 3, . . . n}, compute the expression sgn(σ) ai σ(i) , and
i=1
n
Y
then add up the results. For a given permutation, the inside product ai σ(i) refers to
i=1

n
Y
ai σ(i) = a1 σ(1) a2 σ(2) . . . an σ(n) .
i=1

The values ai σ(i) are a selection of n entries from the matrix with exactly one from each row and exactly
one from each column. For instance, if n = 5 and σ is our example permutation above, then the product
involves the boxed entries below:
 
a11 a12 a13 a14 a15
 a21 a22 a23 a24 a25 
 
a a32 a33 a34 a35 
 31 .
 a41 a42 a43 a44 a45 
 
a51 a52 a53 a54 a55

So for this σ,
n
Y
ai σ(i) = a15 a22 a31 a44 a53 .
i=1

In the next section we will talk about sgn(σ). For the moment let’s just point out that sgn(σ) will always
be ±1. So (6.2) stands for an expression consisting of the products of all possible combinations of n terms
from A, one per row and one per column, either added or subtracted depending on sgn(σ). At this point
you can already see that (6.2) refers to a formula of the same general type as of the determinant expressions
for 2 × 2 and 3 × 3 matrices that we wrote down above. When n = 2 there are only two permutations in S2 ,
the inversion τ and the identity map :
   
1 2 1 2
τ= ; = .
2 1 1 2

If you will take for granted that sgn(τ ) = −1 and sgn() = 1, then we find see that (6.2) reduces to the
familiar formula for 2 × 2 determinants.
n
Y
ai (i) = a11 a22 , using σ = 
i=1
Yn
ai τ (i) = a12 a21 , using σ = τ
i=1
det(A) = 1 · (a11 a22 ) + (−1) · (a12 a21 ).

112
A more concise notation for permutations is the cycle notation. Our example
 
1 2 3 4 5
σ=
5 2 1 4 3
would be written2 in cycle notation as
σ = h1, 5, 3i.
The meaning is that the first value gets sent to the second, the second gets sent to the third, . . . , and the
last gets sent back to the first. In our example we might indicate this by 1 → 5 → 3 → 1. Since 2 and 4
don’t appear, the understanding is that σ leaves them unchanged: σ(2) = 2 and σ(4) = 4.
Definition. A permutation σ ∈ Sn is called a k-cycle if there is a set of k elements, {i1 , i2 , . . . , ik } ⊆
{1, . . . , n} so that σ(i1 ) = i2 , σ(i2 ) = i3 , . . . , σ(ik ) = i1 , and σ(j) = j for all other j. We use
σ = hii , i2 , . . . , ik i
to indicate such a permutation. A 2-cycle is also called a transposition.
The σ of our example is a 3-cycle. h1, 3, 5, 2i would be a 4-cycle. The cycle notation is more concise since
we only need to write one row, and don’t need to write anything for the σ(i) = i terms. However not every
permutation is a single cycle. For instance
 
1 2 3 4 5
π= .
3 5 1 2 4
consists of two “disjoint” cycles. We write
π = h1, 3ih2, 5, 4i.
In general when two or more permutations are written next to each other, as if they were being multiplied,
we mean that they are to be composed as functions.
αβ = α ◦ β.
Thus αβ applied to i is (αβ)(i) = α(β(i)). With that understanding you can check that π = h1, 3ih2, 5, 4i is
indeed correct in our example.
A useful way to illustrate a permutation σ is write out 1, 2, . . . , n in two rows and then draw arrows from
i in the top row to σ(i) in the bottom row. For instance our example π above would be illustrated as follows.
1 2 3 4 5

1 2 3 4 5

We will refer to this as the diagram of the permutation. Such diagrams are convenient for composing two
permutations; we can just stack the two diagrams on top of each other and follow the arrows through both
layers to get to find the composition. Here is the illustration for our example of π = h1, 3ih2, 5, 4i.

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

2 It is common to simply use parentheses to denote a cycle, “(1, 5, 3)” instead of “h1, 5, 3i.” We however are choosing to use

angle brackets to more easily distinguish the notation for the permutation from the parentheses used to demarcate its argument.
Thus h1, 5, 3i(5) = 3 seems a little clearer than (1, 5, 3)(5) = 3.

113
Note that the rightmost permutation is applied first, in keeping with our usual function notation σπ(i) =
σ(π(i)).
If σ, π, and α are three permutations, then (σπ)α = σ(πα) since both refer to the permutation that takes
i to σ(π(α(i))). Thus the parentheses are unnecessary; we can just write σπα with no ambiguity.

Problem 6.3
a) Let I : Sn → Sn be defined by I(σ) = σ −1 . Explain why I is a bijection.
b) Suppose π ∈ Sn , and for this π define Cπ : Sn → Sn be defined by Cπ (σ) = πσ. Explain why Cπ : is a
bijection.
c) Show that (στ )−1 = τ −1 σ −1 for all σ, τ ∈ Sn . (Note that the order on the right is reversed!)
d) Show that if τ is a transpositon, then τ −1 = τ . Is the converse true?
e) Suppose σ ∈ Sn is any permutation and i1 , . . . , ik are k distinct elements of {1, . . . , n}. Show that

σhi1 , i2 , . . . , ik iσ −1 = hσ(i1 ), σ(i2 ), . . . , σ(ik )i.

[Hint: Consider any j ∈ {1, . . . , n}. You want to show that both sides of the above send j to the same
thing. Consider cases: (1) j is one of the values σ(i1 ), . . . σ(ik−1 ), (2) j = σ(ik ), (3) j is not one of the
values σ(i1 ), . . . , σ(ik ).]

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . d1

Lemma 6.1. Every permutation σ ∈ Sn which is not the identity can be written as the composition of some
number k ≤ n − 1 of transpositions τi :
σ = τ1 τ2 . . . τk .
Using parts c) and d) of Problem 6.3, we see that the conclusion of the lemma is equivalent to

τk . . . τ2 τ1 σ =  (the identity permutation).

So what we want to do is show that σ can be followed by a sequence of transpositions so as to “untangle”


the diagram and get every i back to its starting position. For instance consider our example from page 111.
 
1 2 3 4 5
σ= .
5 2 1 4 3

We start by drawing the diagram for σ. We can first get 5 back to its starting position using τ1 = h5, 3i.
Then using τ2 = h1, 3i will finish restoring all i to there initial positions. Here is the combined picture.

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

114
This shows that τ2 τ1 σ = , which means that σ = τ1 τ2 . Our proof of the lemma consists simply of describing
a process for choosing the transpositions which will always accomplish this.
Proof. If σ is not the identity there is some value of i which is not fixed: σ(i) 6= i. Let ` be the largest
value which is not fixed by σ: σ(`) 6= `. This means that for ` < m ≤ n we have σ(m) = m. It must be
that σ(`) < `, because if σ(`) > `, then σ(σ(`)) = σ(`), contrary to the fact that σ is injective. Now let
τ1 = hσ(`), `i. Then
σ 0 = τ1 σ
not only leaves ` + 1, . . . , n fixed, it leaves ` fixed as well.
If σ 0 =  we are done. Otherwise repeat the process above for σ 0 : let `0 be the largest value not fixed by
σ . We know that `0 ≤ ` − 1. Let τ2 = hσ 0 (`0 ), `0 i. Then
0

σ 00 = τ2 σ 0 = τ2 τ1 σ

must leave all the values from `0 up to n fixed. If σ 00 =  we are done. Otherwise we repeat the process
again, and continue to we reach the identity permutation .
Since the value of ` goes down by at least one at each step, and can never be less than 2, the process
must terminate after some number k ≤ n − 1 steps, resulting in

 = τk . . . τ2 τ1 σ.

This implies that


σ = τ1 τ2 . . . τk ,
completing the proof.
This is a somewhat informal proof, because of its reliance on “repeat the process.” It really is the
description of an algorithm to produce the τ1 , . . . , τk . We could present it more formally as a strong
induction argument on the value of `, or a proof by contradiction, but those would probably be less clear
than the informal description above.

Problem 6.4
a) Write the permutation  
1 2 3 4 5 6
σ=
2 4 5 6 3 1
as the compotion of cycles.
 
1 2 ··· 5
b) Write π = h3, 2, 5ih2, 5, 4i in “table” notation: π = .
· · ··· ·

c) Write γ = h3, 1, 2, 5i as the composition of transpositions.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . permrep

B The Sign of a Permutation: sgn(σ)


Next we need to define the sign of a permutation. As we have already said, sgn(σ) will always be ±1, but
we need to explain how the choice of ± is determined.
As an example consider the following permutation.
 
1 2 3 4 5
σ= .
5 1 2 4 3

Here is its diagram.

115
1 2 3 4 5

1 2 3 4 5

Now consider a pair of input values from the top row, 2 and 5 for example. The order of this particular pair
is preserved by σ: 2 < 5 and σ(2) < σ(5). But for other input pairs the order of the corresponding outputs
is the opposite of the inputs: 1 < 3 while σ(1) > σ(3). We see this in the picture because the lines from 2
and 5 do not cross each other, while the lines from 1 and 3 do cross each other. We say that such pairs are
inverted by σ. In general we want to count the number of pairs3 {i, j} (of distinct inputs: i 6= j) which are
inverted by σ. In a picture like the one above, this is the number of line crossings. In notation, an inversion
occurs when i < j and σ(j) < σ(i).
Definition. For σ ∈ Sn , we define the inversion count of σ to be
r(σ) = |{{i, j} : 1 ≤ i < j ≤ n and σ(j) < σ(i)}| .
The sign of the permutation σ is defined to be
sgn(σ) = (−1)r(σ) .
If sgn(σ) = +1 we say σ is an even permutation; if sgn(σ) = −1 we say σ is an odd permutation.

(“|A|” is the notation from Chapter 3 for the number of elements in a finite set A.) For the σ above, the pairs
that are reversed are {1, 2}, {1, 3}, {1, 4}, {1, 5}, {4, 5} and no others. So we find r(σ) = 5, sgn(σ) = −1, and
thus in our example σ is odd.
Lemma 6.2. Every transposition is an odd permutation.
This follows from the next problem.

Problem 6.5 Show that r(hi, ji) = 2|j − i| − 1. (Although looking at examples may help you understand
the formula, don’t just offer an example as a proof! To write a proof, suppose i < j. Now describe exactly
which pairs {k, m} (with k < m) are reversed. It will depend on whether k and m are less than i equal to i
between i and j, equal to j, or greater than j. With an accurate description of this type, you can now count
exactly how many pairs there are which will be reversed.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . d5

Problem 6.6 Let ρ ∈ Sn be  


1 2 ... n
ρ= .
n n−1 ... 1
What is r(ρ)?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . d6

Here is the essential fact about sgn(σ).


Theorem 6.3. For all σ, π ∈ Sn , sgn(σπ) = sgn(σ) sgn(π).
Proof. We divide the distinct pairs {i, j} into four different types.
A = {{i, j} : π reverses {i, j} but σ does not reverse{π(i), π(j)}}
B = {{i, j} : π does not reverse {i, j} but σ does reverse {π(i), π(j)}}
C = {{i, j} : π reverses {i, j} and σ reverses {π(i), π(j)}}
D = {{i, j} : π does not reverse {i, j} and σ does not reverse {π(i), π(j)}}.
3 Wehave used “{i, j}” instead of “(i, j)” to refer to a pair because we don’t want to distinguish between (i, j) and (j, i), as
the use of parentheses would imply. It’s just the set of two (distinct) values i 6= j that we are trying to refer to.

116
Let a = |A|, b = |B|, c = |C|. Clearly
r(π) = a + c.
Also,
r(σ) = b + c.
This is because in counting the pairs {k, l} that σ reverses, we can identify them as k = π(i), l = π(j).
Thirdly
r(σπ) = a + b,
because the pairs in C are reversed by σ but then reversed back to their original order by π so that the
combined effect of σπ is to preserve their order. We now have

sgn(σ) sgn(π) = (−1)a+c (−1)b+c = (−1)a+b (−1)2c = (−1)a+b = sgn(σπ).

Corollary 6.4. For any permutation σ, sgn(σ) = sgn(σ −1 ).

Problem 6.7 Prove Corollary 6.4.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sgninv

Problem 6.8 Show that σ is an odd permutation if and only if it can be written as the composition of an
odd number of transpositions. (Don’t confuse the term “transposition” with the term “reversal.”)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . d8

C Definition and Basic Properties


Now that we have explained its components, we can state the definition of the determinant of a matrix.
Definition. If A = [aij ] is an n × n matrix, its determinant is defined to be

X n
Y
det(A) = sgn(σ) ai σ(i) . (6.4)
σ∈Sn i=1

Problem 6.9 Show that for n = 3 the definition agrees with (6.3).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . dim3

Our goal in the rest of this section is to use the definition to prove basic properties of the determinants.
We begin with triangular matrices, for which the determinant is easy to calculate. A matrix T = [tij ] is
called lower triangular if tij = 0 whenever i < j. T is called upper triangular if tij = 0 whenever i > j.
Lemma 6.5. If T = [tij ] is either lower triangular or upper triangular, then
n
Y
det(T ) = tii .
i=1

In other words the determinant of a triangular matrix is simply the product of its diagonal entries. The
proof of this lemma is our first example of a proof based on the definition (6.4). We start by writing down
what the definition of det(T ) is, and then use the assumptions about tij to manipulate it.

117
Proof. Suppose T is lower triangular. By definition,
X n
Y
det(T ) = sgn(σ) ti σ(i) .
σ∈Sn i=1

Observe that if σ ∈ Sn is any permutation other than the identity permutation, then there must exist some
i0 with i0 < σ(i0 ). (Indeed let i0 be the smallest value such that σ(i0 ) 6= i0 . It cannot be that σ(i0 ) < i0
since that would imply that σ(σ(i0 )) = σ(i0 ) contrary to the injectivity of σ. Therefore i0 < σ(i0 ).) But that
means ti0 σ(i0 ) = 0, so that
Yn
ti σ(i) = 0.
i=1

Thus in the formula for det(T ) there is only one σ ∈ Sn for which the product is nonzero, namely the identity
permutation σ = . Since sgn() = 1, the expression defining the determinant collapses to
n
Y n
Y
det(T ) = 1 ti (i) = ti i .
i=1 i=1

The upper triangular case is proven in a similar way.


Definition. The n × n identity matrix is the matrix I = [δij ] where δij = 1 if i = j and 0 if i 6= j:
 
1 0 ··· 0
0 1 0
I = . .
 
 .. . . . 0
0 ··· 0 1

Corollary 6.6. If I is the n × n identity matrix, then det(I) = 1.


The transpose of a matrix A = [aij ] is the matrix AT = [a0ij ] where a0ij = aji . The rows of A become the
columns of AT .
Lemma 6.7. det(AT ) = det(A).
Proof. Using the notation just introduced, a0iσ(i) = aσ(i)i = ajσ−1 (j) , where j = σ(i). As the values of i run
through 1, 2, . . . , n the corresponding values of j will also run through 1, 2, . . . , n, just in a different order
(since σ is a bijection). It follows that
n
Y n
Y
a0iσ(i) = ajσ−1 (j) .
i=1 j=1

Also observe that as a consequence of Corollary 6.4 we know that sgn(σ −1 ) = sgn(σ). So we have

X n
Y
det(AT ) = sgn(σ) a0iσ(i)
σ∈Sn i=1
X n
Y
= sgn(σ −1 ) ajσ−1 (j)
σ∈Sn j=1
X n
Y
= sgn(σ) ajσ(j)
σ∈Sn j=1

= det(A).

The equality between the second and third line is due to the fact that the set of all possible σ −1 is the same
as the set of all possible σ.

118
You have probably learned techniques for computing determinants based on manipulating the rows or
columns of a matrix. To justify those techniques is our next goal. We will work in terms of columns because
that is most natural for the discussion of Cramer’s Rule in the next section. It will be helpful to have a
standard notation for the columns of a matrix A = [aij ]. The j th column of A is the element of Rn which
we will denote by  
a1j
 a2j 
âj =  .  .
 
 .. 
anj
(Think of the “hat” ˆ as a little pointer reminding us to write it in an up-down orientation.) To save space
we will often type a column vector horizontally with a transpose symbol
h −1 ito remind us to stand it back up as
a column. For instance we will write v̂ = (−1, 5, 7)T instead of v̂ = 5 .)
7
We want to view the determinant as a function D of the n columns of the matrix.

det(A) = D(â1 , â2 , . . . ân ).

Here are the properties of the determinant as a function of the columns.


Theorem 6.8. Consider any collection of column vectors âi , b̂i ∈ Rn .
a) D(â1 , · · · , γâk , · · · ân ) = γD(â1 , · · · , âk , · · · ân ), for any scalar γ ∈ R.

b) D(â1 , · · · , âk + b̂k , · · · ân ) = D(â1 , · · · , âk , · · · ân ) + D(â1 , · · · , b̂k , · · · ân ).
c) sgn(π)D(âπ(1) , · · · , âπ(i) , · · · âπ(n) ) = D(â1 , · · · , âi , · · · ân ), for any permutation π ∈ Sn .
Note that in part c) we could have put the sgn(π) on the other side, because sgn(π) = 1/ sgn(π).

Corollary 6.9. Suppose A is an n × n matrix with columns âi and α is any real number.
a) If two columns duplicate each other, in other words âi = âj for some i 6= j, then D(â1 , · · · ân ) = 0.
b) For any i 6= j if we add α times âi to âj the determinant does not change:

D(â1 , · · · , âi , · · · , âj + αâi , . . . , ân ) = D(â1 , · · · , âi , · · · , âj , . . . , ân ).

c) det(αA) = αn det(A).
Example 6.1. Before writing a proof of the theorem, here is an example of how these column operations can
be used to evaluate a determinant. Suppose we want to calculate the following determinant.
 
0 3 2
det −1 −6 6 .
5 9 1

The strategy is apply the properties of the theorem and corollary to manipulate the matrix into one which

119
is triangular, so that determinant will be easy to evaluate.
 
0 3 2
det −1 −6 6
5 9 1
 
3 0 2
by permuting the columns using π = h1, 2i this = − det −6 −1 6
9 5 1
 
1 0 2
now by taking a factor of 3 out of the first column = −3 det −2 −1 6
3 5 1
 
1 0 0
by adding −2 times the first column to the third column = −3 det −2 −1 10 
3 5 −5
 
1 0 0
by adding 10 times the second column to the third column = −3 det −2 −1 0 
3 5 45
= (−3) · 1 · (−1) · (45)
= 135.

We turn now to the proof. This proof is a good example of what we said in the introduction. We could
convince ourselves of each property by looking at examples, but to write a proof we need to find a way to
write demonstrate the property in general, based on the definition of determinant. To do that it can be
helpful to introduce new notation for some of the other matrices involved in the calculation, like you did in
Problem 6.1.

Proof of Theorem. For part a), let G = [gij ] be the matrix with
(
aij if j 6= k
gij =
γaij if j = k

Part a) claims that det(G) = γ det(A). To see this just observe that for any permutation,
n
Y n
Y
giσ(i) = γ aiσ(i) .
i=1 i=1

This is because σ(i) = k occurs exactly once in the product. Multiplying this by sgn(σ) and summing over
all σ proves a).
You will write the proof of b) as Problem 6.10 below.
Turning to c), let P = [ai π(j) ]. The left side in c) is sgn(π) det(P ) because p̂j = âπ(j) . For any σ ∈ Sn
we have
Yn Yn
piσ(i) = ai π(σ(i)) .
i=1 i=1

So we have
n
Y n
Y
sgn(π) sgn(σ) piσ(i) = sgn(πσ) ai πσ(i) .
i=1 i=1

Next observe that σ 7→ πσ is a bijection of Sn , so that summing over σ ∈ Sn is equivalent to summing over

120
πσ ∈ Sn . Therefore

sgn(π)D(âπ(1) , · · · , âπ(i) , · · · âπ(i) ) = sgn(π) det(P )


X n
Y
= sgn(π) sgn(σ) pi σ(i)
σ∈Sn i=1
X n
Y
= sgn(πσ) ai πσ(i)
σ∈Sn i=1
X n
Y
= sgn(β) ai β(i)
β∈Sn i=1

= D(â1 , · · · âi · · · ân ).

(The change from πσ to β between the third and fourth lines is justified by part b) of Problem 6.3.) This
proof depends on the notation a lot! You can’t just gloss over the mathematical symbols and only read the
words. You really need to scrutinize the notation to understand exactly what it is saying, and be sure that
you agree.

Proof of Corollary, part a). Suppose âi = âj and let π = hi, ji. Then the two determinants in c) of the
theorem are the same. Since sgn(π) = −1, we have

det(A) = − det(A)

which implies that det(A) = 0.

Problem 6.10 Write a proof for part b) of Theorem 6.8. To do this define the matricies C = [cij ] and
H = [hij ] by ( (
aij if j 6= k aij if j 6= k
cij = , hij = .
aij + bij if j = k bij if j = k
The task is to show det(C) = det(A) + det(H). Start by writing down the defintion of det(C). The essential
step is to explain why
n
Y n
Y n
Y
ciσ(i) = aiσ(i) + hiσ(i) ,
i=1 i=1 i=1

and then use that to finish showing that det(C) = det(A) + det(H).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . colsum

Problem 6.11 Prove parts b) and c) of Corollary 6.9.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . corbc

Problem 6.12 Suppose C = AB where A, B and C are square matricies of the same size. Explain why
the j th column of C is given by the following formula involving the columns âk of A and the entries of the
j th column of B. X
ĉj = bkj âk .
k
th
[Hint: What is the i component of each side? Use (6.1).]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . colprod

121
Problem 6.13 Find the determinants of the following matrices using the using the manipulations discussed
above.
 
1 −2 0 0 0
3 −7 0 0 0
 
1 0
A= 1 2 0
5 −4 0 1 1
3 2 1 1 1
 
1 0 0 0 0
3 0
 −1 0 0

5 2
B= 0 0 1

−3 0 1 4 0
6 0 4 2 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . d10

The next theorem is very important, but it is just a consequence of Theorem 6.8 above.

Theorem 6.10. det(AB) = det(A) det(B).


P
Proof. Let C = AB. Observe that the multiplication formula cij = k aik bkj can be interpreted as saying
that the columns ĉj of C are obtained from the columns of A as
X
ĉj = âk bkj .
k

(This is Problem 6.12 above.) We want to substitute this for each entry in

det(AB) = D(ĉ1 , . . . ĉn ).

and and using parts a) and b) of Theorem 6.8. Starting with the first column we have
" #
X X
D(ĉ1 , . . . ĉn ) = D( âk1 bk1 1 , ĉ2 , . . . ĉn ) = bk1 1 D(âk1 , ĉ2 , . . . ĉn ).
k1 k1
P
Next make the replacement ĉ2 = k2 âk2 bk1 2 . This gives
X
D(ĉ1 , . . . ĉn ) = bk1 j D(âk1 , ĉ2 , . . . ĉn )
k1
" #
X X
= bk1 j D(âk1 , âk2 bk2 2 , ĉ3 , . . . ĉn )
k1 k2
XX
= bk1 1 bk2 2 D(âk1 , âk2 , ĉ3 , . . . ĉn ).
k1 k2

Continuing this all the way through to ĉn we arrive at the expression
n n
" n #
X X Y
det(AB) = ··· bki i D(âk1 , . . . , âkn ).
k1 =1 kn =1 i=1

Now if any two of the ki are the same, then D(âk1 , . . . , âkn ) = 0 by the corollary. So the only (k1 , . . . , kn )
that we need to consider are those which correspond to a permutation: ki = σ(i) for some σ ∈ Sn . Thus by

122
part c) of Theorem 6.8 and Lemma 6.7 we have
" n
#
X Y
det(AB) = D(âσ(1) , . . . , âσ(n) ) bσ(i)i
σ∈Sn i=1
X Yn
= D(â1 , . . . , ân ) sgn(σ) bσ(i)i
σ∈Sn i=1
" n
#
X Y
= det(A) sgn(σ) bσ(i)i
σ∈Sn i=1

= det(A) det(B T )
= det(A) det(B),

which completes the proof.

D Cofactors and Cramer’s Rule


In this section we will develop some of the interesting properties of determinants related to cofactors. You
may have learned how to calculate a determinant using the “cofactor expansion” along some row or column.
You also may have learned a method called “Cramer’s Rule” for solving a system of linear equations using
determinants. This too is related to cofactors.
If we were to write out all the terms of det(A) and collect all of them which include a factor of aij , the
ij-cofactor of A is the quantity Cij which multiplies aij :

det(A) = aij Cij + (· · · terms without aij · · · )

To find a more explicit expression, let êk denote the column vector with a 1 in the k th position and 0 in all
other positions:
k
êk = (0 · · · 0 1 0 · · · 0)T . (6.5)
th
The j column of A is X
âj = aij êi .
i

So
det(A) = D(â1 , . . . âj , . . . ân )
X
= D(â1 , . . . aij êi , . . . ân )
i (6.6)
X j
= aij D(â1 , . . . êi , . . . ân ).
i

Using this, we see that


j
Cij = D(â1 , . . . , êi , . . . ân ),
the determinant of the matrix that agrees with A except that its j th column has been replaced by êi .
Moreover, the calculation (6.6) proves the following theorem. This is called the cofactor expansion of det(A)
along the j th column.
Theorem 6.11 (Cofactor Expansion). If A is an n × n matrix then, for any choice of column j = 1, . . . , n,
X
det(A) = aij Cij ,
i

where Cij are the cofactors of A.

123
Problem 6.14
0 0
a) Let Cji be the j, i cofactor of C T . Explain why Cji = Cij .
b) Show that the cofactor expansion along any row is valid also: for any i,
X
det(A) = aij Cij .
j

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cofalt

You may have learned to calculate cofactors using the minors of A. If A is n × n, form an (n − 1) × (n − 1)
matrix by deleting the ith row and j th column from A, and take its determinant. The result is called the ij
minor of A, denoted Mij . The proof of the following lemma is Problem 6.33 below.
Lemma 6.12. Cij = (−1)i+j Mij .
Example 6.2. We illustrate cofactor expansions with the same determinant as Example 6.1.
 
0 3 2
A = −1 −6 6 .
5 9 1

We can choose which column to use; we choose to use the first column. The cofactors from the first column
(computed using minors) are

C1 1 = +[−6 − 54] = −60,


C2 1 = −[3 − 18] = 15,
C3 1 = +[18 − (−12)] = 30.

Now we use the formula from Theorem 6.11 for j = 1.


3
X
det(A) = ai1 Ci1 = 0C1 1 + (−1)C2 1 + 5C3 1 = 0 · (−60) − 1 · 15 + 5 · 30 = 135.
i=1

We could have used any column. The first column is easiest because we really didn’t need to work out C1 1 .

Problem 6.15 Use cofactor expansions to calculate the determinants from Problem 6.13.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cofcalcs

Definition. Suppose A is an n×n matrix. The adjoint of A is the matrix obtained by forming the transpose
of the matrix of cofactors:
Adj(A) = [Cij ]T .
In other words Adj(A) = [ãij ] where ãij = Cji .
Example 6.3. To illustrate we return again to the matrix A of Examples 6.1 and 6.2. We already worked
out the cofactors from the first column. The others are (you can check)

C1 2 = 29 C1 3 = 39
C2 2 = −10 C2 3 = 15
C3 2 = 2 C3 3 = −3.

So the matrix of cofactors is  


−60 31 21
[Cij ] =  15 −10 15  .
30 −2 −3

124
The adjoint is the transpose of this:
 
−60 15 30
Adj(A) =  31 −10 −2 .
21 15 −3

The adjoint has a remarkable property involving the identity matrix.


Theorem 6.13. For any square matrix A,

Adj(A)A = det(A)I,

where I is the n × n identity matrix.


Proof. Consider any pair i, j; we will calculate the i, j entry of Adj(A)A. Using the notation Adj(A) = [ãij ],
the i, j entry of the product is
Xn n
X
ãik akj = akj Cki . (6.7)
k=1 k=1

If i = j this is the cofactor expansion of det(A) along the j th column, so yields det(A). Suppose that i 6= j.
Let B denote the matrix that agrees with A except that its ith column is a copy of the j th column. Observe
that the k, i cofactors of B are the same as for A because the differing column is replaced by êj in calculating
the cofactor. Therefore the cofactor expansion of det(B) along the ith column is
n
X
det(B) = akj Cki .
k=1

But, according to the corollary to Theorem 6.8, det(B) = 0. This means that for i 6= j the calculation in
(6.7) produces 0. Thus in all cases the i, j entries of Adj(A)A and det(A)I agree, proving the theorem.
Definition. A square matrix A is called invertible if there exists a matrix B so that BA = I (the identity
matrix). The matrix B is called the inverse of A, and denoted B = A−1 .
The determinant of A tells us whether A is invertible or not.
Theorem 6.14. A square matrix A is invertible if and only if det(A) 6= 0, in which case A−1 = 1
det(A) Adj(A).

Proof. Suppose A is invertible. There exists B with BA = I. It follows from the product rule that
1
det(B) det(A) = 1, so that det(A) 6= 0. Conversely, suppose det(A) 6= 0. Let B = det(A) Adj(A). The-
orem 6.13 tells us that BA = I, so that A is indeed invertible.
Notice that our defintion of A−1 only says that A−1 A = I. The definition does not say that AA−1 = I.
But that is true, as the following problem shows.

Problem 6.16 If BA = I, prove that AB = I also. (Hint: det(B) 6= 0 implies that there is C with
CB = I. But then explain why CBA must equal both A and C.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . d13

Cramer’s Rule concerns the solution of systems of linear equations,

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .
. . = ..
an1 x1 + an2 x2 + · · · + ann xn = bn

125
As a single matrix equation, this takes the form

Ax̂ = b̂,

where x̂ and b̂ are the column vectors with xi and bi as entries.

Lemma 6.15 (Cramer’s Rule). Suppose A is an invertible n × n matrix and b̂ ∈ Rn . The equation Ax̂ = b̂
has a unique solution given by x̂ = (x1 , . . . , xn )T , where
i
D(â1 , . . . b̂, . . . ân )
xi = .
det(A)

Proof. Since det(A) 6= 0 we know A−1 exists. The equation Ax̂ = b̂ is equivalent to

x̂ = A−1 b̂,

proving the existence of one and only one solution. Using the formula for A−1 from Theorem 6.14 we have
1
x̂ = Adj(A)b̂.
det(A)

Using our notation Adj(A) = [ãij ] again, we have


n
1 X
xi = ãik bk
det(A)
k=1
n
1 X
= bk Cki
det(A)
k=1
i
1
= D(â1 , . . . b̂, . . . ân ),
det(A)

a cofactor expansion along the ith column justifying the last equality.

Problem 6.17
a) Show that for any invertible n × n matrix A, det(Adj(A)) = det(A)n−1 .
b) Show that the formula of part a) is correct even if A is not invertible. [Hint: if det(Adj(A)) 6= 0, then
Adj(A) would be invertible. From that you can conclude that A is the matrix [0] of all 0s. Write this
out as a proof by contradiction.]

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . d14

Problem 6.18 If A is invertible, find a formula for Adj(Adj(A)) in terms of A, A−1 , and det(A).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . d15

E Linear Independence and Bases


Any vector û ∈ Rn can be written as a linear combination of the standard unit vectors of (6.5):

û = u1 ê1 + · · · + un ên .

126
In this section we want to talk about doing the same thing with some other set V = {v̂1 , . . . , v̂m } of vectors
in Rn . In other words we want v̂1 , . . . , v̂m to have the property that for any vector û ∈ Rn it is possible to
obtain û as a linear combinations of our v̂j :

û = c1 v̂1 + · · · cm v̂m

by using some choice of scalars ci ∈ R. Given the set of vectors V there are two basic issues to consider.
The first is whether we really can get all û ∈ Rn in this way. The second is whether we actually need all the
v̂j we started with, or if we might be able to discard some of the v̂j but still be able to recover all û using a
reduced set of v̂j . In other words we want a set V of v̂j which is as small as possible, but still adequate to
reconstruct all other vectors û. The next example will illustrate this.
Example 6.4. Consider R4 , using

v̂1 = (4, 0, 0, −1)T , v̂2 = (−3, 0, 1, 0)T , v̂3 = (1, 1, −1, 0)T , v̂4 = (1, −1, −1, 1)T

Observe that it is impossible to write

(−1, 0, 1, 0)T = c1 v̂1 + c1 v̂3 + c3 v̂3 + c3 v̂4 .

We can see that using the standard “dot” product h·, ·i:

h(1, 2, 3, 4)T , (−1, 0, 1, 0)T i = 2,

while all the v̂i have


h(1, 2, 3, 4)T , v̂i i = 0.
On the other hand, if we add an additional vector,

v̂5 = (−1, 0, 1, 0)T ,

then we will be able to reconstruct every vector, and we can always do it without using v4 . That follows
from Cramer’s Rule, since
D(v̂1 , v̂2 , v̂3 , v̂5 ) = 2.
Thus {v̂1 , v̂2 , v̂3 , v̂4 , v̂5 } is bigger than needed.
Here are definitions of the concepts we are talking about.
Definition. Suppose v̂1 , . . . , v̂m ∈ Rn . Let
• We say v̂1 , . . . , v̂m span Rn if for every û ∈ Rn there exist scalars cj ∈ R so that

û = c1 v̂1 + · · · cn v̂m .

• We say v̂1 , . . . , v̂m are linearly independent if whenever cj ∈ R are such that

0̂ = c1 v̂1 + · · · cn v̂m

then c1 = . . . = cm = 0.

Problem 6.19 By negating the defintion of linear independence, write what it means to say that v̂1 , . . . , v̂m
is not linearly independent. [Hint: You will need to write in the implicit “for all” quantifier before forming
the negation to get it right.]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lindep

Notice that the definition does not assume that the number m of vectors being considered is the same as
the number of components n in a vector. Secondly, although it may not be apparent, the idea of linear
independence is equivalent to our idea of not being able to do without any of the v̂i . To see this lets return
to our example.

127
Example 6.5. With the same v̂1 , . . . , v̂5 as above observe that

(0, 0, 0, 0)T = 1v̂1 + 2v̂2 + 1v̂3 + 1v̂4 + 0v̂5 .

This means that {v̂1 , v̂2 , v̂3 , v̂4 , v̂5 } is not linearly independent, in accord with the definition. In particular
we see that v̂4 can be written in terms of the other v̂i :

v̂4 = −1v̂1 − 2v̂2 − 1v̂3 + 0v̂5 .

We can replace v̂4 by this expression, converting any linear combination of all five v̂i to a linear combination
of just v̂1 , v̂2 , v̂3 , v̂5 :

c1 v̂1 + c2 v̂2 + c3 v̂3 + c4 v̂4 + c5 v̂5 = c1 v̂1 + c2 v̂2 + c3 v̂3 + c4 (−v̂1 − 2v̂2 − v̂3 + 0v̂5 ) + c5 v̂5
= (c1 − c4 )v̂1 + (c2 − 2c4 )v̂2 + (c3 − c4 )v̂3 + c5 v̂5 .

Problem 6.20 Show that if {û, v̂, ŵ} is linearly independent, then {û, û + v̂, û + v̂ + ŵ} is linearly inde-
pendent. (From [14].)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H1

Problem 6.21 Show that {û − v̂, v̂ − ŵ, ŵ − û} is never linearly independent. (From [14].)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H2

The main theorem of this section says that we can test for both of these properties using determinants,
but only when the number m of vectors is the same as the number n of coordinates in each vector in Rn .
Thus when m = n to span is equivalent to being linearly independent. (But when m 6= n they are not
equivalent.)
Theorem 6.16. Suppose v̂1 , . . . , v̂n are n vectors in Rn . The following are equivalent.
a) v̂1 , . . . , v̂n are linearly independent,
b) D(v̂1 , · · · , v̂n ) 6= 0
c) v̂1 , . . . , v̂n span Rn
Proof. We first prove that a) and b) are equivalent. Suppose the v̂i are not linearly independent. That
means there exist scalars ci , not all 0, for which

0̂ = c1 v̂1 + · · · + cn v̂n .

If we let V be the matrix with v̂i as its columns, and ĉ = (c1 , . . . , cn )T , we can rephrase this as saying ĉ
solves
V ĉ = 0̂.
But a second (different) solution is the vector 0̂ of all 0s. According to Cramer’s Rule, this can only be if
det(V ) = D(v̂1 , · · · , v̂n ) = 0.
Conversely assume the determinant in b) is zero: det(V )P = 0 where V is the matrix with v̂i as its columns.
n
From this we want to produce some ci not all 0 for which 1 ci v̂i = 0̂. To do this start by taking W = V T
and ŵi the columns of W (i.e. the rows of V ). By Lemma 6.7

D(ŵ1 , . . . , ŵn ) = det(W ) = det(V ) = 0.

Pick a maximal subset of the ŵi such that they can be complemented with some additional vectors ûj to
produce a nonzero determinant: after renumbering and rearranging,

D(ŵ1 , . . . , ŵk , ûk+1 , . . . , ûn ) 6= 0.

128
By hypothesis, k < n. Let ci = PCin be the cofactors of this determinant along the last column. The above
determinant tells us that 0 6= i ci uin . Therefore cofactors ci cannot be all 0. These will be the ci that we
seek, but we still need to explain why they do what we want.
If we replace ûn (last column) in the above determinant by any of the ŵi the determinant is 0; for
i = 1, . . . , k this is because of a repeated column; for i = k + 1, . . . , n this is because of the maximality of
{ŵ1 , . . . ŵk }. Thus, for each j we have
Xn
ci wij = 0.
1

This is the same as saying


n
X
ci vji = 0 for all j,
1

which in turn is equivalent to


n
X
ci v̂i = 0̂.
1

Since the ci are not all 0, this means that the v̂i are not independent. This completes the proof of the
equivalence of a) and b).
To prove the equivalence of b) and c), observe that Cramer’s Rule says that b) implies that for any û
there exist ci for which
Xn
ci v̂i = û.
1

In other words, b) implies c).


Finally assume c), namely that the v̂i span Rn . In particular, for each j there are some coefficients cij
so that
êj = c1j v̂1 + c2j v̂2 + · · · cnj v̂n . (6.8)
The left side is the j th column of the identity matrix I, and the right side is the j th column of the product
V C, where C = [cij ] is the matrix assembled from all the cij values. In other words (6.8) can be restated as

I = V C.

According to Theorem 6.14, this implies that det(V ) 6= 0. Since det(V ) = D(v̂1 , · · · , v̂n ) this proves that c)
implies b).

Problem 6.22 Use the above theorem to prove the following.


a) If m > n then v1 , . . . vm ∈ Rn are not linearly independent.
b) if m < n then v1 , . . . vm ∈ Rn do not span Rn

[Hint: add some extra 0s so that the theorem can be applied, but be careful to explain why adding the 0s
does not change the spanning or linear independence that you are trying to prove.]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LI

F The Cayley-Hamilton Theorem


In this last section we will use Theorem 6.13 one more time to prove another famous theorem, the Cayley-
Hamilton Theorem, which involves determinants, matrices, and polynomials.
The most familiar way to associate a polynomial with a matrix A is to form its characteristic polynomial.
(You have probably encountered the characteristic polynomial of a matrix before; it is used to compute the
eigenvalues of A for instance.)

129
Definition. Suppose A is an n × n matrix. Its characteristic polynomial is

p(x) = det(xI − A).

In other words, we form the matrix xI − A where x is a variable, and then compute its determinant,
which will be an expression involving that variable.
Example 6.6. For  
3 5
A= ,
2 4
the characteristic polynomial is
 
x−3 −5
p(x) = det = (x − 3)(x − 4) − (−2)(−5) = x2 − 7x + 2.
−2 x−4

Problem 6.23 Explain why p(x) is always a polynomial of degree n (same size as A) with leading coefficient
of 1:
p(x) = xn + lower order powers.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CPdeg

The Cayley-Hamilton Theorem involves substituting x = A in the characteristic polynomial. It is pretty


clear what we should mean by powers of a matrix: A2 = A · A, A3 = A · A · A, and so forth. What we
want to do is substitute the powers of A for the corresponding powers of x in the characteristic polynomial.
We interpret the constant term as a constant times x0 = 1, and replace the x0 by the matrix A0 = I (the
identity matrix).
Example 6.7. Continuing with Example 6.6,

p(x) = x2 − 7x1 + 2x0


p(A) = A2 − 7A + 2I
     
19 35 3 5 1 0
= −7 +2
14 26 2 4 0 1
 
0 0
= .
0 0

Our theorem says that what happened in this example is not a coincidence!

Theorem 6.17 (Cayley-Hamilton). If A is an n × n matrix and its characteristic polynomial is


n
X
p(x) = det(xI − A) = bi x i ,
i=0

then
n
X
p(A) = bi Ai = [0], the zero matrix.
i=0

Problem 6.24 Let A be the matrix  


1 3 5
A = 1 1 2 .
1 0 −1

a) Calculate Adj(A) and check that the identity of Theorem6.13 does hold by calculating both sides.

130
b) Calculate the characteristic polynomial p(x) for A, and verify the conclusion of the Cayley-Hamilton
Theorem by calculating p(A).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . adjex

The Cayley-Hamilton Theorem is often proved using what is called the Jordan canonical form of a matrix,
which you probably have not seen before (and will not see here either). However there is a really nice proof
based on Theorem 6.13. Here is the idea. Let p(x) = det(xI − A) be the characteristic polynomial of A. We
know from Theorem 6.13 that
Adj(xI − A) · (xI − A) = p(x)I, (6.8)
holding for each value of x ∈ R. If we just formally4 plug A in for x on both sides, the right side is
p(A)I = p(A) and the left side is [0], because of the term (xI − A). So this formal manipulation seems to
tell us right away that [0] = p(A), which is exactly what we want to prove! But we need to be careful. We
know what we mean by by substituting A for x in p(x). But the Adj(xI − A) on the left is something more
complicated — it is a matrix with the variable x and its powers appearing in the various entries. Does it
make any sense to plug x = A into that? Our job is to see if we can find a careful way to get from (6.8) to
our desired conclusion that p(A) = [0].
There are two ways we can think of Adj(xI − A). The most natural is to think of it as a matrix with
polynomials as its entries. But there is a second point of view: if we collect up the common powers of x we
can think of it as a polynomial with matrix coefficients.
Example 6.8. Suppose  
1 3 5
A = 2 4 −1 .
1 0 1
With a little work we find that
 2 
x − 5x + 4 3x − 3 5x − 23
Adj(xI − A) =  2x − 3 x2 − 2x − 4 −x + 11  .
2
x−4 3 x − 5x − 2

This is what we mean by a matrix with polynomial entries. But by collecting powers of x we can write it as
     
1 0 0 −5 3 5 4 −3 −23
0 1 0 x2 +  2 −2 −1 x + −3 −4 11  .
0 0 1 1 0 −5 −4 3 −2

This is what we mean by a polynomial with matrix coefficients. (We have written the matrix in front of the
powers of x, as is customary for coefficients.)
If we consider both sides of (6.8) as polynomials with matrix coefficients then at least we know how we
would go about plugging A in for x. But if (6.8) is true for each real number x why should it be true when
x is replaced by something other than a real number, like A? That is the issue the proof ultimately must
answer.
Our proof of the Cayley-Hamilton Theorem is based on a generalization of Lemma 5.1: if two such
polynomials produce the same (matrix) value for each value of the variable x (scalar), then in fact the
coefficient matrices must agree. In case we are uncertain about that, we present it as a lemma.
Lemma 6.18. Suppose Bk , Ck are n × n are matrices for each k = 0, 1, 2, . . . , m, and that for every x ∈ R
m
X m
X
C k xk = Bk xk (as matrices).
0 0

Then Ck = Bk for each k = 0, . . . , m.


4 By “formally” we mean just manipulate the symbols without thinking about whether what we are doing really makes any

sense.

131
Proof. Express both sides as matrices of polynomials:
n
X n
X
Ck xk = [cij (x)], Bk xk = [bij (x)],
0 0

where each cij (x) and bij (x) is a polynomial in x (of degree at most n). The hypothesis is that for each
value of x, the matrices [cij (x)] and [bij (x)] are equal, and therefore

cij (x) = bij (x) for all pairs i, j.

This means the (scalar) coefficients of cij (x) and bij (x) agree. The coefficients of xk in these two polynomials
are the i, j entries of Ck and Bk respectively. It follows then that for each k, all entries of Ck and Bk agree.
Thus Ck = Bk .

We are ready now for our proof, taken from [19].


Proof of Cayley-Hamilton Theorem. Let the characteristic polynomial be
n
X
p(x) = det(xI − A) = bk x k .
k=0

So the matrix coefficients of p(x)I from the right side of (6.8) are bk I. Let Ci be the matrix coefficients of
Adj(xI − A):
n−1
X
Adj(xI − A) = Ci xi
i=0

(Do you see why it has degree at most n − 1 ?) We can now work out the coefficients of the full left side of
(6.8).

Adj(xI − A) · (xI − A) = (C0 x0 + . . . + Cn−2 xn−2 + Cn−1 xn−1 )(xI − A)


= C0 x1 + . . . + Cn−2 xn−1 + Cn−1 xn
− (C0 + C1 x + . . . + Cn−1 xn−1 )A
= −C0 A + (C0 − C1 A)x + . . . + (Cn−2 − Cn−1 A)xn−1 + Cn−1 xn

Now that we have worked out the matrix coefficients of both sides we can apply Lemma 6.18 to (6.8), to
deduce that

b0 I = −C0 A
b1 I = C0 − C1 A
..
.
bn−1 I = Cn−2 − Cn−1 A
bn I = Cn−1 .

We now evaluate p(A), writing bi Ai = (bi I)Ai and using the preceding formulas.

p(A) = b0 I + b1 A1 + . . . + bn−1 An−1 + bn An


= b0 I + b1 IA1 + . . . + bn−1 IAn−1 + bn IAn
= −C0 A + (C0 − C1 A)A + . . . + (Cn−2 − Cn−1 A)An−1 + Cn−1 An
= [0],

because all terms on the right cancel.

132
Qn
Problem 6.25 Suppose that the characteristic polynomial factors as p(x) = i=1 (x − λi ). (By Theo-
rem 5.10 it always Qn does if we allow complex λi . The λi are called the eigenvalues of the matrix A.) Prove
that det(A) = i=1 λi .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ev

Problem 6.26 We can define what we mean for a collection of n × n matricies A, B, . . . to be linearly
independent by analogy with the defintion for vectors on page 127: if
cA A + cB B + · · · = [0] (the matrix of all 0s)
is only true for cA = cB = · · · = 0. Use the Cayley-Hamilton Theorem to explain why I, A, A2 , . . . , An will
never be linearly independent.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . pow

Additional Problems

Problem 6.27 Our definition of sgn(σ) used the natural order relation “<” on the integers. Suppose we
used a different ordering of the integers. Would that lead to a different notion of sgn(σ)? If / is a different
order relation there is a permutation γ connecting it to the natural order in the sense that
i / j if and only if γ(i) < γ(j).
(Take that for granted.) Suppose we defined the inversion count of σ using / instead of <:
r/ (σ) = |{{i, j} : 1 ≤ i / j ≤ n and σ(j) / σ(i)}| .
Prove that we would still get the same result for sgn(σ) = (−1)r/ (σ) as before. Thus the definition of sgn(σ)
is actually independent of the choice of order relation. (Hint: Relate this to γσγ −1 and use Theorem 6.3.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ordind

Problem 6.28 For a permutation π ∈ Sn define the permutation matrix to be Pπ = [pi,j ] where
(
1 if i = π(j)
pij =
0 otherwise.

Show that permutation matricies have the following properties.


a) Pπ Pσ = Pπσ , for any two permutations π and σ.
b) Pπ−1 = Pπ−1 = PπT .
c) det(Pπ ) = sgn(π).
d) If A = [â1 , . . . , âj , . . . , ân ], then AP = [âπ(1) , . . . , âπ(j) , . . . , âπ(n) ]. In other words, right multiplication
by Pπ has the effect of reordering the columns of A according to the permutation π.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . permmat

Problem 6.29 Suppose A] denotes the matrix resulting from rotating A one quarter turn in the counter-
clockwise direction. For instance, if
   
1 2 3 3 6 9
A = 4 5 6 , then A] = 2 5 8 .
7 8 9 1 4 7

133
What is the relationship between det(A) and det(A] ) (in the general n × n case)? [Hint: Problem 6.6 might
be useful.]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . d11

Problem 6.30 Let x1 , x2 , . . . , xn be real numbers. The following determinant is called the Vandermonde
determinant:
1 x1 x21 . . . xn−1
 
1
1 x2 x22 . . . xn−1 
2
Vn = det  . ..  .
 
.. .. . .
 .. . . . . 
2 n−1
1 xn xn . . . xn
Show that its value is given by the following formula:
Y
Vn = (xj − xi ).
1≤i<j≤n

First check the formula directly for n = 2 and n = 3, then prove the general case by induction. Hint: For
the induction, multiply each column by x1 and subtract it from the column to its right, starting from the
right side and working your way to the left. Taking the determinant that way should give you something
like
Vn = (xn − x1 ) . . . (x2 − x1 )Vn−1 .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vdm

Problem 6.31 Explain what Problem 6.30 has to do with the fact that if we are given n + 1 points (xi , yi )
in the plane with distinct xi then there is only one polynomial p(x) of degree n with yi = p(xi ).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vdm2

Problem 6.32 Find and prove a formula (of the same type as in Problem 6.30 above) for this determinant:
 
x1 x1 x1 . . . x1
x1 x2 x2 . . . x2 
 
det x1 x2 x3 . . . x3  .
 
 .. .. .. . . .. 
. . . . . 
x1 x2 x3 ... xn
Under what circumstances is the determinant nonzero?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . d12

Problem 6.33 Prove Lemma 6.12. Like many proofs in this chapter, this mostly requires finding a clear
way to write down the steps involved. To get you started, Ck` = det(B), where B = [bij ] with

0
 if i 6= k, j = `
bij = 1 if i = k, j = `

aij if j 6= `

and Mk` = det F where F = [fij ] is the (n − 1) × (n − 1) matrix with




 ai j if i < k and j < `

a
i j+1 if i < k and ` ≤ j ≤ n − 1
fij =


 ai+1 j if k ≤ i ≤ n − 1 and j < `
ai+1 j+1 if k ≤ i ≤ n − 1 and ≤≤ j ≤ n − 1.

134
Finally, let H = [hij ] where hij = bα(i) β(j) , where α and β are the permutations

α = hk, k + 1, . . . , ni, β = h`, ` + 1, . . . , ni.

F and A are related to each other by means of the same permutations α and β. Put these pieces together,
showing that

Ck` = det(B) = sgn(α) sgn(β) det(H) = sgn(α) sgn(β) det(F ) = sgn(α) sgn(β)Mk` .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . minors

Problem 6.34 The Cayley-Hamilton Theorem tells us that the characteristic polynomial of an n × n
matrix A, p(x) = det(xI − A), has the property that p(A) = 0. But there are other polynomials with this
property. Prove the following.

1. If q(x) is another polynomial and p(x) divides q(x), then q(A) = 0.


2. Any two polynomials of smallest possible degree with p(A) = [0] are constant multiples of each other.
So there exists a unique polynomial m(x) of smallest possible degree with the property that m(A) = 0
and with leading coefficient 1. This m(x) is called the minimal polynomial of A.

3. Find both the characteristic and minimal polynomials for each of the following matricies.
(a) A = I, the n × n identity matrix.
 
1 1 0
(b) A = 0 1 0.
0 0 1
 
1 1 0
(c) A = 0 1 1.
0 0 1

4. Let m(x) be the minimal polynomial of A and q(x) any other polynomial. Prove that q(A) = 0 iff
m(x) divides q(x). [Hint: Use the division theorem.]

5. The minimal polynomial must divide the characteristic polynomial.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . minpoly

135
Appendix: Mathematical Words

Here is a list of some words and phrases that are common in mathematical writing but which have a somewhat
different meaning than in everyday usage. They are ordered so that those with similar or related meanings
are close to each other.
theorem — A theorem is a statement of fact that the author wants to highlight as one of the most important
conclusions of his whole discussion, something he intends his readers will remember and refer back to
even after they have forgotten the details of the proof. It’s the way we record major landmarks
of mathematical learning. Examples are the Pythagorean Theorem, the Fundamental Theorem of
Algebra. Sometimes they end up with people’s names attached to them, like the Cantor’s Theorem,
Rolle’s Theorem or Fermat’s Last Theorem.
lemma — A lemma is like a little theorem, which is important for the paper in which it appears, but
probably won’t have much importance beyond that. Often it is just a tool developed in preparation
for proving the big theorem the author has in mind.
proposition — We have used “proposition” to stand for any statement with an unambiguous mathematical
meaning. A stated Proposition (with a capital P, like Proposition 1.3) is about the same as a lemma,
perhaps even of slightly less lasting importance.

corollary — A corollary is an easily obtained consequence of a theorem or lemma that precedes it. See for
example the corollary to Lemma 1.8.
claim — In a proof we sometimes announce an assertion that we are going to prove before we prove it.
This can help the reader know where we are headed. We might write “We claim that P . To see this
. . . ” It means the same as if we said “We are about to prove P. The proof of P is as follows.” See for
example the proof of Theorem 3.7.
conjecture — A conjecture is a statement of something the author thinks is probably true, but doesn’t know
how to prove. A published conjecture is often taken as a challenge to the mathematical community
to come up with a proof (or disprove it). An example is the Twin Primes Conjecture, described in
Section E.

thus, therefore, hence — These words mean that what came before provides the reason for what comes
next. For instance, “. . . x = (. . .)2 , therefore x ≥ 0.”
it follows that — Sort of like “thus” and “therefore” except that the reasons for what comes next may not
be what immediately precedes “it follows that.” The way most people use it, it means “Somewhere in
what we know and have assumed are reasons for what comes next, but we are not going to spell them
all out. You can find them for yourself.”
we deduce — This is about the same as “it follows that.”
we have — This means “we know” or “we have established the truth of.” It sometimes serves as a reminder
of things from earlier in an argument.

136
obviously — This means the author thinks that what is said next is so clear that no explanation is needed.
Sometimes it means that the next assertion is tedious to prove and the author just doesn’t want to
be bothered with it. “Clearly,” and “it is easy to see” are used the same way. The phrase “as the
reader can check” is pretty close as well, but with that phrase the author is acknowledging that there
is something to be worked out, but which is deliberately left out. For veteran mathematicians writing
for each other, these are often used to shorten a proof by leaving out parts with the confidence that
the reader can fill them in on his/her own. As students be very careful resorting to these. Are you
positive you could prove what you are leaving out? If not, put it in!
since — This typically announces the reason for what comes next: “Since x = (. . .)2 we know x ≥ 0.”
Sometimes we reverse the order: “We know x ≥ 0, since x = (. . .)2 .”

let — This is used several ways. Sometimes it announces notation, as in “let A be a 3×3 matrix.” Sometimes
is announces a hypothesis for the next phase of the argument. For instance “let n be even . . . ;” and
then later “let n be odd.” It might specify the value of a variable to be used in the next part of a
calculation, “let x = 3, then . . . ”
given — This is sometimes used to refer to something we know from hypotheses or an assumption. We used
it this way in the paragraph just after the proof of Theorem 1.9, “given our understanding of infinite
...”
suppose — To suppose something is to assume it. We often say suppose as a way of announcing that we
are invoking the hypotheses. See the first sentence of the proof of Theorem 1.2 for instance.
provided — This is a way of identifying some additional assumption that is needed for what was just said,
or is about to be. For instance, “x2 is always nonnegative, provided x is a real number.”
prove, show — These mean roughly the same thing. Many would say that to show something, means to
describe the main idea of a proof without being quite as complete as a full-dressed proof, a sort of
informal or lazy-man’s proof.

w.l.o.g — An abbreviation for “without loss of generality.” Sometimes something we wish to prove can
be reduced to a special case. We might say “without loss of generality” to announce the special case,
meaning the reader to understand that the general case can be reduced to the special case actually
being considered. Our proof of the Fundamental Theorem of Algebra uses this technique.
such that — This means “with the property
√ that;” it announces a property that is required of the object
being described. For example, “ 2 refers to the positive real number y such that y 2 = 2.”
a, the — When referring to some mathematical object we use “the” when there is only one such object,
and “a” when there could be several to choose from. If you wrote “let r be the root of x2 − 4 = 0,”
the reader would take that as a claim that there is only one such root (and complain because in fact
there are more than one). If you wrote “let r be a root of x2 − 4 = 0,” then I would understand that I
could take my choice between the two possibilities r = 2, −2 and proceed with either one. (What you
say after better work for both of them!)
moreover — This is used to continue with some additional conclusions in a new sentence. See Step 4 of
the proof of the Fundamental Theorem of Algebra in Chapter 5.
indeed — This is often used to give reasons for an assertion that was just made with no justification. See
the beginning of the proof of the Fundamental Theorem of Algebra.

137
Appendix: The Greek Alphabet and
Other Notation

spelling l. case u. case


alpha α
beta β
We use symbols and specialR notation a lot in mathematics. There are
gamma γ Γ
many special symbols, like , ∂ ≤, ÷, ∞, . . . that have no meaning
delta δ ∆
outside mathematics. But we also use conventional letters. Since the
epsilon 
standard Latin alphabet (a, b, . . . , z) is not enough, we also use capitals,
zeta ζ
and sometimes other typefaces (A, A, A). We use a lot of Greek charac-
eta η
ters, and a few from other alphabets (The symbol ℵ (aleph) is Hebrew
theta θ Θ
for instance). At right is a table of those Greek letters that are used in
iota ι
mathematics. (Those in the table are those supported by the standard
kappa κ
mathematical typesetting language LATEX, with which this book is writ-
lambda λ Λ
ten.) Some Greek letters are not used because they look too much like
mu µ
Latin characters. For instance, an uppercase alpha is indistinguishable
nu ν
from A, the lower case upsilon is very close to the italic Latin v, and
xi ξ Ξ
both cases of omicron look just like our o (or O). To further extend the
omicron
list of available symbols we add accents, like a0 , α̃, x̄, fˆ. pi π Π
Some symbols are customary for certain purposes. For instance the rho ρ
letters i, j, k, l, m, n are often used when only integer values are intended. sigma σ Σ
Similarly, z is sometimes used for complex numbers, as opposed to real tau τ
numbers. Such conventions are another way we limit the possible scope upsilon Υ
of what we say, so that the meaning is unambiguous. However these phi φ Φ
conventions are never universal. psi ψ Ψ
chi χ
omega ω Ω

138
Bibliography

[1] D. Acheson, 1089 and All That: A Journey into Mathematics, Oxford Univ. Press, NY, 2002.
[2] J. L. Brown Jr., Zeckendorf ’s Theorem and Some Applications, Fibonacci Quarterly vol. 2 (1964),
pp. 163–168.
[3] Margherita, Barile, Curry Triangle, from MathWorld–A Wolfram Web Resource, created by Eric W.
Weisstein, http://mathworld.wolfram.com/CurryTriangle.html.
[4] Klaus Barner, Paul Wolfskehl and the Wolfskehl prize, Notices of the AMS vol. 44 no. 10 (1997),
pp. 1294–1303.
[5] Edward J. Barbeau, Mathematical Fallacies, Flaws and Flimflam, The Mathematical Associ-
ation of America, 2000.
[6] A. T. Benjamin and J. J. Quinn, Proofs That Really Count: The Art of Combinatorial
Proof, The Mathematical Association of America, 2003.
[7] R. H. Cox, A proof of the Schroeder-Bernstein Theorem, The American Mathematical Monthly v. 75
(1968), p. 508.
[8] Keith Devlin, Mathematics, The New Golden Age, Columbia Univ. Press, NY, 1999.
[9] P. J. Eccles, An Introduction to Mathematical Reasoning: Numbers, Sets and Functions,
Cambridge Univ. Press, Cambridge, UK, 1997.
[10] P. Fletcher and C. W. Patty, Foundations of Higher Mathematics, Brooks/Cole Publishing,
Pacific Grove, CA, 1996.
[11] P. R. Halmos, Naive Set Theory, Springer-Verlag, New York, 1974.
[12] Leonard Gillman, Writing Mathematics Well: a Manual for Authors, Mathematical Associ-
ation of America, 1987.
[13] G. H. Hardy, A mathematician’s Apology, Cambridge University Press, London, 1969.
[14] Jim Hefferon, Linear Algebra, http://joshua.smcvt.edu/linearalgebra/linalg.html.
[15] Vilmos Komornik, Another Short Proof of Descartess Rule of Signs, The American Mathematical
Monthly v. 113 (2006), pp. 829–830.
[16] E. Landau, Differential and Integral Calculus, Chelsa, NY, 1951.
[17] A. Levine, Discovering Higher Mathematics, Four Habits of Highly Effective Mathe-
maticians, Academic Press, San Diego, 2000.
[18] P. D. Magnus, Forall x: An Introduction to Formal Logic, version 1.22, 2005,
http://www.fecundity.com/logic/
[19] M. Marcus and H. Minc, A Survey of Matrix Theory and Matrix Inequalities, Dover Publi-
cations, NY, 1992.

139
[20] T. Nagell, Introduction to Number Theory, Wiley, New Your, 1951.
[21] E. Nagel and J. R. Newman, Gödel’s Proof, New York University Press, 1958.
[22] Roger B. Nelson, Proofs Without Words, MAA, Washington, DC, 1993.
House,

[23] J. Nunemacher and R. M. Young, On the sum of consecutive Kth Powers, Mathematics Magazine v. 60
(1987), pp. 237–238.
[24] Michael Spivak, Calculus (third edition), Publish or Perish, Inc., Houston, TX, 1994.
[25] J. Stewart, Calculus: Early Transcendentals (fourth edition), Brooks/Cole, Pacific Grove, CA,
1999.
[26] D. Veljan, The 2500-year-old Pythagorean Theorem, Mathematics Magazine v. 73 (no. 4, Oct. 2000),
pp. 259—272.
[27] Robin J. Wilson, Four Colors Suffice: How the Map Problem was Solved, Princeton Univer-
sity Press, Princeton, NJ, 2002.

140
Index

absolute value, 1 determinant (of a matrix), 110, 117


additive identity, 73 difference (of sets), 55
additive inverse, 73 disjoint, 56
adjoint (of a matrix), 124 divisible, 15; for polynomials, 96
and (logical connective), 25 Division Theorem, 80; for polynomials, 96, 98
antecedent, 27 domain (of function), 63
associative law, 73 element (of a set: ∈), 54
axioms (for integers), 76 empty set (∅), 55
bijective, 64 equipotent sets ('), 68
Binet’s Formula, 47 equivalence (logical), 29, 39
binomial coefficient, 23 equivalence class, 62
Binomial Theorem, 23 equivalence relation, 62
Cantor’s Theorem, 69 Euclidean Algorithm, 83
cardinality, 68 existence proofs, 41
cardinal numbers, 71 factorial, 20
Cartesian product (×), 60 Fermat’s Last Theorem, 51, 90
cases, 38 Fibonacci numbers, 46
Cayley-Hamilton Theorem, 130 finite set, 68
characteristic polynomial, 130 for all (logical quantifier), 31, 39
codomain (of function), 63 for some (logical quantifier), 31
coefficient, 93 Four Color Problem, 51
cofactor, 123 function, 63
cofactor expansion, 123 Fundamental Theorem of Arithmetic, 15, 86
column operations (for determinants), 119 Fundamental Theorem of Algebra, 105
commutative law, 73 generalized induction, 44
complement (of a set), 56 Generalized Induction Principle, 80
complex numbers (C), 55, 103 generalized polynomial, 99
composite number, 15, 15 Generalized Well-Ordering Principle, 79
composition (of functions), 64 graph (of function), 63
congruence modulo m, 87 greatest common divisor, 81
conjugate, 103 hypotheses, 31
conjecture, 25 identity matrix, 118
consequent, 27 image of a set, 67
context, 31 imaginary part (of complex number), 103
Continuum Hypothesis, 71 implication, 27, 39
contradiction (proof by), 42 implicit quantifier, 35
contrapositive, 28 indexed families, 58
converse, 28 induction, 43, 44, 45
countable set, 69 Induction Principle, 79
countably infinite, 69 infinite set, 69
Cramer’s Rule, 126 injective, 64
Curry Triangle, 11 integers (Z), 55
cycle (permutation), 113 integers modulo m, 87
degree (of polynomial), 95 intersection (of sets), 55
Descartes’ Rule of Signs, 99 inverse matrix, 125

141
inverse function, 65 transpose (of a matrix), 118
inverse image of a set, 67 transposition, 113
inversion, iversion count (for a permutation) 116 Triangle Inequality, 2
invertible matrix, 125 triangular matrix, 117
irrational number, 16 truth table, 26
leading coefficient, 95 Twin Primes Conjecture, 51, 90
linear combination, 126 uncountable set, 69
linear independence, 127 undecidable, 90
matrix, 110 union (of sets), 55
maximum (of two real numbers), 4 uniqueness proofs, 41
minimal polynomial, 135 vacuous statements, 35
minor (of a matrix), 125 Well-Ordering Principle, 74, 78
modulus (of complex number), 103 Zeckendorf’s Theorem, 48
multiplicative identity, 73 zero (of polynomial), 95
multiplicity (of root), 100 zero divisor, 89
natural numbers (N), 55 zero polynomial, 95
negation (logical), 25, 35
nontriviality (axiom of Z), 76
number field, 94 Please let me know of any additional items you think
open statements, 30 ought to be included in the index.
or (logical connective), 26
permutation, 111
Pigeon Hole Principle, 68
polynomial, 93; matrix coefficients, 131
power set, 61
prime number, 15, 15, 86
proposition, 25
Pythagorean Theorem, 10
Pythagorean triple, 92
quotient, 80
rational numbers (Q), 55
real numbers (R), 55
real part (of complex number), 103
range (of function), 64
Rational Root Theorem, 98
relation, 61
rational number, 16
recursive definitions, 46
reflexive (relation), 61
remainder, 80
relatively prime, 85
root (of polynomial), 95, 98
Russell’s Paradox, 71, 90
Schroeder-Bernstein Theorem, 70
set, 54
sign or signum function, 5; of a permutation,116
span (vectors of Rn ), 127
square root, 5
Square Root Lemma, 6
strong induction, 45
subset, 55
surjective, 64
symmetric (relation), 61
transitive (relation), 61

142

You might also like