Introduction To Real Analysis Fall 2014 Lecture Notes: Vern I. Paulsen November 6, 2014

Download as pdf or txt
Download as pdf or txt
You are on page 1of 81

Introduction to Real Analysis

Fall 2014 Lecture Notes


Vern I. Paulsen
November 6, 2014

Contents
1 Metric Spaces
1.1 Definition and Examples . . . . . . . . . . .
1.1.1 Vector Spaces, Norms and Metrics .
1.2 Open Sets . . . . . . . . . . . . . . . . . . .
1.2.1 Uniformly Equivalent Metrics . . . .
1.3 Closed Sets . . . . . . . . . . . . . . . . . .
1.4 Convergent Sequences . . . . . . . . . . . .
1.4.1 Further results on convergence in Rk
1.4.2 Subsequences . . . . . . . . . . . . .
1.5 Interiors, Closures, Boundaries of Sets . . .
1.6 Completeness . . . . . . . . . . . . . . . . .
1.7 Compact Sets . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

2 Finite and Infinite Sets, Countability


3 Continuous Functions
3.1 Functions into Euclidean space . . . . . . . . . . . .
3.2 Continuity of Some Basic Functions . . . . . . . . .
3.3 Continuity and Limits . . . . . . . . . . . . . . . . .
3.4 Continuous Functions and Compact Sets . . . . . . .
3.5 Connected Sets and the Intermediate Value Theorem

5
5
9
10
12
14
17
21
23
24
26
30
37

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

41
45
47
49
51
52

4 The Contraction Mapping Principle


55
4.1 Application: Newtons Method . . . . . . . . . . . . . . . . . 57
4.2 Application: Solution of ODEs . . . . . . . . . . . . . . . . . 58
5 Riemann and Riemann-Stieltjes Integration
63
5.1 The Riemann-Stieltjes Integral . . . . . . . . . . . . . . . . . 68
5.2 Properties of the Riemann-Stieltjes Integral . . . . . . . . . . 74
5.3 The Fundamental Theorem of Calculus . . . . . . . . . . . . . 79
3

CONTENTS

Chapter 1

Metric Spaces
These notes accompany the Fall 2011 Introduction to Real Analysis course

1.1

Definition and Examples

Definition 1.1. Given a set X a metric on X is a function d : X X R


satisfying:
1. for every x, y X, d(x, y) 0,
2. d(x, y) = 0 if and only if x = y,
3. d(x, y) = d(y, x),
4. (triangle inequality) for every x, y, z X, d(x, y) d(x, z) + d(z, y).
The pair (X, d) is called a metric space.
Example 1.2. Let X = R and set d(x, y) = |x y|, then d is a metric on
R. We call this the usual metric on R.
To prove it is a metric we verify (1)(4). For (1): d(x, y) = |x y| 0,
by the definition of the absolute value functions so (1). Since d(x, y) = 0 if
and only if |x y| = 0 if and only if x = y, (2) follows. (3) follows since
d(x, y) = |x y| = |y x| = d(y, x). Finally, for (4), d(x, y) = |x y| =
|x z + z y| |x z| + |z y| = d(x, z) + d(z, y).
Example 1.3 (The taxi cab metric). Let X = R2 . Given x = (x1 , x2 ), y =
(y1 , y2 ), set d(x, y) = |x1 y1 | + |x2 y2 |, then d is a metric on R2 .
5

CHAPTER 1. METRIC SPACES

We verify (1)(4). (1) and (3) are obvious. For (2): d(x, y) = 0 iff
|x1 y1 | + |x2 y2 | = 0. But since both terms in the sum are non-negative
for the sum to be 0, each one must be 0. So d(x, y) = 0 iff |x1 y1 | = 0
AND |x2 y2 | = 0 iff x1 = y1 and x2 = y2 iff x = (x1 , x2 ) = (y1 , y2 ) = y.
Finally to see (4):
d(x, y) = |x1 y1 | + |x2 y2 | = |x1 z1 + z1 y1 | + |x2 z2 + z2 y2 |
|x1 z1 | + |z1 y1 | + |x2 z2 | + |z2 y2 | = d(x, z) + d(z, y).
We often denote the taxi cab metric by d1 (x, y).
Example 1.4. A different metric on R2 . For x = (x1 , x2 ), y = (y1 , y2 ) set
d (x, y) = max{|x1 y1 |, |x2 y2 |}. So the distance between two points is
the larger of these two numbers.
We only check the triangle inequality. Let z = (z1 , z2 ) be another point.
We have two cases to check. Either |x1 y1 | = d(x, y) OR d(x, y) = |x2 y2 |.
Case 1: d(x, y) = |x1 y1 |.
Now notice that |x1 z1 | max{|x1 z1 |, |x2 z2 |} = d(x, z). Similarly,
|z1 y1 | max{|z1 y1 |, |z2 y2 |}. Hence,
d(x, y) = |x1 y1 | = |x1 z1 + z1 y1 | |x1 z1 | + |z1 y1 |
d(x, z) + d(z, y).
Case 2: d(x, y) = |x2 y2 |. Now use |x2 z2 | max{|x1 z1 |, |x2 z2 |} =
d(x, z). Similarly, |z2 y2 | max{|z1 y1 |, |z2 y2 |}. Hence,
d(x, y) = |x2 y2 | = |x2 z2 + z2 y2 | |x2 z2 | + |z2 y2 |
d(x, z) + d(z, y).
So in each case the triangle inequality is true, so it is true.
Example 1.5. In this case we let X be the set of all continuous real-valued
functions on [0, 1]. We use three facts from Math 3333:
1. if f and g are continuous on [0,1], then f g is continuous on [0,1],
2. if f is continuous on [0,1], then |f | is continuous on [0,1],
3. if h is continuous on [0,1], then there is a point 0 t0 1, so that
h(t) h(t0 ) for every 0 t 1. That is h(t0 ) = max{h(t) : 0 t
1}.

1.1. DEFINITION AND EXAMPLES

Now given f, g X, we set d(f, g) = max{|f (t) g(t)| : 0 t 1}.


Note that by (1) and (2) |f g| is continuous and so by (3) there is a
point where it acheives its maximum.
We now show that d is a metric on X. Clearly, (1) holds. Next, if
d(f, g) = 0, then the maximum of |f (t) g(t)| is 0, so we must have that
|f (t) g(t)| = 0 for every t. But then this means that f (t) = g(t) for every
t, and so f = g. So d(f, g) = 0 implies f = g. Also f = g implies d(f, g) = 0
so (2) holds. Clearly (3) holds. Finally to see the triangle inequality, we let
f, g, h be three continuous functions on [0,1]. that is, f, g, h X. We must
show that d(f, g) d(f, h) + d(h, g).
We know that there is a point t0 , 0 t0 1, so that
d(f, g) = max{|f (t) g(t)| : 0 t 1} = |f (t0 ) g(t0 )|.
Hence,
d(f, g) = |f (t0 ) g(t0 )| = |f (t0 ) h(t0 ) + h(t0 ) g(t0 )|
|f (t0 ) h(t0 )| + |h(t0 ) g(t0 )|
max{|f (t) h(t)| : 0 t 1} + max{|h(t) g(t)| : 0 t 1}
= d(f, h) + d(h, g)
Example 1.6 (Euclidean space, Euclidean metric). Let X = Rn the set of
real n-tuples. For x = (a1 , . . . , an ) and y = (b1 , . . . , bn ) we set
p
d(x, y) = (a1 b1 )2 + + (an bn )2 .
This defines a metric on Rn , which we will prove shortly. This metric
is called the Euclidean metric and (Rn , d) is called Euclidean space.
Sometimes, we will write d2 for the Euclidean metric.
It is easy to see that the Euclidean metric satisfies (1)(3) of a metric.
It is harder to prove the triangle inequality for the Euclidean metric than
some of the others that we have looked at. This requires some results first.
Lemma 1.7. Let p(t) = at2 + bt + c with a 0. If p(t) 0 for every t R,

then b2 4ac.
Proposition 1.8 (Cauchy-Schwarz Inequality). Let a1 , ..., an , b1 , ..., bn be
real numbers. Then
q
q
2
2
|a1 b1 + an bn | a1 + + an b21 + + b2n .

CHAPTER 1. METRIC SPACES

Proof. Look at p(t) = (ta1 + b1 )2 + + (tan + bn )2 0 for all t.


p
Corollary
1.9
(Minkowskis
Inequality).
(a1 + b1 )2 + + (an + bn )2
p
p
2
2
2
2
a1 + + an + b1 + + bn .
Proof. Let LHS denote the left hand side, RHS the right hand side of the
inequality. then
(LHS)2 = (a1 +b1 )2 + +(an +bn )2 = a21 + +a2n +2(a1 b1 + +an bn )+b21 + +b2n
q
q
a21 + + a2n + 2 a21 + a2n b21 + + b2n + b21 + + b2n
q
q
= ( a21 + + a2n + b21 + + b2n )2 = (RHS)2

Now prove the triangle inequality.


Example 1.10 (The discrete metric). Let X be any non-empty set and
define
(
1 x 6= y
d(x, y) =
.
0 x=y
Then this is a metric on X called the discrete metric and we call (X, d)
a discrete metric space.
Example 1.11. When (X, d) is a metric space and Y X is a subset,
then restricting the metric on X to Y gives a metric on Y, we call (Y, d) a
subspace of (X,d).
Problem 1.12. Let X = R, define d(x, y) = |x y| + 1. Show that this is
NOT a metric.
Problem 1.13. Let X = R, define d(x, y) = |x2 y 2 |. Show that this is
NOT a metric.
Problem 1.14. Let X = R, define d(x, y) = |x y| + |x2 y 2 |. Prove that
this is a metric on R.
Problem 1.15. Let X = Rn for x = (a1 , ..., an ), y = (b1 , ..., bn ) define
d1 (x, y) = |a1 b1 | + + |an bn |.
Prove that this is a metric.
Problem 1.16. Let X = Rn for x and y as before, define
d (x, y) = max{|a1 b1 |, ..., |an bn |}.
Prove that this is a metric.

1.1. DEFINITION AND EXAMPLES

1.1.1

Vector Spaces, Norms and Metrics

Definition 1.17. Let X be a real vector space. A function k k : X R is


called a norm provided that
1. kxk 0 for all x,
2. kxk = 0 if and only if x = 0,
3. krxk = |r|kxk for every r R and x X,
4. (triangle inequality) kx + yk kxk + kyk.
The next result summarizes the relation between this concept and norms.
Proposition 1.18. Let X be a real vector space and let k k be a norm on
X. Then setting d(x, y) = kx yk defines a metric on X.
Proof. We have that d(x, y) = kx yk 0 so property (1) of a metric holds.
We have that d(x, y) = 0 iff kx yk = 0 iff x y = 0 iff x = y, so
property (2) of a metric holds.
Also, d(x, y) = kx yk = k(1)(y x)k = | 1| ky xk = ky xk =
d(y, x), so property (3) of a metric holds.
Finally, d(x, y) = kx yk = kx z + z yk kx zk + kz yk =
d(x, z) + d(z, y) so property (4) of a metric holds.
The metric that one gets from a norm by using this result is called the
metric induced by the norm.
Example 1.19. For x = (a1 , ..., an ) Rn if we set kxk2 =
then this defines a norm called the Euclidean norm.

p
a21 + + a2n

To see that this is a norm, it is easy to check the first 3 properties


needed for a norm. To see the fourth property, use Corollary 1.9. Thus, the
Euclidean metric is the metric induced by the Euclidean norm via Proposition 1.18.
Example 1.20. For x = (a1 , a2 ) R2 if we set kxk1 = |a1 | + |a2 | and
kxk = max{|a1 |, |a2 |} then these both define norms on R2 and the metrics
that they induce are respectively, d1 the taxi cab metric and d .

10

1.2

CHAPTER 1. METRIC SPACES

Open Sets

Definition 1.21. Let (X,d) be a metric space, fix x X and r > 0. The
open ball of radius r centered at x is the set
B(x; r) = {y X : d(x, y) < r}.
Example 1.22. In R with the usual metric, B(x; r) = {y : |x y| < r} =
{y : x r < y < x + r} = (x r, x + r).
Example 1.23. In R2 with the Euclidean metric, x = (x1 , x2 ), then B(x; r) =
{(y1 , y2 ) : (x1 y1 )2 + (x2 y2 )2 < r2 }, which is a disk of radius r centered
at x.
Example 1.24. In R3 with the Euclidean metric, B(x; r) really is an open
ball of radius r. This example is where the name comes from.
Example 1.25. In (R2 , d ) we have
B(x; r) = {(y1 , y2 ) : |x1 y1 | < r and |x2 y2 | < r} =
{(y1 , y2 ) : x1 r < y1 < x1 + r and x2 r < y2 < y2 + r}.
So now an open ball is actually an open square, centered at x with sides
of length 2r.
Example 1.26. In (R2 , d1 ) we have B((0, 0); 1) = {(x, y) : |x0|+|y 0| <
1} which can be seen to be the diamond with corners at (1,0),(0,1),(-1,0),
(0,-1).
Example 1.27. When X = {f : [0, 1] R|f is continuous} and d(f, g) =
max{|f (t)g(t)| : 0 t 1}, then B(f ; r) = {g : g is continuous and f (t)
r < g(t) < f (t) + r, t}. This can be pictured as all continuous functions g
whose graphs lie in a band of width r about the graph of f.
Example 1.28. If we let R have the usual metric and let Y = [0, 1] R
be the subspace, then when we look at the metric space Y we have that
B(0; 1/2) = [0, 1/2) = (1/2, 1/2) Y.
Example 1.29. When we let X be a set with the discrete metric and x X,
then B(x; r) = {x} when r 1. When r > 1, then B(x; r) = X.
Definition 1.30. Given a metric space (X,d) a subset O X is called open
provided that whenever x O, then there is an r > 0 such that B(x; r) O.

1.2. OPEN SETS

11

Showing that sets are open really requires proof, so we do a few examples.
Example 1.31. In R with the usual metric, an interval of the form (a, b) =
{x : a < x < b} is an open set.
Proof. Given x (a, b), let r = min{x a, b x}. If y B(x; r) then x r <
y < x + r. Since y < x + r x + (b x) = b and y > x r x (x a) = a
we have a < y < b. So B(x; r) (a, b). Thus, (a, b) is open.
So open intervals really are open sets.
Example 1.32. In R2 with the Euclidean metric, a rectangle of the form
R = {(y1 , y2 ) : a < y1 < b, c < y2 < d} is an open set.
Proof. Given x = (x1 , x2 ) R, let r = min{x1 a, b x1 , x2 c, d x2 }.
We claim that B(x; r) R. To see this let y = (y1 , y2 ) B(x; r), then
(y1 x1 )2 + (y2 x2 )2 < r2 .
Looking at just one term in the sum, we see that (y1 x1 )2 < r2 and so
|y1 x1 | < r. This implies that x1 r < y1 < x1 + r. As before we see that
x1 + r b and x1 r a, so a < y1 < b.
Similarly, we get that c < y2 < d, so y R.
Example 1.33. In R with the usual metric an interval of the form [a, b) is
not open.
Proof. Consider the point a [a, b). Any ball about this point is of the form
B(a; r) = (a r, a + r) and this contains points that are less than a and so
not in the set [a, b).
CAREFUL! If we let Y = [0, 1) R be the subspace. Then in Y the set
O = [0, 1/2) is open!(Explain why.)
The next result justifies us calling B(x; r) an open ball.
Proposition 1.34. Let (X,d) be a metric space, fix x X and r > 0. Then
B(x; r) is an open set.
Theorem 1.35. Let (X,d) be a metric space. Then
1. the empty set is open,
2. X is open,
3. the union of any collection of open sets is open,
4. the intersection of finitely many open sets is open.

12

CHAPTER 1. METRIC SPACES

Proposition 1.36. In a discrete metric space, every set is open.


Problem 1.37. Prove that the set O = {(y1 , y2 ) : y1 + y2 > 0} is an open
subset of R2 in the Euclidean metric.
Problem 1.38. Prove that the set O = {(y1 , y2 ) : y1 > 0} is an open subset
of R2 in the Euclidean metric.
Problem 1.39. Prove that the disk D = {(y1 , y2 ) : y12 + y22 < 1} is an open
subset of (R2 , d ).
Problem 1.40. Let X be the set of continuous real-valued functions on [0,1]
with the metric that we introduced. Prove that O = {g X : g(t) > 0t} is
an open subset.

1.2.1

Uniformly Equivalent Metrics

The definition of open set really depends on the metric. For example, on
R if instead of the usual metric we used the discrete metric, then every set
would be open. But we have seen that when R has the usual metric, then
not every set is open. For example, [a, b) is not an open subset of R in the
usual metric. Thus, whether a set is open or not really can depend on the
metric that we are using.
For this reason, if a given set X has two metrics, d and , and we say
that a set is open, we generally need to specify which metric we mean.
Consequently, we will say that a set is open with respect to d or open
in (X, d) when we want to specify that it is open when we use the metric
d. In this case it may or may not be open with respect to .
In the case of R2 , we already have three metrics, the Euclidean metric
d, the taxi cab metric d1 and the metric d . So when we say that a set is
open in R2 , we could potentially mean three different things. On the other
hand it could be the case that all three of these metrics give rise to the same
collection of open sets.
In fact, these three metrics do give rise to the same collections of open
sets and the following definition and result explains why.
Definition 1.41. Let X be a set and let d and be two metrics on X. We
say that these metrics are uniformly equivalent provided that there are
constants A and B such that for every x, y X,
(x, y) Ad(x, y) and d(x, y) B(x, y).

1.2. OPEN SETS

13

Example 1.42. On R2 the Euclidean d and the metric d are uniformly


equivalent. In fact,

d (x, y) d(x, y) and d(x, y) 2d (x, y).


p
2
2
To see this, let
px = (a1 , a2 ), y = (b1 , b2 ). Since |a1 b1 | (a1 b1 ) + (a2 b2 )
2
2
and |a2 b2 | (a1 b1 ) + (a2 b2 ) , we have that
p
d (x, y) = max{|a1 b1 |, |a2 b2 |} (a1 b1 )2 + (a2 b2 )2 = d(x, y).
On the other hand, since |a1 b1 | d (x, y) and |a2 b2 | d (x, y),
we have that (a1 b1 )2 + (a2 b2 )2 2(d (x, y))2 . Taking square roots of
both sides, yields d(x, y) 2d (x, y).
Example 1.43. On R2 the Euclidean metric d and d1 are uniformly equivalent. In fact,

d(x, y) d1 (x, y) and d1 (x, y) 2d(x, y).


We have that
d1 (x, y)2 = (|a1 b1 | + |a2 b2 |)2
= |a1 b1 |2 + 2|a1 b1 ||a2 b2 | + |a2 b2 |2 (a1 b1 )2 + (a2 b2 )2
= (d(x, y))2 .
Hence, d(x, y) d1 (x, y).
To see the other inequality, we use the Schwarz inequality,
d1 (x, y) = ||a1 b1 | 1 + |a2 b2 | 1|
p
p

|a1 b1 |2 + |a2 b2 |2 12 + 12 = 2d(x, y)


Given a set X with metrics d and , a point x X and r > 0, we shall
write Bd (x; r) = {y X : d(x, y) < r} and B (x; r) = {y X : (x, y) < r}.
Lemma 1.44. Let X be a set, let d and be two metrics on X that are
uniformly equivalent, and let A and B denote the constants that appear in
Definition 1.41. Then B (x; r) Bd (x; Br) and Bd (x; r) B (x; Ar).
Proof. If y B (x; r), then (x, y) < r which implies that d(x, y) B(x, y) <
Br. Hence, y Bd (x; Br) and so B (x; r) Bd (x; Br). The other case is
proven similarly, using the other inequality.

14

CHAPTER 1. METRIC SPACES

Theorem 1.45. Let X be a set and let d and be metrics on X that are
uniformly equivalent. Then a set is open with respect to d if and only if it
is open with respect to .
Proof. Let E X be a set that is open with respect to d. We must prove
that E is open with respect to .
Since E is open with respect to d, given x E there is an r > 0 so that
Bd (x; r) E. By the lemma, B (x; r/B) Bd (x; r). Thus, B (x; r/B) E
and we have shown that given an arbitrary point in E that there is an open
ball in the metric about that point that is contained in E. Hence, E is
open with respect to .
The proof that a set that is open with respect to is open with respect
to d is similar and uses the other containment given in the lemma.
Problem 1.46. Let X be a set with three metrics, d, , and . Prove that
if d and are uniformly equivalent and and are uniformly equivalent,
then d and are uniformly equivalent.
Problem 1.47. Prove that on Rn the Euclidean metric, the metric d1 and
the metric d are all uniformly equivalent.

1.3

Closed Sets

Definition 1.48. Given a set X and E X, the complement of E,


denoted E c is the set of all elements of X that are not in E, i.e.,
E c = {x X : x
/ E}.
Other notations that are used for the complement are E c = CE = X\E.
Note that (E c )c = E.
Definition 1.49. Let (X, d) be a metric space. Then a set E X is closed
if and only if E c is open.
The following gives a useful way to re-state this definition.
Proposition 1.50. Let (X, d) be a metric space. Then a set E is closed if
and only if there is an open set O such that E = Oc .
Proof. If E is closed, then E c is open. So let O = E c , then E = Oc .
Conversely, if E = Oc for some open set O, then E c = (Oc )c = O, which
is open, so by the definition, E is closed.

1.3. CLOSED SETS

15

Example 1.51. In R with the usual metric, we have that (b, ) is open
and (, a) is open. So when a < b we have that O = (, a) (b, +)
is open. Hence, Oc = [a, b] is closed.
This shows that our old calculus definition of a closed interval, really
is a closed set in this sense.
Definition 1.52. Let (X, d) be a metric space, let x X and let r > 0. The
closed ball with center x and radius r is the set
B (x; r) = {y X : d(x, y) r}.
The following result explains this notation.
Proposition 1.53. Let (X, d) be a metric space, let x X and let r > 0.
Then B (x; r) is a closed set.
Proof. Let O = (B (x; r))c denote the complement. We must prove that O
is open. Note that O = {y X : d(x, y) > r}.
For p O set r1 = d(x, p) r > 0. We claim that B(p; r1 ) O. If
we can prove this claim, then we will have shown that O is open. So let
q B(p; r1 ), then d(x, p) d(x, q) + d(q, p) < d(x, q) + r1 . Subtracting r1
from both sides of this last inequality yields, d(x, p) r1 < d(x, q). But
d(x, p) r1 = r. Hence, r < d(x, q) and so q O. Since q was an arbitrary
element of B(p; r1 ), we have that B(p; r1 ) O and our proof is complete.
Because the definition of closed sets involves complements, it is useful
to recall DeMorgans Laws. Given subsets Ei X where i belongs to some
set I, we have
[
Ei = {x X : there exists i I with x Ei }
iI

and
\

Ei = {x X : x Ei for every i I}.

iI

Proposition 1.54 (DeMorgan). Let Ei X for i I. Then


[
\
\
[
( Ei )c =
Eic and ( Ei )c =
Eic .
iI

iI

iI

iI

The following theorem about closed sets follows from DeMorgans Laws
and Theorem 1.31.

16

CHAPTER 1. METRIC SPACES

Theorem 1.55. Let (X, d) be a metric space. Then:


1. the empty set is closed,
2. X is closed,
3. the intersection of any collection of closed sets is closed,
4. the union of finitely many closed sets is closed.
Proof. By Theorem 1.31, the set X is open, so X c is closed, but X c is the
empty set. Similarly, the empty set is open, so its complement, which is X,
is closed.
Given a collection of closed set Ei i I, we have that each Eic is open.
Since
\
[
( Ei )c =
Eic ,
iI

iI

we see that the complement of their intersection is the union of a collection


of open sets. By Theorem 1.31.3, this union of open sets is an open set.
Hence, the complement of the intersection is open and so the intersection is
closed.
Similarly, if we have only a finite collection of closed sets, E1 , . . . , En
then (E1 En )c = E1c Enc , which is a finite intersection of open
sets. By Theorem 1.31.4, this finite intersection of open sets is open. Hence,
the complement of the union is open and so the union is closed.
Proposition 1.56. In a discrete metric space, every set is closed.
As with open sets, when there is more than one metric on the set X,
then we need to specify which metric we are referring to when saying that a
set is closed. The following is the analogue of Theorem 1.41 for closed sets.
Proposition 1.57. Let X be a set and let d and be two metrics on X that
are uniformly equivalent. Then a set is closed with respect to d if and only
if it is closed with respect to .
Problem 1.58. Prove that {(a1 , a2 ) R2 : 0 a1 2, 0 a2 4} is a
closed set in the Euclidean metric.

1.4. CONVERGENT SEQUENCES

1.4

17

Convergent Sequences

Our general idea from calculus of a sequence {pn } converging to a point p is


that as n grows larger, the points pn grow closer and closer to p. When we
say grow closer we really have in our minds that some distance is growing
smaller. This leads naturally to the following definition.
Definition 1.59. Let (X, d) be a metric space, {pn } X a sequence in X
and p X. We say that the sequence {pn } converges to p and write
lim pn = p,

n+

provided that for every  > 0, there is a real number N so that when n > N,
then d(p, pn ) < .
Often out of sheer laziness, I will write limn pn = p for limn+ pn = p.
Example 1.60. When X = R and d is the usual metric, then d(p, pn ) =
|p pn | and this definition is identical to the definition used when we studied
convergent sequences of real numbers in Math 3333.
For a quick review, we will look at a couple of examples of sequences and
recall how we would prove that they converge.
For a first example, consder the sequence given by pn = 3n+1
5n2 . Fo any
natural number n, the denominator of this fraction is non-zero. So this
formula defines a sequence of points. For large n, the +1 in the numerator
and the 2 in the denominator are small in relation to the 3n and 5n so we
expect that this sequence has limit p = 35 .
Now for some scrap work. To prove this we would need
d(p, pn ) = |p pn | = |

3(5n 2) (3n + 1)5


| < .
5(5n 2)

11
Simplifying this fraction leads to the condition 25n10
< . Solving for n we
11
11
see that this is true provided  < 25n 10, i.e., 25 + 25 < n. So it looks
11
like we should choose, N = 25
+ 25 . This is not a proof, just our scrap work.
Now for the proof:
11
Given  > 0, define N = 25
+ 25 . For any n > N, we have that 25n10 >
11
11
11 1
25N 10 =  , and so d(p, pn ) = | 35 3n+1
= . Hence,
5n2 | = 25n10 < ( 11/ )
limn+ pn = p.

For the next example, we look at pn = n2 + 8n n and prove that this


sequence has limit p = 4.

18

CHAPTER 1. METRIC SPACES

Now that we have recalled what this definition means in R and have seen
how to prove a few examples. We want to look at what this concept means
in some of our other favorite metric spaces.
Theorem 1.61. Let Rk be endowed with the Euclidean metric d, let {pn }
Rk be a sequence with pn = (a1,n , a2,n , . . . , ak,n ) and let p = (a1 , a2 , . . . , ak )
Rk . Then limn+ pn = p in (Rk , d) if and only if for each j, 1 j k, we
have limn+ aj,n = aj in R with the usual metric.
Proof. First we assume that limn+ pn = p. Given  > 0, there is a N so
that for n > N, we have that
q
d(p, pn ) = (a1 a1,n )2 + + (ak ak,n )2 < .
p
Now fix j, we have that |aj aj,n | = (aj aj,n )2 is smaller than the above
square root, since it is just one of the terms. Thus, for the same N, we have
that n > N implies that |aj aj,n | d(p, pn ) < . Hence, for each , we
have found an N and so limn aj,n = aj . This proves the first implication.
Now conversely, suppose that for each j, we have that limn aj,n = aj .
Given an  > 0, we must produce an N so that when n > N, we have
d(p, pn ) < .
For each j, taking 1 = k we can find an Nj so that for n > Nj we have
that |aj aj,n | < 1 .(Why did I call it Nj ?) Now let N = max{N1 , . . . , Nk },
2
2
then for n > N, we have
p that (aj aj,n ) < 1 for every j, 1p2j k.
2
2
2
p Hence, d(ppn ) = (a1 a1,n ) + + (ak ak,n ) < 1 + + 1 =
k21 = . This proves that limn pn = p in the Euclidean metric.
So the crux of the above theorem is that a sequence of points in Rk
converges if and only if each of their components converge.
If we combine our first two examples with the above theorem,
we see that
3n+1
2
if we define a sequence of points in R by setting pn = ( 5n2 , n2 + 8n n)
then in the Euclidean metric these points converge to the point p = ( 35 , 4).
What if we had used the taxi cab metric or d metric on R2 instead of
the Euclidean metric, would these points still converge to the same point?
The answer is yes and the following result explains why and saves us having
to prove separate theorems for each of these metrics.
Proposition 1.62. Let X be a set with metrics d and that are uniformly
equivalent, let {pn } X be a sequence and let p X be a point. Then
{pn } converges to p in the metric d if and only if {pn } converges to p in the
metric .

1.4. CONVERGENT SEQUENCES

19

Proof. Let (x, y) Ad(x, y) and d(x, y) B(x, y) for every x, y X.


First assume that {pn } converges to p in the metric d. Given  > 0, we
can pick a N so that when n > N, we have that d(p, pn ) < A . Then when
n > N, we have that (p, pn ) Ad(p, pn ) < . Hence, {pn } converges to p
in the metric .
The proof of the converse statement is similar, using the constant b.
We now look at the discrete metric.
Proposition 1.63. Let (X, d) be the discrete metric space, let {pn } X
be a sequence, and let p X. Then the sequence {pn } converges to p if and
only if there exists N so that for n > N, pn = p.
Proof. First assume that {pn } converges to p. Taking  = 1, there is N so
that n > N implies that d(p, pn ) <  = 1. But since this is the discrete
metric, d(x, y) < 1 implies that d(x, y) = 0, i.e., that x = y. Hence, pn = p
for every n > N.
Conversely, if there is an N1 so that when n > N1 , we have pn = p, then
given any  > 0, pick N = N1 . Then n > N implies that d(p, pn ) = d(p, p) =
0 < .
A sequence {pn } such that pn = p for every n > N, is often called
eventually constant.
We now look at some general theorems about convergence in metric
spaces.
Proposition 1.64. Let (X, d) be a metric space. A sequence {pn } X can
have at most one limit.
Proof. Suppose that limn pn = p and limn pn = q, we need to prove that
this implies that p = q. We will do a proof by contradiction. Suppose that
p 6= q, then d(p, q) > 0.
Set  = d(p,q)
2 . Since limn pn = p, there is N1 so that n > N1 implies that
d(p, pn ) < .
Since limn pn = q, there is N2 so that n > N2 implies that d(q, pn ) < .
So if we let N = max{N1 , N2 }, then for n > N, both of these inequalities
will be true.
Thus, for any n > N, we have that d(p, q) d(p, pn ) + d(pn , q) <  +  =
d(p, q). Thus, d(p, q) < d(p, q), a contradiction.
The above result is most often used to prove that two points are really
the same point, since if limn pn = p and limn pn = q, then p = q.
Convergence is an important way to characterize closed sets.

20

CHAPTER 1. METRIC SPACES

Theorem 1.65. Let (X, d) be a metric space and let S X be a subset.


Then S is a closed subset if and only if whenever {pn } S is a convergent
sequence, we have limn pn S.
That is, a set is closed if and only if limits of convergent sequences stay
in the set.
Proof. First assume that S is closed, let {pn } S be a convergent sequence
and let p = limn pn . We must show that p S. We do a proof by contradiction.
Suppose that p
/ S, then p S c which is open. Hence, there is r > 0 so
c
that B(p; r) S . Thus, for any q S, d(p, q) r. Now set  = r, then for
every n, d(p, pn ) , and hence we can find no N for this .
To prove the converse, we consider the logically equivalent contrapositive statement: if S is not closed, then there exists at least one convergent
sequence {pn } S, with limn pn
/ S.
So we want to prove that this statement is true. To this end, look at
S c since S is not closed, S c is not open. Hence, there exists p S c , such
that for every r > 0, the set B(p; r) is not a subset of S c . That is for every
r > 0, B(p; r) S is not empty. Now for each n if we set r = 1/n, then we
can pick a point pn B(p; 1/n) S. Thus, the sequence of points that we
get this way satisfies, {pn } S. Also, since d(p, pn ) < 1/n, we have that
limn pn = p, by Problem 1.65.
This completes the proof of the theorem.
One of our other main results from Math 3333 about convergent seqeunces in R is that every convergent sequence of real numbers is bounded.
This plays an important role in metric spaces too. But first we need to say
what it means for a set to be bounded in a metric space.
Proposition 1.66. Let (X, d) be a metric space and let E X be a subset.
Then the following are equivalent:
1. there exists a point p X and r1 > 0 such that E B (p; r1 ),
2. there exists a point q X and r2 > 0 such that E B(q; r2 ),
3. there exists M > 0 so that every x, y E satisfies d(x, y) M.
Proof. If 1) holds, then let q = p and let r2 be any number satisfying,
r2 > r1 . Then E B (p; r1 ) B(p; r2 ), and so 2) holds.
If 2) holds, let M = 2r2 , then for any x, y E, we have that d(x, y)
d(x, q) + d(q, y) < r2 + r2 = M, and so 3) holds.

1.4. CONVERGENT SEQUENCES

21

If 3) holds, fix any point p E and let r1 = M, then any y E satisfies,


d(p, y) M and so, E B (p, M ).
Definition 1.67. Let (X, d) be a metric space and let E X. We say that
E is a bounded set provided that it satisfies any of the three equivalent
conditions of the above proposition.
Proposition 1.68. Let (X,d) be a metric space. If {pn } is a convergent
sequence, then it is bounded.
Proof. Let limn pn = p. For  = 1, there is N so that when n > N, then
d(p, pn ) < 1.
Let r1 = max{1, d(p, pn ) : n N }. Since there are only finitely many
numbers in this set the maximum is well-defined and a finite number.
Now when, n N, d(p, pn ) r1 , since it is one of the numbers used in
finding the maximum. While if n > N, then d(p, pn ) < 1 r1 . Thus, for
every n, d(p, pn ) r1 and so {pn } B (p; r1 ).
Problem 1.69. Let (X, d) be a metric space, {pn } X and p X. Prove
that limn pn = p if and only if the sequence of real numbers {d(p, pn )} satisfies limn d(p, pn ) = 0.
Problem 1.70. Let S R2 be the set defined by S = {(a1 , a2 ) : 0 a2
a1 }. Prove that S is a closed subset of R2 in the Euclidean metric.
Problem 1.71. Give an example of a bounded subset of R that is not closed,
of a closed subset that is not bounded and of a bounded sequence that is not
convergent.

1.4.1

Further results on convergence in Rk

Recall that Rk is a vector space, for p = (a1 , . . . , ak ), q = (b1 , . . . , bk ) Rk


and r R we have
p + q = (a1 + b1 , . . . , ak + bk ) and rp = (ra1 , . . . , rak ).
There is also the dot product,
p q = a1 b1 + + ak bk .
Note that in terms of the dot product and Euclidean metric, we can see that
the Schwarz inequality says that
q
q
|p q| a21 + + a2k b21 + + b2k = d(0, p)d(0, q).

22

CHAPTER 1. METRIC SPACES

Two other useful connection to notice between the Euclidean metric and
the vector space operations is that
p
d(p, q) = (a1 b1 )2 + + (ak bk )2 = d(0, p q)
and
d(rp, rq) =

p
(ra1 rb1 )2 + + (rak rbk )2 = |r|d(p, q).

Lemma 1.72. Let (Rk , d) be Euclidean space. If E Rk is a bounded set,


then there is A so that E B (0; A).
Proof. Since E is bounded, there is p Rk and r1 , so that d(p, x) r1
for every x E. Let A = r1 + d(0, p), then for any x E, d(0, x)
d(0, p) + d(p, x) A, and so E B (0; A).
The following result can be proved using Theorem 1.57 and results from
Math 3333 about convergence of sums and product of real numbers. we give
a proof that mimics the proofs for real numbers.
Theorem 1.73. Let (Rk , d) be Euclidean space, let pn = (a1,n , . . . , ak,n ) and
qn = (b1,n , . . . , bk,n ) be sequences in Rk with limn pn = p = (a1 , . . . , ak ) and
limn qn = q = (b1 , . . . , bk ), and let {rn } be a sequence in R with limn rn = r.
Then we have the following:
1. limn pn + qn = p + q,
2. limn rn pn = rp,
3. limn pn qn = p q.
Proof. To see 1), given  > 0, choose N1 , so that for n > N1 , d(p, pn ) < /2
and choose N2 so that for n > N2 , d(q, qn ) < /2.
Let N = max{N1 , N2 }, then for n > N, we have that
q
d(p+q, pn +qn ) = (a1 + b1 a1,n b1,n )2 + + (ak + bk ak,n bk,n )2
q
q
(a1 a1,n )2 + + (ak ak,n )2 + (b1 b1,n )2 + + (bk bk,n )2
d(p, pn ) + d(q, qn ) < .
To prove 2), since {rn } is a bounded sequence of real numbers, there is
M > 0, so that |rn | M, for every n. Note that also |r| M. Since {pn } is
a bounded set of vectors, by the lemma there is A > 0, so that d(0, pn ) A,
for every n.

1.4. CONVERGENT SEQUENCES


Now choose N1 so that for n > N1 , |r rn | <

that for n > N2 , d(p, pn ) < 2M
.
Let N = max{N1 , N2 }, then for n > N, we have

23

2A ,

and choose N2 , so

d(rp, rn pn ) d(rp, rpn ) + d(rpn , rn pn ) = |r|d(p, pn ) + |r rn |d(0, pn ) <






+
A + = .
|r|
2M
2A
2 2
Finally, to prove 3), let d(0, pn ) A, as before and let d(0, qn ) B, so
that d(0, p) A and d(0, q) B. We may choose N1 so that for n > N1 we


have that d(p, pn ) < 2B
and N2 so that for n > N2 , we have d(q, qn ) < 2A
.
Thus, if we let N = max{N1 , N2 }, then for n > N,
|p q pn qn | = |(p pn ) q + pn (q qn )|
|(p pn ) q| + |pn (q qn )| d(0, p pn )d(0, q) + d(0, pn )d(0, q qn )
d(p, pn )B + Ad(q, qn ) < /2 + /2 = 

1.4.2

Subsequences

We recall the definition of a subsequence.


Definition 1.74. Given a set X, a sequence {pn } in X and numbers, 1
n1 < n2 < , the new sequence that we get by setting qk = pnk is called a
subsequence of {pn }.
For example, if nk = 2k, k = 1, 2, ..., then the subsequence that we get is
just the even numbered terms of the old sequence. If nk = 2k1, k = 1, 2, ...,
then we get the subsequence of odd terms.
Often we simply denote the subsequence by {pnk }.
Proposition 1.75. Let (X, d) be a metric space, {pn } a sequence in X and
p X. If the sequence {pn } converges to p, then every subsequence of {pn }
also converges to p.
Proof. Let 1 n1 < n2 < , be the numbers for our subsequence. Note
that since these are integers and nj < nj+1 , we have that they must be at
least one apart. Since, 1 n1 , we have that 2 n2 , 3 n3 , and in general,
k nk .
Now let qk = pnk denote our subsequence and let  > 0 be given. We
must show that there is a K so that k > K, implies that d(p, qk ) < .

24

CHAPTER 1. METRIC SPACES

Since limn pn = p, there is N so that n > N implies that d(p, pn ) < .


Let K = N, then for k > K, nk k > N. Hence, for k > K, d(p, qk ) =
d(p, pnk ) < .
This proves that limk qk = p.

1.5

Interiors, Closures, Boundaries of Sets

In this section we look at some other important concepts related to open


and closed sets.
Definition 1.76. Let (X,d) be a metric space and S X. A point p is called
an interior point of S provided that there exists r > 0 so that B(p; r) S.
The set of all interior points of S is called the interior of S and is denoted
as int(S).
Note that since p B(p; r), every interior point of S is a point in S.
Thus, int(S) S.
Example 1.77. Let X = R with the usual metric. Then int([a, b]) = (a, b),
while int(Q) is the empty set.
Example 1.78. Let X = R2 with the Euclidean metric. Then int({(x1 , x2 ) :
a x1 b, c x2 d}) = {(x1 , x2 ) : a < x1 < b, c < x2 < d} and
int({(x1 , 0) : a x1 b}) is the empty set.
Proposition 1.79. Let (X,d) be a metric space and let S X. Then:
1. int(S) is an open set,
2. if O S is an open set, then O int(S),
3. int(S) is the largest open set contained in S.
Proof. If int(S) is empty, then it is open. Otherwise, let p int(S), so that
there is r > 0 with B(p : r) S. If we show that B(p; r) int(S), then that
will prove that int(S) is open. Let q B(p; r) and set r1 = r d(p, q). Using
the triangle inequality, we have seen before that B(q; r1 ) B(p; r) S. This
shows that q int(S). Hence, B(p; r) int(S).
To see the second statement, if p O, then since O is open, there is
r > 0, so that B(p; r) O S. This shows that p int(S) and so O S.
The third statement is a summary of the first two statements.
Definition 1.80. Let (X,d) be a metric space and let S X. Then the
closure of S, denoted S is the intersection of all closed sets containing S.

1.5. INTERIORS, CLOSURES, BOUNDARIES OF SETS

25

Before giving any examples, it will be easiest to have some other descriptions of S.
Proposition 1.81. S is closed and if C is any closed set with S C, then
S C.
Proof. S is closed because intersections of closed sets are closed. If C is
closed and S C, then C is one of the sets that is in the intersection
defining S and so S C.
Thus, S is the smallest closed set containing S.
Theorem 1.82. Let p X, then p S if and only if for every r > 0,
B(p; r) S is non-empty.
Proof. Suppose that there was an r > 0 so that B(p; r) S was empty. This
would imply that S B(p; r)c , which is a closed set. Thus, we have a closed
set containing S and p
/ B(p; r)c . This implies that p
/ S. (This is the
contrapositive.)
Conversely, if B(p; r) S is non-empty for every r > 0, then for each n,
we may choose a point pn B(p; 1/n) S. This gives a sequence {pn } S,
and it is easy to see that limn pn = p. Now if C is any closed set with S C,
then {pn } C, and so by Theorem 1.61, p C. Hence, p is in every closed
set containing S and so p S.
Corollary 1.83. Let p X, then p S if and only if there is a sequence
{pn } S with limn pn = p.
Example 1.84. Let X = R with the usual metric. Then the closure of (a, b)
is [a, b] and Q = R.
Example 1.85. If X = {x R : x > 0} with the usual metric and S = {x :
0 < x < 1} then S = (0, 1].
Definition 1.86. Let (X,d) be a metric space and let S X. Then the
boundary of S, denoted S is the set S = S S c .
Proposition 1.87. Let p X, then p S if and only if for every r > 0,
B(p; r) S is non-empty and B(p; r) S c is non-empty.
Proof. Apply the characterization of the closure of sets to S and S c .
Corollary 1.88. Let p X. p S if and only if there is a sequence
{pn } S and a sequence {qn } S c with limn pn = limn qn = p.

26

CHAPTER 1. METRIC SPACES

Example 1.89. Let X = R with the usual metric and let S = (0, 1]. Then
S = {0, 1} and Q = R.
Problem 1.90. Prove Corollary 1.83.
Problem 1.91. Prove Corollary 1.88.
Problem 1.92. Let X = R2 with the Euclidean metric and let S = {(x1 , x2 ) :
x21 + x22 < 1}. Prove that S = {(x1 , x2 ) : x21 + x22 1} and that S =
{(x1 , x2 ) : x21 + x22 = 1}.
Definition 1.93. Let (X,d) be a metric space S X. A point p X, is
called a cluster point or accumulation point of S, provided that for every
r > 0, B(p; r) S has infinitely many points.
Problem 1.94. Prove that p is a cluster point of S if and only if there is
a sequence {pn } S of distinct points, i.e., pi 6= pj for i 6= j, such that
limn pn = p.
Problem 1.95. Let (X,d) be a discrete metric space and let S X be any
non-empty set. Find int(S), S, S and give the cluster points of S.

1.6

Completeness

One weakness of convergence is that when we want to prove that a sequence {pn } converges, then we need the point p that it converges to before
we can prove that it converges. But often in math, one doesnt know yet
that a problem has a solution and we can only produce a sequence {pn }
that somehow is a better and better approximate solution and we want to
claim that necessarily a point exists that is the limit of this sequence. It
is for these reasons that mathematicians introduced the concepts of Cauchy
sequences and complete metric spaces.
Definition 1.96. Let (X,d) be a metric space. A sequence {pn } X is
called Cauchy provided that for each  > 0 there exists N so that whenever
m, n > N, then d(pn , pm ) < .
We look at a few properties of Cauchy sequences.
Proposition 1.97. Let (X,d) be a metric space and let {pn } X be a
sequence. If {pn } is a convergent sequence, then {pn } is a Cauchy sequence.

1.6. COMPLETENESS

27

Proof. Let p = limn pn . Given  > 0 there is N so that n > N implies that
d(p, pn ) < 2 . Now let m, n > N, then d(pn , pm ) d(pn , p) + d(p, pm ) <


2 + 2 = .
Proposition 1.98. Let (X,d) be a metric space, {pn } X a sequence and
{pnk } a subsequence. If {pn } is Cauchy, then {pnk } is Cauchy, i.e., every
subsequence of a Cauchy sequence is also Cauchy.
Proof. Given  > 0, there is N so that n, m > N then d(pn , pm ) < . Let
K = N and recall that nk k. Thus, if k, j > K then nk k > N and
nj j > N, and so d(pnk , pnj ) < .
Proposition 1.99. Let (X,d) be a metric space and let {pn } X be a
Cauchy sequence. Then {pn } is bounded.
Proof. Set  = 1 and choose N so that for n, m > N, d(pn , pm ) < 1. Fix any
n1 > N, we will show that there is a radius r so that {pn } B (pn1 ; r).
To this end, set r = max{1, d(pn1 , pn ) : 1 n N }. This is a finite set of
numbers so it does have a maximum. Now given any n, either 1 n N,
in which case d(Pn1 , pn ) r, since it is one of the numbers occurring the
max, or n > N, and then d(pn1 , pn ) < 1 r.
Definition 1.100. Let (X,d) be a metric space. If for each Cauchy sequence
in (X,d), there is a point in X that the sequence converges to, then (X,d) is
called a complete metric space.
Example 1.101. Let X = (0, 1] R be endowed with the usual metric
d(x, y) = |x y|. Then X is not complete, since { n1 } X is a Cauchy
sequence with no point in X that it can converge to.
Example 1.102. Let Q denote the rational numbers with metricd(x, y) =
|x y|. We can take a sequence of rational numbers converging to 2, which
we know is irrational. Then that sequence will be Cauchy, but not have a
limit in Q. Thus, (Q, d) is not complete.
In Math 3333, we proved that R with the usual metric has the property
that every Cauchy sequence converges, that is, (R, d) is a complete metric
space. This fact is so important that we repeat the proof here. First, we
need to recall a few important facts and definitions.
Recall that a set S R is called bounded above if there is a number
b R such that s S implies that s b. Such a number b is called an upper
bound for S. An upper bound for S that is smaller than every other upper
bound of S is called a least upper bound for S and denoted lub(S) in

28

CHAPTER 1. METRIC SPACES

Rosenlichts book. In the book that we used for Math 3333, a least upper
bound for S was called a supremum for S and denoted sup(S).
Similarly, a set S R is called bounded below if there is a number
a R such that s S implies that a s. Such a number a is called a
lower bound for S. A lower bound for S that is larger than every other
lower bound is called a greatest lower bound for S, and denoted glb(S)
in Rosenlichts book. In the book that we used for Math 3333, a greatest
lower bound was called an infimum for S and denoted inf(S).
A key property of R is that every set that is bounded above has a least
upper bound and that every set that is bounded below has a greatest lower
bound.
Proposition 1.103. Let {an } be a sequence of real numbers such that the
set S = {an : n 1} is bounded above and such that an an+1 for every
n. Then {an } converges and limn an = lub(S). If {bn } is a sequence of real
numbers such that S = {bn : n 1} is bounded below and bn bn+1 , then
{bn } converges and limn bn = glb(S).
Proof. Let a = lub(S), then an a for every n. Given  > 0, since a  < a,
we have that a  is not an upper bound for S and so there is a N so
that a  < aN . If n > N, then a  < aN an a < a + . Thus,
a  < an < a +  and so |an a| <  for every n > N. This proves that
limn an = a.
The proof of the other case is similar.
Theorem 1.104. Let (R, d) denote the real numbers with the usual metric. Then (R, d) is complete, i.e., every Cauchy sequence of real numbers
converges.
Proof. Let {pn } be a Cauchy sequence of real numbers. By the above results
S1 = {pj : j 1} is a bounded set of real numbers, so it is bounded above
and bounded below. For each n 1, let Sn = {pk : k n}. Note that
S1 S2 . . . , so that each set Sn is a bounded set of real numbers.
Let an = glb(Sn ), and bn = lub(Sn ). Since Sn+1 Sn we have that an
an+1 and bn+1 bn . Since pn Sn , an pn bn . In particular, an b1
for every n, and so {an } converges and a = limn an = lub({an : n 1}).
Similarly, a1 bn for every n and so {bn } converges and b = limn bn =
glb({bn : n 1}). Also, since an bn we have that a b.
Now given any  > 0, there is N so that when j, n > N, |pn pj | < .
Hence, for any j n, we have that pn  < pj < pn + . This shows that
pn  is a lower bound for Sn and pn +  is an upper bound for Sn . Thus,
pn  an a b bn pn + .

1.6. COMPLETENESS

29

This shows that a and b are trapped in an interval of length 2. Since 
was arbitrary, by the vanishing  principle, |a b| = 0, so a = b. Thus,
limn an = limn bn , but an pn bn , so by the Squeeze Theorem for
limits, limn an = limn pn = limn bn .
Another set of important examples of compelte metric spaces are the
Euclidean spaces.
Theorem 1.105. Let (Rk , d) denote k-dimensional Euclidean space. Then
(Rk , d) is complete.
Proof. To simplify notation, we will only do the case k = 2. So let pn =
(an , bn ) be a Cauchy sequence of points in R2 with the Euclidean metric
d. Given  > 0, there
p is N so that n, m > N implies that d(pn , pm ) < .
But |an am | |an am |2 + |bn bm |2 = d(pn , pm ), so for n, m > N,
|an am | < , and so {an } is a Cauchy sequence of real numbers. Hence
{an } converges, let a = limn an .
Similarly, |bn bm | <  for n, m > N and so {bn } converges and let
b = limn bn .
Since limn an = a and limn bn = b, by Theorem 1.57, limn pn = (a, b).
Hence, this arbitrary Cauchy sequence has a limit and the proof is done.
Problem 1.106. Let X be a set and let d and be two uniformly equivalent
metrics on X. Prove that (X, d) is a complete metric space if and only if
(X, ) is complete.
By the above theorem and problem, (Rk , d1 ) and (Rk , d ) are also complete metric spaces.
Example 1.107. Let X = (0, 1]. Recall that X is not complete in the usual
metric d(x, y) = |x y|. Given x, y X we set (x, y) = | x1 y1 |. It is easy
to check that is a metric on X. We claim that (X, ) is complete! Thus,
d and are not uniformly equivalent.
We sketch the proof. To prove this we must show that if {xn } X is
Cauchy in the metric, then it converges to a point in X. Given  > 0,
suppose that for n, m > N, (xn , xm ) < . Since (xn , xm ) = | x1n x1m | =
m
| xxnnx
xm | we have that |xn xm | < |xn xm | . Thus, {xn } is Cauchy
in the usual metric. Let x = limn xn . Since, 0 < xn 1 we have that
0 x 1. We claim that x 6= 0. Because if x = 0, then for any N, when
n, m > N, if we fix m and we let n +, xn 0, then x1n +. Thus,
(xn , xm ) = | x1n x1m | +. This prevents us making (xn , xm ) <  and
so violates the Cauchy condition.

30

CHAPTER 1. METRIC SPACES

Thus, x 6= 0. But also xn 6= 0 for every n. By one of our basic results


from 3333, when x 6= 0, xn 6= 0 and limn xn = x, then limn x1n = x1 . But this
last limit being true means that for any  > 0, we can pick N so that when
n > N, | x1 x1n | < . But this implies that for n > N, (x, xn ) <  and so
{xn } converges to x in the metric! We are done.
Now that we have a few examples of complete metric spaces, the following
result gives us many more examples.
Proposition 1.108. Let (X,d) be a complete metric space. If Y X is a
closed subset, then (Y,d) is a complete metric space.
Proof. Let {yn } Y be a Cauchy sequence. Since X is complete, there is a
point x X, so that limn yn = x. But since Y is closed, x Y. Thus, each
Cauchy sequence in Y has a limit in Y.

1.7

Compact Sets

Definition 1.109. Let (X,d) be a metric space, S X. A collection {U }A


of subsets of X is called a cover of S provided that S A U and an
open cover of S provided that it is a cover of S and every set U is open.
A subset S X is called compact provided that whenever {U }A is an
open cover of S, then there is a finite subset F A such that S F U .
The collection {U }F is called a finite subcover.
Example 1.110. Let R have the usual metric. Let Un = B(0; n) = (n, +n),
n N. Then these sets are open and R = nN Un . Suppose that there was
a finite subset F = {n1 , ..., nL } N so that R nF Un = Un1 UnL .
If we let N = max{n1 , ..., nL }, then since n < m implie that Un Um , we
would have that R nF Un = UN . But this implies that every real number
is in B(0; N ), a contradiction. Hence, no finite subcover of {Un }nN covers
R and so R is not compact.
Proposition 1.111. Let (X, d) be a discrete metric space and let K X.
Then K is compact if and only if K is a finite set.
Proof. Assume that K is compact. For each x K, let Ux = {x}. Because
X is discrete each Ux is an open set. Also K = xK Ux , so this is an open
cover of K. But if you leave out any of the sets Ux then you no longer cover
K. So this cover must be a finite cover, in which case K has only finitely
many points.
Conversely, assume that K = {x1 , ..., xn } is a finite set and let {U }A
be any open cover of K. There must be some J so that xj Uj . But then
{U1 , ..., Un } is a finite subcover for K.

1.7. COMPACT SETS

31

Before we can give many more examples of compact sets, we need some
theorems.
Proposition 1.112. Let (X,d) be a metric space. If K X is compact,
then K is closed.
Proof. We will prove that if K is not closed, then K is not compact. Since
K is not closed K 6= K. Hence, there is p K, p
/ K. Since p K, then
for every r > 0, B(p; r) K is non-empty.
Let Un = B (p; 1/n)c = {q X : d(p, q) > 1/n}. Then each of these
sets is open and nN Un = {q X : d(p, q) > 0} = {p}c . Since p
/ K,
we have that thi is an open cover of K. Not that n < m implies that
Un Um , so as in the proof of Example 1.106, Un1 UnL = UN
where N = max{n1 , ..., nL }. So if K was contained in a finite subcover, then
there would be N with K UN . But this would imply that for q K,
d(p, q) > 1/N, and so K B(p; 1/N ) would be empty, contradiction.
Proposition 1.113. Let (X,d) be a metric space, K X a compact set
and let S K be a closed subset. Then S is compact.
Proof. Let {U }A be an open cover of S and let V = S c so that V is open.
Check that {V, U : A} is an open cover of K. So finitely many of these
sets cover K and the same finite collection covers S since S is a subset of
K.
The next few results give a better understanding of the structure of
compact sets.
Proposition 1.114. Let (X,d) be a metric space, K X a compact set
and S K an infinite subset. Then S has a cluster point in K.
Proof. We prove the contrapositive. Suppose S K is a set with no cluster
point in K. Then for each point p K, since it is not a cluster point of
S there would exist an r > 0 such that B(p; r) contains only finitely many
points from S. The set of all such balls is an open cover of K. Now because
K is compact, some finite collection of these balls cover K. But each of
these balls has only finitely many points from S so their union contains only
finitely many points from S. Since their union contains K, it contains all of
S and so S has only finitely many points.
Definition 1.115. Let (X,d) be a metric space, K X. Then K is called
sequentially compact if every sequence in K has a subsequence that converges to a point in K.

32

CHAPTER 1. METRIC SPACES

In some texts a set is said to have the Bolzano-Weierstrass property


if and only if it is sequentially compact.
Proposition 1.116. Let (X,d) be a metric space, K X. If K is compact,
then K is sequentially compact.
Proof. Let {pn } K be a sequence. We do two cases.
Case 1. The sequence has only finitely many points. In this case there
must be some p so that pn = p for infinitely many n. Let n1 be the first such
integer, n2 the second, etc., this defines a subsequence {pnk } with pnk = p
for all k. Clearly this subsequence converges to p.
Case 2. The sequence has infinitely many values. In this case if we let
S be the set of points in the sequence, then S is an infinite set and so has a
cluster point p in K. We will construct a subsequence that converges to p.
First, let n1 be the smallest integer so that pn1 B(p; 1) and pn1 6= p. Now
let n2 be the smallest integer strictly larger than n1 so that pn2 B(p; 1/2)
and pn2 6= p. Continuing in this way we obtain a subsequence {pnk } such
that d(p, pnk ) < 1/k, and hence this subsequence converges to p.
Definition 1.117. Let (X,d) be a metric space, K X and let  > 0. A
subset E K is called an -net for K provided that given any p K there
is q E, such that d(p, q) < . The subset K is called totally bounded if
for each  > 0 there is an -net for K consisting of finitely many points.
Example 1.118. Let K = [0, 1] for each  > 0, let N be the largest integer
so that N  1. Then {0, , 2, ..., N } is an -net for K.
Proposition 1.119. Let (X,d) be a metric space and K X. If K is
sequentially compact, then K is totally bounded and complete.
Proof. First we show that K is complete. To see this let {pn } K be a
Cauchy sequence. Since K is sequentially compact, there is a p K and
a subsequence so that limk pnk = p. We now show that limn pn = p. Given
 > 0, there is N so that n, m > N implies that d(pn , pm ) < /2. But we
can pick a large k so that nk > N and d(p, pnk ) < /2. Hence for n > N,
d(p, pn ) d(p, pnk ) + d(pnk , pn ) < .
Now to show that K sequentially compact implies K totally bounded,
we prove the contrapositive. So assume that K is not totally bounded. Then
there exists some  > 0 for which we can find no finite -net. Pick any p1 K,
since this point is not an -net, there must be p2 K such that d(p2 , p1 ) > .
Since {p1 , p2 } can not be an -net, there is p3 K such that d(p3 , p2 ) > 
and d(p3 , p1 ) > . Continuing in this way, we obtain a sequence {pn } in K

1.7. COMPACT SETS

33

with d(pn , pm ) >  for all n 6= m. But for a sequence to converge to a point,
it must be Cauchy. Since no subsequence of this sequence could be Cauchy,
no subsequence converges. Hence, K is not sequentially compact.
Now we come to the main theorem.
Theorem 1.120. Let (X,d) be a metric space and K X. Then the following are equivalent:
1. K is compact,
2. K is sequentially compact,
3. K is totally bounded and complete.
Proof. We have already shown that 1) implies 2) and that 2) implies 3). So
it will be enough to show that 3) implies 1). To this end assume that 3)
holds but that K is not compact. Then there is some open cover {U } of K
with no finite subcover.
For  = 1/2 take a finite 1/2-net, {p1 , ..., pL }, So that K L
j=1 B(pj ; 1/2).

Hence, K = (K B (p1 ; 1/2)) (K B (pL ; 1/2). This writes K as


the union of L closed sets. Now if each of these sets could be covered by a
finite collection of sets in {U } then K would have a finite subcover. So one
of these sets cannot have a finite subcover. Pick one and call it K1 . Then
K1 K is closed and since it is the intersection with a closed ball of radius
1/2 no two points in K1 can be distance greater than 1 apart.
Now take a finite 1/4-net, use it to write K1 as a finite union of sets of
radius 1/4, and arguing as before pick one which cannot be covered by a
finite collection of the Us. This defines a closed subset K2 K1 with no
two points greater than 1/2 apart. Proceed inductively to define a sequence
of closed subsets, K K1 K2 . . . , so that no two points in Kn are
distance greater than 1/2n1 apart and each Kn cannot be covered by a
finite collection of the Us.
If we pick a point pn Kn , then the sequence is Cauchy since for n, m >
N, d(pn , pm ) 1/2N 1 . Since K is complete, there is p K with limn pn =
p. Also, since pn Km for all n m, this implies that the limit is in Km ,
i.e., p Km for all m.
Now since {U } is an open cover of K, there is some 0 , with p U0
and since this set is open, there is an r > 0, so that B(p : r) U0 . But
since no two points in Kn are more than distance 1/2n1 apart, if we pick
n, so that 1/2n1 < r, then Kn B(p; r) U0 . This contradicts the fact
that Kn is not contained in a finite union of the Us. This contradiction
completes the proof of the theorem.

34

CHAPTER 1. METRIC SPACES

Theorem 1.121 (Heine-Borel). Let (Rk , d) denote k-dimensional Euclidean


spaceand let K Rk . Then K is compact if and only if K is closed and
bounded.
Proof. We have already shown that compact sets are closed. Every compact
set is bounded by Problem 1.121.
If K is closed and bounded, then there is some M > 0 so that K
[M, +M ] [M, +M ] [M, +M ]. If we can show that this kdimensional cube is totally bounded, then it will be compact and since K is
a closed subset of this compact set, it will follow that K is compact.
To see that the cube is totally bounded, divide each interval into 2N
equal length subintervals of length M/N. This divides the large cube up into
a finite number of smaller cubes.
MEach of these smaller cubes is contained
in a Euclidean ball of radius k 2N . Given any  > 0, pick N large enough
M
that k 2N
< , then these balls define a finite -net. Thus, the big cube is
totally bounded.
Theorem 1.122 (Bolzano-Weierstrass). Let (Rk , d) be k-dimensional space.
If {pn } Rk is a bounded sequence, then it has a convergent subsequence.
Proof. Since the sequence is bounded it is contained in a closed ball of
some finite radius. This set is compact by Heine-Borel, hence sequentially
compact.
Historically, the Bolzano-Weierstrass theorem was proved before HeineBorel. The original proof used a divide and conquer strategy.
Definition 1.123. Let (X,d) be a metric space and let Y X. A subset S
Y is called dense provided that S Y = Y. The set Y is called separable
if there is a sequence S = {pn } Y that is dense.
Proposition 1.124. Let (X,d) be a metric space. If K X is compact,
then K is separable.
Proof. For each n N, let Sn be a finite 1/n-net for K. Now we write the
elements in S = nN Sn as a sequence by numbering them beginning with
S1 , then S2 , etc.
Now we claim that the sequence S = {pn } is dense. To see this, given
any p K and r > 0, if we take 1/n < r then since Sn is a 1/n-net for K,
we have that B(p; r) Sn 6= . Hence, B(p; r) S 6= . This shows that
p S. Thus, K S K K and so S is dense.

1.7. COMPACT SETS

35

Problem 1.125. Let (X, d) be a metric space and K X a compact subset.


Prove that K is bounded.
Problem 1.126. Let (X, d) be a metric space. Let Ki X, i = 1, ..., n be a
finite collection of compact subsets. Prove that K1 Kn is compact.
We now look at a few applications of compact sets. First we need an
inequality sometimes called the reverse triangle inequality.
Proposition 1.127. Let (X, d) be a metric space and let x, y, z X. Then
|d(x, y) d(x, z)| d(y, z).
Proof. We have that d(x, y) d(x, z) (d(x, z) + d(z, y)) d(x, z) = d(y, z).
On the other hand, (d(x, y) d(x, z)) = d(x, z) d(x, y) (d(x, y) +
d(y, z)) d(x, y) = d(y, z). Since this number and its negative are both less
than d(y, z) we have that |d(x, y) d(x, z)| d(y, z).
Proposition 1.128. Let (X, d) be a metric space, let {qn } X be a convergent sequence with limit q and let p X. Then d(p, q) = limn d(p, qn ).
Proof. Given  > 0, there is a N so that n > N, implies that d(q, qn ) < .
Hence, by the reverse triangle inequality, for n > N, we have |d(p, q)
d(p, qn )| d(q, qn ) < . This shows that d(p, q) = limn d(p, qn ).
Definition 1.129. Let (X, d) be a metric space, let S X and let p X.
Then the distance from p to S is the number
dist(p, S) = inf{d(p, q) : q S}.
In general, the distance from a point to a set is really an infinum and
not a minimum. That is there is not generally a point in q0 S so that
dist(p, S) = d(p, q0 ). For example, in R with the usual metric, if we take
S = [0, 1) and p = 3, then dist(p, S) = 2, but there is no point q0 S, with
d(p, q0 ) = |p q0 | = 2.
However, for compact sets the distance is always attained as the following
result shows.
Proposition 1.130. Let (X, d) be a metric space, let K X and let p X.
If K is compact, then there is a point q0 K, such that dist(p, K) = d(p, q0 ).
Proof. Let {qn } be a sequence in K so that dist(p, K) = limn d(p, qn ). Since
K is compact, this sequence has a subsequence that converge to a point in
K. Let {qnk } be a convergent subsequence with limit q0 K. Then by the
above proposition, we have that d(p, q0 ) = limk d(p, qnk ). Since {d(p, qn )} is
a convergent sequence of reals, the subsequence has the same limit. Hence,
d(p, q0 ) = limn d(p, qn ) = dist(p, K).

36

CHAPTER 1. METRIC SPACES

Proposition 1.131. Let (Rk , d) be k-dimesnional Euclidean space, let C


Rk , and let p Rk . If C is closed, then there is a point q0 C with
dist(p, C) = d(p, q0 ).
Proof. Pick any point q1 C and let r = d(p, q1 ). Then dist(p, C) d(p, q1 )
and so, dist(p, C) = inf{d(p, q) : q C, d(p, q) r}. This shows that if we let
K = C B (p; r), then dist(p, C) = dist(p, K). Now since K is closed and
bounded, it is a compact set by Heine-Borel. Thus, by the last result, there
is q0 K( and hence, q0 C) with dist(p, C) = dist(p, K) = d(p, q0 ).

Chapter 2

Finite and Infinite Sets,


Countability
Later we will need a little familiarity with infinite sets and the fact that they
can have different sizes. So we discuss these ideas now.
One formal way to think of the statement that a set S has n elements,
is that there is a function f : {1, 2, ..., n} S that is one-to-one and onto.
Usually, we write f (j) = sj and write S = {s1 , s2 , ..., sn } to show that it has
n elements.
A set S is finite when there is some natural number n and a one-to-one
onto function f : {1, 2, ..., n} S.
A set S is called countably infinite when there is a one-to-one onto
function f : N S. This can be thought of as saying that a set is countable
infinite precisely when the elements of S can be listed in sequence of S =
{s1 , s2 , ...} with no repetitions, i.e., i 6= j implies that si 6= sj .
A set is countable when it is either finite or countably infinite. It is
not hard to see that a set S is countable if and only if there is a function
f : N S that is onto.
There are many sets that are countable, some are a little surprising.
For example, Z is countable, since I could list the elements as,
{0, 1, 1, 2, 2, ...}.
Formally, f : N Z is the function, f (2k 1) = k, for k = 1, 2, ... and
f (2k) = k, for k = 1, 2, ....
Also, even though N N seems to be much larger than N it is also
countable. To see this first Ill write N N as a union of sets, each of which
is finte. For each natrual number n 2, let Sn = {(i, j) N N : i + j = n}.
37

38

CHAPTER 2. FINITE AND INFINITE SETS, COUNTABILITY

Thus, S2 = {(1, 1)}, S3 = {(1, 2), (2, 1)}, S4 = {(1, 3), (2, 2), (3, 1)}. It is not
hard to see that Sn has n 1 elements. Now to list these points, we will first
list the points in S2 then S3 etc. Within each set, we will list the elements in
order of their first coordinate, so for example, (1, 2) comes before (2, 1). This
time the formula for the one-to-one onto function f : N N N, is difficult
to write down. But equivalently, we could define its inverse, i.e., a one-to-one
onto function g : NN N. Now if i+j = n then all the elements of the sets
S1 , ..., Sn1 come before. These sets have 1 + 2 + ... + (n 2) = (n2)(n1)
2
elements, then the order after that is determined by whether i = 1, 2, ...
Thus, the function g is defined by
g((i, j)) =

(i + j 2)(i + j 1)
+i
2

and we have shown that N N is countable.


Now since N N is countable N N N is also countable! To see how to
do this, first define h : NNN NN by h((i, j, k)) = (g((i, j)), k). Since
g is one-to-one and onto N, it is easy to see that h is one-to-one and onto
N N. Now if we compose with g then we will have g h : N N N N
is one-to-one and onto!
In a similar way one sees that doing the Cartesian product of N with
itself any finite number of times yields a countable set.
These facts make it easy to see that some other sets are countable. For
example to see that, Q = {p/q : p, q Z, q 6= 0} is countable. First, the
map from Z (Z\{0}) Q defined by (p, q) p/q is onto(but not one-toone). We know that Z is countable and Z\{0} is countable. So there is a
one-to-one onto map from N N Z (Z\{0}), and hence, a one-to-one
onto map from N to Z (Z\{0}). Composing with the above map gives an
onto map from N onto Q. Hence, Q is countable.
With so many countable sets it is a little surprising that there are sets
so big that their elements cannot be listed, i.e., that are not countable.
Such sets are called uncountable.
One example of an uncountable set is the set R. To prove that R is
uncountable it will be enough to prove that the real numbers in the interval
(0, 1) is uncountable.
Suppose that the real numbers in (0, 1) was a countable set, then we
could list them all, let rn denote the n-th real number, so that {rn : n
N} = (0, 1). Now each real number has a decimal expansion. Recall that
0.4000... = 0.399999.... We shall only allow the first type of decimal expansion. When we disallow infinite tails of nines, then the decimal expansion is

39
unique. Now let
r1 = 0.a1,1 a1,2 a1,3 .....
r2 = 0.a2,1 a2,2 a2,3 .....
etc.
where each of the ai,j is an integer between 0 and 9. Note that ai,j is the
j-th decimal entry of the i-th number.
Now define a sequence of integers bn , as follows: if 0 an,n 4 let
bn = an,n + 1, and if 5 an,n 9 let bn = an,n 1.
Note that 1 bn 8. Hence, there is a real number r, 0 < r < 1, where
r has decimal expansion,
r = 0.b1 b2 ....
Note that the way that we have defined the bn s, there are no nines. Since
b1 6= a1,1 , r 6= r1 and since b2 6= a2,2 , r 6= r2 , etc. Thus, r was not included
in the list! Contradiction.
This method of proofof listing things, somehow getting a doubly indexed array and then altering the (n, n)-entries is often referred to as Cantors diagonalization method.

40

CHAPTER 2. FINITE AND INFINITE SETS, COUNTABILITY

Chapter 3

Continuous Functions
Continuity plays an important role for functions on the real line. Intuitively,
a function is continuous provided that it sends close points to close
points. When we say close we naturally have a notion of distance between
points, both in the domain and the range. Thus, we see that continuity is
really a property for functions between metric spaces. In this chapter, we
define what we mean by continuous functions between metric spaces, then
study the properties of continuous functions. We will see that our notions
of open, closed and compact sets all play an important role.
Definition 3.1. Let (X, d) and (Y, ) be metric spaces and let p0 X.
We say that a function f : X Y is continuous at p0 provided that for
every  > 0 there is > 0 such that whenever p X and d(p0 , p) < then
(f (p0 ), f (p)) < .
Note that in this definition the value of really depends on the  for this
reason some authors write ().
Two quick examples.
Example 3.2. Let X = Y = R both endowed with the usual metric and let
f : R R be defined by
(
0 if x 0
f (x) =
1 if x > 0.
Then f is not continuous at p0 = 0.
To see that f is not continuous at 0, take  = 1/2. For any > 0,
the point p = /2 satisfies d(p0 , p) = |p0 p| = |0 /2| = /2, but
41

42

CHAPTER 3. CONTINUOUS FUNCTIONS

(f (p0 ), f (p)) = |f (p0 ) f (p)| = 0 1| > . Thus, every possible value of


fails to meet the critieria.
Note that the domain of the function is really important when trying to
decide continuity. For this same formula, if we made the domain instead
just X = [1, 0], then f would be continuous at 0, since now the function
would be the function f (x) = 0, for every x X.
Example 3.3. Let X = Y = R with the usual metric, let f (x) = x2 and let
p0 = 3. Then f is continuous at p0 .
To see this given  > 0, we take = min{1, /7}. Then when d(p0 , p) =
|3p| < we know that |3p| < 1 and so 2 < p < 4. Hence, d(f (p0 ), f (p)) =
|32 p2 | = |3 p||3 + p| < (3 + 4)|3 p| < 7 .
If we wanted to prove that this function was continuous at p0 = 5, then
we could take = min{1, /11}, and the same argument would work.
Definition 3.4. Let (X, d) and (Y, ) be metric spaces. We say that a
function f : X Y is continuous provided that f is continuous at every
point in X.
Thus, f is continuous provided that for each p0 X and  > 0 there
is > 0, such that p X and d(p0 , p) < implies that (f (p0 ), f (p)) < .
In this case depends on both  and the point p0 . For this reason some
authors write (p0 , ).
Example 3.5. Let X = Y = R with the usual metric. Then f (p) = p2 is
continuous.
Given p0 R and  > 0 let = min{1, 2|p0|+1 . Then d(p0 , p) < implies
that |p| < |p0 | + 1 and hence (f (p0 ), f (p)) = |p20 p2 | = |p0 + p||p0 p|
(|p0 | + |p|) < (2|p0 | + 1) .
In this example, the that we picked depended on both p0 and . When it
can be chosen independent of p0 , the function is called uniformly continuous.
That is:
Definition 3.6. Let (X, d) and (Y, ) be metric spaces. We say that a
function f : X Y is uniformly continuous provided that for every
 > 0 there is > 0 such that whenever p, q X and d(p, q) < then
(f (p), f (q)) < .
The following is immediate:
Proposition 3.7. Let (X, d) and (Y, ) be metric spaces and let f : X Y.
If f is uniformly continuous, then f is continuous.

43
Example 3.8. Let X = Y = R with the usual metric and let f (p) = 5p + 7,
then f is uniformly continuous.
Given  > 0, let = /5, then when d(p, q) = |p q| < we have that
(f (p), f (q)) = |f (p) f (q)| = 5|p q| < 5 = .
Example 3.9. Let X = Y = R with the usual metric and let f (p) = p2 .
Then f is not uniformly continuous.
To prove this it will be enough to take  = 1, and show that for this value
of , we can find no corresponding > 0. By way of contradiction, suppose
that there was a such that |p q| < implied that |f (p) f (q)| < 1. Set
pn = n, qn = n + /2. Then |pn qn | < , but |f (pn ) f (qn )| = qn2 p2n =
n + 2 /4 > n > 1, for n sufficiently large, in fact for n > 1 .
Example 3.10. Let X = [M, +M ], Y = R with usual metrics and let
f : X Y with f (p) = p2 . Then f is uniformly continuous.

To see this, given  > 0, set = 2M
. then |p q| < implies that
2
2
|p q | = |p + q||p q| 2M |p q| < 2M = .
We now look at a characterization of continuity in terms of open and
closed sets.
Recall that if f : X Y and S Y, then the preimage of S is the subset
of X given by
f 1 (S) = {x X : f (x) S}.

It is important to realize that to make this definition, we do not need for


the function f to have an inverse function.
Theorem 3.11. Let (X, d) and (Y, ) be metric spaces and let f : X Y.
Then the following are equivalent:
1. f is continuous,
2. for every open set U Y, the set f 1 (U ) is an open set in X,
3. for every closed set C Y, the set f 1 (C) is a closed subset of X.
Proof. First we show that 2) and 3) are equivalent. To see this note that
(f 1 (S))c = {x X : f (x)
/ S} = f 1 (S c ). So if say 2) holds and C is
c
a closed set, then C is an open set. Hence, (f 1 (C))c = f 1 (C c ) is open,
which implies that f 1 (C) is closed. The proof that 3) implies 2) is similar.
Now we prove that 1) implies 2). Let U Y be open. We must prove
that f 1 (U ) is open. Since we will be talking about balls in two different

44

CHAPTER 3. CONTINUOUS FUNCTIONS

metric spaces, we will use a subscript to help indicate which metric. Let
p0 f 1 (U ), so that f (p0 ) U. Since U is open there is  > 0, so that
B (f (p0 ); ) U. Because f is continuous at p0 , there is a > 0, so that
d(p0 , p) < implies that (f (p0 ), f (p)) < .
But this last statement shows that Bd (p0 ; ) f 1 (U ), because p
Bd (p0 ; ) implies that d(p0 , p) < , implies that (f (p0 ), f (p)) < , implies
that f (p) B (f (p0 ); ), implies that f (p) U.
Hence, f 1 (U ) is open in X.
Conversely, assume that 2) holds and we will prove 1). Given p0 X
and an  > 0, we have that U = b (f (p0 ); ) is open. Hence, by assumption, f 1 (U ) is open and p0 f 1 (U ). Thus, we may pick > 0
so that Bd (p0 ; ) f 1 (U ). But this implies that if d(p0 , p) < , then
p f 1 (U ), which implies that f (p) U = B (f (p0 ); ), which implies that
(f (p0 ), f (p)) < .
Hence, f is continuous at p0 , and since this was an arbitrary point, we
have that f is continuous at every point in X.
Proposition 3.12. Let (X, d),(Y, ) and (Z, ) be metric spaces and let
f : X Y and g : Y Z be continuous functions. Then g f : X Z is
continuous.
Proof. First note that if S Z, then x (g f )1 (S) iff (g f )(x) S
iff g(f (x)) S iff f (x) g 1 (S) iff x f 1 (g 1 (S)). Thus, we have that
(g f )1 (S) = f 1 (g 1 (S)).
We use the above theorem. If U Z is open in Z, then g 1 (U ) is open
in Y and hence f 1 (g 1 (U )) is open in X. Thus, when U is open in Z, then
(g f )1 (U ) is open in X. Hence, g f is continuous.
Problem 3.13. Let f : R R be defined by f (x) = x3 . Prove that f is
continuous.
Problem 3.14. Prove that f (x) = x3 is not uniformly continuous on R.
Problem 3.15. Prove that f : [M, +M ] R with f (x) = x3 is uniformly
continuous.
Problem 3.16. Let (X, d) and (Y, ) be metric spaces. A function f : X
Y is called Lipschitz continuous provided that there is a constant K > 0, so
that (f (p), f (q)) Kd(p, q). Prove that every Lipschitz continuous function
is uniformly continuous.
Given a function f : X Y and S X, the image of S is the subset of
Y given by f (S) = {f (p) : p S}.

3.1. FUNCTIONS INTO EUCLIDEAN SPACE

45

Problem 3.17. Let (X, d) and (Y, ) be metric spaces and f : X Y a


function. Prove that f is continuous at p0 if and only if for every  > 0, there
exists a > 0 so that the image of Bd (p0 ; ) is contained in B (f (p0 ); ).

3.1

Functions into Euclidean space

Let (X, d) be a metric space. Given functions fi : X R, i = 1, ..., k


we can define a function F : X Rk by setting F (x) = (f1 (x), ..., fk (x)).
Conversely, any function F : X Rk is readily seen to be of this form.
We call the functions f1 , ..., fk the component functions or, more simply, the
components of F.
To avoid possibly confusion, we shall let d2 denote the Euclidean metric
on Rk .
Theorem 3.18. Let (X, d) be a metric space and let (Rk , d2 ) be Euclidean
space. A function F : X Rk is continuous at a point p0 X if and only
if each of its component functions fi : X R is continuous at p0 for every
i = 1, ..., k. The function F is continuous if and only if each of its component
functions is continuous.
Proof. First assume that F is continuous at p0 . Given  > 0, there exists
> 0 such that when d(p0 , p) < , then
p
d2 (F (p0 ), F (p)) = (f1 (p0 ) f1 (p)2 + + (fk (p0 ) fk (p))2 < .
For each i, since |fi (p0 ) fi (p)| < d2 (F (p0 ), F (p)), we see that d(p0 , p) <
implies that |fi (p0 ) fi (p)| < . Thus, fi is continuous at p0 for each i.
Conversely, assume that each fi is continuous at p0 and let  > 0 be
given. For each i, there exists i > 0, so that d(p0 , p) < i implies that
|fi (p0 ) fi (p)| < k . Now let = min{1 , ..., k }. Then for d(p0 , p) < we
have that d2 (F (p0 ), F (p)) < .
The second equivalence follows from the first and the fact that continuous
means continuous at every point.
Theorem 3.19. Let (X, d) be a metric space, let p0 X, let R have the usual
metric and let f, g : X R be functions. If f and g are both continuous at
p0 , then:
1. f + g is continuous at p0 ,
2. f g is continuous at p0 ,

46

CHAPTER 3. CONTINUOUS FUNCTIONS


3. for any constant c, cf is continuous at p0 ,
4. if g(p) 6= 0 for all p, then

f
g

is continuous at p0 .

Proof. To prove the first statement, given  > 0, there is 1 > 0, so that
d(p0 , p) < 1 implies that f (p0 ) f (p)| < /2. Also there is 2 > 0, so that
d(p0 , p) < 2 implies that |g(p0 ) g(p)| < /2. Let = min{1 , 2 }.
To prove the second, given  > 0, first pick 1 > 0 and 2 > 0, so
that d(p0 , p) < 1 implies |f (p0 ) f (p)| < 1 and d(p0 , p) < 2 implies that
|g(p0 ) g(p)| < 1. Now pick 3 > 0 and 4 > 0 so that d(p0 , p) < 3 implies
that |f (p0 )f (p)| < 2(|g(p0 )|+1) and d(p0 , p) < 4 implies that |g(p0 )g(p)| <

2(|f (p0 )|+1) . Let = min{1 , ..., 4 }, then for d(p0 , p) < we have that
|f (p0 )g(p0 ) f (p)g(p)| |f (p0 ) f (p)||g(p0 )| + |g(p0 ) g(p)||f (p)| <


|g(p0 )| +
(|f (p0 )| + 1) < .
2(|g(p) )| + 1)
2(|f (p0 )| + 1)
Statement 3) follows from 2) and the fact constants are continuous.
Statement 4) will follow from 2), if we prove that the function g1 is
continuous at p0 . Given  > 0, we first pick 1 > 0, so that d(p0 , p) < 1
implies that |g(p0 ) g(p)| < |g(p20 )| . This guarantees that |g(p)| > |g(p20 )| . We
now pick 2 > 0 so that d(p0 , p) < 2 implies that |g(p0 ) g(p)| <
Now if we let = min{1 , 2 }, then when d(p0 , p) < , we have
|

|g(p0 )|2 
.
2

1
1
|g(p) g(p0 )|

|=

g(p0 ) g(p)
|g(p0 )g(p)|
|g(p) g(p0 )|
|g(p0 )|2 
2
<
= .
2
|g(p0 )| /2
2
|g(p0 )|2

Corollary 3.20. Let (X, d) be a metric space, let R have the usual metric
and let f, g : X R be functions. If f and g are both continuous, then:
1. f + g is continuous,
2. f g is continuous,
3. for any constant c, cf is continuous,
4. if g(p) 6= 0 for all p, then

f
g

is continuous.

3.2. CONTINUITY OF SOME BASIC FUNCTIONS

47

Corollary 3.21. Let (X, d) be a metric space, let p0 X,and let F, G :


X Rk . If F, G are both continuous at p0 (respectively, both continuous),
then
1. F + G is continous at p0 (respectively, continuous),
2. cF is continuous at p0 (respectively, continuous),
3. F G is continuous at p0 (respectively, continuous).
Proof. We only prove 3), the rest are similar. If we let f1 , ..., fk and g1 , .., gk
be the component functions for F and G, then the fact that F and G are
continuous at p0 implies that f1 , ..., fk , g1 , ..., gk are all continuous at p0 . By
the above result about continuity of products, each fi gi is continuous at p0 ,
and so by the result about continuity of sums, F G = f1 g1 + + fk gk is
continuous at p0 .
The results about continuity on all of X then follow since they are true
at each point.

3.2

Continuity of Some Basic Functions

Proposition 3.22. Every polynomial defines a continuous function on R.


Proof. The proof is by induction on the degree of the polynomial. First note
that 1 (x) = x is not only continuous but uniformly continuous since we can
pick = . Applying Corollary 3.19.2 with f (x) = g(x) = x, we get that the
function p2 (x) = x2 is continuous. Now assume that we have shown that
pn (x) = xn is continuous. Then applying Corollary 3.20.2 again, we have
that pn (x)p1 (x) = xn+1 = pn+1 (x) is continuous.
Since pn is continuous, applying Corollary 3.20.3 any function of the form
an xn , with an R a constant is continuous. Since constants are continuous,
applying Corollary 3.20.1 to f (x) = a1 x and g(x) = a0 , we get that any first
degree polynomial is continuous. Now assume that we have proved that
all polynomials of degree n are continuous and we are given an (n + 1)-st
degree polynomial, q(x) = an+1 xn+1 + qn (x) where qn is an n-th degree
polynomial. By our inductive hypothesis qn is continuous and by our earlier
work, an+1 xn+1 is continuous. Thus, applying Corollary 3.20.1, gives that q
is continuous.
Thus, all polynomials are continuous.
Proposition 3.23. Let p, q be polynomials and let E = {x R : q(x) 6= 0}.
Then r : E R defined by r(x) = p(x)/q(x) is continuous.

48

CHAPTER 3. CONTINUOUS FUNCTIONS

Proof. We know that p and q are continuous on R by the above result


and hence they are continuous on E. The result now follows by applying
Corollary 3.20.4 with X = E.
We now prove continuity of our favorite trigonometric functions. We will
need a few important facts and inequalities:
cos(x+y) = cos(x)cos(y)sin(x)sin(y), sin(x+y) = sin(x)cos(y)+cos(x)sin(y),
|sin(x)| |x|, |1 cos(x)| |x|.
The first two equalities are the double angle formulas. The two inequalities
follow by examining the unit circle and recalling that x in radian measure
is the length of the arc.
Proposition 3.24. The functions, sin, cos : R R are both Lipschitz
continuous with constant 2.
Proof. We have that
|sin(x) sin(y)| = |sin((x y) + y) sin(y)| =
|sin(x y)cos(y) + cos(x y)sin(y) sin(y)|
|sin(x y)cos(y)| + |(cos(x y) 1)sin(y)|
|sin(x y)| + |cos(x y) 1| 2|x y|.
Thus, d(sin(x), sin(y)) 2d(x, y).
Recall that by Problem 3.16, we have that the functions sin and cos
are not only continuous on R but are uniformly continuous on R. Applying
Corollary 3.20.4, we have that the functions tan, cot, sec, cosec are continuous wherever their denominators do not vanish.
We now look at some multivariable functions. We formally define functions, xi : Rk R by xi ((a1 , ..., ak )) = ai . Thus, xi is the function that yields
the i-th coordinate of a point. These are called the coordinate functions
of a point. By a polynomial in the coordinate functions, we mean
any function p : Rk R that can be expressed as a finite sum of products
of constants times products of coordinate functions of points. For example,
6x21 x53 + 2x32 + 7 is a polynomial in the coordinate functions.
Proposition 3.25. Every polynomial in the coordinate functions defines a
continuous function from the Euclidean space Rk to R.

3.3. CONTINUITY AND LIMITS

49

Proof. Given p = (a1 , ..., ak ) and q = (b1 , ..., bk ) two points in Rk , we have
that d(xi (p), xi (q)) = |xi (p) xi (q)| = |ai bi | d2 (p, q). This shows that
each coordinate function is a Lipschitz continuous function from (Rk , d2 ) to
R. The remainder of the proof follows as in the case of ordinary polynomials.
First, one does induction to show that every monomial function is continuous
and then do an induction on the number of terms in the polynomial.
Problem 3.26. Prove that the function g : [0, +) R defined by g(x) =

x is continuous.
Problem 3.27. Prove or disprove that the above function is uniformly continuous.

3.3

Continuity and Limits

In this section we generalize the concept of limit and establish the connections between limits and continuity. The first result gives a sequential test
for continuity.
Theorem 3.28. Let (X, d) and (Y, ) be metric spaces, let f : X Y be a
function and let p0 X. Then f is continuous at p0 if and only if whenever
{pn } X is a sequence that converges to p0 , the sequence {f (pn )} Y
converges to f (p0 ).
Proof. Assume that f is continuous at p0 and let  > 0. Then there exists
> 0, such that d(p0 , p) < implies that (f (p0 ), f (p)) < . Since limn pn =
po , there is N so that n > N implies that d(p0 , p) < . Hence, if n > N,
then (f (p0 ), f (pn )) < . This proves that limn f (pn ) = f (p0 ).
To prove the converse, we show its contrapositive. So assume that f is
not continuous at p0 . Then there exists  > 0, for which we can obtain no
that satisfies the definition of continuity. The fact that = 1/n fails to
satisfy the definition, means that there exists a point pn , with d(p0 , pn ) <
1/n, but (f (p0 ), f (pn )) .
In this manner we obtain a sequence {pn } with d(p0 , pn ) < 1/n and
hence limn pn = p0 . Yet (f (p0 ), f (pn ))  for every n, and so f (p0 ) cannot
be the limit of the sequence {f (pn )}.
Corollary 3.29. Let (X, d) and (Y, ) be metric spaces and let f : X Y.
Then f is continuous if and only if whenever {pn } X is a convergent
sequence we have limn f (pn ) = f (limn pn ).

50

CHAPTER 3. CONTINUOUS FUNCTIONS

There is another notion of limit, familiar from calculus, that is also


related to continuity.
Definition 3.30. Let (X, d) and (Y, ) be metric spaces, let p0 X be a
cluster point, let f : X\{p0 } Y be a function and let q0 Y. We write
lim f (p) = q0

pp0

provided that for every  > 0 there is a > 0 such that whenever d(p0 , p) <
and p 6= p0 , then (q0 , f (p)) < .
Note that we need p0 to be a cluster point to guarantee that the set of
ps satisfying d(p0 , p) < and p 6= p0 , is non-empty. Often, when using this
definition, the function f is actually defined on all of X, in which case we
simply ignore its value at p0 .
We leave the proof of the following result as an exercise.
Theorem 3.31. Let (X, d) and (Y, ) be metric spaces, let f : X Y be
a function and let p0 X be a cluster point. Then f is continuous at p0 if
and only if limpp0 f (p) = f (p0 ).
Lemma 3.32. Let (X, d) and (Y, ) be metric spaces and let f : X Y be
any function. If p0 X is not a cluster point, then f is continuous at p0 .
Proof. Since p0 is not a cluster point, there exists r > 0 so that d(p0 , p) < r
implies that p = p0 . Thus, given any  > 0, if we let = r, then d(p0 , p) <
implies that (f (p0 ), f (p)) = 0 < .
Corollary 3.33. Let (X, d) and (Y, ) be metric spaces, and let f : X Y
be a function. Then f is continuous if and only if for every cluster point
p0 X, we have that limpp0 f (p) = f (p0 ).
Problem 3.34. Prove Theorem 3.31.
Problem 3.35. Prove Corollary 3.33.
Problem 3.36. Let (X, d) and (Y, ) be metric spaces, let f : X Y be
a function and let p0 X be a cluster point. Prove that limpp0 f (p) = q0
if and only if for every sequence of points {pn } X such that pn 6= p0 and
limn pn = p0 , we have limn f (pn ) = q0 .

3.4. CONTINUOUS FUNCTIONS AND COMPACT SETS

3.4

51

Continuous Functions and Compact Sets

If we let C = {(x, y) R2 : xy = 1} then it is not hard to see that C is


a closed subset of R2 . Also we know that the function f : R2 R given
by f ((x, y)) = x is a continuous function, it is the first coordinate function.
But the image f (C) = {x R : x 6= 0} is not a closed subset of R. Thus,
the continuous image of a closed set need not be a closed set. The story is
quite different for compact sets.
Theorem 3.37. Let (X, d) and (Y, ) be metric spaces, and let f : X Y
be a continuous function. If K X is compact, then its image f (K) Y
is compact.
Proof. Let {U }A be an open cover of f (K), then each of the sets f 1 (U )
is open and {f 1 (U )}A is an open cover of K. Since K is compact, there
is a finite subset F A such that {f 1 (U )}F covers K.
Note that f (f 1 (U )) = U . Hence, {U }F covers f (K). Since every
open cover of f (K) has a finite subcover, f (K) is compact.
Corollary 3.38. Let (X, d) and (Y, ) be metric spaces, and let f : X Y
be a continuous function. If X is compact, then f (X) is a bounded subset
of Y.
Thus, when X is non-empty, compact and f : X R is continuous,
then there exists M such that |f (x)| M for every x X. So in particular
sup{f (x) : x X} and inf{f (x) : x X} exists. The following shows that
not only does the supremum and infimum exist but that they are attained.
Corollary 3.39. Let (X, d) be a non-empty compact metric space and let
f : X R be continuous. Then there are points xm , xM X, such that for
any x X, f (xm ) f (x) f (xM ). That is, f (xm ) = inf{f (x) : x X}
and f (xM ) = sup{f (x) : x X}.
Proof. The proof is similar to the proof of Proposition 1.126. Choose a
sequence of points {xn } such that limn f (xn ) = inf{f (x) : x X}. Then
since X is sequentially compact, there is a subsequence {xnk } that converges
to some point xm . Since f is continuous, f (xm ) = limk f (xnk ) = inf{f (x) :
x X}. The proof for the supremum is similar.
The above result gives the proof that whenever f : [a, b] R is continuous, then f attains its maximum and minimum value. First, by Heine-Borel
the interval [a, b] is compact, now apply the above result. Note that (0, 1)
is a bounded interval, f (x) = 1/x is continuous on this set but is not even
bounded.

52

CHAPTER 3. CONTINUOUS FUNCTIONS

Theorem 3.40. Let (X, d) and (Y, ) be metric spaces, and let f : X Y
be a continuous function. If X is compact, then f is uniformly continuous.
Proof. Given  > 0, since f is continuous at p, there is a (p) > 0, such that
d(p, q) < (p) implies that (f (p), f (q)) < /2.
The collection of sets {B(p; (p)
2 )}pX is an open cover of X. Hence, there
exists a finite collection of points, {p1 , ..., pn } so that
X B(p1 ;

(pn )
(p1 )
) B(pn ;
).
2
2

Let = 12 min{(p1 ), ..., (pn )}.


Now let p, q X be any points with d(p, q) < . Because the above balls
are a cover, there exists an i, so that p B(pi ; (p2 i ) ). Hence, (f (pi ), f (p)) <
/2. Also, d(pi , q) d(pi , p)+d(p, q) < (p2 i ) + (pi ) and so, (f (pi ), f (q)) <
/2. Thus, we have that (f (p), f (q)) (f (p), f (pi )) + (f (pi ), f (q)) <
.
Thus, for example any rational function r(x) = p(x)/q(x), such that
q(x) 6= 0 on the set [a, b] will be uniformly continuous.
Problem 3.41. Let (X, d) be a metric space, fix p X and define f : X
R by f (x) = d(p, x). Prove that f is continuous. Use this fact to give another
proof of Proposition 1.130.

3.5

Connected Sets and the Intermediate Value


Theorem

We will see in this section that the intermediate value theorem from calculus
is really a consequence of the fact that an interval of real numbers is a
connected set. First we need to define this concept.
Definition 3.42. A metric space (X, d) is connected if the only subsets
of X that are both open and closed are X and the empty set. A subset S
of X is called connected provided that the subspace (S, d) is a connected
metric space. If S is not connected then we say that S is disconnected or
separated.
Proposition 3.43. A metric space (X, d) is disconnected if and only if X
can be written as a union of two disjoint, non-empty open sets.

3.5. CONNECTED SETS AND THE INTERMEDIATE VALUE THEOREM53


Proof. We have that (X, d) is disconnected if and only if there is a subset
A with A 6= X and A non-empty and A is open and closed. Let B = Ac ,
then since A is closed B is open and since A 6= X, B is non-empty. Thus,
X = A B expresses X as a disjoint union of two non-empty open sets.
Conversely, if X = A B with A and B disjoint, non-empty and open.
Then necessarily A = B c and so A is both open and closed. Since neither
set was empty, A 6= X and A is non-empty.
Example 3.44. If (X, d) is a discrete metric space with two or more points,
then X is disconnected since X = {p0 } {p0 }c expresses X as a disjoint
union of two non-empty open sets.
We now come to perhaps the most important example of a connected
space. By an interval in R we mean either an open interval, closed interval,
or half-open interval. The endpoints can be either an actual number or +
or .
Theorem 3.45. Let I R be an interval or all of R. Then I is a connected
set.
Proof. Suppose not, then we could write I = A B where A and B are
both non-empty and open in the metric space (I, d) where d is the usual
metric. Let a A and b B. Without loss of generality, we can assume
that a < b(otherwise just change the names of A and B). Let A1 = A[a, b],
and B1 = B [a, b]. Then A1 and B1 are disjoint, non-empty open sets in the
topological space ([a, b], d). Since A1 is closed and bounded, it is compact
and hence c = sup{x : x A1 } is an element of A1 . Hence, c < b. But since
A1 is open, there is an open interval centered at c that is still in A1 . This
open interval contains points in A1 that are larger than c. This contradiction
completes the proof.
Now we come to a general version of the Intermediate Value Theorem.
Theorem 3.46 (Intermediate Value Theorem for Metric Spaces). Let (X, d)
be a connected metric space and let f : X R be a continuous function. If
x0 , x1 X with f (x0 ) < L < f (x1 ), then there is x2 X with f (x2 ) = L.
Proof. Suppose not. Then we would have that X = f 1 ((, L))f 1 ((L, +)).
Since f is continuous, both of these sets are open, with x0 in the first set and
x1 in the second set. Thus, X is written as a disjoint union of two non-empty
open sets. This makes each of these sets open and closed, contradiction.

54

CHAPTER 3. CONTINUOUS FUNCTIONS

Corollary 3.47 (Intermediate Value Theorem). Let I R be an interval


or the whole real line and let f : I R be continuous. If x0 , x1 I and
f (x0 ) < L < f (x1 ), then there is x2 between x0 and x1 with f (x2 ) = L.
Proof. If x0 < x1 , apply the theorem to the connected space [x0 , x1 ], if
x1 < x0 apply the theorem to the connected space [x1 , x0 ].
Theorem 3.48. Let (X, d) and (Y, ) be metric spaces and let f : X Y
be continuous. If X is connected, then f (X) Y is a connected subset.
Proof. Let S = f (X) and let (S, ) be the subspace. Then f : X S is also
continuous. If we could write S = AB as a disjoint union of two non-empty
open sets, then we would have that X = f 1 (A) f 1 (B) expresses X as a
disjoint union of non-empty open sets. Hence, S must be connected.
Definition 3.49. A metric space (X, d) is called pathwise connected
provided that for any two points, a, b X there exists a continuous function
f : [0, 1] X such that f (0) = a and f (1) = b.
Intuitively, a space is pathwise connected if and only if you can draw a
curve between any two points with no breaks in the curve.
Problem 3.50. Prove that if (X, d) is pathwise connected, then (X, d) is
connected.
Example 3.51. The subset of the plane defined by
1
X = {(x, sin( )) : 0 < x 1} {(0, y) : 1 y +1}
x
is a space that is connected, but not pathwise connected.
Problem 3.52. Let X = {(x, y) : a x b, c y d} R2 . Prove that
X is pathwise connected.
Definition 3.53. Let g : [a, b] R. By the graph of g we mean the set
G = {(x, g(x)) : a x b} R2 .
Problem 3.54. Let g : [a, b] R. Prove that g is continuous if and only if
the graph of g is a pathwise connected subset of the plane.
This last problem is often used in calculus to intuitively describe continuous functions as those whose graph can be drawn without lifting your
pencil.

Chapter 4

The Contraction Mapping


Principle
At this point we would like to present an important result that uses the
concepts that we have developed and together with some of its applications.
Logically, this material should be done later, since it uses facts from calculus
about derivatives and integrals which we have not yet discussed. But we find
that a little practical math at this stage helps to motivate the rest of the
course.
Definition 4.1. Let (X, d) be a metric space. A function f : X X is
called a contraction mapping provided that there is r, 0 < r < 1, so that
d(f (x), f (y)) rd(x, y) for every x, y X.
Note that saying that f is a contraction mapping is the same as saying
that it is Lipschitz continuous with constant r < 1. It is important to note
that when we say that f is a contraction mapping that is stronger than just
saying that d(f (x), f (y)) < d(x, y), since we need the r.
Given a function f : X X any point satisfying f (x0 ) = x0 is called a
fixed point of the function.
Theorem 4.2 (Contraction Mapping Principle). Let (X, d) be a complete
metric space and let f : X X be a contraction mapping. Then:
1. there exists a unique point x0 X, such that f (x0 ) = x0 ,
2. if x1 X is any point and we define a sequence inductively, by setting
xn+1 = f (xn ), then limn xn = xo ,
3. for this sequence, we have that d(x0 , xn )
55

d(x2 ,x1 )rn1


.
1r

56

CHAPTER 4. THE CONTRACTION MAPPING PRINCIPLE

Proof. First, we show that the inductively defined sequence does converge.
To see this, since X is complete, it will be enough to show that the sequence
is Cauchy.
Let A = d(x2 , x1 ). Then we have that d(x3 , x2 ) = d(f (x2 ), f (x1 ))
rd(x2 , x1 ) = rA. Similarly, d(x4 , x3 ) = d(f (x3 ), f (x2 )) rd(x3 , x2 ) r2 A.
By induction, we prove that d(xn+1 , xn ) rn1 A.
Now, if m > n, then d(xm , xn ) d(xm , xm1 ) + + d(xn+1 , xn )
Pm1 k1
n1
A Ar1r . Given any  > 0, we may choose an integer N such
k=n r
N

that Ar
1r < . Then if m, n > N, we have that d(xm , xn ) < . Thus, {xn }nN
is Cauchy.
Let x0 = limn xn . Then since f is continuous, f (x0 ) = limn f (xn ) =
limn xn+1 = x0 . to a point x0 satisfying f (x0 ) = x0 .
n1
Now if we fix any n, then d(x0 , xn ) = limm d(xm , xn ) Ar1r , by the
above estimate, which proves 3).
Thus, we know that there is a point x0 , with f (x0 ) = x0 and that this
sequence converges to one such point. To complete the proof of the theorem
it will be enough to show that if f (x0 ) = x0 and f (p0 ) = p0 , then p0 = x0 .
To see this last fact, note that d(p0 , x0 ) = d(f (p0 ), f (x0 )) rd(p0 , x0 ). Since
r < 1, this implies that d(p0 , x0 ) = 0.
Thus, the contraction mapping principle not only guarantees us the existence of a fixed point, but shows that there is a unique fixed point and gives
us a method for approximating the fixed point, together with an estimate of
how close the sequence xn is to the fixed point! This is a remarkable amount
of information.
Heres a typical application of this theorem. Let f (x) = cos(x) and
note that since 1 < /2 for 0 x 1, we have that 0 f (x) 1. Thus,
f : [0, 1] [0, 1] and is continuous. Also, by the mean value theorem, for
0 x y 1, there is c, x c y with
f (y) f (x) = f 0 (c)(y x) = sin(c)(y x).
Hence, |f (y) f (x)| sin(c)|y x| sin(1)|y x|. Thus, f (x) = cos(x) is
a contraction mapping with r = sin(1) < 1.
Using the fact that [0, 1] is complete, by the contraction mapping principle, there is a unique point, 0 x0 1, such that cos(x0 ) = x0 . Moreover,
we can obtain this point(or at least approximate it) by choosing any number
x1 , 0 x1 1 and forming the inductive sequence xn+1 = cos(xn ).
The third part of the theorem gives us an estimate of the distance between our approximate fixed point xn and the true fixed point. In par-

4.1. APPLICATION: NEWTONS METHOD

57

ticular, if we pick x1 = 0, then x2 = cos(x1 ) = 1, so |x2 x1 | = 1. Hence,


n1
|x0 xn | sin(1)
1sin(1) .
Problem 4.3. Let f (x) = x2 + x1 . Use basic calculus to show that f : [1, 2]
[1, 2] and use the mean value theorem
to show that f 3is a contraction. Show
that the unique fixed point of f is 2. Using x1 = 2 , give an estimate on

| 2 xn |.

4.1

Application: Newtons Method

Newtons method gives an iterative method for approximating the solution


to an equation of the form f (x) = 0, when f is differentiable.
The idea of the iteration is to choose any x1 and then get an improved
estimate to the solution by tracing the tangent line to the graph at the point
(x1 , f (x1 )) until it intersects the x-axis and letting this determine the point
x2 . The formula that one obtains is
x2 = x1

f (x1 )
.
f 0 (x1 )

Newtons method consists of repeating this formula iteratively to generate


a sequence of points
f (xn )
xn+1 = xn 0
f (xn )
that, hopefully, converge to the zero of the function.
Note that if we set g(x) = x ff0(x)
(x) , then f (x) = 0 if and only if g(x) = x.
Thus, finding a zero of f is equivalent to finding a fixed point of g. Moreover,
the above iteration is simply computing xn+1 = g(xn ).
Thus, if we can construct an interval [a, b] such that g : [a, b] [a, b] and
such that g is a contraction mapping on [a, b] then we will have a criterion
for convergence of Newtons method.
The details are below.
Theorem 4.4 (Newtons Method). Let f be a twice continuously differentiable function and assume that there is a point x0 with f (x0 ) = 0 and
f 0 (x0 ) 6= 0. Then there is M > 0 so that for any x1 with x0 M x1
x0 + M the sequence of points obtained by Newtons method starting at x1
converges to x0 .
00

(x)f (x)
Proof. Fix an r, 0 < r < 1. Let g be as above, then g 0 (x) = f(f
0 (x))2 , which is
0
continuous in a neighborhood of x0 . Since g (x0 ) = 0, there is some M > 0,

58

CHAPTER 4. THE CONTRACTION MAPPING PRINCIPLE

so that |x x0 | M implies that |g 0 (x)| r. Hence, for any x, in the


interval I = [x0 M, x0 + M ], we have that |g(x) x0 | = |g(x) g(x0 )| =
|g 0 (c)(x x0 )| rM < M. Hence, x I implies that g(x) I.
Thus, g : I I and for any x, y I, |g(x) g(y)| = |g 0 (c)(x y)|
r|x y|, since c I. Hence, we may apply the contraction mapping principle
to g to obtain the desired result.
There are two difficulties in using the above theorem. First, the interval
is centered at x0 , which is a value that we are trying to obtain! The second
lies in having enough detailed information about g to explicitly find M.
However, since the proof of Newtons method uses the contraction mapping
principle, in some cases it is possible to say a great deal.
Problem 4.5. Let f (x) = x2 5. Show that a zero of this equation lies
somewhere in the interval [2, 3]. Calculate g(x) for this function and prove
that g : [2, 3] [2, 3] and |g 0 (x)| 1/2 for x [2, 3]. Prove that if we
perform Newtons method with x1 = 2, then |xn 5| (1/2)n .
Problem 4.6. Let f (x) = x cos(x), so that x0 = cos(x0 ), the problem
that we studied earlier. Compute the corresponding g for this f and find a
concrete interval I = [a, b] with 0 a < b 1 such that g : I I is a
contraction mapping. How does this r compare to the earlier r from the last
section?

4.2

Application: Solution of ODEs

In this section we show how the contraction mapping principle can be used
to deduce that solutions exist and are unique for some very complicated
ordinary differential equations.
Starting with a continuous function h(x, y) and an intial value y0 , we
wish to solve
dy
= h(x, y), y(a) = y0 .
dx
That is, we seek a function f (x) so that f 0 (x) = h(x, f (x)) for a < x < b
and f (a) = y0 . Such a problem is often called an initial value problem(IVP).
dy
For example, when h(x, y) = x2 y 3 , then we are trying to solve, dx
= x2 y 3
which can be done by separation of variables. But our function could be
dy
h(x, y) = sin(xy), in which case the differential equation becomes dx
=
sin(xy), which cannot be solved by elementary means.
For this application, we will use a number of things that we have not yet
developed fully. But again, we stress that we are seeking motivation.

4.2. APPLICATION: SOLUTION OF ODES

59

First note that by the fundamental theorem a continuous function f :


[a, b] R satisfies the IVP if and only if for a x b,
Z x
Z x
0
h(t, f (t))dt + y0 .
f (t)dt + y0 =
f (x) =
a

This is often called the integral form of the IVP.


Now let C([a, b]) denote the set of all real-valued continuous functions
on the interval [a, b]. Note that if we are given any f C([a, b]) and we set
Z x
h(t, f (t))dt + y0 ,
g(x) =
a

then g is also a continuous function on [a, b].


Thus, we can define a map : C([a, b]) C([a, b]) by letting (f ) = g,
where f and g are as above. We see that solving our IVP is the same as
finding a fixed point of the map ! Also, we are now in a situation, where
by a point in our space C([a, b]), we mean a function.
This is starting to look like an application of the contraction mapping
principle. For this we would first need a metric d on the set C([a, b])(again
points in this metric space are functions!) so that (C([a, b]), d) is a complete
metric space and then we would need to be a contraction mapping.
It turns out that there is such a metric on C([a, b]) and that for many
functions h the corresponding map is a contraction mapping.
First for the metric. Given any two functions f, g C([a, b]), we set
d(f, g) = sup{|f (x) g(x)| : a x b}.
This is the example that was introduced in the our first section on metric
spaces. Note that since f, g are continuous functions on the compact metric
space [a, b], we have that the continuous function f g is bounded. Hence,
the supremum is finite. Also, it is clear that d(f, g) = 0 if and only if
f (x) = g(x) for all x, i.e., if and only if f = g. Note that d(f, g) = d(g, f ).
Finally, if f, g, h C([a, b]), then
d(f, g) = sup{|f (x) h(x) + h(x) g(x)| : a x b}
sup{|f (x) h(x)| + |h(x) g(x)| : a x b}
sup{|f (x) h(x)| : a x b} + sup{|h(x) g(x)| : a x b} =
d(f, h) + d(h, g).
Thus, we see that d is indeed a metric on C([a, b]). To apply the contraction mapping principle, we need this metric space to be complete. This
fact is shown by the following theorem.

60

CHAPTER 4. THE CONTRACTION MAPPING PRINCIPLE

Theorem 4.7. The metric space (C([a, b]), d) is complete.


Proof. We need to prove that if {fn } C([a, b]) is a Cauchy sequence in
the d metric, then there is a continuous function f, to which it converges.
First, note that is we fix x, a x b, then we have that
|fn (x) fm (x)| sup{|fn (t) fm (t)| : a t b} = d(fn , fm ).
Thus, for each fixed x, the sequence of real numbers {fn (x)} is Cauchy and
hence converges to a real number.
Thus, we may define a function by setting, f (x) = limn fn (x). Doing this
for each a x b, defines f : [a, b] R.
We now wish to prove that f is a continuous function, so that f defines
a point in the metric space C([a, b]) and then prove that the sequence {fn }
converges to this point.
First to see that f is continuous at some point x0 , given  > 0, pick N
so that n, m > N implies that d(fn , fm ) < /3. Now fix an n > N, then we
have that
|f (x) fn (x)| = lim |fm (x) fn (x)| /3.
m

Using that fn is continuous, pick > 0, so that |x0 x| < implies that
|fn (x0 ) fn (x)| < /3. Then for |x0 x| < we have that
|f (x0 ) f (x)| = |f (x0 ) fn (x0 ) + fn (x0 ) fn (x) + fn (x) f (x)|
|f (x0 ) fn (x0 )| + |fn (x0 ) fn (x)| + |fn (x) f (x)| < .
Thus, we have that f is continuous at x0 and since x0 was arbitrary, f is
continuous.
Finally, we must show that limn d(f, fn ) = 0. But to see this, given
 > 0, take N as before and recall that we have already shown, that |f (x)
fn (x)| /3, for any n > N. Hence, for n > N, we have that d(f, fn )
sup{|f (x) fn (x)| : a x b} /3 < .
We now have all the tools at our disposal needed to solve some IVPs.
Theorem 4.8. Let h(x, y) be a continuous function on [a, b]R and assume
that |h(x, y1 ) h(x, y2 )| K|y1 y2 | with r = K(b a) < 1. Then for any
y0 , the initial value problem, f 0 (x) = h(x, f (x)), f (a) = y0 has a unique
solution on the interval [a, b].

4.2. APPLICATION: SOLUTION OF ODES

61

Proof. Consider the mapping : C([a, b]) C([a, b]) defined by


Z x
h(t, f (t))dt + y0 .
(f )(x) =
a

Given f, g C([a, b]), we have that


Z x
h(t, f (t)) h(t, g(t))dt|
|(f )(x) (g)(x)| = |
a
Z x
Z x
|h(t, f (t)) h(t, g(t))|dt
K|f (t) g(t)|dt
a
a
Z x
Kd(f, g)dt = K(x a)d(f, g) rd(f, g).
a

Thus, is a contraction mapping in the metric d. Since (C([a, b]), d) is


a complete metric space, by the contraction mapping principle, there exists
a unique f C([a, b]), such that (f ) = f. Earlier, we saw that an f was
a solution to the IVP if and only if (f ) = f. Thus, the solution not only
exists but it is unique.
The contraction mapping principle not only gives us the existence and
uniqueness of solutions to the IVP, but it also gives us a method for approximating solutions and very nice bounds on the error of the approximation.
The above theorem is far from the most useful, because the conditions on
h(x, y) are too restrictive for most applications. For example, h(x, y) = x3 y 2
doesnt satisfy our hypothesis. For this reason you will seldom see it in a
textbook. But it is does have the advantage of being the simplest to prove
and we believe that it illustrates the key guiding principles of the proofs of
more complicated results.
dy
Problem 4.9. Consider the IVP on the interval [0, /2] given by dx
=
sin(xy)
, y(0) = 1. Prove that the corresponding map is a contraction map5
2
ping with r = 10 (if you get a better bound, that is good!). Let f denote
the unique solution to this IVP and let fn+1 = (fn ) denote the sequence
that one obtains starting with f1 equal to the constant function 1. Give an
estimate on d(f, fn ).

62

CHAPTER 4. THE CONTRACTION MAPPING PRINCIPLE

Chapter 5

Riemann and
Riemann-Stieltjes
Integration
In this chapter we develop the theory of the Riemann integral, which is
the type of integration used in your calculus courses and we also introduce
Riemann-Stieltjes integration which is widely used in probability, statistics
and financial mathematics.
Given a closed interval [a, b] by a partition of [a, b] we mean a set
P = {x0 .x1 , , ..., xn1 , xn } with a = x0 < x1 < ... < xn = b. The norm or
width of the partition is
kPk = max{xi xi1 : 1 i n}.
Given two partitions P1 and P2 we say that P2 is a refinement of P1 or
P2 refines P1 provided that as sets P1 P2 . Note that if P2 refines P1 ,
then kP2 k kP1 k.
Given a bounded function f : [a, b] R and a partition P = {x0 , ..., xn }
of [a, b] for i = 1, ..., n, we set
Mi = sup{f (x) : xi1 x xi }
and
mi = inf{f (x) : xi1 x xi }.
The upper Riemann sum of f given the partition P is the real number,
U (f, P) =

n
X

Mi (xi xi1 ),

i=1

63

64CHAPTER 5. RIEMANN AND RIEMANN-STIELTJES INTEGRATION


and the lower Riemann sum of f is
L(f, P) =

n
X

mi (xi xi1 ).

i=1

Note that if we hadnt assumed that f is a bounded function then some


of the numbers Mi or mi would have been infinite. This is the one reason
that we can only define Riemann integrals for bounded functions.
By a general Riemann sum of f given P, we mean a sum of the form
n
X

f (x0i )(xi xi1 ),

i=1

where x0i is any choice of points satisfying, xi1 x0i xi , for i = 1, ..., n.
Since mi f (x0i ) Mi , the upper and lower Riemann give an upper and
lower bound for general Riemann sums, i.e.,
L(f, P)

n
X

f (x0i )(xi xi1 ) U (f, P).

i=1

In fact, since we can choose the points x0i so that f (x0i ) is arbitrarily close
to Mi , we see that U (f, P) is actually the supremum of all general Riemann
sums of f given P. Similarly, by choosing the points x0i so that f (x0i ) is
arbitrarily close to mi , we see that L(f, P) is the infimum of all general
Riemann sums.
Thus, if we want all general Riemann sums of a function to be close
to a value that we wish to think of as the integral of f, then it will be
enough to study the extreme cases of the upper and lower sums.
Proposition 5.1. Let f : [a, b] R be a bounded function and let P1 and
P2 be partitions of [a, b] with P2 a refinement of P1 . Then
L(f, P1 ) L(f, P2 ) U (f, P2 ) U (f, P1 ).
Proof. We have already observed that for any partition, L(f, P) U (f, P).
So we only need to consider the other two inequalities.
It is enough to prove these inequalities in the case that P2 is obtained
from P1 by adding just one extra point. Because if we do this then P2 can
be obtained from P1 by successively adding a point at a time and in each
case the inequalities move in the right direction.
So let us set P1 = {x0 , ..., xn } so that for some j, P2 = {x0 , ...., xj1 , x
, xj , ..., xn }
where xj1 < x
< xj .

65
We start with the case of the upper sums. Note that most of the terms
in the sum for U (f, P1 ) and U (f, P2 ) will be equal, except that the one term
Mj (xj xj1 ) in the sum for U (f, P1 ) will be replaced by two terms in the
sum for U (f, P2 ). These two new terms are Mj0 (
x xj1 ) + Mj00 (xj x
),
where
Mj0 = sup{f (x) : xj1 x x
} and Mj00 = sup{f (x) : x
x xj }.
Since [xj1 , x
] [xj1 , xj ] and [
x, xj ] [xj1 , xj ], we have that Mj0
00
Mj and Mj Mj . Hence, the two terms appearing in U (f, P2 ), satisfy
Mj0 (
x xj1 ) + Mj00 (xj x
) Mj (
x xj1 ) + Mj (xj x
) = Mj (xj xj1 ),
which is the one term appearing in U (f, P1 ). Thus, U (f, P2 ) U (f, P1 ).
The proof for the lower sums is similar, except that since we are dealing
with infima, we will have that
m0j = inf{f (x) : xj1 x x
} mj and m00j = inf{f (x) : x
x xj } mj .

Definition 5.2. Let f : [a, b] R be a bounded function. Then the upper


Riemann integral of f is the number
Z

f (x)dx = inf{U (f, P) : P a partition of [a, b]}.


a

The lower Riemann integral of f is the number


Z

f (x)dx = sup{L(f, P) : P a partition of [a, b]}.


a

We say that f is Riemann integrable when these two numbers are equal
and in this case we define the Riemann integral of f to be
Z

f (x)dx =
a

f (x)dx.

f (x)dx =
a

To help cement these definitions, let us consider the function, f : [a, b]


R defined by
(
1 x rational
f (x) =
0 x irrational,

66CHAPTER 5. RIEMANN AND RIEMANN-STIELTJES INTEGRATION


then for any partition P we will have that Mi = 1 and mi = 0 for every i.
Hence, U (f, P) = (b a) and L(f, P) = 0. Thus,
Z

f (x)dx = b a and
a

f (x)dx = 0.
a

In particular, f is not Riemann integrable.


The following helps to explain the terms upper and lower.
Proposition 5.3. Let f : [a, b] R be a bounded function and let P1 and
Rb
P2 be any two partitions of [a, b]. Then L(f, P1 ) U (f, P2 ) and f (x)dx
a
Rb
f
(x)dx.
a
Proof. Let P3 be the partition of [a, b] that as a set is P3 = P1 P2 . Then
P3 is a refinement of P1 and of P2 . So by the above result,
L(f, P1 ) L(f, P3 ) U (f, P3 ) U (f, P2 ).
Now we have that
Z b
f (x)dx = sup{L(f, P1 ) : P1 is a partition of [a, b]} U (f, P2 ),
a

and so
Z

f (x)dx inf{U (f, P2 ) : P2 is a partition of [a, b]} =

f (x)dx.
a

The following gives an important means of determining if a function is


Riemann integrable.
Theorem 5.4. Let f : [a, b] R be a bounded function. Then f is Riemann
integrable if and only if for every  > 0 there exists a partition P such that
U (f, P) L(f, P) < .
Proof. Assuming that the latter condition is met, for any  > 0 and P as
above, we have
Z

L(f, P)

f (x)dx
a

f (x)dx U (f, P),


a

67
which implies that
Z

f (x)dx

0
a

f (x)dx U (f, P) L(f, P) < .


a

Since  > 0 was arbitrary, the lower and upper Riemann integrals must be
equal.
Conversely, assume that f is Riemann integrable and that  > 0 is given.
Since the upper integral is an infimum, we may choose a partition P1 so that
Z

f (x)dx U (f, P1 ) <


a

f (x)dx + /2.
a

Similarly, we may choose a partition P2 so that


Z b
Z b
f (x)dx /2 < L(f, P2 )
f (x)dx.
a

Let P3 = P1 P2 , then
U (f, P3 ) L(f, P3 ) U (f, P1 ) L(f, P2 ) <
Z b
Z b
( f (x)dx + /2) ( f (x)dx /2) = 
a

since the upper and lower integrals are equal. Thus, we have found a partition satisfying our criteria.
Definition 5.5. We say that a bounded function f : [a, b] R satisfies
the Riemann integrability criterion provided that for every  > 0, there
exists a partition P, such that U (f, P) L(f, P) < .
Thus, the above theorem says that a bounded function is Riemann integrable if and only if it satisfies the Riemann integrability criterion.
Theorem 5.6. Let f : [a, b] R be a continuous function. Then f is
Riemann integrable.
Proof. Since f is continuous and [a, b] is compact, we have that f is bounded
and f is uniformly continuous. We will prove that f meets the Riemann
integrability criterion.
Given any  > 0, since f is uniformly continuous, there is a > 0, so
that whenever |x y| < then |f (x) f (y)| < /(b a). Now let P = {a =

68CHAPTER 5. RIEMANN AND RIEMANN-STIELTJES INTEGRATION


x0 , ..., xn = b} be any partition of [a, b] with kPk < . Then for each i, we
have that xi1 z, y xi implies that |f (z) f (y)| < /(b a). Since f
is continuous, we also have that there are points zi and yi in [xi1 , xi ] with
Mi = f (zi ) and mi = f (yi ). Hence, 0 Mi mi < /(b a). This implies
that
0 U (f, P)L(f, P) =

n
X

(Mi mi )(xi xi1 ) <

i=1

n
X
[/(ba)](xi xi1 ) = .
i=1

Thus, the Riemann integrability criterion is met by f.


For the following problem, we consider the interval [0, 1] and let Pn =
{0, 1/n, ..., (n1)/n, 1} denote the partition on [0, 1] into n subintervals each
of length 1/n. Thus, kPn k = 1/n. We will also use the summation formula,
n
X
j=1

j=

n(n + 1)
.
2

Problem 5.7. Let f (x) = x. Compute U (f, Pn ) and L(f, Pn ). Use these
formulas and the Riemann integrability
R 1criterion to prove that f is Riemann
integrable on [0, 1] and to prove that 0 xdx = 1/2.
Problem 5.8. Let a c < d b and let f : [a, b] R be the function
defined by

0 a x c
f (x) = 1 c < x < d .

0 dxb
Rb
Prove that f is Riemann integrable on [a, b] and that a f (x)dx = (d c).

5.1

The Riemann-Stieltjes Integral

The Riemann-Stieltjes integral is a slight generalization of the Riemann


integral. The new ingredient in Riemann-Stieltjes integration is a function,
: [a, b] R
that is increasing, i.e., x y implies that (x) (y). It is best to think of
as a function that measures a new length of subintervals by setting the
length of a subinterval [xi1 , xi ] equal to (xi ) (xi1 ). One case where
this concept arises is if we imagine that we have a piece of wire of varying

5.1. THE RIEMANN-STIELTJES INTEGRAL

69

density stretched from a to b and (xi ) (xi1 ) represents the weight of


the section of wire from xi1 to xi .
Given a bounded function f : [a, b] R the Riemann-Stieltjes integral
is denoted
Z
b

f d,
a

and it is designed to also define a signed area under the graph of f but
now if we want the area of a rectangle to be the length of the base times the
height, then a rectangle from xi1 to xi of height h should have area
h((xi ) (xi1 )).
Thus, given a bounded function f an increasing function , a partition
P = {a = x0 , ...., xn = b}, the numbers Mi = sup{f (x) : xi1 x xi } and
mi = inf{f (x) : xi1 x xi }, we are led to define the upper RiemannStieltjes sum as
U (f, P, ) =

n
X

Mi ((xi ) (xi1 ))

i=1

and the lower Riemann-Stieltjes sum as


L(f, P, ) =

n
X

mi ((xi ) (xi1 )).

i=1

The upper Riemann-Stieltjes integral of f with respect to is


then defined to be
Z

f d = inf{U (f, P, ) : P is a partition of [a, b]}.


a

Similarly, the lower Riemann-Stieltjes integral of f with respect to


is defined to be
Z b
f d = sup{L(f, P, ) : P is a partition of [a, b]}.
a

When
Z

f d =
a

f d,
a

70CHAPTER 5. RIEMANN AND RIEMANN-STIELTJES INTEGRATION


then we say that f is Riemann-Stieltjes integrable with respect to
and we let
Z b
f d
a

denote this common value.


We repeat the key facts about Riemann-Stieltjes integration below. Since
the proofs are almost identical to the corresponding proofs in the case of
Riemann integration, we omit the details.
Proposition 5.9. Let f : [a, b] R be a bounded function, let : [a, b] R
be an increasing function and let P1 and P2 be partitions of [a, b] with P2 a
refinement of P1 . Then
L(f, P1 , ) L(f, P2 , ) U (f, P2 , ) U (f, P1 , ).
Proposition 5.10. Let f : [a, b] R be a bounded function, let : [a, b]
R be increasing and let P1 and P2 be any two partitions of [a, b]. Then
Rb
Rb
L(f, P1 , ) U (f, P2 , ) and f (x)d a f (x)d.
a

Theorem 5.11. Let f : [a, b] R be a bounded function and let : [a, b]


R be increasing. Then f is Riemann-Stieltjes integrable with respect to if
and only if for every  > 0 there exists a partition P such that U (f, P, )
L(f, P, ) < .
Definition 5.12. Given an increasing function : [a, b] R, we say that a
bounded function f : [a, b] R satisfies the Riemann-Stieltjes integrability criterion with respect to provided that for every  > 0, there
exists a partition P, such that U (f, P, ) L(f, P, ) < .
Thus, the above theorem says that a bounded function is RiemannStieltjes integrable with respect to if and only if it satisfies the RiemannStieltjes integrability criterion with respect to .
Theorem 5.13. Let f : [a, b] R be a continuous function and let :
[a, b] R be an increasing function. Then f is Riemann-Stieltjes integrable
with respect to .
One of the main motivations for Riemann-Stieltjes integration comes
from the concept of a cumulative distribution function of a random variable.
To understand the Riemann-Stieltjes integral, one need not understand any
probability theory, but we introduce these ideas from probability here, via an
example, in order to motivate the desire for the Riemann-Stieltjes integral.

5.1. THE RIEMANN-STIELTJES INTEGRAL

71

For a probability space, suppose that we consider the flip of a biased


coin, so tht the probability of heads(H) is p and the probability of tails(T)
is 1 p with 0 < p < 1. When p = 1/2, we think of the coin as a fair coin.
A random variable would then be a function that assigned a real number
to each of these possible outcomes. For example, we could define a random
variable X by setting X(H) = 1, X(T ) = 3. If we let P rob(X = x) denote
the probability that X is equal to the real number x, then we would have
P rob(X = 1) = p and P rob(X = 3) = 1 p. The cumulative distribution
function of X, is the function P rob(X x). So in our case,

0 x < 1
P rob(X x) = p 1 x < 3 .

1 3x
So our cumulative distribution function has two jump discontinuities, the
first of size p ocurring at 1 and the second of size 1 p occurring at 3.
More generally, if one has a probability space, a random variable X,
and we let (x) = P rob(X x), then will be an increasing function,
i.e., one for which we can consider Riemann-Stieltjes integrals. Much work
in probability and its applications, e.g., financial math, is concerned with
computing these Riemann-Stieltjes integrals for various functions f in the
case that is the cumulative distribution function of a random variable.
Returning to our example, if for any c we let
(
0 x<c
Jc (x) =
,
1 cx
then we can also write P rob(X x) = pJ1 (x) + (1 p)J3 (x).
These simple jump functions are useful for expressing the cumulative
distributions of many random variables. For example, if we consider one roll
of a fair die, so that each side has probability 1/6 of facing up and we let
X be the random variable that simply gives the number that is facing up,
then
P rob(X x) = 1/6[J1 (x) + J2 (x) + J3 (x) + J4 (x) + J5 (x) + J6 (x)].
For this reason we want to take a careful look at what Riemann-Stieltjes
integration is when (x) = hJc (x), for some real number c and h > 0.
Definition 5.14. Let f : [a, b] R and let a < c b. If there is a number
L so that for every  > 0, there exists > 0, so that when c < x < c,

72CHAPTER 5. RIEMANN AND RIEMANN-STIELTJES INTEGRATION


we have |f (x) L| < , then we say that L is the limit from the left of
f at c. We write limxc f (x) = L and say that f is continuous from
the left at c provided that limxc f (x) = f (c). When a c < b, then
the limit from the right of f at c is defined similarly and is denoted
limxc+ f (x). We say that f is continuous from the right at c provided
that limxc+ f (x) = f (c).
Theorem 5.15. Let a < c b, let h > 0, let (x) = hJc (x), and let
f : [a, b] R be a bounded function. Then f is Riemann-Stieltjes integrable
with respect to if and only if f is continuous from the left at c. In this
case,
Z b
f d = hf (c).
a

Proof. First we assume that f is continuous from the left at c and prove
that f satisfies the Riemann-Stieltjes integrability criterion. So let  > 0,
be given. Since limxc f (x) = f (c), there is > 0, so that c < x < c

implies that |f (x) f (c)| < 3h
.
Now consider the partition, P = {a = x0 , x1 = c /2, x3 = c, x4 = b}.
Then we have that
U (f, P, ) L(f, P, ) =

4
X

(Mi mi )((xi ) (xi1 ) = (M2 m2 )h,

i=1

since (xi )(xi1 ) = 0 when i 6= 2. Now for x1 = c/2 x c = x2 , we





have that |f (x) f (c)| < 2h
, so that f (c) 3h
f (x) f (c) + 3h
. Hence,



f (c) 3h
m2 M2 f (c) + 3h
, which implies that 0 M2 m2 2 3h
.
2
Thus, U (f, P, ) L(f, P, ) (M2 m2 )h 3 < , and we have shown
that the Riemann-Stieltjes integrability criterion is met.
Also, note that these inequalities show that
Z b


(f (c)
)h mi h = L(f, P, )
f d U (f, P, ) (f (c) +
)h
2h
2h
a
Rb
Rb
which shows that |f (c)h a f d| 2 < . So it follows that a f d = f (c)h.
Now we must prove that if f satisfies the Riemann-Stieltjes integrability
criterion, then f is continuous from the left at c, i.e., limxc f (x) = f (c).
To this end fix  > 0 and pick a partition P so that U (f, P, )
L(f, P, ) < h. Let P2 be the partition that we obtain from P by including
the point c. Since P2 is a refinement of P, we will have that
L(f, P, ) L(f, P2 , ) U (f, P2 , ) U (f, P, ),

5.1. THE RIEMANN-STIELTJES INTEGRAL

73

and hence, U (f, P2 , ) L(f, P2 , ) U (f, P, ) L(f, P, ) < h. Now


our partition has the form P2 = {a = x0 < ... < xj1 < xj = c < ... <
xn = b}, for some j. Since (xi ) (xi1 ) = 0 for all i 6= j, we have that
(Mj mj )h = U (f, P2 , ) L(f, P2 , ) < h and Mj mj < . Thus, for
xj1 x c, we have that mj f (c) Mj and mj f (x) Mj , so that
|f (x) f (c)| Mj mj < .
Thus, if we let = c xj1 , then for xj1 = c < x < c, we have that
|f (x) f (c)| < . Since  > 0 was arbitrary, limxc f (x) = f (c).
In probability, two events are called independent if the probability of
both occurring is the product of the probabilities of each occurring. For example, suppose that we flip a biased coin with P rob(H) = p, P rob(T ) = 1p
twice, so that the possible outcomes are {HH, HT, T H, T T }. If we assume
that the flips are independent then the outcomes would have probabilities,
P rob(HH) = P rob(H)P rob(H) = p2 ,
P rob(HT ) = P rob(T H) = P rob(H)P rob(T ) = p(1 p),
P rob(T T ) = P rob(T )P rob(T ) = (1 p)2 .
Problem 5.16. Suppose that we flip our biased coin twice, assume that
the flips are independent and let our random variable be the sum of our
earlier random variable, so that X(HH) = X(H) + X(H) = 2, X(HT ) =
X(T H) = 1 + 3, X(T T ) = 3 + 3. Find the cumulative distribution function,
(t) = P rob(X t).
Problem 5.17. Suppose that we flip a fair coin and roll a fair die and
asume that these events are independent, so that each possible outcome has
probbility (1/2)(1/6) = 1/12. Let X be the random variable that adds +1 to
the number on the die when a H is flipped and +3 to the number on the die
when a T is flipped. Find the cumulative distribution function of X.
Problem 5.18. Let : [a, b] R be an increasing function and let f :
[a, b] R be a bounded function. If m = inf{f (x) : a x b} and
Rb
Rb
M = sup{f (x) : a x b}, then m((b) (a))
f d a f d
a
M ((b) (a)).
Problem 5.19. Write out the proof of Theorem 5.13. You may use the
earlier stated results in your proof without proving them.

74CHAPTER 5. RIEMANN AND RIEMANN-STIELTJES INTEGRATION

5.2

Properties of the Riemann-Stieltjes Integral

Before trying to compute many integrals, it will be helpful to have proven


some basic properties of these integrals.
Theorem 5.20. Let : [a, b] R be an increasing function, let f, g :
[a, b] R be bounded functions that are Riemann-Stieltjes integrable with
respect to and let c R be a constant. Then:
1. cf is Riemann-Stieltjes integrable with respect to and
Z

Z
cf d = c

f d,
a

2. f + g is Riemann-Stieltjes integrable with respect to and


Z

Z
(f + g)d =

f d +
a

gd,
a

3. f g is Riemann-Stieltjes integrable with respect to .


Proof. To prove 1), first consider the case that c 0. Then we have that
for any partition, L(cf, P, ) = cL(f, P, ) and U (cf, P, ) = cU (f, P, ).
Thus,
Z

cf d = sup{L(cf, P, )} =
a

Z b
Z b
sup{cL(f, P, )} = c f d = c
f d =
a

Z b
c f d = inf{cU (f, P, } =
a

inf{U (cf, P, )} =

f d,
a

which shows that the upper and lower integrals are equal for cf, so that cf
is Riemann-Stieltjes
R b integrable with respect to and shows that they are
both equal to c a f d. Thus, 1) is proven in the case that c 0.
When c < 0, recall that for any bounded set S of real numbers, sup{cs :
s S} = c inf{s : s S} and inf{cs : s S} = c sup{s : s S}. Thus,
for any interval, sup{cf (x) : xi1 x xi } = c inf{f (x) : xi1 x xi }

5.2. PROPERTIES OF THE RIEMANN-STIELTJES INTEGRAL

75

and inf{cf (x) : xi1 x xi } = c sup{f (x) : xi1 x xi }. From this it


follows that L(cf, P, ) = cU (f, P, ) and U (cf, P, ) = cL(f, P, ).
Thus,
Z

cf d = sup{L(cf, P, )} =
a

sup{cU (f, P, )} = c inf{U (f, P, )} =


Z b
Z b
Z b
f d = c f d =
c f d = c
a

c sup{L(f, P, )} = inf{cL(f, P, )} =
Z

inf{U (cf, P, )} =

f d,
a

and again we have that cf is Riemann-Stieltjes integrable with respect to


and the equality of the integrals.
Now we prove 2). Note that for any interval, we have that
sup{f (x)+g(x) : xi1 x xi } sup{f (x) : xi1 x xi }+sup{g(x) : xi1 x xi }
from which it follows that for any partition, U (f + g, P, ) U (f, P, ) +
U (g, P, ). Similarly,
inf{f (x)+g(x) : xi1 x xi } inf{f (x) : xi1 x xi }+inf{g(x) : xi1 x xi }
and it follows that L(f + g, P, ) L(f, P, ) + L(g, P, ).
Rb
Given any  > 0, choose partitions P1 , P2 so that a f d U (f, P1 , ) <
Rb
Rb
Rb
a f d + /2 and a gd U (g, P2 , ) < a gd + /2. If we let P3 = P1 P2
denote their common refinement, then
Z

(f + g)d U (f + g, P3 , ) U (f, P3 , ) + U (g, P3 , )


a

U (f, P1 , ) + U (g, P2 , ) <


Z

f d +
a

since  > 0 was arbitrary, we have that


Z

(f + g)d
a

f d +
a

gd.
a

gd + .
a

76CHAPTER 5. RIEMANN AND RIEMANN-STIELTJES INTEGRATION


A similar calculation shows that

(f + g)d
a

f d +
a

gd.
a

Rb
Rb
Since the upper bound and lower bound are both equal to a f d + a gd
we have that these inequalities must be equalities and 2) follows.
Before proving 3) we first make a number of observations to reduce the
work. First suppose that we only prove that if a function h is RiemannStieltjes integrable with respect to , then h2 is Riemann-Stieltjes integrable.
Then, since f and g are both integrable, by part 2), f + g is integrable and
hence we would have (f + g)2 = f 2 + 2f g + g 2 , f 2 and g 2 are all integrable.
Applying 1) and 2) again, we would have that (f + g)2 f 2 g 2 = 2f g
is integrable and hence by 1), f g is integrable. Thus, it will be enough to
prove that h integrable implies that h2 is integrable.
Next, we claim that to show that squares of integrable functions are
integrable, it will be enough to show that squares of non-negative integrable
are integrable. To see this, given any h that is integrable, we know that
h is bounded, so there is some constant c > 0 so that h + c 0. Since
constants are continuous functions and continuous functions are integrable,
h + c is integrable. Thus, if we know that squares of non-negative integrable
functions are integrable, then we will have that (h + c)2 = h2 + 2ch + c2 is
integrable. But 2ch+c2 is integrable by 1) and 2), so h2 = (h+c)2 2chc2
is integrable.
Thus, it remains to show that if h 0 is Riemann-Stieltjes integrable
with respect to , then h2 is also Riemann-Stieltjes integrable with respect
to . To this end we show that h2 satisfies the Riemann-Stieltjes integrability
criterion. So let  > 0, be given and let K = sup{h(x) : a x b}. If h = 0,
then the result is clearly true, so we may assume that h 6= 0 and so K > 0.
Since h is Riemann-Stieltjes integrable, it satisfies the criterion and hence
there exists a partition P, so that U (h, P, ) L(h, P, ) < /2K. Let
P = {a = x0 , ..., xn = b}, let Mi = sup{h(x) : xi1 x xi } and let
mi = inf{h(x) : xi1 x xi }. Since h(x) 0, we have that sup{h(x)2 :
xi1 x xi } = Mi2 and inf{h(x)2 : xi1 x xi } = m2i . Thus, we have

5.2. PROPERTIES OF THE RIEMANN-STIELTJES INTEGRAL

77

that
U (h2 , P, ) L(h2 , P, ) =

n
X
(Mi2 m2i )((xi ) (xi1 )) =
i=1

n
X

(Mi mi )(Mi +mi )((xi )(xi1 )) 2K

i=1

(Mi mi )((xi )(xi1 )) =

i=1

2K(U (h, P, ) L(h, P, )) < 


and we have shown that h2 satisfies the Riemann-Stieltjes integrability criterion with respect to .
Proposition 5.21. Let 1 , 2 : [a, b] R be increasing functions, let c > 0
be a constant and let f : [a, b] R be a bounded function. Then
Z b
Z b
Z b
f d(c1 + 2 ) = c f d1 +
f d2
a

and
Z

Z b
Z b
f d(c1 + 2 ) = c f d1 +
f d2 .

Proof. First note that for any partition P = {a = x0 , ..., xn = b}, we have
that
U (f, P, c1 + 2 ) =

n
X

Mi [(c1 (xi ) + 2 (xi )) (c1 (xi1 ) + 2 (xi1 ))] =

i=1

cU (f, P, 1 ) + U (f, P, 2 ).
Thus,
Z

f d(c1 + 2 ) = inf{cU (f, P, 1 ) + U (f, P, 2 )}


a

inf{cU (f, P, 1 )} + inf{U (f, P, 2 )} =


Z b
Z b
f d2 .
c f d1 +
a

On the other hand, given any  > 0, there is a partition P1 so that


Z

U (f, P1 , 1 ) <

f d1 + /2c
a

78CHAPTER 5. RIEMANN AND RIEMANN-STIELTJES INTEGRATION


and a partition P2 so that
Z

U (f, P2 , 2 ) <

f d2 + /2.
a

Let P3 = P1 P2 , then
Z

f d(c1 + 2 ) U (f, P3 , c1 + 2 ) =
a

cU (f, P3 , 1 ) + U (f, P3 , 2 )
cU (f, P1 , 1 ) + U (f, P2 , 2 ) <
Z b
Z b
c[ f d1 + /2c] + [ f d2 + /2].
a

Since  > 0 was arbitrary, we have that


Z b
Z b
Z b
f d(c1 + 2 ) c f d1 +
f d2
a

and these two inequalties for the upper Riemann-Stieltjes, prove equality
of these integrals. The proof for the lower Riemann-Stieltjes integrals is
similar.
Proposition 5.22. Let 1 , 2 : [a, b] R be increasing, let c > 0 and let
f : [a, b] R be a bounded function. If f is Riemann-Stieltjes integrable
with respect to 1 and 2 , then f is Riemann-Stieltjes integrable with respect
to c1 + 2 and
Z b
Z b
Z b
f d(c1 + 2 ) = c
f d1 +
f d2 .
a

Proof. The previous result shows that the upper and lower Riemann-Stieltjes
of f with respect to c1 + 2 are equal and are given by the above formula.
Theorem 5.23. Let a < c1 < .... < cn b, let hi > 0, i = 1, ..., n and
let = h1 Jc1 + hn Jcn . If f : [a, b] R is a bounded function that is
continuous from the left at each ci , i = 1, ..., n, then f is Riemann-Stieltjes
integrable with respect to and
Z b
f d = h1 f (c1 ) + hn f (cn ).
a

5.3. THE FUNDAMENTAL THEOREM OF CALCULUS

79

Proof. Apply Theorem 5.15 together with the above result.


Problem 5.24. Let a < 1 and 3 < b and let be the cumulative distribution
function for the
R b random variable on
R b page 69 for one flip of a biased coin.
Compute = a td(t), and 2 = a (t )2 d(t).
This two values are known as the mean and variance of the random variable. The square root of the variance, , is called the standard
deviation.
Problem 5.25. Let a < 2 and 6 < b. Compute the mean and variance of
the random variable of Problem 5.16.
Problem 5.26. Let a < 2 and 9 < b. Compute the mean and variance of
the random variable of Problem 5.17.

0
x<0

1
0x<1
Problem 5.27. Let : [1, 3] R be defined by (x) =
.

1+x 1x<2

7
2x3
R3
Compute 1 td(t).
Problem 5.28. Let : [a, b] R be an increasing function, let a < c < b
and let f : [a, b] R be a bounded function. Prove:

Rb

Rc

Rb

Rb

f d =

Rc

f d +

Rb

a f d

a f d

c f d,

f d

Problem 5.29. Let : [a, b] R, let a < c < b and let f : [a, b] R be
Riemann-Stieltjes integrable. Prove that f is Riemann-Stieltjes integrable
on [a, c] and on [c, b] and that
Z

Z
f d =

5.3

Z
f d +

f d.
c

The Fundamental Theorem of Calculus

Before proceeding it will be useful to have a few more facts about the
Riemann-Stieltjes integral.

80CHAPTER 5. RIEMANN AND RIEMANN-STIELTJES INTEGRATION


Theorem 5.30 (Mean Value theorem for Integrals). Let : [a, b] R be
an increasing function and let f : [a, b] R be a continuous function. Then
there exists c, a c b such that
Z

f d = f (c)((b) (c)).
a

Proof. When (b) (a) = 0, then both sides of the equation are 0 and any
point c will suffice. So assume that (b) (a) 6= 0.
By Problem 5.18,
Rb
a f d
m
M,
(b) (a)
where m = f (x0 ) and M = f (x1 ) are the minimum and maximum of f on
[a, b]. Thus, by the intermediate value theorem, there is a point c between
x0 and x1 such that f (c) is equal to this fraction.
Proposition 5.31. Let a c < d b, let : [a, b] R be an increasing
function and let f : [a, b] R be a bounded function that is RiemannStieltjes integrable with respect to on [a, b], then f is Riemann-Stieltjes
integrable with respect to on [c, d].
Proof. Given  > 0 there is a partition P of [a, b] with U (f, P, )L(f, P, ) <
. Let P1 = P {c, d}, then P1 is a refinement of P and so U (f, P1 , )
L(f, P1 , ) U (f, P, ) L(f, P, ) < . Now if we only consider the points
in P1 that lie in [c, d] then that will be a partition of [c, d] and the terms in the
above sum corresponding to those subintervals will be even smaller. Thus,
we obtain a partition of [c, d] that satisfies the condition in the RiemannStieltjes integrability criterion.
In the case that (x) = x the following theorem reduces to the classic
fundamental theorem of calculus.
Theorem 5.32 (Fundamental Theorem of Calculus). Let : [a, b] R be
an increasing function that
R x is differentiable on (a, b) and let f : [a, b] R
be continuous. If F (x) = a f (t)d(t), then F is differentiable on (a, b) and
F 0 (x) = f (x)0 (x).
Proof. Let a < x < x + h < b, then by the mean value theorem, there is
c, x c x + h so that
F (x + h) F (x)
1
=
h
h

x+h

f (t)d(t) =
x

f (c)((x + h) (x)
.
h

5.3. THE FUNDAMENTAL THEOREM OF CALCULUS

81

By choosing h sufficiently small, |f (c) f (x)| and | (x+h)(x)


0 (x)| can
h
be made arbitrarily small. The case of h < 0 is similar.
Corollary 5.33. Let : [a, b] R be an increasing, continuous function
that is differentiable on (a, b) and let f : [a, b] R be continuous. If F :
[a, b] R is a continuous function that is differentiable on (a, b) and F 0 (x) =
Rb
f (x)0 (x), then a f d = F (b) F (a).
Rx
Proof. Let G(x) = a f d + F (a), then for a < x < b, we have that G0 (x) =
F 0 (x) and G(a) = F (a). Hence, G(x) = F (x) for a < x < b. Since |G(b)
Rb
G(x)| = | x f d| K((b) (x)), where K = sup{|f (x)| : a x
b}, we see that G(b) = limxb G(x) = limxb F (x) = F (b), since F is
Rb
continuous. Therefore, F (b) = G(b) = a f d + F (a) and the formula
follows.
(
x 0x<2
and
Problem 5.34. Let : [0, 2] R be defined by (x) =
5 x=2
R2
let f (x) = xm . Calculate 0 f d. Show that if we let F be continuous on
R2
[0, 2] with F 0 (x) = f (x)0 (x) for 0 < x < 2, then F (2) F (0) 6= 0 f d.

You might also like