Ma131B Analysis Ii Revision Guide: Written by David Mccormick

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

WMS

MA131B
Analysis II

Revision Guide

Written by David McCormick

WMS
ii MA131B Analysis II

Contents
1 Review of MA131A Analysis I 1

2 Continuity 2
2.1 Defining Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Continuity and Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Continuous functions on closed bounded intervals . . . . . . . . . . . . . . . . . . . . . . . 6

3 Differentiation 7
3.1 Basic Properties of Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Derivatives and Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Power Series 11
4.1 Differentiability and Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Upper and Lower Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Special functions of analysis 13


5.1 The Exponential,Logarithm Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 sine and cosine functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6 L’Hôpital’s Rule 15

Introduction
This revision guide for MA131B Analysis II has been designed as an aid to revision, not a substitute for
it. As can be seen from the length of this revision guide, Analysis II is a fairly long course with lots of
confusing definitions and big theorems; hopefully this guide will help you make sense of how everything
fits together. Only some proofs are included; the inclusion of a particular proof is no indication of how
likely it is to appear on the exam. However, the exam format doesn’t change much from year to year, so
the best way of revising is to do past exam papers, as well as assignment questions, using this guide as
a reference.
For further practice, the questions in R. P. Burn’s Numbers and Functions are invaluable, and indeed
one of the principal sources of this revision guide. For the more brave of heart, Rudin’s Principles of
Mathematical Analysis is a classic, unparalleled textbook, but is pitched somewhat above the level of the
course, with some of the most challenging analysis questions known to man.

Disclaimer: Use at your own risk. No guarantee is made that this revision guide is accurate or
complete, or that it will improve your exam performance. Use of this guide will increase entropy,
contributing to the heat death of the universe. Contains no GM ingredients. Your mileage may vary.
All your base are belong to us.

Authors
Written by D. S. McCormick ([email protected])
Based upon lectures given by David Elworthy at the University of Warwick, 2005 and 2006.
Minor corrections Stephen Loke & Guy Barwell, 2012.
Any corrections or improvements should be entered into our feedback form at http://tinyurl.com/WMSGuides
(alternatively email [email protected]).

History
First Edition: 2007.
Current Edition: February 21, 2016.
MA131B Analysis II 1

1 Review of MA131A Analysis I


Analysis is a linear subject, in the sense that most results depend critically on previous results. As
such, a sound knowledge of MA131A Analysis I is vital for success in the Analysis II exam, and one
question on the Analysis II exam will be exclusively on material from Analysis I. We state some of the
key definitions and theorems for reference:
Definitions 1.1. A (real) sequence is a list of (real) numbers in a definite order (or a function N → R),
written (an )∞
n=1 = (a1 , a2 , . . . ).
• A sequence (an ) is increasing if an ≤ an+1 for all n; it is strictly increasing if an < an+1 for all n;
similarly for decreasing. A sequence is monotonic if it is either increasing or decreasing.
• A sequence (an ) is bounded above if ∃ U s.t. an ≤ U for all n; similarly for bounded below. A
sequence is bounded if it is bounded above and below.
• A sequence (an ) converges to a if ∀ε > 0, ∃ N ∈ N s.t. n ≥ N =⇒ |an − a| < ε; write (an ) → a.
• A sequence (an ) is Cauchy if ∀ε > 0, ∃ N ∈ N s.t. n, m ≥ N =⇒ |an − am | < ε.
• A sequence (an ) tends to infinity if ∀ C > 0, ∃ N ∈ N s.t. n ≥ N =⇒ an > C; write (an ) → ∞.
A subsequence of (an ) is a sequence of the form (ani ), where (ni ) is a strictly increasing sequence of
natural numbers.
Proposition 1.2. Let a, b ∈ R. Suppose(an) → a and (bn ) → b. Then for any c, d ∈ R, (can + dbn ) →
ca + db, (an bn ) → ab, and if1 b 6= 0 then abnn → ab .

Theorem 1.3 (Closed Interval Rule). Suppose (an ) → a. If (eventually2 ) A ≤ an ≤ B, then A ≤ a ≤ B.


Completeness Axiom. Every non-empty subset A ⊂ R that is bounded above (resp. below) has a
least upper bound, sup A (resp. greatest lower bound, inf A). Equivalently:
• Every bounded monotonic sequence is convergent.
• Every bounded sequence has a convergent subsequence (this is the Bolzano–Weierstrass theorem).
• Every Cauchy sequence is convergent.
P∞
Definitions
P∞ 1.4. A series is an expression of Pthe form n=1 an = a1 + a2 + · · · . Consider the series
n
n=1Pan with partial sums (sn ), where sn = i=1 ai . We say: P

• P an converges if (sn ) converges. If sn → S then we write n=1 an = S.
• P an diverges if (sn ) does not converge.
• P an diverges to infinity if (sn ) tends to infinity.
• anPdiverges to minus infinity if (snP) tends to minus infinity.
The series an is absolutely convergent if |an | is convergent.
Lemma 1.5. Every absolutely convergent series is convergent.
P∞ P∞
Proposition
P∞ 1.6. Suppose n=1 an and P n=1 bn are convergent
∞ P∞ for all c, d ∈ R,
P∞ series. Then,
n=1 (can + db n ) is a convergent series and n=1 (can + dbn ) = c a
n=1 n + d n=1 bn .
P∞
Theorem 1.7 (Null Sequence Test). If (an ) does not tend to zero, then n=1 an diverges.
P
Theorem
P 1.8 (Comparison P∞ 0 ≤ an ≤ bn for all natural numbers n. If
P∞Test). Suppose bn converges
then an converges and n=1 an ≤ n=1 bn .
P∞ 1
Lemma 1.9 (Geometric Series). The series n=0 xn is convergent if |x| < 1 and the sum is 1−x . It is
divergent if |x| ≥ 1.

Theorem 1.10 (Ratio Test). Suppose an 6= 0. If aan+1
P
→ k, then an converges absolutely if

n

an+1 P
0 ≤ k < 1 and diverges if k > 1. If an → ∞, then an diverges.

In analysis II we have a better way (in most cases) of proving that the series converges, namely
Cauchy’s nth Root Test. This also provides radius of convergence (covered in the ”Power Series” section).
1b 6= 0 =⇒ bn 6= 0 eventually; for the finitely many n where bn = 0 we can ignore the terms an /bn .
2 Something holds eventually if there exists an N ∈ N such that it holds for all n ≥ N .
2 MA131B Analysis II

2 Continuity
Fundamental to all of mathematics is the notion of a function:
Definition 2.1. Given two sets A and B, a function f : A → B is a pairing of each element of A with
an element of B. The element of B which is paired with a ∈ A is denoted by f (a).
Here the set A is called the domain of the function, while the set B is called the co-domain of the
function. The range of the function is the set of points in B which is mapped to by some point in A,
i.e. {f (a) | a ∈ A} ⊂ B. Throughout this module we only consider functions f : A → R, where A ⊂ R;
that is, real-valued functions of one real variable.
The general definition of function simply requires pairing each value x in the domain with a value
f (x) in the co-domain, even when something like a fresh formula is needed for each x, and so does not
require the value of f (x) to be anywhere close to the value of f (y), even when x and y are arbitrarily
close to one another. We are thus moved to define various classes of functions which are well-behaved;
most important are continuous functions, which makes rigorous the idea that if x and y are close to each
other, then f (x) and f (y) should also be close to each other.

2.1 Defining Continuity


In Analysis I, we used the idea of limits of sequences to get “close” to a number using a sequence tending
to that number. So, our first definition of continuity (which we will call sequential continuity for the
time being) is based on these ideas:
Definition 2.2. Given a function f : A → R, where A ⊂ R, and given a ∈ A, then f is (neighbourhood)
continuous at x = a if, for each ε > 0, we can find a δ > 0 such that

x∈A and |x − a| < δ =⇒ |f (x) − f (a)| < ε

Definition 2.3. Given a function f : A → R, where A ⊂ R, and given a ∈ A, then f is sequentially


continuous
 at x = a if, for every sequence (an ) converging to a, where an ∈ A for all n, the sequence
f (an ) converges to f (a).
This is commonly expressed as f (x) → f (a) as x → a. But another way to describe nearness is to
consider neighbourhoods of a point a, i.e. a set {x | a − δ < x < a + δ} for some δ > 0.
For a given function f , our choice of δ depends on the point a and the challenge ε; we sometimes
write δ = δa (ε) to emphasise this.

Having two different definitions of continuity would be disastrous if they led to different consequences.
Thankfully however, we show now that the two definitions are in fact equivalent.
Theorem 2.4. Given a function f : A → R, where A ⊂ R, and given a ∈ A, then f is (neighbourhood)
continuous at x = a if and only if f is sequentially continuous at x = a.
Proof. (=⇒) Suppose f is neighbourhood continuous at x = a. Let (an ) → a be a sequence in A. As f
is neighbourhood continuous at a, we have that ∀ε > 0, ∃ δ > 0 such that x ∈ A and |x − a| < δ =⇒
|f (x) − f (a)| < ε. Now an ∈ A for all n, so |an − a| < δ =⇒ |f (an ) − f (a)| < ε. But since (an ) → a,
we have that ∀ δ > 0, ∃ N such that for n ≥ N we have |an − a| < δ, and hence that |f (an ) − f (a)| < ε.
So f (an ) → f (a).
(⇐=) We use the contrapositive; suppose f is not neighbourhood continuous at x = a. Then ∃ ε > 0
such that given any δ > 0, there is at least one x for which |x − a| < δ, but |f (x) − f (a)| ≥ ε. For each
n ∈ N, let δ = n1 and choose an x satisfying this condition and label it an . Now for each n we have
|an − a| < n1 , so (an ) → a. But for each n, |f (an ) − f (a)| ≥ ε for some ε > 0. Hence f (an ) does not
tend to f (a), and f is not sequentially continuous at x = a.
Definition 2.5. If a function f : A → R is continuous at every point a ∈ A, then we say f is continuous
on A, or simply a continuous function.
MA131B Analysis II 3

Lemma 2.6 (Preservation of sign). If f : E → R is continuous at c ∈ E and f (c) > 0 then there exists
a δ > 0 such that f (x) > 0 for all x ∈ E with |x − c| < δ. Similarly, if f (c) < 0 then there exists a δ > 0
such that f (x) < 0 for all x ∈ E with |x − c| < δ.
Proof. If f : E → R, E ⊂ R is continuous at c ∈ E and f (c) > 0 then ∃δ > 0 such that |x − c| < δ =⇒
|f (x) − f (c)| < f (c) so f (x) > −f (c) + f (c) = 0 by the same reasoning if f (c) < 0 then there exists δ > 0
such that for |x − c| < δ =⇒ f (x) < 0
Notice that if f is not continuous at c ∈ E this does not have to hold, consider f (x) = −1 for x 6= 0
and f (x) = 1 for x = 0 is not continuous at 0 and preservation of sign does not hold. Notice also, we
can change 0 in the inequalities f (c) > 0 or f (c) < 0 for any number p to give a similar result. For
example, with a function continuous at c and satisfying the inequality f (c) > p, it holds that there is a
neighbourhood A ⊂ E around c such that for any x ∈ A the inequality f (x) > p holds, (similarly for
f (c) < p). Can you prove this?

Most functions we are familiar with are continuous. For example, the functions f, g : R → R given
by f (x) = c (for any constant c ∈ R) and g(x) = x are continuous. Moreover, when we have continuous
functions, we can create new continuous functions by adding, subtracting, multiplying or dividing them.
The following result follows immediately from the sequential definition of continuity and the analogous
results for sequences:
Proposition 2.7. Given A ⊂ R, and given a ∈ A, suppose that f, g : A → R are continuous at a. Then:
1. the function f + g : A → R defined by (f + g)(x) := f (x) + g(x) is continuous at a;
2. the function f · g : A → R defined by (f · g)(x) := f (x) · g(x) is continuous at a; and
 
3. if also g(x) 6= 0 for all x ∈ A, then the function fg : A → R defined by fg (x) = fg(x)
(x)
is continuous
at a.
Since the function x 7→ x is continuous, we can multiply it by itself to get that x 7→ xn is continuous
for any n ∈ N. Multiplying by any constant an shows that x 7→ an xn is continuous. Hence the sum
x 7→ a0 + a1 x + a2 x2 + · · · + an xn is continuous, i.e.:
Proposition 2.8. Any (finite) polynomial function p : R → R given by p(x) = a0 +a1 x+a2 x2 +· · ·+an xn
is continuous.
p1 (x)
Corollary 2.9. Any rational function q : A → R given by q(x) = p2 (x) where p1 , p2 are (finite) polyno-
mials is continuous provided that p2 (x) 6= 0 for all x ∈ A.
Theorem 2.10. If the function f : A → R is continuous at x = a, and the function g : B → R is
continuous at f (a), where f (A) ⊂ B, then g ◦ f is continuous at x = a.
That is, a continuous function of a continuous function is continuous.

Proof. Let (an ) → a be a sequence
 in A. By the continuity of f at a, f (an ) → f (a). Then, since g is
continuous at f (a), g(f (an )) → g(f (a)). Hence g ◦ f is continuous at x = a.
1
Corollary 2.11. If f : E → R is continuous at c and f (c) 6= 0 then f is well-defined in a neighbourhood
of c and is continuous at c.
Examples 2.12. Continuous functions are so familiar that to clarify the meaning of this definition we
need some examples of discontinuity, illustrating the absence of continuity. The continuity of a function
at a point a is recognised by the way the function affects every sequence
 (an ) tending to a. Discontinuity
is established by finding just one sequence (an ) → a for which f (an ) does not tend to f (a).
1. Consider again the integer part function f (x) = [x]. f is continuous for any non-integer x and
discontinuous at any integer x. Thus f has an infinity of isolated discontinuities.
2. Consider the function f : R → R defined by f (x) = 0 when x is rational, and f (x) = x when x is
irrational. This is continuous at just a single point, x = 0.
4 MA131B Analysis II

3. Consider the function f (x) = sin x1 . This is continuous if x 6= 0, but what about x = 0? Let
1
an = 2nπ and bn = (2n+1 1 )π . Now (an ) → 0 and (bn ) → 0, but f (an ) = 0 for all n, and f (bn ) = 1
2
for all n. Hence, if we try to extend f to a continuous function by defining

˜ f (x) if x > 0
f (x) =
k if x = 0

then we  will fail, since whatever value of k we choose there will exist sequences (an ) such that
f (an ) does not tend to k.

Lemma 2.13. For x ∈ (0, π2 ) we have that 0 < sin x < x < tan x

Proof. Firstly, take a unit circle with centre O and two distinct points A,B on its circumference with
the angle between OA and OB less than π2 . Then take a point E outside the circle such that OBE is a
right angle and notice that the area of triangle OAB < sector OAB < area of triangle OBE. Finally, by
denoting the angle of the sector by x we get the desired inequality.

Theorem 2.14. The function f : R → [1, 1] given by f (x) = sin x is continuous.

Proof. Firstly, note that sin(x) = sin( x−c x+c x+c x−c
2 + 2 ) and sin(c) = sin( 2 − 2 ) and then set δ =  (where
|x − c| < δ). Using the identities above and then the addition formula for sine we get | sin(x) − sin(c)| =
|2 sin( x−c x+c x−c x+c x−c
2 ) cos( 2 )|. Hence, |2 sin( 2 ) cos( 2 )| < |2 sin( 2 )| < |x − c| < δ =  and therefore sin x is
continuous.

2.2 Continuity and Completeness


Up until now, our results on continuity have not depended on the property of completeness. However,
using completeness gets us some of the most useful results to do with continuity.
Intervals, or connected sets, on the real line are subsets of the real line which contain all the real
numbers lying between any two points of the subset. The set I is an interval if when r, s ∈ I, and r < s,
then every x such that r < x < s also belongs to I. The seven types of interval are distinguished by
their boundedness and their boundaries:
• bounded above and below:
(i) singleton point, {a}
(ii) closed 3 interval, [a, b] = {x ∈ R | a ≤ x ≤ b}
(iii) open4 interval, (a, b) = {x ∈ R | a < x < b}
(iv) half-open interval, [a, b) = {x ∈ R | a ≤ x < b} or (a, b] = {x ∈ R | a < x ≤ b}
• bounded above or below, but not both:
(v) open half-ray, (a, ∞) = {x ∈ R | x > a} or (−∞, a) = {x ∈ R | x < a}
(vi) closed half-ray, [a, ∞) = {x ∈ R | x ≥ a} or (−∞, a] = {x ∈ R | x ≤ a}
• unbounded:
(vii) the whole real line, R

Definition 2.15. Let E ⊂ R function f : E → R is strictly increasing if ∀x, y ∈ E, x > y =⇒ f (x) >
f (y), increasing if ∀x, y ∈ E, x > y =⇒ f (x) ≥ f (y), decreasing if ∀x, y ∈ E, x > y =⇒ f (x) ≤ f (y)
and strictly decreasing if ∀x, y ∈ E, x > y =⇒ f (x) < f (y)

The Intermediate Value Theorem, or IVT, tells us that a continuous function maps intervals onto
intervals:

Theorem 2.16 (IVT version 1). If f : [a, b] → R is a continuous function with f (a) < f (b), then for
each k such that f (a) < k < f (b), there exists c ∈ (a, b) such that f (c) = k.
3 The word closed as we have used it here indicates that every convergent sequence within each closed set converges to

a point of the set (see the Closed Interval Rule, theorem 1.3).
4 The word open as we have used it here indicates that there is space within each open set around each point of the set.

For example, if c ∈ (a, b), then a < 12 (a + c) < c < 21 (c + b) < b. This is also described by saying that an open set contains
a neighbourhood of each of its points.
MA131B Analysis II 5

Proof. Let S = {x ∈ [a, b] | f (x) ≤ k}, so a ∈ S and x ∈ S =⇒ x ≤ b. Therefore S is non-empty


and bounded above, and so has a least upper bound, sup S. Set c = sup S. We claim f (c) = k. Now,
since c = sup S, there exists a sequence (xn ) in S with xn → c as n → ∞. As f is continuous at c,
f (xn ) → f (c) as n → ∞. But ∀n, f (xn ) ≤ k by definition of S, hence f (c) ≤ k (by the Closed Interval
Rule).
Next, for contradiction, assume f (c) < k. Take ε = 12 (k −f (c)) > 0. Since f is continuous at c, ∃δ > 0
such that x ∈ [a, b] and |x − c| < δ =⇒ |f (x) − f (c)| < 12 (k − f (c)). Hence f (x) < f (c) + 12 (k − f (c)) =
1
2 (k + f (c)) < k, so x ∈ S. But this means that x ∈ (c, c + δ) =⇒ x ∈ S, which is impossible since
c = sup S. Hence f (c) ≥ k. So f (c) = k as required.

Corollary 2.17 (IVT version 2). If f : [a, b] → R is a continuous function with f (b) < f (a), then for
each k such that f (b) < k < f (a), there exists c ∈ (a, b) such that f (c) = k.

Note that we most definitely require completeness for the Intermediate Value Theorem – if we work
over Q the theorem vanishes in a puff of smoke. For example, consider f : Q → Q given by f (x) = −1 if
x2 < 2 and f (x) = 1 if x2 > 2. Then f is continuous with f (0) = −1 and f (2) = 1, but there is no c
with f (c) = 0. Notice also that the IVT does not work if a function, f, is not continuous at a point. E.g.
f (x) = −1 for x 6= 0 and f (x) = 1 for x = 0 then f (0) = 1 and f (1) = 0 but there is no point c ∈ [0, 1]
such that f (c) = 21 .

Lemma 2.18. Any odd degree polynomial has a real root.


P2n+1
Proof. Consider P (x) = j=1 aj xj without loss of generality let a2n+1 > 0 then for x > 1, P (x) >
P2n P2n
P2n |aj | j=1 |aj |
a2n+1 x2n+1 − j=1 |aj | and so for x > aj=1
2n+1
we have that P (x) > 0 and for x < − a2n+1 we have
P2n P2n
j=1 |aj | j=1 |aj |
P (x) < 0 and so by IVT there exists a root somewhere in the range (− a2n+1 , a2n+1 ).

If we consider f : [a, b] → [a, b] and apply the IVT to f (x) − x, we get a simple fixed point theorem 5 :

Theorem 2.19 (Fixed point theorem). If f : [a, b] → [a, b] is continuous, then there exists at least one
c ∈ [a, b] such that f (c) = c.

The Intermediate Value Theorem tells us that the range of a continuous function on an interval is
also an interval. A continuous function on a closed interval has special properties that lead to a stronger
conclusion.

2.3 Limits
Definition 2.20 (Continuous limit). For f : (a, b)\{c} → R then we say that f (x) tends to α as x tend
to c, denoted f (x) → α as x → c, if ∀ > 0∃δ > 0 such that for |x − c| < δ =⇒ |f (x) − α| < 

Proposition 2.21 (Sandwich Rule). Suppose that f, g, h : (a, b)\{c} → R, and that f (x) ≤ g(x) ≤ h(x)
then if limx→c f (x) = limx→c h(x) = α then limx→c g(x) = α

Proposition 2.22 (Uniqueness of Limits). If limx→a f (x) = l1 and limx→a f (x) = l2 , then l1 = l2 .

Some kinds of discontinuities may be identified and discussed by considering the two sides of the
point in question separately. On this basis we define one-sided limits:

Definition
 2.23. Let f : A → R. If, for any sequence (an ) in A with an < a and (an ) → a, we have
f (an ) → l, we write lim− f (x) = l. Similarly, if, for any sequence (an ) in A with an > a and (an ) → a,
 x→a
we have f (an ) → l, we write lim f (x) = l.
x→a+

For example, for some integer a ∈ Z, we have that limx→a− [x] = [a] − 1 and limx→a+ [x] = [a]. Note
that we do not require a ∈ A; we do not take account of the value of f (a) – in fact we do not even
require that f be defined at a, only that you can get arbitrarily close to a and still take values of f (x).
5 In higher dimensions this is completely non-trivial, since we do not have any inequalities when dealing with vectors.

The full version is known as Brouwer’s fixed point theorem and is covered in detail in MA3H5 Manifolds.
6 MA131B Analysis II


Theorem 2.24 (Algebra of Limits). Suppose lim f (x) = l and lim g(x) = m. Then lim f (x)+g(x) =
x→a x→a x→a
l + m, lim f (x) · g(x) = l · m, and if m 6= 0 then lim fg(x)
(x)
= ml .

x→a x→a

Just as we can define continuity in terms of sequences and neighbourhoods, we can do the same for
limits. We prove this in a similar way to the proof of theorem 2.4.
Definition 2.25. Given a function f : A → R, limx→a− f (x) = l if and only if for each ε > 0, there is a
δ > 0 such that x ∈ A and a − δ < x < a =⇒ |f (x) − l| < ε.
Similarly, limx→a+ f (x) = l if and only if for each ε > 0, there is a δ > 0 such that x ∈ A and
a < x < a + δ =⇒ |f (x) − l| < ε.
If, for a particular point a, we have limx→a− f (x) = l and limx→a+ f (x) = l then it is customary to
write limx→a f (x) = l. We then have:
The following properties of so-called two-sided limits follow immediately:
Moreover, by chasing definitions it is easy to show that for continuous functions the limit of the
function is the function of the limit:
Proposition 2.26. f : A → R is continuous at a ∈ A if and only if lim f (x) = lim f (x) = f (a), i.e. if
x→a− x→a+
and only if lim f (x) = f (a).
x→a

Definition 2.27. We say that f (x) → α as x → ∞ if ∀ > 0 ∃ R such that ∀x > R we have |f (x)−α| < 
and that f (x) → α as x → −∞ if ∀ > 0 ∃ R such that ∀x < R we have |f (x) − α| < 
Definition 2.28. We say that f (x) → ∞ as x → c if ∀R ∃δ > 0 such that f (x) > R for all x with
0 < |x − c| < δ and we say that f (x) → −∞ as x → c if ∀R ∃δ > 0 such that f (x) < R for all x with
0 < |x − c| < δ
Definition 2.29. We say that f (x) → ∞ as x → ∞ if ∀R ∃δ such that f (x) > R with x > δ and we
say that f (x) → −∞ as x → ∞ if ∀R ∃δ such that f (x) < R with x > δ
Many of the results we established for continuous functions – such as sum and product rules – carry
over easily into limits, and are proved easily using the analogous results for sequences.

Theorem 2.30 (Algebra of Limits). Suppose lim f (x) = l and lim g(x) = m. Then lim f (x)+g(x) =
x→a x→a x→a
l + m, lim f (x) · g(x) = l · m, and if m 6= 0 then lim fg(x)
(x)
= ml .

x→a x→a

Note, however, that while the composition of continuous functions is continuous, limits do not behave
as nicely under composition, and there is no analogous simple statement about limits of composed
functions.
By replacing “for all ε > 0” with “for all C > 0” and changing |f (x) − l| < ε to f (x) > C, we get
infinite limits. We can also replace “there exists δ > 0” with “there exists C > 0” and change |x − a| < δ
with x > C to get limits at infinity. Practise playing with the definitions, and also with evaluating all
manner of limits; such questions are popular on exams.

2.4 Continuous functions on closed bounded intervals


Theorem 2.31 (Extreme Value Theorem). If f : [a, b] → R is continuous, then the range of f , i.e. f ([a, b])
is bounded. Furthermore, f attains its bounds, i.e. there exist x1 , x2 ∈ [a, b] such that f (x1 ) =
inf x∈[a,b] f (x) and f (x2 ) = supx∈[a,b] f (x).
Notice that if the range isn’t closed, e.g. (0, 1] or (0, 1) then the EVT may not apply, as an example
consider f : (0, 1) → R , f (x) = x then both f (x1 ) = inf x∈(0,1) f (x) and f (x2 ) = supx∈(0,1) f (x) don’t
exist.
We used the Intermediate Value Theorem to find individual solutions, i.e. when we have a function
on a closed interval [a, b] we can say that the function takes every value between f (a) and f (b). What
would be much more useful would be to be able to say that f is a bijection – that f has a well-defined
inverse function. However, continuity alone is not sufficient to ensure that f is a bijection – we need
f (x) to occur exactly once in the range in order that we can invert it, i.e. we require f to be injective.
Once we assume this, we get:
MA131B Analysis II 7

Proposition 2.32. Suppose f : [a, b] → R is continuous, injective and f (a) < f (b). Then the range of
f is the interval [f (a), f (b)], and f : [a, b] → [f (a), f (b)] is a strictly increasing bijection, whose inverse
f −1 : [f (a), f (b)] → [a, b] is also continuous and strictly increasing.
Proposition 2.33. If f : [a, b] → R is continuous with f (a) < f (b) but is not strictly increasing then it
is not injective.
Proposition 2.34. If f : [a, b] → [f (a), f (b)] is non-decreasing and surjective then it is continuous.
Corollary 2.35. Let f : [a, b] → R be continuous and strictly monotonic. Then f −1 : f ([a, b]) → [a, b] is
continuous and strictly monotonic.
Example 2.36. The function sin : [− π2 , π2 ] → R is continuous and injective; hence it is strictly increasing
and bijective onto [−1, 1], and its inverse arcsin : [−1, 1] → [− π2 , π2 ] is also continuous.

3 Differentiation
Having dealt with continuous functions, the next class of “useful” functions which we come to are the
differentiable functions. We first ask a simple question: how can we find the gradient, or rate of change, of
f ? If we consider a point (a, f (a)), then the line through this point with gradient m is y−f (a) = m(x−a).
Now, the equation of the chord joining (a, f (a)) and (a + h, f (a + h)) is y − f (a) = f (a+h)−f
h
(a)
· (x − a).
f (a+h)−f (a)
If we let m = lim h , then we would surely describe the line y − f (a) = m(x − a) as being a
h→0
tangent to the curve at x = 0, and m as being the gradient of this tangent. On the strength of these
ideas, we define the derivative at a point a of its domain:
Definition 3.1. For a function f : A → R and a point a ∈ A, if
f (a + h) − f (a) f (x) − f (a)
lim = lim =m
h→0 h x→a x−a
for some real number m, then m is called the derivative of f at a, usually denoted by f 0 (a), and f is said
to be differentiable at a. (The two limits are equivalent; the first exists if and only if the second exists,
and they are equal if they both exist.)
Although derivatives arise naturally from considering the geometric notion of tangent, there is still
one circumstance when a tangent to the graph of a function may exist without a derivative of the function
at the point in question; this occurs when the slope of the chord tends to ∞. The derivative does not
exist (since ∞ is not a real number), but the tangent still exists and is vertical on the graph.

3.1 Basic Properties of Derivatives


We can easily establish that a constant function must have zero derivative, simply by working with the
definition:
Proposition 3.2. If f : R → R, f (x) = c for some c ∈ R, then f 0 (a) = 0 for all a ∈ R.
The attempt to establish a converse to Proposition 3.2 exposes some unexpected subtleties; we will
come back to this after the Mean Value Theorem. Similarly, we can easily establish the derivative of a
straight line:
Proposition 3.3. If f : R → R, f (x) = mx + c for some m, c ∈ R, then f 0 (a) = m for all a ∈ R.
The following rules will be familiar from A-level:
Theorem 3.4. Suppose f, g : A → R are both differentiable at a. Then:
f + g : A → R, (f + g)(x) = f (x) + g(x) is differentiable at a, and (f + g)0 (a) = f 0 (a) + g 0 (a);
f g : A → R, (f g)(x) = f (x) · g(x) is differentiable at a, and (f g)0 (a) = f 0 (a)g(a) + f (a)g 0 (a);
   0
provided that g(x) 6= 0 for all x ∈ A, fg : A → R, fg (x) = fg(x) (x)
is differentiable at a, and fg (a) =
f 0 (a)g(a)−f (a)g 0 (a)
(g(a))2
8 MA131B Analysis II

Theorem 3.5 (Chain Rule6 ). Let A ⊂ R and B ⊂ R. If f : A → R is differentiable at a, and g : B →


R is differentiable at f (a), where f (A) ⊂ B, then the composite function g ◦ f : A → R defined by
(g ◦ f )(x) = g(f (x)) is differentiable at a and (g ◦ f )0 (a) = g 0 (f (a)) · f 0 (a).
Theorem 3.6 (Carathéodory formulation of differentiability). f : (a, b) → R and x0 ∈ (a, b), then f is
differentiable at x0 with derivative f 0 (x0 ) if and only if there exists a function φ that is continuous at x0
and φ(x0 ) = f 0 (x0 ) and f (x) = f (x0 ) + φ(x)(x − x0 )
Proof. Supposed f is differentiable at x0 then define φ(x) = f (x)−f
x−x0
(x0 )
, x 6= x0 and φ(x0 ) = f 0 (x0 ) and
so φ is continuous at x0 by definition of differentiability and φ(x0 ) = f 0 (x0 ). Now suppose that there
exists φ which is continuous at x0 and f (x) = f (x0 ) + φ(x)(x − x0 ) so φ(x) = f (x)−f (x0 )
x−x0 and the limit
of φ as x → x0 exists so f is differentiable.
From these it is easy to show that:
Proposition 3.7. If f : R → R, where f (x) = xn for some natural number n, then f 0 (a) = n · an−1 for
any a ∈ R.
Theorem 3.4 can be proved using the algebra of limits with some fancy trickery; however, it is
much more difficult (though possible) to prove the chain rule in such a manner, and the proof does
not generalise to higher dimensions. The problem stems from the lack of a rule for limits of composite
functions. However, we know how composite functions behave when they are continuous; so we seek to
reduce the problem of differentiability to one of continuity.
One other problem we encounter when trying to generalise differentiation to higher dimensions is that
division cannot always be defined. To solve both these problems, we reject the definition of the derivative
as the slope of the tangent, and simply use the tangent itself. The most general straight line through
the point (a, f (a)) is of the form y = f (a) + k(x − a), and this is tangent to the graph of f precisely
when k = f 0 (a). Now, if we let k vary in such a way that y = f (x), then we require that k(x) → f 0 (a)
as x → a; in other words we need the function k to be continuous. Stating this formally, we have:
Using we can prove the product, quotient and chain rules quite easily7 . Furthermore, since each of
the functions on the right-hand side of the expression f (x) = f (a) + (x − a)∆a f (x) are continuous, it is
immediately clear that:
Lemma 3.8. If f is differentiable at a, then f is continuous at a.
The converse is not true. Some examples follow:
• Consider the absolute value function f : R → R given by f (x) = |x|. Since for x ≥ 0 we have
f (x) = x and for x < 0 we have f (x) = −x, f is clearly continuous and differentiable at any point
except 0; however, f is continuous but not differentiable at x = 0.
• We saw earlier that the function f (x) = sin x1 for x > 0 could not be extended to a continuous
function on x ≥ 0 no matter what we defined f (0) as. It follows that f cannot be differentiable
at x = 0, by the contrapositive of Lemma 3.8. However, f : R → R defined by f (x) = x sin x1 for
x 6= 0 with f (0) = 0 is continuous at x = 0, but it fails to be differentiable. On the other hand,
f : R → R given by f (x) = x2 sin x1 is continuous and differentiable at x = 0.
Definitions 3.9. A function f : (a, b) → R is differentiable if it is differentiable at every c ∈ (a, b). A
function f : [a, b] → R is differentiable if, in addition, the limits limx→a+ f (x)−f
x−a
(a)
and limx→b− f (x)−f
x−b
(b)

exist as real numbers; we call these the right derivative of f at a and the left derivative of f at b,
respectively.
Definitions 3.10. If a function f : A → R is differentiable, we define the derived function f 0 : A → R.
If the derived function is differentiable at a point a ∈ A, we say that f is twice differentiable, and denote
(f 0 )0 (a) as f 00 (a). Similarly, we say f is n times differentiable at a if f (n) (a) exists, where we define
inductively by f (n) (a) = (f (n−1) )0 (a) and f (0) (a) = f (a).
If f : A → R is n times differentiable, and f (n) : A → R is continuous, then we say that f is C n (A).
If f is n times differentiable for all n, then we say f is C ∞ (A).
dg dg
6 In Leibniz notation, the chain rule says that dx (x) = df (f (x)) · df
dx
(x).
7 These proofs are fairly simple, involving mainly algebraic manipulation, and are examinable.
MA131B Analysis II 9

3.2 Derivatives and Completeness


We now consider maxima and minima of functions.

Definition 3.11. For f : A → R, a ∈ A is a local maximum of f if ∃ δ > 0 s.t. |x − a| < δ =⇒ f (x) ≤


f (a). Similarly, a ∈ A is a local minimum of f if ∃ δ > 0 s.t. |x − a| < δ =⇒ f (x) ≥ f (a).

You may be used to finding local maxima or minima by setting f 0 (x) = 0 and solving for x; however,
0
f (x) = 0 is a necessary condition for x to be a maximum or minimum, but not sufficient:

Lemma 3.12. If f : A → R has a local maximum or local minimum at some a ∈ A, and f is differentiable
at a, then f 0 (a) = 0.

To better understand the situation, we turn to more general questions. The following powerful
theorem seems obvious at first glance; however, it relies on the completeness of R and is hence non-
trivial

Theorem 3.13 (Rolle’s Theorem). If f : [a, b] → R is continuous on [a, b] and differentiable on (a, b),
and additionally f (a) = f (b), then there is a point c ∈ (a, b) such that f 0 (c) = 0.

Proof. As f is continuous, it is bounded and attains its bounds, i.e. there exist x1 , x2 ∈ [a, b] such that
f (x1 ) ≤ f (x) ≤ f (x2 ) for all x ∈ [a, b]. Set A = f (a) = f (b). If f (x1 ) < A, then x1 ∈ (a, b) is a
local minimum, and hence f 0 (x1 ) = 0. If f (x2 ) > A, then x2 ∈ (a, b) is a local maximum, and hence
f 0 (x2 ) = 0. Otherwise f (x1 ) = f (x2 ) = A and hence f is constant, so f 0 (x) = 0 for all x ∈ (a, b).

Notice that the interval must be closed in order to use Extreme Value Theorem. Geometrically,
Rolle’s Theorem states that there is a tangent to the curve y = f (x), at some point between x = a and
x = b, which is parallel to the chord joining the points (a, f (a)) and (b, f (b)). In this theorem, the chord
is horizontal; we now generalise this to allow the chord to have any gradient.

Theorem 3.14 (The Mean Value Theorem). If f : [a, b] → R is continuous on [a, b] and differentiable
on (a, b), then there is a point c ∈ (a, b) such that f 0 (c) = f (b)−f
b−a
(a)
.
f (b)−f (a)
Proof. Consider the line joining the points (a, f (a)) and (b, f (b)), whose gradient is m = b−a .
f (b)−f (a)
This is given by k : [a, b] → R where k(x) = f (a) + b−a · (x − a). Applying Rolle’s theorem to
f (b)−f (a)
g(x) = f (x)−k(x), which has g(a) = g(b) = 0, yields some c ∈ (a, b) with g 0 (c) = 0. But k 0 (x) = b−a
f (b)−f (a) f (b)−f (a)
for all x ∈ (a, b), so g 0 (c) = f 0 (c) − b−a = 0, i.e. f 0 (c) = b−a as required.

Corollary 3.15. If f : [a, b] → R is continuous on [a, b] and differentiable on (a, b), then
• if f 0 (x) = 0 for all x ∈ (a, b), then f is constant on [a, b];
• if f 0 (x) > 0 for all x ∈ (a, b), then f is strictly increasing on [a, b];
• if f 0 (x) < 0 for all x ∈ (a, b), then f is strictly decreasing on [a, b].

In higher dimensions, the Mean Value Theorem does not hold, but often the Mean Value Inequality
does hold:

Proposition 3.16 (Mean Value Inequality). Let f : [a, b] → R be continuous on [a, b] and differentiable
on (a, b). If there exists a real number K such that |f 0 (x)| ≤ K, then for all x, y ∈ [a, b] we have
|f (x) − f (y)| ≤ K|x − y|.

This says that if the derivative is bounded, f is Lipschitz. (A function f : A → R is Lipschitz if there
exists K > 0 such that |f (x) − f (y)| ≤ K|x − y|; a Lipschitz function is necessarily continuous.) In
particular, this occurs when f 0 : [a, b] → R is continuous, since it is then automatically bounded. The
Mean Value Inequality converts local information into global information: |f 0 (x)| ≤ K tells us that the
local rate of change is no greater than K, and then |f (x) − f (y)| ≤ K|x − y| says that the rate of change
between any two points in the domain is no greater than K.
When f is invertible, we can relate the derivatives of f and f −1 as follows8 :
8 In dx dy
Leibniz notation, the inverse rule says that dy
(y) = 1/ dx (x), where y(x) = f (x).
10 MA131B Analysis II

We turn now to Taylor’s theorem. When we write the Mean Value Theorem in the form f (b) =
f (a) + (b − a)f 0 (c), for some c ∈ (a, b), we can use the term (b − a)f 0 (c) to give us an estimate of how
near the value of f at b is to its value at a. This is a linear approximation to f around the point a.
What if we try to approximate it by a polynomial? Well, let pn (x) = f (a) + k1 (x − a) + · · · + kn (x − a)n ,
and suppose f (x) = pn (x) + rn (x), where rn (x) is a remainder term for the nth -order approximation.
If we choose the ki such that f and p have the same derivatives of orders 1 to n at x = a, then we get
(j)
pn (a) = j!kj = f (j) (a) for 1 ≤ j ≤ n, and hence

f 00 (a) f (n) (a)


pn (x) = f (a) + f 0 (a)(x − a) + (x − a)2 + · · · + (x − a)n
2! n!
Does this approximate f well? In order to know that, we need to know what form the remainder term,
rn (x) takes. Taylor’s theorem states that the remainder term for the approximation by an nth -order
polynomial depends on f (n+1) (c) for some c ∈ (a, b).

Theorem 3.17 (Cauchy’s Mean Value Theorem). Suppose f, g : [a, b] → R are continuous on [a, b]
and differentiable on (a, b), and g 0 (x) 6= 0 for all x ∈ (a, b). Then there exists c ∈ (a, b) for which
f (b)−f (a) f 0 (c)
g(b)−g(a) = g 0 (c) .

Theorem 3.18 (Taylor’s Theorem). If f : [a, b] → R has a continuous nth derivative on [a, b] and is n + 1
times differentiable on (a, b), then there is a point c ∈ (a, b) such that

(b − a)2 00 (b − a)n (n) (b − a)n+1 (n+1)


f (b) = f (a) + (b − a)f 0 (a) + f (a) + · · · + f (a) + f (c).
2! n! (n + 1)!

By putting a = 0, b = x and c = θx into Taylor’s theorem, for some θ ∈ (0, 1), we get the special case
known as Maclaurin’s Theorem. The final term, known as the Lagrange form of the remainder, is most
like that in the Mean Value Theorem; a term dependent on a derivative of f at some point c ∈ (a, b).
As we do not, in general, know what the point c is, this does not always give useful information, so we
consider an alternative formulation of Taylor’s Theorem, with Cauchy’s form of the remainder:

Theorem 3.19 (Taylor’s Theorem with Cauchy’s form of the remainder). If f : [a, b] → R has a con-
tinuous nth derivative on [a, b] and is n + 1 times differentiable on (a, b), then there is a point c ∈ (a, b)
such that

(b − a)2 00 (b − a)n (n) (b − c)n (b − a) (n+1)


f (b) = f (a) + (b − a)f 0 (a) + f (a) + · · · + f (a) + f (c).
2! n! n!

Proposition 3.20. Let A ⊂ R and B ⊂ R. For a bijection f : A → B, with inverse f −1 : B → A, if f is


differentiable at a and f −1 is differentiable at f (a) then (f −1 )0 (f (a)) = f 01(a) .

Note that this means that if f and f −1 are inverse functions, then if f 0 (a) = 0 then f −1 cannot be
differentiable at f (a). However, if the domain of f is an interval, then the differentiability of f at a,
along with the requirement that f 0 (a) 6= 0, is sufficient to show that f −1 is differentiable at the point
f (a).

Proposition 3.21. Suppose f : (a, b) → (h, k) is a continuous bijection with inverse f −1 : (h, k) → (a, b).
Then if f is differentiable at x ∈ (a, b) with f 0 (x) 6= 0, then f −1 is differentiable at f (x).

Using this, we get the Inverse Function Theorem9 :

Theorem 3.22 (The Inverse Function Theorem). If f : (a, b) → R is continuous and differentiable on
(a, b) with f 0 (x) > 0 for all x ∈ (a, b), then f : (a, b) → (h, k) is a bijection with inverse f −1 : (h, k) → (a, b)
which is continuous and differentiable on (h, k) with (f −1 )0 (y) = f 0 (f −1 1
(y)) for all y ∈ (h, k).

9 This is not the most general form of the Inverse Function Theorem; the lectures included a slightly stronger statement.

However, this is good enough for all intents and purposes.


MA131B Analysis II 11

4 Power Series
So far, the only interesting functions we have considered have been polynomials, f (x) = xn , and what
can be made by combining finitely many through addition, multiplication, or even division. To obtain
functions other than rational functions, we must remove the restriction to a finite number of operations;
in other words, we must admit limiting processes.
In the previous section, we considered polynomial approximations toPdifferentiable functions using
n
Taylor’s Theorem. In each case we had a finite polynomial of the form k=0 ak xk , and the difference
between the function and the polynomial was given by a remainder term depending on the values of
f (n+1) . We now ask the question: what if these remainder terms tend to 0 as n → ∞? Do we get an
“infinite polynomial” representation of the function?
We mustP∞first consider what we mean by an “infinite polynomial”. One likely candidate is a series of
the form n=0 an xn , where the an are fixed coefficients; this is known as a power series. Power series
have many beautiful and interesting properties which make using them in analysis very useful indeed.

4.1 Differentiability and Taylor Series


We begin with a few examples:

Examples 4.1. P n
• Lemma 1.9 from Analysis I tells us that the power series x converges for |x| < 1 and diverges
for |x| ≥ 1.
• Consider the power series Pnxn . By the ratio test, this converges if |x| < 1 and diverges if |x| > 1.
P
n
• Consider the power series P xn! . By the ratio test, this converges for any value of x.
• Consider the power series n!xn . By the ratio test, this diverges for every x 6= 0, but converges
for x = 0.

This motivates our definition of radius of convergence:

an xn , either:
P
Theorem 4.2. For any power series
1. the series converges absolutely for all x ∈ R; or
2. there is a real number R, such that the series is absolutely convergent for |x| < R and divergent
for |x| > R; or
3. the series converges only if x = 0.
We call R the radius of convergence of the power series; in the first case we write R = ∞, and in the
third case we set R = 0.

We can use the ratio test (although most of the time Cauchy’s nth Root is better) to find the radius
of convergence as in the above examples; again this is popularPin exam questions. However, note that in
the second case, we know nothing about the convergence of an xn on the boundary where |x| = R.
What is even more beautiful about power series is that they are guaranteed to be differentiable within
their radius of convergence, and that the derivative of a function defined by a power series is simply the
power series obtained by differentiating each term:
P an xn+1
an xn , nan xn−1 and
P P
Lemma 4.3. The power series n+1 have the same radius of convergence.

an xn has radius of convergence R with


P
Theorem 4.4 (Term by Term Differentiation). Suppose P

R > 0 (orP∞ R = ∞), and define f : (−R, R) → R by f (x) = n=0 an xn . Then f is differentiable and
0 n−1
f (x) = n=1 nan x for x ∈ (−R, R).

This is a powerful theorem: given a power series that is convergent on (−R, R), we know automat-
ically that it is differentiable, and its derivative is given by differentiating the power series termwise,
as we would a polynomial. But if we differentiate a power series, we get another power series, which
we can differentiate again and again; so power series can be differentiated arbitrarily many times! Fur-
thermore, as differentiability implies continuity, not only is the power series “infinitely” differentiable,
each derivative (not to mention the power series itself) is continuous. Power series really are very nice
functions.
12 MA131B Analysis II

1 (k)
an xn , and setting x = 0, we see that ak =
P
By applying theorem 4.4 to f (x) = k! f (0). Hence if
f is equal to some power series, then we have

X f (k) (0)
f (x) = xk .
k!
k=0

This looks very much like Taylor’s theorem, but with infinitely many terms; we call it the Taylor series
of f about 0. If we can differentiate f arbitrarily many times, i.e. if f is C ∞ , then can we say that f is
equal to its Taylor series? Many great mathematicians, such as Lagrange, thought so. But the answer
is NO!
Firstly, the Taylor series may not converge; and even if it does, it may not converge to the function.
The starkest example is due to Cauchy: consider f : R → R defined by
 −1/x2
e for x 6= 0
f (x) :=
0 for x = 0

Then f possesses derivatives of all orders at every point, but f (n) (0) = 0 for all n, so its Taylor series
about 0 is zero everywhere! Where does this example break down? Simple: we have neglected the
statement of Taylor’s theorem. This states that f is equal to a polynomial plus a remainder term! In
this example, the remainder terms taken about x = 0 do not converge to 0 for any x 6= 0.

P∞ The function f : A → R is called analytic if, for all a ∈ A, there is a radius δ > 0 such
Definition 4.5.
that f (x) = n=0 an (x − a)n for all x with |x − a| < δ.
This says that if f is equals its Taylor series about any point in its domain, then f is analytic. Any
analytic function is automatically C ∞ , but we have seen that not all C ∞ functions are analytic.

4.2 Upper and Lower Limits


Theorem 4.2 is beautiful: a power series must converge for every x ∈ (−R, R) for some R ∈ [0, ∞].
Unfortunately, however, it does not tell us how to find the radius of convergence R. As we did in the
examples, we can use the ratio test:

cn xn , suppose that cn+1
P
Proposition 4.6. For a power series cn → k for some 0 < k < ∞. Then

cn xn is convergent if |x| < k1 and divergent if |x| > k1 .


P

However, this limit may not exist. Hence, we search for an explicit expression for the radius of
convergence which always exists. Consider the sequence (0.9, 3.1, 0.99, 3.01, 0.999, 3.001, . . . ). This does
not have a limit, but we can find a subsequence, (0.9, 0.99, 0.999, . . . ) that approaches 1, and another
subsequence (3.1, 3.01, 3.001, . . . ) which approaches 3. No subsequence approaches anything larger than
3. We say that 3 is the upper limit, or lim sup, of this sequence, and that the lower limit, or lim inf, is
1. It should be noted that 3 is not the least upper bound of the elements in this sequence. There are
infinitely many terms that are strictly greater than 3. But if we take any positive ε, then eventually all
of the terms will fall below 3 + ε and stay below. We thus define lim sup an :
Definition 4.7. The upper limit (or “limit superior”) of a sequence (an ), denoted lim sup an , is defined
as lim supn→∞ an := limk→∞ supn≥k an . Similarly, the lower limit (or “limit inferior”) of a sequence
(an ), denoted lim inf an , is defined as lim inf n→∞ an := limk→∞ inf n≥k an .
By definition, (an ) → a if and only if, for each ε > 0 there is a number N such that n ≥ N implies
|an − a| < ε. This can be characterised by saying that, for any ε > 0, a − ε < an < a + ε for all but finitely
many n. This is our working definition of the limit; but our definitions of the upper and lower limits
bear no relation to this. However, for the lim sup and lim inf, one of the inequalities in a − ε < an < a + ε
holds for all but finitely many n, but the other holds only for infinitely many n.
Proposition 4.8. Given a sequence (an ) in R, we have lim sup an = α ∈ R if and only if, for each
ε > 0, the following two conditions hold:
(i) for all but finitely many n (that is, for all n ≥ N for some N ∈ N), we have an < α + ε; and
MA131B Analysis II 13

(ii) for infinitely many n, we have an > α − ε.

Proposition 4.9. Given a sequence (an ) in R, we have lim inf an = β ∈ R if and only if, for each
ε > 0, the following two conditions hold:
(i) for all but finitely many n (that is, for all n ≥ N for some N ∈ N), we have an > β − ε; and
(ii) for infinitely many n, we have an < β + ε.

Proposition 4.10. Given a sequence (an ) in R, (an ) → l if and only if lim inf an = lim sup an = l.

There are many uses of lim sup and lim inf; one of these is in analysing convergent sub-sequences
Given a bounded sequence, the Bolzano-Weierstrass Theorem guarantees that it contains a convergent
subsequence. It says nothing, however, about how many of them there are; if the sequence is convergent,
then all sub-sequences converge to the same limit, that of the original sequence. On the other hand,
a sequence which oscillates, e.g. (sin n), will have many convergent sub-sequences, all converging to
different points. We call these points limit points:

Definition 4.11. If (an ) is a sequence in R, we say that l ∈ R is a limit point of (an ) if there exists a
subsequence (ank ) with (ank ) → l as k → ∞.

If the lim sup is the “limit towards which the greatest values converge” (as Cauchy, somewhat heuris-
tically, defined it), then we would expect that the largest value to which a subsequence can converge is
the lim sup of the sequence; similarly, we expect the smallest value to be the lim inf. This is indeed the
case, and what is more, lim sup an and lim inf an are themselves limit points:

Proposition 4.12. Suppose (an ) is a bounded sequence in R. Then lim sup an and lim inf an are limit
points of (an ), and for any limit point l of (an ) we have lim inf an ≤ l ≤ lim sup an .

A general sequence may not always have a limit, but it will always have a lim sup and a lim inf. We
put this to good use in Cauchy’s nth root test:

Theorem 4.13 (Cauchy’s nth Root Test). The series an is absolutely convergent if lim sup |an |1/n < 1
P
and divergent if lim sup |an |1/n > 1.

This leads straight away to our desired expression for the radius of convergence of a power series:
1√
an xn , set R =
P
Corollary 4.14 (The Cauchy–Hadamard formula). For a power series ,
lim sup n |an |
p p
an xn
P
with the convention that R = 0 if lim sup n |an | = +∞ and R = ∞ if lim sup n |an | = 0. Then
is absolutely convergent if |x| < R, and diverges if |x| > R.

5 Special functions of analysis


5.1 The Exponential,Logarithm Functions
One of the ultimate applications of mathematical analysis is to solve the problems which present them-
selves in the natural sciences, engineering, economics and other disciplines. Commonly the step from
the experimental data or hypotheses to the conclusions that they can be made to yield lies in solving
differential equations. In solving these problems, particularly in physics, a number of functions occur re-
peatedly which play a vital role in all these disciplines. The most important of these are the exponential
and logarithm functions, of which we now strive to develop their principal properties.
P∞ n
Definition 5.1. Define exp : R → R by exp(x) = n=0 xn! .

We must now, of course, show that all its familiar properties can be retrieved from this definition.
of all, we note that exp 0 = 1, since 00 = 1 and 0n = 0 for all non zero n; comparing exp x with
FirstP
∞ 1
e := n=0 n! we see that exp 1 = e. Using theorem 4.4, we can show that exp is its own derivative:

Proposition 5.2. For all x ∈ R, exp0 (x) = exp(x).

Another important fact is the multiplicative formula:


14 MA131B Analysis II

Proposition 5.3. For all x, y ∈ R, we have exp(x) exp(y) = exp(x + y).

Propositions 5.2 and 5.3 yield between them, in one way or another, just about every property we
seek to show of exp. By letting y = −x in proposition 5.3, we get that exp(−x) exp x = exp 0 = 1 for all
1
real numbers x; hence exp x 6= 0 for all x ∈ R, and exp(−x) = exp x.

Lemma 5.4. exp x > 0 for all real numbers x.

Now, since exp0 (x) = exp x > 0 for all x ∈ R, using the Mean Value Theorem we get that

Proposition 5.5. exp : R → R is strictly increasing (and hence injective).

We now consider some common limits involving exp. First of all, it is clear from the definition of the
1
power series that exp x → ∞ as x → ∞. So, as exp(−x) = exp x , we have that exp x → 0 as x → −∞.
Secondly, we consider the ratio of any power of x and the exponential function:
xn
Lemma 5.6. For all n ∈ N, lim = 0.
x→∞ exp x

In other words exp x → ∞ “faster” than any power of x as x → ∞.


We know that exp is continuous and differentiable on all of R; furthermore, exp0 x > 0 for all x. As
limx→−∞ exp x = 0 and limx→∞ exp x = ∞, we can apply the Inverse Function Theorem to get that exp
is bijective, with inverse (0, ∞) → R:

Theorem 5.7. exp : R → (0, ∞) is bijective with inverse log : (0, ∞) → R that is differentiable and
log0 (y) = y1 for all y > 0.

The usual properties of the logarithm now fall out almost immediately, again using proposition 5.3:

Proposition 5.8. For all a, b > 0 and any r ∈ Q, we have


(i) log 1 = 0;
(ii) log a + log b = log ab; and
(iii) log(ar ) = r log a.

So far we have only defined ax for rational x; here a > 0. By part (iii) of the above proposition, for
any rational x we have ax = exp(x log a). In order to extend ax to any real x, we define ax := exp(x log a)
for any real x. In the spirit of the section, we prove the following obvious properties of ax :

Proposition 5.9. For a > 0 we have


(i) ax > 0 for all x ∈ R;
(ii) ax ay = ax+y for all x, y ∈ R;
(iii) (ax )y = axy for all x, y ∈ R.

In particular, since exp 1 = e, we have that log e = 1, and hence that ex = exp x. We can use this
definition of ax to extend our ability to differentiate xk to all real numbers k, as well as differentiating
ax with respect to x.

Proposition 5.10. The functions f, g : (0, ∞) → R defined by f (x) = ax , a > 0, and g(x) = xk , k ∈ R,
are differentiable with f 0 (x) = ax log a and g 0 (x) = kxk−1 .

Finally, applying Taylor’s theorem yields two key results:



X xn
Proposition 5.11. For −1 < x ≤ 1, log(1 + x) = (−1)n+1 .
n=1
n

Proposition 5.12 (The Binomial Theorem for any real index). For any a ∈ R and −1 < x < 1,

X a(a − 1) . . . (a − n + 1) n a(a − 1) 2 a(a − 1)(a − 2) 3
(1 + x)a = x = 1 + ax + x + x + ...
n=0
n! 2! 3!
MA131B Analysis II 15

5.2 sine and cosine functions


P∞ (−1)k x2k+1 P∞ (−1)k x2k
s(x) = k=0 (2k+1)! , c(x) = k=0 (2k)!

Lemma 5.13. c, s : R → R, x ∈ R then c2 (x) + s2 (x) = 1


Lemma 5.14. c, s : R → R a, b ∈ R then s(a + b) = s(a)c(b) + s(b)c(a) and c(a + b) = c(a)c(b) − s(a)s(b)
√ √
Lemma 5.15. The function c : R → R has its smallest positive root γ between 2 and 3. Then 2γ is
the smallest positive root of s and s(γ) = 1.
In some cases when we are required to prove there exists a root between 2 values a useful method
is to group terms in a specific way where each group is greater than or less than zero. This gives the
required inequality and we can then use the IVT.

6 L’Hôpital’s Rule
The Mean Value Theorem is one of the most powerful theorems in all of analysis, and yet it is relatively
simple to understand. It thus comes as something of a surprise that it can lead to such powerful theorems
as L’Hôpital’s Rule, a means of evaluating limits which would otherwise result in 00 by differentiating
both top and bottom without affecting the limit.
The first form of L’Hôpital’s Rule is concerned with the evaluation of limits of the form limx→a fg(x)
(x)

when f (a) = g(a) = 0. The original Mean Value Theorem considered the ratio f (b)−f b−a
(a)
, where f is
a differentiable function. But we now have two functions, both (presumably) differentiable; how do we
relate their rates of change? The answer comes in the form of Cauchy’s Mean Value Theorem for two
functions f and g:
Using this we get the first form of L’Hôpital’s Rule, for right limits:
Proposition 6.1 (L’Hôpital’s Rule for right limits, case I). Suppose f, g : [a, b] → R are continuous on
0
[a, b] and differentiable on (a, b), and g 0 (x) 6= 0 for all x ∈ (a, b). If f (a) = g(a) = 0 and limx→a+ fg0 (x)
(x)
=l
f (x) f 0 (x)
for some l ∈ R ∪ {±∞}, then limx→a+ g(x) = limx→a+ g 0 (x) = l.

Analogously, we can use left-sided limits; we can also use two-sided limits, giving us the standard
form of L’Hôpital’s Rule:
Theorem 6.2 (L’Hôpital’s Rule, case I). Suppose f, g : [a, b] → R are differentiable on (a, b), and
0
g 0 (x) 6= 0 for all x ∈ (a, b). If for some c ∈ (a, b) we have f (c) = g(c) = 0 and limx→c fg0 (x)
(x)
= l for some
f (x) f 0 (x)
l ∈ R ∪ {±∞}, then limx→c g(x) = limx→c g 0 (x) = l.
√ √ √ √
Example 6.3. Suppose we wish to calculate the limit lim √x+2− x+1−1
2
. If we let f (x) = x + 2 − 2 and
√ x→0
g(x) = x + 1 − 1 then we see that f (0) = g(0) = 0. We compute f 0 (x) = 2√x+2 1
and g 0 (x) = 2√x+1
1
;
0
then g (x) 6= 0, so
√1

f 0 (x) 2 x+2 x+1 1
lim 0 = lim = lim √ =√ .
x→0 g (x) x→0 √1 x→0 x + 2 2
2 x+1
√ √
f 0 (x) 1 x+2− 2 1
So lim 0 = √ , and hence by L’Hôpital’s Rule we have lim √ =√ .
x→0 g (x) 2 x→0 x+1−1 2
f (x)
So far, L’Hôpital’s Rule has dealt with limits of the form limx→a g(x) in which f (x), g(x) → 0 as
x → a. However, if g(x) → ∞ as x → a, then limx→a fg(x)
(x)
will equal 0 if f is finite, but if also f (x) → ∞
as x → a we have an indeterminate form. For this reason, we study Case II of L’Hôpital’s Rule:
Proposition 6.4 (L’Hôpital’s Rule for right limits, case II). Suppose f, g : (a, b) → R are differentiable,
0
and that g(x) 6= 0 and g 0 (x) 6= 0 for all x ∈ (a, b). If limx→a+ g(x) = +∞ and limx→a+ fg0 (x)
(x)
= l for some
f (x) f 0 (x)
l ∈ R ∪ {±∞}, then limx→a+ g(x) = limx→a+ g 0 (x) = l.
16 MA131B Analysis II

Analogously, we can use left-sided limits; we can also use two-sided limits:
Theorem 6.5 (L’Hôpital’s Rule, case II). Suppose f, g : [a, b] \ {c} → R are differentiable on (a, c) and
0
(c, b), and g 0 (x) 6= 0 for all x ∈ (a, b). If for c ∈ (a, b) we have limx→c g(x) = +∞ and limx→c fg0 (x)
(x)
=l
f (x) f 0 (x)
for some l ∈ R ∪ {±∞}, then limx→c g(x) = limx→c g 0 (x) = l.

The evaluation of limits using L’Hôpital’s Rule is a very useful application of the Mean Value Theorem,
and is popular in exam questions.

You might also like