Ma131B Analysis Ii Revision Guide: Written by David Mccormick
Ma131B Analysis Ii Revision Guide: Written by David Mccormick
Ma131B Analysis Ii Revision Guide: Written by David Mccormick
MA131B
Analysis II
Revision Guide
WMS
ii MA131B Analysis II
Contents
1 Review of MA131A Analysis I 1
2 Continuity 2
2.1 Defining Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Continuity and Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Continuous functions on closed bounded intervals . . . . . . . . . . . . . . . . . . . . . . . 6
3 Differentiation 7
3.1 Basic Properties of Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Derivatives and Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Power Series 11
4.1 Differentiability and Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Upper and Lower Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6 L’Hôpital’s Rule 15
Introduction
This revision guide for MA131B Analysis II has been designed as an aid to revision, not a substitute for
it. As can be seen from the length of this revision guide, Analysis II is a fairly long course with lots of
confusing definitions and big theorems; hopefully this guide will help you make sense of how everything
fits together. Only some proofs are included; the inclusion of a particular proof is no indication of how
likely it is to appear on the exam. However, the exam format doesn’t change much from year to year, so
the best way of revising is to do past exam papers, as well as assignment questions, using this guide as
a reference.
For further practice, the questions in R. P. Burn’s Numbers and Functions are invaluable, and indeed
one of the principal sources of this revision guide. For the more brave of heart, Rudin’s Principles of
Mathematical Analysis is a classic, unparalleled textbook, but is pitched somewhat above the level of the
course, with some of the most challenging analysis questions known to man.
Disclaimer: Use at your own risk. No guarantee is made that this revision guide is accurate or
complete, or that it will improve your exam performance. Use of this guide will increase entropy,
contributing to the heat death of the universe. Contains no GM ingredients. Your mileage may vary.
All your base are belong to us.
Authors
Written by D. S. McCormick ([email protected])
Based upon lectures given by David Elworthy at the University of Warwick, 2005 and 2006.
Minor corrections Stephen Loke & Guy Barwell, 2012.
Any corrections or improvements should be entered into our feedback form at http://tinyurl.com/WMSGuides
(alternatively email [email protected]).
History
First Edition: 2007.
Current Edition: February 21, 2016.
MA131B Analysis II 1
In analysis II we have a better way (in most cases) of proving that the series converges, namely
Cauchy’s nth Root Test. This also provides radius of convergence (covered in the ”Power Series” section).
1b 6= 0 =⇒ bn 6= 0 eventually; for the finitely many n where bn = 0 we can ignore the terms an /bn .
2 Something holds eventually if there exists an N ∈ N such that it holds for all n ≥ N .
2 MA131B Analysis II
2 Continuity
Fundamental to all of mathematics is the notion of a function:
Definition 2.1. Given two sets A and B, a function f : A → B is a pairing of each element of A with
an element of B. The element of B which is paired with a ∈ A is denoted by f (a).
Here the set A is called the domain of the function, while the set B is called the co-domain of the
function. The range of the function is the set of points in B which is mapped to by some point in A,
i.e. {f (a) | a ∈ A} ⊂ B. Throughout this module we only consider functions f : A → R, where A ⊂ R;
that is, real-valued functions of one real variable.
The general definition of function simply requires pairing each value x in the domain with a value
f (x) in the co-domain, even when something like a fresh formula is needed for each x, and so does not
require the value of f (x) to be anywhere close to the value of f (y), even when x and y are arbitrarily
close to one another. We are thus moved to define various classes of functions which are well-behaved;
most important are continuous functions, which makes rigorous the idea that if x and y are close to each
other, then f (x) and f (y) should also be close to each other.
Having two different definitions of continuity would be disastrous if they led to different consequences.
Thankfully however, we show now that the two definitions are in fact equivalent.
Theorem 2.4. Given a function f : A → R, where A ⊂ R, and given a ∈ A, then f is (neighbourhood)
continuous at x = a if and only if f is sequentially continuous at x = a.
Proof. (=⇒) Suppose f is neighbourhood continuous at x = a. Let (an ) → a be a sequence in A. As f
is neighbourhood continuous at a, we have that ∀ε > 0, ∃ δ > 0 such that x ∈ A and |x − a| < δ =⇒
|f (x) − f (a)| < ε. Now an ∈ A for all n, so |an − a| < δ =⇒ |f (an ) − f (a)| < ε. But since (an ) → a,
we have that ∀ δ > 0, ∃ N such that for n ≥ N we have |an − a| < δ, and hence that |f (an ) − f (a)| < ε.
So f (an ) → f (a).
(⇐=) We use the contrapositive; suppose f is not neighbourhood continuous at x = a. Then ∃ ε > 0
such that given any δ > 0, there is at least one x for which |x − a| < δ, but |f (x) − f (a)| ≥ ε. For each
n ∈ N, let δ = n1 and choose an x satisfying this condition and label it an . Now for each n we have
|an − a| < n1 , so (an ) → a. But for each n, |f (an ) − f (a)| ≥ ε for some ε > 0. Hence f (an ) does not
tend to f (a), and f is not sequentially continuous at x = a.
Definition 2.5. If a function f : A → R is continuous at every point a ∈ A, then we say f is continuous
on A, or simply a continuous function.
MA131B Analysis II 3
Lemma 2.6 (Preservation of sign). If f : E → R is continuous at c ∈ E and f (c) > 0 then there exists
a δ > 0 such that f (x) > 0 for all x ∈ E with |x − c| < δ. Similarly, if f (c) < 0 then there exists a δ > 0
such that f (x) < 0 for all x ∈ E with |x − c| < δ.
Proof. If f : E → R, E ⊂ R is continuous at c ∈ E and f (c) > 0 then ∃δ > 0 such that |x − c| < δ =⇒
|f (x) − f (c)| < f (c) so f (x) > −f (c) + f (c) = 0 by the same reasoning if f (c) < 0 then there exists δ > 0
such that for |x − c| < δ =⇒ f (x) < 0
Notice that if f is not continuous at c ∈ E this does not have to hold, consider f (x) = −1 for x 6= 0
and f (x) = 1 for x = 0 is not continuous at 0 and preservation of sign does not hold. Notice also, we
can change 0 in the inequalities f (c) > 0 or f (c) < 0 for any number p to give a similar result. For
example, with a function continuous at c and satisfying the inequality f (c) > p, it holds that there is a
neighbourhood A ⊂ E around c such that for any x ∈ A the inequality f (x) > p holds, (similarly for
f (c) < p). Can you prove this?
Most functions we are familiar with are continuous. For example, the functions f, g : R → R given
by f (x) = c (for any constant c ∈ R) and g(x) = x are continuous. Moreover, when we have continuous
functions, we can create new continuous functions by adding, subtracting, multiplying or dividing them.
The following result follows immediately from the sequential definition of continuity and the analogous
results for sequences:
Proposition 2.7. Given A ⊂ R, and given a ∈ A, suppose that f, g : A → R are continuous at a. Then:
1. the function f + g : A → R defined by (f + g)(x) := f (x) + g(x) is continuous at a;
2. the function f · g : A → R defined by (f · g)(x) := f (x) · g(x) is continuous at a; and
3. if also g(x) 6= 0 for all x ∈ A, then the function fg : A → R defined by fg (x) = fg(x)
(x)
is continuous
at a.
Since the function x 7→ x is continuous, we can multiply it by itself to get that x 7→ xn is continuous
for any n ∈ N. Multiplying by any constant an shows that x 7→ an xn is continuous. Hence the sum
x 7→ a0 + a1 x + a2 x2 + · · · + an xn is continuous, i.e.:
Proposition 2.8. Any (finite) polynomial function p : R → R given by p(x) = a0 +a1 x+a2 x2 +· · ·+an xn
is continuous.
p1 (x)
Corollary 2.9. Any rational function q : A → R given by q(x) = p2 (x) where p1 , p2 are (finite) polyno-
mials is continuous provided that p2 (x) 6= 0 for all x ∈ A.
Theorem 2.10. If the function f : A → R is continuous at x = a, and the function g : B → R is
continuous at f (a), where f (A) ⊂ B, then g ◦ f is continuous at x = a.
That is, a continuous function of a continuous function is continuous.
Proof. Let (an ) → a be a sequence
in A. By the continuity of f at a, f (an ) → f (a). Then, since g is
continuous at f (a), g(f (an )) → g(f (a)). Hence g ◦ f is continuous at x = a.
1
Corollary 2.11. If f : E → R is continuous at c and f (c) 6= 0 then f is well-defined in a neighbourhood
of c and is continuous at c.
Examples 2.12. Continuous functions are so familiar that to clarify the meaning of this definition we
need some examples of discontinuity, illustrating the absence of continuity. The continuity of a function
at a point a is recognised by the way the function affects every sequence
(an ) tending to a. Discontinuity
is established by finding just one sequence (an ) → a for which f (an ) does not tend to f (a).
1. Consider again the integer part function f (x) = [x]. f is continuous for any non-integer x and
discontinuous at any integer x. Thus f has an infinity of isolated discontinuities.
2. Consider the function f : R → R defined by f (x) = 0 when x is rational, and f (x) = x when x is
irrational. This is continuous at just a single point, x = 0.
4 MA131B Analysis II
3. Consider the function f (x) = sin x1 . This is continuous if x 6= 0, but what about x = 0? Let
1
an = 2nπ and bn = (2n+1 1 )π . Now (an ) → 0 and (bn ) → 0, but f (an ) = 0 for all n, and f (bn ) = 1
2
for all n. Hence, if we try to extend f to a continuous function by defining
˜ f (x) if x > 0
f (x) =
k if x = 0
then we will fail, since whatever value of k we choose there will exist sequences (an ) such that
f (an ) does not tend to k.
Lemma 2.13. For x ∈ (0, π2 ) we have that 0 < sin x < x < tan x
Proof. Firstly, take a unit circle with centre O and two distinct points A,B on its circumference with
the angle between OA and OB less than π2 . Then take a point E outside the circle such that OBE is a
right angle and notice that the area of triangle OAB < sector OAB < area of triangle OBE. Finally, by
denoting the angle of the sector by x we get the desired inequality.
Proof. Firstly, note that sin(x) = sin( x−c x+c x+c x−c
2 + 2 ) and sin(c) = sin( 2 − 2 ) and then set δ = (where
|x − c| < δ). Using the identities above and then the addition formula for sine we get | sin(x) − sin(c)| =
|2 sin( x−c x+c x−c x+c x−c
2 ) cos( 2 )|. Hence, |2 sin( 2 ) cos( 2 )| < |2 sin( 2 )| < |x − c| < δ = and therefore sin x is
continuous.
Definition 2.15. Let E ⊂ R function f : E → R is strictly increasing if ∀x, y ∈ E, x > y =⇒ f (x) >
f (y), increasing if ∀x, y ∈ E, x > y =⇒ f (x) ≥ f (y), decreasing if ∀x, y ∈ E, x > y =⇒ f (x) ≤ f (y)
and strictly decreasing if ∀x, y ∈ E, x > y =⇒ f (x) < f (y)
The Intermediate Value Theorem, or IVT, tells us that a continuous function maps intervals onto
intervals:
Theorem 2.16 (IVT version 1). If f : [a, b] → R is a continuous function with f (a) < f (b), then for
each k such that f (a) < k < f (b), there exists c ∈ (a, b) such that f (c) = k.
3 The word closed as we have used it here indicates that every convergent sequence within each closed set converges to
a point of the set (see the Closed Interval Rule, theorem 1.3).
4 The word open as we have used it here indicates that there is space within each open set around each point of the set.
For example, if c ∈ (a, b), then a < 12 (a + c) < c < 21 (c + b) < b. This is also described by saying that an open set contains
a neighbourhood of each of its points.
MA131B Analysis II 5
Corollary 2.17 (IVT version 2). If f : [a, b] → R is a continuous function with f (b) < f (a), then for
each k such that f (b) < k < f (a), there exists c ∈ (a, b) such that f (c) = k.
Note that we most definitely require completeness for the Intermediate Value Theorem – if we work
over Q the theorem vanishes in a puff of smoke. For example, consider f : Q → Q given by f (x) = −1 if
x2 < 2 and f (x) = 1 if x2 > 2. Then f is continuous with f (0) = −1 and f (2) = 1, but there is no c
with f (c) = 0. Notice also that the IVT does not work if a function, f, is not continuous at a point. E.g.
f (x) = −1 for x 6= 0 and f (x) = 1 for x = 0 then f (0) = 1 and f (1) = 0 but there is no point c ∈ [0, 1]
such that f (c) = 21 .
If we consider f : [a, b] → [a, b] and apply the IVT to f (x) − x, we get a simple fixed point theorem 5 :
Theorem 2.19 (Fixed point theorem). If f : [a, b] → [a, b] is continuous, then there exists at least one
c ∈ [a, b] such that f (c) = c.
The Intermediate Value Theorem tells us that the range of a continuous function on an interval is
also an interval. A continuous function on a closed interval has special properties that lead to a stronger
conclusion.
2.3 Limits
Definition 2.20 (Continuous limit). For f : (a, b)\{c} → R then we say that f (x) tends to α as x tend
to c, denoted f (x) → α as x → c, if ∀ > 0∃δ > 0 such that for |x − c| < δ =⇒ |f (x) − α| <
Proposition 2.21 (Sandwich Rule). Suppose that f, g, h : (a, b)\{c} → R, and that f (x) ≤ g(x) ≤ h(x)
then if limx→c f (x) = limx→c h(x) = α then limx→c g(x) = α
Proposition 2.22 (Uniqueness of Limits). If limx→a f (x) = l1 and limx→a f (x) = l2 , then l1 = l2 .
Some kinds of discontinuities may be identified and discussed by considering the two sides of the
point in question separately. On this basis we define one-sided limits:
Definition
2.23. Let f : A → R. If, for any sequence (an ) in A with an < a and (an ) → a, we have
f (an ) → l, we write lim− f (x) = l. Similarly, if, for any sequence (an ) in A with an > a and (an ) → a,
x→a
we have f (an ) → l, we write lim f (x) = l.
x→a+
For example, for some integer a ∈ Z, we have that limx→a− [x] = [a] − 1 and limx→a+ [x] = [a]. Note
that we do not require a ∈ A; we do not take account of the value of f (a) – in fact we do not even
require that f be defined at a, only that you can get arbitrarily close to a and still take values of f (x).
5 In higher dimensions this is completely non-trivial, since we do not have any inequalities when dealing with vectors.
The full version is known as Brouwer’s fixed point theorem and is covered in detail in MA3H5 Manifolds.
6 MA131B Analysis II
Theorem 2.24 (Algebra of Limits). Suppose lim f (x) = l and lim g(x) = m. Then lim f (x)+g(x) =
x→a x→a x→a
l + m, lim f (x) · g(x) = l · m, and if m 6= 0 then lim fg(x)
(x)
= ml .
x→a x→a
Just as we can define continuity in terms of sequences and neighbourhoods, we can do the same for
limits. We prove this in a similar way to the proof of theorem 2.4.
Definition 2.25. Given a function f : A → R, limx→a− f (x) = l if and only if for each ε > 0, there is a
δ > 0 such that x ∈ A and a − δ < x < a =⇒ |f (x) − l| < ε.
Similarly, limx→a+ f (x) = l if and only if for each ε > 0, there is a δ > 0 such that x ∈ A and
a < x < a + δ =⇒ |f (x) − l| < ε.
If, for a particular point a, we have limx→a− f (x) = l and limx→a+ f (x) = l then it is customary to
write limx→a f (x) = l. We then have:
The following properties of so-called two-sided limits follow immediately:
Moreover, by chasing definitions it is easy to show that for continuous functions the limit of the
function is the function of the limit:
Proposition 2.26. f : A → R is continuous at a ∈ A if and only if lim f (x) = lim f (x) = f (a), i.e. if
x→a− x→a+
and only if lim f (x) = f (a).
x→a
Definition 2.27. We say that f (x) → α as x → ∞ if ∀ > 0 ∃ R such that ∀x > R we have |f (x)−α| <
and that f (x) → α as x → −∞ if ∀ > 0 ∃ R such that ∀x < R we have |f (x) − α| <
Definition 2.28. We say that f (x) → ∞ as x → c if ∀R ∃δ > 0 such that f (x) > R for all x with
0 < |x − c| < δ and we say that f (x) → −∞ as x → c if ∀R ∃δ > 0 such that f (x) < R for all x with
0 < |x − c| < δ
Definition 2.29. We say that f (x) → ∞ as x → ∞ if ∀R ∃δ such that f (x) > R with x > δ and we
say that f (x) → −∞ as x → ∞ if ∀R ∃δ such that f (x) < R with x > δ
Many of the results we established for continuous functions – such as sum and product rules – carry
over easily into limits, and are proved easily using the analogous results for sequences.
Theorem 2.30 (Algebra of Limits). Suppose lim f (x) = l and lim g(x) = m. Then lim f (x)+g(x) =
x→a x→a x→a
l + m, lim f (x) · g(x) = l · m, and if m 6= 0 then lim fg(x)
(x)
= ml .
x→a x→a
Note, however, that while the composition of continuous functions is continuous, limits do not behave
as nicely under composition, and there is no analogous simple statement about limits of composed
functions.
By replacing “for all ε > 0” with “for all C > 0” and changing |f (x) − l| < ε to f (x) > C, we get
infinite limits. We can also replace “there exists δ > 0” with “there exists C > 0” and change |x − a| < δ
with x > C to get limits at infinity. Practise playing with the definitions, and also with evaluating all
manner of limits; such questions are popular on exams.
Proposition 2.32. Suppose f : [a, b] → R is continuous, injective and f (a) < f (b). Then the range of
f is the interval [f (a), f (b)], and f : [a, b] → [f (a), f (b)] is a strictly increasing bijection, whose inverse
f −1 : [f (a), f (b)] → [a, b] is also continuous and strictly increasing.
Proposition 2.33. If f : [a, b] → R is continuous with f (a) < f (b) but is not strictly increasing then it
is not injective.
Proposition 2.34. If f : [a, b] → [f (a), f (b)] is non-decreasing and surjective then it is continuous.
Corollary 2.35. Let f : [a, b] → R be continuous and strictly monotonic. Then f −1 : f ([a, b]) → [a, b] is
continuous and strictly monotonic.
Example 2.36. The function sin : [− π2 , π2 ] → R is continuous and injective; hence it is strictly increasing
and bijective onto [−1, 1], and its inverse arcsin : [−1, 1] → [− π2 , π2 ] is also continuous.
3 Differentiation
Having dealt with continuous functions, the next class of “useful” functions which we come to are the
differentiable functions. We first ask a simple question: how can we find the gradient, or rate of change, of
f ? If we consider a point (a, f (a)), then the line through this point with gradient m is y−f (a) = m(x−a).
Now, the equation of the chord joining (a, f (a)) and (a + h, f (a + h)) is y − f (a) = f (a+h)−f
h
(a)
· (x − a).
f (a+h)−f (a)
If we let m = lim h , then we would surely describe the line y − f (a) = m(x − a) as being a
h→0
tangent to the curve at x = 0, and m as being the gradient of this tangent. On the strength of these
ideas, we define the derivative at a point a of its domain:
Definition 3.1. For a function f : A → R and a point a ∈ A, if
f (a + h) − f (a) f (x) − f (a)
lim = lim =m
h→0 h x→a x−a
for some real number m, then m is called the derivative of f at a, usually denoted by f 0 (a), and f is said
to be differentiable at a. (The two limits are equivalent; the first exists if and only if the second exists,
and they are equal if they both exist.)
Although derivatives arise naturally from considering the geometric notion of tangent, there is still
one circumstance when a tangent to the graph of a function may exist without a derivative of the function
at the point in question; this occurs when the slope of the chord tends to ∞. The derivative does not
exist (since ∞ is not a real number), but the tangent still exists and is vertical on the graph.
exist as real numbers; we call these the right derivative of f at a and the left derivative of f at b,
respectively.
Definitions 3.10. If a function f : A → R is differentiable, we define the derived function f 0 : A → R.
If the derived function is differentiable at a point a ∈ A, we say that f is twice differentiable, and denote
(f 0 )0 (a) as f 00 (a). Similarly, we say f is n times differentiable at a if f (n) (a) exists, where we define
inductively by f (n) (a) = (f (n−1) )0 (a) and f (0) (a) = f (a).
If f : A → R is n times differentiable, and f (n) : A → R is continuous, then we say that f is C n (A).
If f is n times differentiable for all n, then we say f is C ∞ (A).
dg dg
6 In Leibniz notation, the chain rule says that dx (x) = df (f (x)) · df
dx
(x).
7 These proofs are fairly simple, involving mainly algebraic manipulation, and are examinable.
MA131B Analysis II 9
You may be used to finding local maxima or minima by setting f 0 (x) = 0 and solving for x; however,
0
f (x) = 0 is a necessary condition for x to be a maximum or minimum, but not sufficient:
Lemma 3.12. If f : A → R has a local maximum or local minimum at some a ∈ A, and f is differentiable
at a, then f 0 (a) = 0.
To better understand the situation, we turn to more general questions. The following powerful
theorem seems obvious at first glance; however, it relies on the completeness of R and is hence non-
trivial
Theorem 3.13 (Rolle’s Theorem). If f : [a, b] → R is continuous on [a, b] and differentiable on (a, b),
and additionally f (a) = f (b), then there is a point c ∈ (a, b) such that f 0 (c) = 0.
Proof. As f is continuous, it is bounded and attains its bounds, i.e. there exist x1 , x2 ∈ [a, b] such that
f (x1 ) ≤ f (x) ≤ f (x2 ) for all x ∈ [a, b]. Set A = f (a) = f (b). If f (x1 ) < A, then x1 ∈ (a, b) is a
local minimum, and hence f 0 (x1 ) = 0. If f (x2 ) > A, then x2 ∈ (a, b) is a local maximum, and hence
f 0 (x2 ) = 0. Otherwise f (x1 ) = f (x2 ) = A and hence f is constant, so f 0 (x) = 0 for all x ∈ (a, b).
Notice that the interval must be closed in order to use Extreme Value Theorem. Geometrically,
Rolle’s Theorem states that there is a tangent to the curve y = f (x), at some point between x = a and
x = b, which is parallel to the chord joining the points (a, f (a)) and (b, f (b)). In this theorem, the chord
is horizontal; we now generalise this to allow the chord to have any gradient.
Theorem 3.14 (The Mean Value Theorem). If f : [a, b] → R is continuous on [a, b] and differentiable
on (a, b), then there is a point c ∈ (a, b) such that f 0 (c) = f (b)−f
b−a
(a)
.
f (b)−f (a)
Proof. Consider the line joining the points (a, f (a)) and (b, f (b)), whose gradient is m = b−a .
f (b)−f (a)
This is given by k : [a, b] → R where k(x) = f (a) + b−a · (x − a). Applying Rolle’s theorem to
f (b)−f (a)
g(x) = f (x)−k(x), which has g(a) = g(b) = 0, yields some c ∈ (a, b) with g 0 (c) = 0. But k 0 (x) = b−a
f (b)−f (a) f (b)−f (a)
for all x ∈ (a, b), so g 0 (c) = f 0 (c) − b−a = 0, i.e. f 0 (c) = b−a as required.
Corollary 3.15. If f : [a, b] → R is continuous on [a, b] and differentiable on (a, b), then
• if f 0 (x) = 0 for all x ∈ (a, b), then f is constant on [a, b];
• if f 0 (x) > 0 for all x ∈ (a, b), then f is strictly increasing on [a, b];
• if f 0 (x) < 0 for all x ∈ (a, b), then f is strictly decreasing on [a, b].
In higher dimensions, the Mean Value Theorem does not hold, but often the Mean Value Inequality
does hold:
Proposition 3.16 (Mean Value Inequality). Let f : [a, b] → R be continuous on [a, b] and differentiable
on (a, b). If there exists a real number K such that |f 0 (x)| ≤ K, then for all x, y ∈ [a, b] we have
|f (x) − f (y)| ≤ K|x − y|.
This says that if the derivative is bounded, f is Lipschitz. (A function f : A → R is Lipschitz if there
exists K > 0 such that |f (x) − f (y)| ≤ K|x − y|; a Lipschitz function is necessarily continuous.) In
particular, this occurs when f 0 : [a, b] → R is continuous, since it is then automatically bounded. The
Mean Value Inequality converts local information into global information: |f 0 (x)| ≤ K tells us that the
local rate of change is no greater than K, and then |f (x) − f (y)| ≤ K|x − y| says that the rate of change
between any two points in the domain is no greater than K.
When f is invertible, we can relate the derivatives of f and f −1 as follows8 :
8 In dx dy
Leibniz notation, the inverse rule says that dy
(y) = 1/ dx (x), where y(x) = f (x).
10 MA131B Analysis II
We turn now to Taylor’s theorem. When we write the Mean Value Theorem in the form f (b) =
f (a) + (b − a)f 0 (c), for some c ∈ (a, b), we can use the term (b − a)f 0 (c) to give us an estimate of how
near the value of f at b is to its value at a. This is a linear approximation to f around the point a.
What if we try to approximate it by a polynomial? Well, let pn (x) = f (a) + k1 (x − a) + · · · + kn (x − a)n ,
and suppose f (x) = pn (x) + rn (x), where rn (x) is a remainder term for the nth -order approximation.
If we choose the ki such that f and p have the same derivatives of orders 1 to n at x = a, then we get
(j)
pn (a) = j!kj = f (j) (a) for 1 ≤ j ≤ n, and hence
Theorem 3.17 (Cauchy’s Mean Value Theorem). Suppose f, g : [a, b] → R are continuous on [a, b]
and differentiable on (a, b), and g 0 (x) 6= 0 for all x ∈ (a, b). Then there exists c ∈ (a, b) for which
f (b)−f (a) f 0 (c)
g(b)−g(a) = g 0 (c) .
Theorem 3.18 (Taylor’s Theorem). If f : [a, b] → R has a continuous nth derivative on [a, b] and is n + 1
times differentiable on (a, b), then there is a point c ∈ (a, b) such that
By putting a = 0, b = x and c = θx into Taylor’s theorem, for some θ ∈ (0, 1), we get the special case
known as Maclaurin’s Theorem. The final term, known as the Lagrange form of the remainder, is most
like that in the Mean Value Theorem; a term dependent on a derivative of f at some point c ∈ (a, b).
As we do not, in general, know what the point c is, this does not always give useful information, so we
consider an alternative formulation of Taylor’s Theorem, with Cauchy’s form of the remainder:
Theorem 3.19 (Taylor’s Theorem with Cauchy’s form of the remainder). If f : [a, b] → R has a con-
tinuous nth derivative on [a, b] and is n + 1 times differentiable on (a, b), then there is a point c ∈ (a, b)
such that
Note that this means that if f and f −1 are inverse functions, then if f 0 (a) = 0 then f −1 cannot be
differentiable at f (a). However, if the domain of f is an interval, then the differentiability of f at a,
along with the requirement that f 0 (a) 6= 0, is sufficient to show that f −1 is differentiable at the point
f (a).
Proposition 3.21. Suppose f : (a, b) → (h, k) is a continuous bijection with inverse f −1 : (h, k) → (a, b).
Then if f is differentiable at x ∈ (a, b) with f 0 (x) 6= 0, then f −1 is differentiable at f (x).
Theorem 3.22 (The Inverse Function Theorem). If f : (a, b) → R is continuous and differentiable on
(a, b) with f 0 (x) > 0 for all x ∈ (a, b), then f : (a, b) → (h, k) is a bijection with inverse f −1 : (h, k) → (a, b)
which is continuous and differentiable on (h, k) with (f −1 )0 (y) = f 0 (f −1 1
(y)) for all y ∈ (h, k).
9 This is not the most general form of the Inverse Function Theorem; the lectures included a slightly stronger statement.
4 Power Series
So far, the only interesting functions we have considered have been polynomials, f (x) = xn , and what
can be made by combining finitely many through addition, multiplication, or even division. To obtain
functions other than rational functions, we must remove the restriction to a finite number of operations;
in other words, we must admit limiting processes.
In the previous section, we considered polynomial approximations toPdifferentiable functions using
n
Taylor’s Theorem. In each case we had a finite polynomial of the form k=0 ak xk , and the difference
between the function and the polynomial was given by a remainder term depending on the values of
f (n+1) . We now ask the question: what if these remainder terms tend to 0 as n → ∞? Do we get an
“infinite polynomial” representation of the function?
We mustP∞first consider what we mean by an “infinite polynomial”. One likely candidate is a series of
the form n=0 an xn , where the an are fixed coefficients; this is known as a power series. Power series
have many beautiful and interesting properties which make using them in analysis very useful indeed.
Examples 4.1. P n
• Lemma 1.9 from Analysis I tells us that the power series x converges for |x| < 1 and diverges
for |x| ≥ 1.
• Consider the power series Pnxn . By the ratio test, this converges if |x| < 1 and diverges if |x| > 1.
P
n
• Consider the power series P xn! . By the ratio test, this converges for any value of x.
• Consider the power series n!xn . By the ratio test, this diverges for every x 6= 0, but converges
for x = 0.
an xn , either:
P
Theorem 4.2. For any power series
1. the series converges absolutely for all x ∈ R; or
2. there is a real number R, such that the series is absolutely convergent for |x| < R and divergent
for |x| > R; or
3. the series converges only if x = 0.
We call R the radius of convergence of the power series; in the first case we write R = ∞, and in the
third case we set R = 0.
We can use the ratio test (although most of the time Cauchy’s nth Root is better) to find the radius
of convergence as in the above examples; again this is popularPin exam questions. However, note that in
the second case, we know nothing about the convergence of an xn on the boundary where |x| = R.
What is even more beautiful about power series is that they are guaranteed to be differentiable within
their radius of convergence, and that the derivative of a function defined by a power series is simply the
power series obtained by differentiating each term:
P an xn+1
an xn , nan xn−1 and
P P
Lemma 4.3. The power series n+1 have the same radius of convergence.
This is a powerful theorem: given a power series that is convergent on (−R, R), we know automat-
ically that it is differentiable, and its derivative is given by differentiating the power series termwise,
as we would a polynomial. But if we differentiate a power series, we get another power series, which
we can differentiate again and again; so power series can be differentiated arbitrarily many times! Fur-
thermore, as differentiability implies continuity, not only is the power series “infinitely” differentiable,
each derivative (not to mention the power series itself) is continuous. Power series really are very nice
functions.
12 MA131B Analysis II
1 (k)
an xn , and setting x = 0, we see that ak =
P
By applying theorem 4.4 to f (x) = k! f (0). Hence if
f is equal to some power series, then we have
∞
X f (k) (0)
f (x) = xk .
k!
k=0
This looks very much like Taylor’s theorem, but with infinitely many terms; we call it the Taylor series
of f about 0. If we can differentiate f arbitrarily many times, i.e. if f is C ∞ , then can we say that f is
equal to its Taylor series? Many great mathematicians, such as Lagrange, thought so. But the answer
is NO!
Firstly, the Taylor series may not converge; and even if it does, it may not converge to the function.
The starkest example is due to Cauchy: consider f : R → R defined by
−1/x2
e for x 6= 0
f (x) :=
0 for x = 0
Then f possesses derivatives of all orders at every point, but f (n) (0) = 0 for all n, so its Taylor series
about 0 is zero everywhere! Where does this example break down? Simple: we have neglected the
statement of Taylor’s theorem. This states that f is equal to a polynomial plus a remainder term! In
this example, the remainder terms taken about x = 0 do not converge to 0 for any x 6= 0.
P∞ The function f : A → R is called analytic if, for all a ∈ A, there is a radius δ > 0 such
Definition 4.5.
that f (x) = n=0 an (x − a)n for all x with |x − a| < δ.
This says that if f is equals its Taylor series about any point in its domain, then f is analytic. Any
analytic function is automatically C ∞ , but we have seen that not all C ∞ functions are analytic.
However, this limit may not exist. Hence, we search for an explicit expression for the radius of
convergence which always exists. Consider the sequence (0.9, 3.1, 0.99, 3.01, 0.999, 3.001, . . . ). This does
not have a limit, but we can find a subsequence, (0.9, 0.99, 0.999, . . . ) that approaches 1, and another
subsequence (3.1, 3.01, 3.001, . . . ) which approaches 3. No subsequence approaches anything larger than
3. We say that 3 is the upper limit, or lim sup, of this sequence, and that the lower limit, or lim inf, is
1. It should be noted that 3 is not the least upper bound of the elements in this sequence. There are
infinitely many terms that are strictly greater than 3. But if we take any positive ε, then eventually all
of the terms will fall below 3 + ε and stay below. We thus define lim sup an :
Definition 4.7. The upper limit (or “limit superior”) of a sequence (an ), denoted lim sup an , is defined
as lim supn→∞ an := limk→∞ supn≥k an . Similarly, the lower limit (or “limit inferior”) of a sequence
(an ), denoted lim inf an , is defined as lim inf n→∞ an := limk→∞ inf n≥k an .
By definition, (an ) → a if and only if, for each ε > 0 there is a number N such that n ≥ N implies
|an − a| < ε. This can be characterised by saying that, for any ε > 0, a − ε < an < a + ε for all but finitely
many n. This is our working definition of the limit; but our definitions of the upper and lower limits
bear no relation to this. However, for the lim sup and lim inf, one of the inequalities in a − ε < an < a + ε
holds for all but finitely many n, but the other holds only for infinitely many n.
Proposition 4.8. Given a sequence (an ) in R, we have lim sup an = α ∈ R if and only if, for each
ε > 0, the following two conditions hold:
(i) for all but finitely many n (that is, for all n ≥ N for some N ∈ N), we have an < α + ε; and
MA131B Analysis II 13
Proposition 4.9. Given a sequence (an ) in R, we have lim inf an = β ∈ R if and only if, for each
ε > 0, the following two conditions hold:
(i) for all but finitely many n (that is, for all n ≥ N for some N ∈ N), we have an > β − ε; and
(ii) for infinitely many n, we have an < β + ε.
Proposition 4.10. Given a sequence (an ) in R, (an ) → l if and only if lim inf an = lim sup an = l.
There are many uses of lim sup and lim inf; one of these is in analysing convergent sub-sequences
Given a bounded sequence, the Bolzano-Weierstrass Theorem guarantees that it contains a convergent
subsequence. It says nothing, however, about how many of them there are; if the sequence is convergent,
then all sub-sequences converge to the same limit, that of the original sequence. On the other hand,
a sequence which oscillates, e.g. (sin n), will have many convergent sub-sequences, all converging to
different points. We call these points limit points:
Definition 4.11. If (an ) is a sequence in R, we say that l ∈ R is a limit point of (an ) if there exists a
subsequence (ank ) with (ank ) → l as k → ∞.
If the lim sup is the “limit towards which the greatest values converge” (as Cauchy, somewhat heuris-
tically, defined it), then we would expect that the largest value to which a subsequence can converge is
the lim sup of the sequence; similarly, we expect the smallest value to be the lim inf. This is indeed the
case, and what is more, lim sup an and lim inf an are themselves limit points:
Proposition 4.12. Suppose (an ) is a bounded sequence in R. Then lim sup an and lim inf an are limit
points of (an ), and for any limit point l of (an ) we have lim inf an ≤ l ≤ lim sup an .
A general sequence may not always have a limit, but it will always have a lim sup and a lim inf. We
put this to good use in Cauchy’s nth root test:
Theorem 4.13 (Cauchy’s nth Root Test). The series an is absolutely convergent if lim sup |an |1/n < 1
P
and divergent if lim sup |an |1/n > 1.
This leads straight away to our desired expression for the radius of convergence of a power series:
1√
an xn , set R =
P
Corollary 4.14 (The Cauchy–Hadamard formula). For a power series ,
lim sup n |an |
p p
an xn
P
with the convention that R = 0 if lim sup n |an | = +∞ and R = ∞ if lim sup n |an | = 0. Then
is absolutely convergent if |x| < R, and diverges if |x| > R.
We must now, of course, show that all its familiar properties can be retrieved from this definition.
of all, we note that exp 0 = 1, since 00 = 1 and 0n = 0 for all non zero n; comparing exp x with
FirstP
∞ 1
e := n=0 n! we see that exp 1 = e. Using theorem 4.4, we can show that exp is its own derivative:
Propositions 5.2 and 5.3 yield between them, in one way or another, just about every property we
seek to show of exp. By letting y = −x in proposition 5.3, we get that exp(−x) exp x = exp 0 = 1 for all
1
real numbers x; hence exp x 6= 0 for all x ∈ R, and exp(−x) = exp x.
Now, since exp0 (x) = exp x > 0 for all x ∈ R, using the Mean Value Theorem we get that
We now consider some common limits involving exp. First of all, it is clear from the definition of the
1
power series that exp x → ∞ as x → ∞. So, as exp(−x) = exp x , we have that exp x → 0 as x → −∞.
Secondly, we consider the ratio of any power of x and the exponential function:
xn
Lemma 5.6. For all n ∈ N, lim = 0.
x→∞ exp x
Theorem 5.7. exp : R → (0, ∞) is bijective with inverse log : (0, ∞) → R that is differentiable and
log0 (y) = y1 for all y > 0.
The usual properties of the logarithm now fall out almost immediately, again using proposition 5.3:
So far we have only defined ax for rational x; here a > 0. By part (iii) of the above proposition, for
any rational x we have ax = exp(x log a). In order to extend ax to any real x, we define ax := exp(x log a)
for any real x. In the spirit of the section, we prove the following obvious properties of ax :
In particular, since exp 1 = e, we have that log e = 1, and hence that ex = exp x. We can use this
definition of ax to extend our ability to differentiate xk to all real numbers k, as well as differentiating
ax with respect to x.
Proposition 5.10. The functions f, g : (0, ∞) → R defined by f (x) = ax , a > 0, and g(x) = xk , k ∈ R,
are differentiable with f 0 (x) = ax log a and g 0 (x) = kxk−1 .
Proposition 5.12 (The Binomial Theorem for any real index). For any a ∈ R and −1 < x < 1,
∞
X a(a − 1) . . . (a − n + 1) n a(a − 1) 2 a(a − 1)(a − 2) 3
(1 + x)a = x = 1 + ax + x + x + ...
n=0
n! 2! 3!
MA131B Analysis II 15
6 L’Hôpital’s Rule
The Mean Value Theorem is one of the most powerful theorems in all of analysis, and yet it is relatively
simple to understand. It thus comes as something of a surprise that it can lead to such powerful theorems
as L’Hôpital’s Rule, a means of evaluating limits which would otherwise result in 00 by differentiating
both top and bottom without affecting the limit.
The first form of L’Hôpital’s Rule is concerned with the evaluation of limits of the form limx→a fg(x)
(x)
when f (a) = g(a) = 0. The original Mean Value Theorem considered the ratio f (b)−f b−a
(a)
, where f is
a differentiable function. But we now have two functions, both (presumably) differentiable; how do we
relate their rates of change? The answer comes in the form of Cauchy’s Mean Value Theorem for two
functions f and g:
Using this we get the first form of L’Hôpital’s Rule, for right limits:
Proposition 6.1 (L’Hôpital’s Rule for right limits, case I). Suppose f, g : [a, b] → R are continuous on
0
[a, b] and differentiable on (a, b), and g 0 (x) 6= 0 for all x ∈ (a, b). If f (a) = g(a) = 0 and limx→a+ fg0 (x)
(x)
=l
f (x) f 0 (x)
for some l ∈ R ∪ {±∞}, then limx→a+ g(x) = limx→a+ g 0 (x) = l.
Analogously, we can use left-sided limits; we can also use two-sided limits, giving us the standard
form of L’Hôpital’s Rule:
Theorem 6.2 (L’Hôpital’s Rule, case I). Suppose f, g : [a, b] → R are differentiable on (a, b), and
0
g 0 (x) 6= 0 for all x ∈ (a, b). If for some c ∈ (a, b) we have f (c) = g(c) = 0 and limx→c fg0 (x)
(x)
= l for some
f (x) f 0 (x)
l ∈ R ∪ {±∞}, then limx→c g(x) = limx→c g 0 (x) = l.
√ √ √ √
Example 6.3. Suppose we wish to calculate the limit lim √x+2− x+1−1
2
. If we let f (x) = x + 2 − 2 and
√ x→0
g(x) = x + 1 − 1 then we see that f (0) = g(0) = 0. We compute f 0 (x) = 2√x+2 1
and g 0 (x) = 2√x+1
1
;
0
then g (x) 6= 0, so
√1
√
f 0 (x) 2 x+2 x+1 1
lim 0 = lim = lim √ =√ .
x→0 g (x) x→0 √1 x→0 x + 2 2
2 x+1
√ √
f 0 (x) 1 x+2− 2 1
So lim 0 = √ , and hence by L’Hôpital’s Rule we have lim √ =√ .
x→0 g (x) 2 x→0 x+1−1 2
f (x)
So far, L’Hôpital’s Rule has dealt with limits of the form limx→a g(x) in which f (x), g(x) → 0 as
x → a. However, if g(x) → ∞ as x → a, then limx→a fg(x)
(x)
will equal 0 if f is finite, but if also f (x) → ∞
as x → a we have an indeterminate form. For this reason, we study Case II of L’Hôpital’s Rule:
Proposition 6.4 (L’Hôpital’s Rule for right limits, case II). Suppose f, g : (a, b) → R are differentiable,
0
and that g(x) 6= 0 and g 0 (x) 6= 0 for all x ∈ (a, b). If limx→a+ g(x) = +∞ and limx→a+ fg0 (x)
(x)
= l for some
f (x) f 0 (x)
l ∈ R ∪ {±∞}, then limx→a+ g(x) = limx→a+ g 0 (x) = l.
16 MA131B Analysis II
Analogously, we can use left-sided limits; we can also use two-sided limits:
Theorem 6.5 (L’Hôpital’s Rule, case II). Suppose f, g : [a, b] \ {c} → R are differentiable on (a, c) and
0
(c, b), and g 0 (x) 6= 0 for all x ∈ (a, b). If for c ∈ (a, b) we have limx→c g(x) = +∞ and limx→c fg0 (x)
(x)
=l
f (x) f 0 (x)
for some l ∈ R ∪ {±∞}, then limx→c g(x) = limx→c g 0 (x) = l.
The evaluation of limits using L’Hôpital’s Rule is a very useful application of the Mean Value Theorem,
and is popular in exam questions.