LectureNoteFA Mitocw

Download as pdf or txt
Download as pdf or txt
You are on page 1of 176

Functional Analysis

Lecture notes for 18.102, Spring 2020

Richard Melrose

Department of Mathematics, MIT

1
Version 0.9A; Revised: 29-9-2018; Run: May 16, 2020 .

2
Contents

Preface 5
Introduction 6

Chapter 1. Normed and Banach spaces 9


1. Vector spaces 9
2. Normed spaces 11
3. Banach spaces 14
4. Operators and functionals 16
5. Subspaces and quotients 19
6. Completion 21
7. More examples 25
8. Baire’s theorem 26
9. Uniform boundedness 28
10. Open mapping theorem 28
11. Closed graph theorem 30
12. Hahn-Banach theorem 31
13. Double dual 35
14. Axioms of a vector space 36

Chapter 2. The Lebesgue integral 37


1. Integrable functions 37
2. Linearity of L1 42
3. The integral on L1 43
4. Summable series in L1 (R) 47
5. The space L1 (R) 49
6. The three integration theorems 50
7. Notions of convergence 54
8. The space L2 (R) 54
9. Measurable and non-measurable sets 57
10. Measurable functions 58
11. The spaces Lp (R) 59
12. Lebesgue measure 61
13. Higher dimensions 63

Chapter 3. Hilbert spaces 65


1. pre-Hilbert spaces 65
2. Hilbert spaces 67
3. Orthonormal sequences 67
4. Gram-Schmidt procedure 68
5. Orthonormal bases 69
3
4 CONTENTS

6. Isomorphism to l2 69
7. Parallelogram law 70
8. Convex sets and length minimizer 71
9. Orthocomplements and projections 71
10. Riesz’ theorem 73
11. Adjoints of bounded operators 74
12. Compactness and equi-small tails 75
13. Finite rank operators 78
14. Compact operators 80
15. Weak convergence 82
16. The algebra B(H) 85
17. Spectrum of an operator 86
18. Spectral theorem for compact self-adjoint operators 89
19. Functional Calculus 92
20. Spectral projection 94
21. Polar Decomposition 96
22. Compact perturbations of the identity 98
23. Hilbert-Schmidt, Trace and Schatten ideals 100
24. Fredholm operators 106
25. Kuiper’s theorem 109
Chapter 4. Differential and Integral operators 115
1. Fourier series 115
2. Toeplitz operators 119
3. Cauchy problem 122
4. Dirichlet problem on an interval 126
5. Harmonic oscillator 133
6. Fourier transform 135
7. Fourier inversion 136
8. Convolution 140
9. Plancherel and Parseval 143
10. Weak and strong derivatives 144
11. Sobolev spaces 150
12. Schwartz distributions 153
13. Poisson summation formula 154
Appendix A. Problems for Chapter 1 157
1. For §1 157
Appendix B. Problems for Chapter 4 161
1. Hill’s equation 161
2. Mehler’s formula and completeness 162
3. Friedrichs’ extension 166
4. Dirichlet problem revisited 170
5. Isotropic space 171
Appendix. Bibliography 175
PREFACE 5

Preface
These are notes for the course ‘Introduction to Functional Analysis’ – or in the
MIT style, 18.102, from various years culminating in Spring 2020. There are many
people who I should like to thank for comments on and corrections to the notes
over the years, but for the moment I would simply like to thank, as a collective,
the MIT undergraduates who have made this course a joy to teach, as a result of
their interest and enthusiasm.
6 CONTENTS

Introduction
This course is intended for ‘well-prepared undergraduates’ meaning specifically
that they have a rigourous background in analysis at roughly the level of the first
half of Rudin’s book [4] – at MIT this is 18.100B. In particular the basic theory of
metric spaces is used freely. Some familiarity with linear algebra is also assumed,
but not at a very sophisticated level.
The main aim of the course in a mathematical sense is the presentation of the
standard constructions of linear functional analysis, centred on Hilbert space and
its most significant analytic realization as the Lebesgue space L2 (R) and leading up
to the spectral theory of ordinary differential operators. In a one-semester course
at MIT it is only just possible to get this far. Beyond the core material I have
included other topics that I believe may prove useful both in showing how to use
the ‘elementary’ results in various directions.

Dirichlet problem. The treatment of the eigenvalue problem with potential


perturbation on an interval is one of the aims of this course, so let me describe it
briefly here for orientation.
Let V : [0, 1] −→ R be a real-valued continuous function. We are interested in
‘oscillating modes’ on the interval; something like this arises in quantum mechanics
for instance. Namely we want to know about functions u(x) – twice continuously
differentiable on [0, 1] so that things make sense – which satisfy the differential
equation
d2 u
− (x) + V (x)u(x) = λu(x)
dx2
(1) and the boundary conditions
u(0) = u(1) = 0.
Here the eigenvalue, λ is an ‘unknown’ constant. More precisely we wish to know
which such λ’s can occur. In fact all λ’s can occur with u ≡ 0 but this is the ‘trivial
solution’ which will always be there for such an equation. What other solutions are
there? The main result is that there is an infinite sequence of λ’s for which there
is a non-trivial solution of (1) λj ∈ R – they are all real, no non-real complex λ’s
can occur. For each of these λj there is at least exactly a one-dimensional space of
solutions, uj , to (1). We can say a lot more about everything here but one main
aim of this course is to get at least to this point. From a Physical point of view,
(1) represents a linearized oscillating string with fixed ends.
The journey to a discussion of the Dirichlet problem is rather extended and
apparently wayward. The relevance of Hilbert space and the Lebesgue integral is
not immediately apparent – and indeed one can proved the results as stated above
without Hilbert space methods – but I hope this will become clear as we proceed.
It is the completeness of the eigenfunctions which uses Hilbert space.
The basic idea of functional analysis is that we consider a space of all ‘putative’
solutions to the problem at hand. In this case one might take the space of all twice
continuously differentiable functions on [0, 1] – we will consider such spaces below.
One of the weaknesses of this choice of space is that it is not closely connected with
the ‘energy’ invariant of a solution, which is the integral
Z 1
du
(2) (| |2 + V (x)|u(x)|2 )dx.
0 dx
INTRODUCTION 7

It is the importance of such integrals which brings in the Lebesgue integral and
leads to a Hilbert space structure.
In any case one of the significant properties of the equation (1) is that it is
‘linear’. So we start with a brief discussion of linear (I usually say vector) spaces.
What we are dealing with here can be thought of as the eigenvalue problem for an
‘infinite matrix’. This in fact is not a very good way of thinking about operators
on infinite-dimensional spaces, they are not really like infinite matrices, but in this
case it is justified by the appearance of compact operators which are rather more
like infinite matrices. There was a matrix approach to quantum mechanics in the
early days but it was replaced by the sort of ‘operator’ theory on Hilbert space
that we will discuss below. One of the crucial distinctions between the treatment of
finite dimensional matrices and an infinite dimensional setting is that in the latter
topology is encountered. This is enshrined in the notion of a normed linear space
which is the first important topic we shall meet.
After a brief treatment of normed and Banach spaces, the course proceeds to
the construction of the Lebesgue integral and the associated spaces of ‘Lebesgue in-
tegrable functions’ (as you will see this is by way of a universally accepted falsehood,
but a useful one). To some extent I follow here the idea of Jan Mikusiński that one
can simply define integrable functions as the almost everywhere limits of absolutely
summable series of step functions and more significantly the basic properties can
be deduced this way. While still using this basic approach I have dropped the step
functions almost completely and instead emphasize the completion of the space of
continuous functions to get the Lebesgue space. Even so, Mikusiński’s approach
still underlies the explicit identification of elements of the completion with Lebesgue
‘functions’. This approach is followed in the book of Debnaith and Mikusiński [1].
After about a two-week stint of integration and then a little measure theory
the course proceeds to the more gentle ground of Hilbert spaces. Here I have been
most guided by the (old now) book of Simmons [5] which is still very much worth
reading. We proceed to a short discussion of operators and the spectral theorem for
compact self-adjoint operators. I have also included in the notes (but generally not
in the lectures) various things that a young mathematician should know(!) such
as Kuiper’s Theorem. Then in the last third or so of the semester this theory is
applied to the treatment of the Dirichlet eigenvalue problem, followed by a short
discussion of the Fourier transform and the harmonic oscillator. Finally various
loose ends are brought together, or at least that is my hope.
8
CHAPTER 1

Normed and Banach spaces

In this chapter we introduce the basic setting of functional analysis, in the form
of normed spaces and bounded linear operators. We are particularly interested in
complete, i.e. Banach, spaces and the process of completion of a normed space to
a Banach space. In lectures I proceed to the next chapter, on Lebesgue integration
after Section 7 and then return to the later sections of this chapter at appropriate
points in the course.
There are many good references for this material and it is always a good idea
to get at least a couple of different views. The treatment here, whilst quite brief,
does cover what is needed later.

1. Vector spaces
You should have some familiarity with linear, or I will usually say ‘vector’,
spaces. Should I break out the axioms? Not here I think, but they are included
in Section 14 at the end of the chapter. In short it is a space V in which we can
add elements and multiply by scalars with rules quite familiar to you from the the
basic examples of Rn or Cn . Whilst these special cases are (very) important below,
this is not what we are interested in studying here. What we want to come to grips
with are spaces of functions hence the name of the course.
Note that for us the ‘scalars’ are either the real numbers or the complex numbers
– usually the latter. To be neutral we denote by K either R or C, but of course
consistently. Then our set V – the set of vectors with which we will deal, comes
with two ‘laws’. These are maps
(1.1) + : V × V −→ V, · : K × V −→ V.
which we denote not by +(v, w) and ·(s, v) but by v + w and sv. Then we impose
the axioms of a vector space – see Section 14 below! These are commutative group
axioms for +, axioms for the action of K and the distributive law linking the two.
The basic examples:
• The field K which is either R or C is a vector space over itself.
• The vector spaces Kn consisting of ordered n-tuples of elements of K.
Addition is by components and the action of K is by multiplication on
all components. You should be reasonably familiar with these spaces and
other finite dimensional vector spaces.
• Seriously non-trivial examples such as C([0, 1]) the space of continuous
functions on [0, 1] (say with complex values).
In these and many other examples we will encounter below, the ‘component
addition’ corresponds to the addition of functions.

9
10 1. NORMED AND BANACH SPACES

Lemma 1.1. If X is a set then the spaces of all functions


(1.2) F(X; R) = {u : X −→ R}, F(X; C) = {u : X −→ C}
are vector spaces over R and C respectively.
Non-Proof. Since I have not written out the axioms of a vector space it is
hard to check this – and I leave it to you as the first of many important exercises.
In fact, better do it more generally as in Problem 1.2 – then you can sound sophis-
ticated by saying ‘if V is a linear space then F(X; V ) inherits a linear structure’.
The main point to make sure you understand is precisely this; because we do know
how to add and multiply in either R and C, we can add functions and multiply
them by constants (we can multiply functions by each other but that is not part of
the definition of a vector space so we ignore it for the moment since many of the
spaces of functions we consider below are not multiplicative in this sense):-
(1.3) (c1 f1 + c2 f2 )(x) = c1 f1 (x) + c2 f2 (x)
defines the function c1 f1 + c2 f2 if c1 , c2 ∈ K and f1 , f2 ∈ F(X; K). 

You should also be familiar with the notions of linear subspace and quotient
space. These are discussed a little below and most of the linear spaces we will meet
are either subspaces of these function-type spaces, or quotients of such subspaces –
see Problems 1.3 and 1.5.
Although you are probably most comfortable with finite-dimensional vector
spaces it is the infinite-dimensional case that is most important here. The notion
of dimension is based on the concept of the linear independence of a subset of a
vector space. Thus a subset E ⊂ V is said to be linearly independent if for any
finite collection of distinct elements vi ∈ E, i = 1, . . . , N, and any collection of
‘constants’ ai ∈ K, i = 1, . . . , N we have the following implication
N
X
(1.4) ai vi = 0 =⇒ ai = 0 ∀ i.
i=1

That is, it is a set in which there are ‘no non-trivial finite linear dependence rela-
tions between the elements’. A vector space is finite-dimensional if every linearly
independent subset is finite. It follows in this case that there is a finite and maxi-
mal linearly independent subset – a basis – where maximal means that if any new
element is added to the set E then it is no longer linearly independent. A basic
result is that any two such ‘bases’ in a finite dimensional vector space have the
same number of elements – an outline of the finite-dimensional theory can be found
in Problem 1.1.
Still it is time to leave this secure domain behind, since we are most interested
in the other case, namely infinite-dimensional vector spaces. As usual with such
mysterious-sounding terms as ‘infinite-dimensional’ it is defined by negation.
Definition 1.1. A vector space is infinite-dimensional if it is not finite di-
mensional, i.e. for any N ∈ N there exist N elements with no, non-trivial, linear
dependence relation between them.
Thus the infinite-dimensional vector spaces, which you may be quite keen to under-
stand, appear just as the non-existence of something. That is, it is the ‘residual’
case, where there is no finite basis. This means that it is ‘big’.
2. NORMED SPACES 11

So, finite-dimensional vector spaces have finite bases, infinite-dimensional vec-


tor spaces do not. Make sure that you see the gap between these two cases, i.e.
either a vector space has a finite-dimensional basis or else it has an infinite linearly
independent set. In particular if there is a linearly independent set with N elements
for any N then there is an infinite one, there is a point here – if an independent finite
set has the property that there is no element of the space which can be added to it
so that it remains independent then it already is a basis and any other independent
set has the same or fewer elements.
The notion of a basis in an infinite-dimensional vector spaces needs to be modi-
fied to be useful analytically. Convince yourself that the vector space in Lemma 1.1
is infinite dimensional if and only if X is infinite. 1

2. Normed spaces
We need to deal effectively with infinite-dimensional vector spaces. To do so we
need the control given by a metric (or even more generally a non-metric topology,
but we will only get to that much later in this course; first things first). A norm
on a vector space leads to a metric which is ‘compatible’ with the linear structure.
Definition 1.2. A norm on a vector space is a function, traditionally denoted
(1.5) k · k : V −→ [0, ∞),
with the following properties
(Definiteness)
(1.6) v ∈ V, kvk = 0 =⇒ v = 0.
(Absolute homogeneity) For any λ ∈ K and v ∈ V,
(1.7) kλvk = |λ|kvk.
(Triangle Inequality) For any two elements v, w ∈ V
(1.8) kv + wk ≤ kvk + kwk.
Note that (1.7) implies that k0k = 0. Thus (1.6) means that kvk = 0 is equiv-
alent to v = 0. This definition is based on the same properties holding for the
standard norm(s), |z|, on R and C. You should make sure you understand that
(
x if x ≥ 0
R 3 x −→ |x| = ∈ [0, ∞) is a norm as is
(1.9) −x if x ≤ 0
1
C 3 z = x + iy −→ |z| = (x2 + y 2 ) 2 .
Situations do arise in which we do not have (1.6):-
Definition 1.3. A function (1.5) which satisfes (1.7) and (1.8) but possibly
not (1.6) is called a seminorm.

1Hint: For each point y ∈ X consider the function f : X −→ C which takes the value 1 at y
and 0 at every other point. Show that if X is finite then any function X −→ C is a finite linear
combination of these, and if X is infinite then this is an infinite set with no finite linear relations
between the elements.
12 1. NORMED AND BANACH SPACES

A metric, or distance function, on a set is a map


(1.10) d : X × X −→ [0, ∞)
satisfying three standard conditions
(1.11) d(x, y) = 0 ⇐⇒ x = y,
(1.12) d(x, y) = d(y, x) ∀ x, y ∈ X and
(1.13) d(x, y) ≤ d(x, z) + d(z, y) ∀ x, y, z ∈ X.
As you are no doubt aware, a set equipped with such a metric function is called a
metric space.
If you do not know about metric spaces, then you are in trouble. I suggest that
you take the appropriate course now and come back next year. You could read the
first few chapters of Rudin’s book [4] before trying to proceed much further but it
will be a struggle to say the least. The point is
Proposition 1.1. If k · k is a norm on V then
(1.14) d(v, w) = kv − wk
is a distance on V turning it into a metric space.
Proof. Clearly (1.11) corresponds to (1.6), (1.12) arises from the special case
λ = −1 of (1.7) and (1.13) arises from (1.8). 
We will not use any special notation for the metric, nor usually mention it
explicitly – we just subsume all of metric space theory from now on. So kv − wk is
the distance between two points in a normed space.
Now, we need to talk about a few examples; there are more in Section 7.
The most basic ones are the usual finite-dimensional spaces Rn and Cn with their
Euclidean norms
! 12
X
(1.15) |x| = |xi |2
i
where it is at first confusing that we just use single bars for the norm, just as for
R and C, but you just need to get used to that.
There are other norms on Cn (I will mostly talk about the complex case, but
the real case is essentially the same). The two most obvious ones are
|x|∞ = max |xi |, x = (x1 , . . . , xn ) ∈ Cn ,
(1.16)
X
|x|1 = |xi |
i

but as you will see (if you do the problems) there are also the norms
X 1
(1.17) |x|p = ( |xi |p ) p , 1 ≤ p < ∞.
i

In fact, for p = 1, (1.17) reduces to the second norm in (1.16) and in a certain sense
the case p = ∞ is consistent with the first norm there.
In lectures I usually do not discuss the notion of equivalence of norms straight
away. However, two norms on the one vector space – which we can denote k · k(1)
and k · k(2) are equivalent if there exist constants C1 and C2 such that
(1.18) kvk(1) ≤ C1 kvk(2) , kvk(2) ≤ C2 kvk(1) ∀ v ∈ V.
2. NORMED SPACES 13

The equivalence of the norms implies that the metrics define the same open sets –
the topologies induced are the same. You might like to check that the reverse is also
true, if two norms induced the same topologies (just meaning the same collection
of open sets) through their associated metrics, then they are equivalent in the sense
of (1.18) (there are more efficient ways of doing this if you wait a little).
Look at Problem 1.6 to see why we are not so interested in norms in the finite-
dimensional case – namely any two norms on a finite-dimensional vector space are
equivalent and so in that case a choice of norm does not tell us much, although it
certainly has its uses.
One important class of normed spaces consists of the spaces of bounded con-
tinuous functions on a metric space X :
(1.19) C∞ (X) = C∞ (X; C) = {u : X −→ C, continuous and bounded} .
That this is a linear space follows from the (pretty obvious) result that a linear
combination of bounded functions is bounded and the (less obvious) result that a
linear combination of continuous functions is continuous; this we are supposed to
know. The norm is the best bound
(1.20) kuk∞ = sup |u(x)|.
x∈X

That this is a norm is straightforward to check. Absolute homogeneity is clear,


kλuk∞ = |λ|kuk∞ and kuk∞ = 0 means that u(x) = 0 for all x ∈ X which is
exactly what it means for a function to vanish. The triangle inequality ‘is inherited
from C’ since for any two functions and any point,
(1.21) |(u + v)(x)| ≤ |u(x)| + |v(x)| ≤ kuk∞ + kvk∞
by the definition of the norms, and taking the supremum of the left gives
ku + vk∞ ≤ kuk∞ + kvk∞ .
Of course the norm (1.20) is defined even for bounded, not necessarily contin-
uous functions on X. Note that convergence of a sequence un ∈ C∞ (X) (remember
this means with respect to the distance induced by the norm) is precisely uniform
convergence
(1.22) kun − vk∞ → 0 ⇐⇒ un (x) → v(x) uniformly on X.
Other examples of infinite-dimensional normed spaces are the spaces lp , 1 ≤
p ≤ ∞ discussed in the problems below. Of these l2 is the most important for us.
It is in fact one form of Hilbert space, with which we are primarily concerned:-
X
(1.23) l2 = {a : N −→ C; |a(j)|2 < ∞}.
j

It is not immediately obvious that this is a linear space, nor that


  21
X
(1.24) kak2 =  |a(j)|2 
j

is a norm. It is. From now on we will generally use sequential notation and think
of a map from N to C as a sequence, so setting a(j) = aj . Thus the ‘Hilbert space’
l2 consists of the square summable sequences.
14 1. NORMED AND BANACH SPACES

3. Banach spaces
You are supposed to remember from metric space theory that there are three
crucial properties, completeness, compactness and connectedness. It turns out that
normed spaces are always connected, so that is not very interesting, and they are
never compact (unless you consider the trivial case V = {0}) so that is not very
interesting either – in fact we will ultimately be very interested in compact subsets.
So that leaves completeness. This is so important that we give it a special name in
honour of Stefan Banach who first emphasized this property.
Definition 1.4. A normed space which is complete with respect to the induced
metric is a Banach space.
Lemma 1.2. The space C∞ (X), defined in (1.19) for any metric space X, is a
Banach space.
Proof. This is a standard result from metric space theory – basically that the
uniform limit of a sequence of (bounded) continuous functions on a metric space is
continuous. However, it is worth recalling how one proves completeness at least in
outline. Suppose un is a Cauchy sequence in C∞ (X). This means that given δ > 0
there exists N such that
(1.25) n, m > N =⇒ kun − um k∞ = sup |un (x) − um (x)| < δ.
x∈X

Fixing x ∈ X this implies that the sequence un (x) is Cauchy in C. We know that
this space is complete, so each sequence un (x) must converge (we say the sequence
of functions converges pointwise). Since the limit of un (x) can only depend on x,
we may define u(x) = limn un (x) in C for each x ∈ X and so define a function
u : X −→ C. Now, we need to show that this is bounded and continuous and is the
limit of un with respect to the norm. Any Cauchy sequence is bounded in norm –
take δ = 1 in (1.25) and it follows from the triangle inequality that
(1.26) kum k∞ ≤ kuN +1 k∞ + 1, m > N
and the finite set kun k∞ for n ≤ N is certainly bounded. Thus kun k∞ ≤ C, but this
means |un (x)| ≤ C for all x ∈ X and hence |u(x)| ≤ C by properties of convergence
in C and thus kuk∞ ≤ C, so the limit is bounded.
The uniform convergence of un to u now follows from (1.25) since we may pass
to the limit in the inequality to find
n > N =⇒ |un (x) − u(x)| = lim |un (x) − um (x)| ≤ δ
m→∞
(1.27)
=⇒ kun − uk∞ ≤ δ.
The continuity of u at x ∈ X follows from the triangle inequality in the form
|u(y) − u(x)| ≤ |u(y) − un (y)| + |un (y) − un (x)| + |un (x) − u(x)|
≤ 2ku − un k∞ + |un (x) − un (y)|.
Given δ > 0 the first term on the far right can be make less than δ/2 by choosing
n large using (1.27) and then, having chosen n, the second term can be made less
than δ/2 by choosing d(x, y) small enough, using the continuity of un . 
I have written out this proof (succinctly) because this general structure arises
often below – first find a candidate for the limit and then show it has the properties
that are required.
3. BANACH SPACES 15

There is a space of sequences which is really an example of this Lemma.


Consider the space c0 consisting of all the sequences {aj } (valued in C) such
that limj→∞ aj = 0. As remarked above, sequences are just functions N −→ C.
If we make {aj } into a function α : D = {1, 1/2, 1/3, . . . } −→ C by setting
α(1/j) = aj then we get a function on the metric space D. Add 0 to D to get
D = D ∪ {0} ⊂ [0, 1] ⊂ R; clearly 0 is a limit point of D and D is, as the nota-
tion dangerously indicates, the closure of D in R. Now, you will easily check (it is
really the definition) that α : D −→ C corresponding to a sequence, extends to a
continuous function on D vanishing at 0 if and only if limj→∞ aj = 0, which is to
say, {aj } ∈ c0 . Thus it follows, with a little thought which you should give it, that
c0 is a Banach space with the norm
(1.28) kak∞ = sup kaj k.
j

What is an example of a non-complete normed space, a normed space which is


not a Banach space? These are legion of course. The simplest way to get one is to
‘put the wrong norm’ on a space, one which does not correspond to the definition.
Consider for instance the linear space T of sequences N −→ C which ‘terminate’,
i.e. each element {aj } ∈ T has aj = 0 for j > J, where of course the J may depend
on the particular sequence. Then T ⊂ c0 , the norm on c0 defines a norm on T but
it cannot be complete, since the closure of T is easily seen to be all of c0 – so there
are Cauchy sequences in T without limit in T . Make sure you are not lost here –
you need to get used to the fact that we often need to discuss the ‘convergence of
sequences of convergent sequences’ as here.
One result we will exploit below, and I give it now just as preparation, concerns
absolutely summable series. Recall that a series is just a sequence where we ‘think’
about adding the terms. Thus if vn is a sequence in some vector space V then there
N
P
is the corresponding sequence of partial sums wN = vi . I will say that {vn } is a
i=1
series if I am thinking about summing it.
Definition 1.5. A series {vn } with partial sums {wN } is said to be absolutely
summable if
X X
(1.29) kvn kV < ∞, i.e. kwN − wN −1 kV < ∞.
n N >1

Proposition 1.2. The sequence of partial sums of any absolutely summable


series in a normed space is Cauchy and a normed space is complete if and only
if every absolutely summable series in it converges, meaning that the sequence of
partial sums converges.
Proof. The sequence of partial sums is
n
X
(1.30) wn = vj .
j=1

Thus, if m > n then


m
X
(1.31) wm − wn = vj .
j=n+1
16 1. NORMED AND BANACH SPACES

It follows from the triangle inequality that


m
X
(1.32) kwn − wm kV ≤ kvj kV .
j=n+1

So if the series is absolutely summable then


X∞ ∞
X
kvj kV < ∞ and lim kvj kV = 0.
n→∞
j=1 j=n+1

Thus {wn } is Cauchy if {vj } is absolutely summable. Hence if V is complete then


every absolutely summable series is summable, i.e. the sequence of partial sums
converges.
Conversely, suppose that every absolutely summable series converges in this
sense. Then we need to show that every Cauchy sequence in V converges. Let
un be a Cauchy sequence. It suffices to show that this has a subsequence which
converges, since a Cauchy sequence with a convergent subsequence is convergent.
To do so we just proceed inductively. Using the Cauchy condition we can for every
k find an integer Nk such that
(1.33) n, m > Nk =⇒ kun − um k < 2−k .
Now choose an increasing sequence nk where nk > Nk and nk > nk−1 to make it
increasing. It follows that
(1.34) kunk − unk−1 k ≤ 2−k+1 .
Denoting this subsequence as u0k = unk it follows from (1.34) and the triangle
inequality that

X
(1.35) ku0n − u0n−1 k ≤ 4
n=1

so the sequence v1 = u01 , vk = u0k − u0k−1 , k > 1, is absolutely summable. Its


sequence of partial sums is wj = u0j so the assumption is that this converges, hence
the original Cauchy sequence converges and V is complete. 
Notice the idea here, of ‘speeding up the convergence’ of the Cauchy sequence
by dropping a lot of terms. We will use this idea of absolutely summable series
heavily in the discussion of Lebesgue integration.

4. Operators and functionals


The vector spaces we are most interested in are, as already remarked, spaces of
functions (or something a little more general). The elements of these are the objects
of primary interest but we are especially interested in the way they are related by
linear maps. The sorts of maps we have in mind here are differential and integral
operators. For example the indefinite Riemann integral of a continuous function
f : [0, 1] −→ C is also a continuous function of the upper limit:
Z x
(1.36) I(f )(x) = f (s)ds.
0
So, I : C([0, 1]) −→ C([0, 1]) it is an ‘operator’ which turns one continuous function
into another. You might want to bear such an example in mind as you go through
this section.
4. OPERATORS AND FUNCTIONALS 17

A map between two vector spaces (over the same field, for us either R or C) is
linear if it takes linear combinations to linear combinations:-
(1.37) T : V −→ W, T (a1 v1 +a2 v2 ) = a1 T (v1 )+a2 T (v2 ), ∀ v1 , v2 ∈ V, a1 , a2 ∈ K.
In the finite-dimensional case linearity is enough to allow maps to be studied.
However in the case of infinite-dimensional normed spaces we will require conti-
nuity, which is automatic in finite dimensions. It makes perfectly good sense to
say, demand or conclude, that a map as in (1.37) is continuous if V and W are
normed spaces since they are then metric spaces. Recall that for metric spaces
there are several different equivalent conditions that ensure a map, T : V −→ W,
is continuous:
(1.38) vn → v in V =⇒ T vn → T v in W
(1.39) O ⊂ W open =⇒ T −1 (O) ⊂ V open
(1.40) C ⊂ W closed =⇒ T −1 (C) ⊂ V closed.
For a linear map between normed spaces there is a direct characterization of
continuity in terms of the norm.
Proposition 1.3. A linear map (1.37) between normed spaces is continuous if
and only if it is bounded in the sense that there exists a constant C such that
(1.41) kT vkW ≤ CkvkV ∀ v ∈ V.
Of course bounded for a function on a metric space already has a meaning and this
is not it! The usual sense would be kT vk ≤ C but this would imply kT (av)k =
|a|kT vk ≤ C so T v = 0. Hence it is not so dangerous to use the term ‘bounded’ for
(1.41) – it is really ‘relatively bounded’, i.e. takes bounded sets into bounded sets.
From now on, bounded for a linear map means (1.41).
Proof. If (1.41) holds then if vn → v in V it follows that kT v − T vn k =
kT (v − vn )k ≤ Ckv − vn k → 0 as n → ∞ so T vn → T v and continuity follows.
For the reverse implication we use the second characterization of continuity
above. Denote the ball around v ∈ V of radius  > 0 by
BV (v, ) = {w ∈ V ; kv − wk < }.
Thus if T is continuous then the inverse image of the the unit ball around the
origin, T −1 (BW (0, 1)) = {v ∈ V ; kT vkW < 1}, contains the origin in V and so,
being open, must contain some BV (0, ). This means that
(1.42) T (BV (0, )) ⊂ BW (0, 1) so kvkV <  =⇒ kT vkW ≤ 1.
Now proceed by scaling. If 0 6= v ∈ V then kv 0 k <  where v 0 = v/2kvk. So (1.42)
shows that kT v 0 k ≤ 1 but this implies (1.41) with C = 2/ – it is trivially true if
v = 0. 
Note that a bounded linear map is in fact uniformly continuous – given δ > 0
there exists  > 0 such that
(1.43) kv − wkV = dV (v, w) <  =⇒ kT v − T wkW = dW (T v, T W ) < δ
namely  = δ/C. One consequence of this is that a linear map T : U −→ W into a
Banach space, defined and continuous on a linear subspace, U ⊂ V. (with respect
to the restriction of the norm from V to U ) extends uniquely to a continuous map
T : U −→ W on the closure of U.
18 1. NORMED AND BANACH SPACES

As a general rule we drop the distinguishing subscript for norms, since which
norm we are using can be determined by what it is being applied to.
So, if T : V −→ W is continous and linear between normed spaces, or from
now on ‘bounded’, then
(1.44) kT k = sup kT vk < ∞.
kvk=1

Lemma 1.3. The bounded linear maps between normed spaces V and W form
a linear space B(V, W ) on which kT k defined by (1.44) or equivalently
(1.45) kT k = inf{C; (1.41) holds}
is a norm.
Proof. First check that (1.44) is equivalent to (1.45). Define kT k by (1.44).
Then for any v ∈ V, v 6= 0,
v kT vk
(1.46) kT k ≥ kT ( )k = =⇒ kT vk ≤ kT kkvk
kvk kvk
since as always this is trivially true for v = 0. Thus C = kT k is a constant for which
(1.41) holds.
Conversely, from the definition of kT k, if  > 0 then there exists v ∈ V with
kvk = 1 such that kT k −  < kT vk ≤ C for any C for which (1.41) holds. Since
 > 0 is arbitrary, kT k ≤ C and hence kT k is given by (1.45).
From the definition of kT k, kT k = 0 implies T v = 0 for all v ∈ V and for λ 6= 0,
(1.47) kλT k = sup kλT vk = |λ|kT k
kvk=1

and this is also obvious for λ = 0. This only leaves the triangle inequality to check
and for any T, S ∈ B(V, W ), and v ∈ V with kvk = 1
(1.48) k(T + S)vkW = kT v + SvkW ≤ kT vkW + kSvkW ≤ kT k + kSk
so taking the supremum, kT + Sk ≤ kT k + kSk. 

Thus we see the very satisfying fact that the space of bounded linear maps
between two normed spaces is itself a normed space, with the norm being the best
constant in the estimate (1.41). Make sure you absorb this! Such bounded linear
maps between normed spaces are often called ‘operators’ because we are thinking
of the normed spaces as being like function spaces.
You might like to check boundedness for the example, I, of a linear operator
in (1.36), namely that in terms of the supremum norm on C([0, 1]), kT k ≤ 1.
One particularly important case is when W = K is the field, for us usually C.
Then a simpler notation is handy and one sets V 0 = B(V, C) – this is called the
dual space of V (also sometimes denoted V ∗ ).
Proposition 1.4. If W is a Banach space then B(V, W ), with the norm (1.44),
is a Banach space.
Proof. We simply need to show that if W is a Banach space then every Cauchy
sequence in B(V, W ) is convergent. The first thing to do is to find the limit. To
say that Tn ∈ B(V, W ) is Cauchy, is just to say that given  > 0 there exists N
such that n, m > N implies kTn − Tm k < . By the definition of the norm, if v ∈ V
5. SUBSPACES AND QUOTIENTS 19

then kTn v − Tm vkW ≤ kTn − Tm kkvkV so Tn v is Cauchy in W for each v ∈ V. By


assumption, W is complete, so
(1.49) Tn v −→ w in W.
However, the limit can only depend on v so we can define a map T : V −→ W by
T v = w = limn→∞ Tn v as in (1.49).
This map defined from the limits is linear, since Tn (λv) = λTn v −→ λT v and
Tn (v1 + v2 ) = Tn v1 + Tn v2 −→ T v2 + T v2 = T (v1 + v2 ). Moreover, |kTn k − kTm k| ≤
kTn − Tm k so kTn k is Cauchy in [0, ∞) and hence converges, with limit S, and
(1.50) kT vk = lim kTn vk ≤ Skvk
n→∞

so kT k ≤ S shows that T is bounded.


Returning to the Cauchy condition above and passing to the limit in kTn v −
Tm vk ≤ kvk as m → ∞ shows that kTn − T k ≤  if n > M and hence Tn → T in
B(V, W ) which is therefore complete. 
Note that this proof is structurally the same as that of Lemma 1.2.
One simple consequence of this is:-
Corollary 1.1. The dual space of a normed space is always a Banach space.
However you should be a little suspicious here since we have not shown that
the dual space V 0 is non-trivial, meaning we have not eliminated the possibility
that V 0 = {0} even when V 6= {0}. The Hahn-Banach Theorem, discussed below,
takes care of this.
One game you can play is ‘what is the dual of that space’. Of course the dual
is the dual, but you may well be able to identify the dual space of V with some
other Banach space by finding a linear bijection between V 0 and the other space,
W, which identifies the norms as well. We will play this game a bit later.

5. Subspaces and quotients


The notion of a linear subspace of a vector space is natural enough, and you
are likely quite familiar with it. Namely W ⊂ V where V is a vector space is a
(linear) subspace if any linear combinations λ1 w1 + λ2 w2 ∈ W if λ1 , λ2 ∈ K and
w1 , w2 ∈ W. Thus W ‘inherits’ its linear structure from V. Since we also have a
topology from the metric we will be especially interested in closed subspaces. Check
that you understand the (elementary) proof of
Lemma 1.4. A subspace of a Banach space is a Banach space in terms of the
restriction of the norm if and only if it is closed.
There is a second very important way to construct new linear spaces from old.
Namely we want to make a linear space out of ‘the rest’ of V, given that W is
a linear subspace. In finite dimensions one way to do this is to give V an inner
product and then take the subspace orthogonal to W. One problem with this is that
the result depends, although not in an essential way, on the inner product. Instead
we adopt the usual ‘myopia’ approach and take an equivalence relation on V which
identifies points which differ by an element of W. The equivalence classes are then
‘planes parallel to W ’. I am going through this construction quickly here under
the assumption that it is familiar to most of you, if not you should think about it
carefully since we need to do it several times later.
20 1. NORMED AND BANACH SPACES

So, if W ⊂ V is a linear subspace of V we define a relation on V – remember


this is just a subset of V × V with certain properties – by
(1.51) v ∼W v 0 ⇐⇒ v − v 0 ∈ W ⇐⇒ ∃ w ∈ W s.t. v = v 0 + w.
This satisfies the three conditions for an equivalence relation:
(1) v ∼W v
(2) v ∼W v 0 ⇐⇒ v 0 ∼W v
(3) v ∼W v 0 , v 0 ∼W v 00 =⇒ v ∼W v 00
which means that we can regard it as a ‘coarser notion of equality.’
Then V /W is the set of equivalence classes with respect to ∼W . You can think
of the elements of V /W as being of the form v + W – a particular element of V
plus an arbitrary element of W. Then of course v 0 ∈ v + W if and only if v 0 − v ∈ W
meaning v ∼W v 0 .
The crucial point here is that
(1.52) V /W is a vector space.
You should check the details – see Problem 1.5. Note that the ‘is’ in (1.52) should
really be expanded to ‘is in a natural way’ since as usual the linear structure is
inherited from V :
(1.53) λ(v + W ) = λv + W, (v1 + W ) + (v2 + W ) = (v1 + v2 ) + W.
The subspace W appears as the origin in V /W.
Now, two cases of this are of special interest to us.
Proposition 1.5. If k · k is a seminorm on V then
(1.54) E = {v ∈ V ; kvk = 0} ⊂ V
is a linear subspace and
(1.55) kv + EkV /E = kvk
defines a norm on V /E.
Proof. That E is linear follows from the properties of a seminorm, since
kλvk = |λ|kvk shows that λv ∈ E if v ∈ E and λ ∈ K. Similarly the triangle
inequality shows that v1 + v2 ∈ E if v1 , v2 ∈ E.
To check that (1.55) defines a norm, first we need to check that it makes sense
as a function k · kV /E −→ [0, ∞). This amounts to the statement that kv 0 k is the
same for all elements v 0 = v + e ∈ v + E for a fixed v. This however follows from
the triangle inequality applied twice:
(1.56) kv 0 k ≤ kvk + kek = kvk ≤ kv 0 k + k − ek = kv 0 k.
Now, I leave you the exercise of checking that k·kV /E is a norm, see Problem ??. 
The second application is more serious, but in fact we will not use it for some
time so I usually do not do this in lectures at this stage.
Proposition 1.6. If W ⊂ V is a closed subspace of a normed space then
(1.57) kv + W kV /W = inf kv + wkV
w∈W

defines a norm on V /W ; if V is a Banach space then so is V /W.


For the proof see Problems ?? and ??.
6. COMPLETION 21

6. Completion
A normed space not being complete, not being a Banach space, is considered
to be a defect which we might, indeed will, wish to rectify.
Let V be a normed space with norm k · kV . A completion of V is a Banach
space B with the following properties:-
(1) There is an injective (i.e. 1-1) linear map I : V −→ B
(2) The norms satisfy
(1.58) kI(v)kB = kvkV ∀ v ∈ V.
(3) The range I(V ) ⊂ B is dense in B.
Notice that if V is itself a Banach space then we can take B = V with I the
identity map.
So, the main result is:
Theorem 1.1. Each normed space has a completion.
There are several ways to prove this, we will come across a more sophisticated
one (using the Hahn-Banach Theorem) later. In the meantime I will describe two
proofs. In the first the fact that any metric space has a completion in a similar
sense is recalled and then it is shown that the linear structure extends to the
completion. A second, ‘hands-on’, proof is also outlined with the idea of motivating
the construction of the Lebesgue integral – which is in our near future.
Proof 1. One of the neater proofs that any metric space has a completion is
to use Lemma 1.2. Pick a point in the metric space of interest, p ∈ M, and then
define a map
(1.59) M 3 q 7−→ fq ∈ C∞ (M ), fq (x) = d(x, q) − d(x, p) ∀ x ∈ M.
That fq ∈ C∞ (M ) is straightforward to check. It is bounded (because of the second
term) by the reverse triangle inequality
|fq (x)| = |d(x, q) − d(x, p)| ≤ d(p, q)
and is continuous, as the difference of two continuous functions. Moreover the
distance between two functions in the image is
(1.60) sup |fq (x) − fq0 (x)| = sup |d(x, q) − d(x, q 0 )| = d(q, q 0 )
x∈M x∈M

using the reverse triangle inequality (and evaluating at x = q). Thus the map (1.59)
is well-defined, injective and even distance-preserving. Since C∞ (M ) is complete,
the closure of the image of (1.59) is a complete metric space, X, in which M can
be identified as a dense subset.
Now, in case that M = V is a normed space this all goes through. The
disconcerting thing is that the map q −→ fq is not linear. Nevertheless, we can
give X a linear structure so that it becomes a Banach space in which V is a dense
linear subspace. Namely for any two elements fi ∈ X, i = 1, 2, define
(1.61) λ1 f1 + λ2 f2 = lim fλ1 pn +λ2 qn
n→∞
where pn and qn are sequences in V such that fpn → f1 and fqn → f2 . Such
sequences exist by the construction of X and the result does not depend on the
choice of sequence – since if p0n is another choice in place of pn then fp0n − fpn → 0
in X (and similarly for qn ). So the element of the left in (1.61) is well-defined. All
22 1. NORMED AND BANACH SPACES

of the properties of a linear space and normed space now follow by continuity from
V ⊂ X and it also follows that X is a Banach space (since a closed subset of a
complete space is complete). Unfortunately there are quite a few annoying details
to check! 
‘Proof 2’ (the last bit is left to you). Let V be a normed space. First
we introduce the rather large space

( )
X

(1.62) Ve = {uk }k=1 ; uk ∈ V and kuk k < ∞
k=1

the elements of which, if you recall, are said to be absolutely summable. Notice that
the elements of Ve are sequences, valued in V so two sequences are equal, are the
same, only when each entry in one is equal to the corresponding entry in the other
– no shifting around or anything is permitted as far as equality is concerned. We
think of these as series (remember this means nothing except changing the name, a
series is a sequence and a sequence is a series), the only difference is that we ‘think’
of taking the limit of a sequence but we ‘think’ of summing the elements of a series,
whether we can do so or not being a different matter.
Now, each element of Ve is a Cauchy series – meaning the corresponding se-
N
P
quence of partial sums vN = uk is Cauchy if {uk } is absolutely summable. As
k=1
noted earlier, this is simply because if M ≥ N then
M
X M
X X
(1.63) kvM − vN k = k uj k ≤ kuj k ≤ kuj k
j=N +1 j=N +1 j≥N +1
P
gets small with N by the assumption that kuj k < ∞.
j
Moreover, Ve is a linear space, where we add sequences, and multiply by con-
stants, by doing the operations on each component:-
(1.64) t1 {uk } + t2 {u0k } = {t1 uk + t2 u0k }.
This always gives an absolutely summable series by the triangle inequality:
X X X
(1.65) kt1 uk + t2 u0k k ≤ |t1 | kuk k + |t2 | ku0k k.
k k k

Within Ve consider the linear subspace


( )
X X
(1.66) S = {uk }; kuk k < ∞, uk = 0
k k

of those which sum to 0. As discussed in Section 5 above, we can form the quotient
(1.67) B = Ve /S
the elements of which are the ‘cosets’ of the form {uk } + S ⊂ Ve where {uk } ∈ Ve .
This is our completion, we proceed to check the following properties of this B.
(1) A norm on B (via a seminorm on Ṽ ) is defined by
n
X
(1.68) kbkB = lim k uk k, b = {uk } + S ∈ B.
n→∞
k=1
6. COMPLETION 23

(2) The original space V is imbedded in B by

(1.69) V 3 v 7−→ I(v) = {uk } + S, u1 = v, uk = 0 ∀ k > 1

and the norm satisfies (1.58).


(3) I(V ) ⊂ B is dense.
(4) B is a Banach space with the norm (1.68).
So, first that (1.68) is a norm. The limit on the right does exist since the limit
of the norm of a Cauchy sequence always exists – namely the sequence of norms
is itself Cauchy but now in R. Moreover, adding an element of S to {uk } does not
change the norm of the sequence of partial sums, since the additional term tends
to zero in norm. Thus kbkB is well-defined for each element b ∈ B and kbkB = 0
means exactly that the sequence {uk } used to define it tends to 0 in norm, hence is
in S hence b = 0 in B. The other two properties of norm are reasonably clear, since
if b, b0 ∈ B are represented by {uk }, {u0k } in Ve then tb and b + b0 are represented
by {tuk } and {uk + u0k } and
(1.70)
Xn n
X
lim k tuk k = |t| lim k uk k, =⇒ ktbk = |t|kbk
n→∞ n→∞
k=1 k=1
n
X
lim k (uk + u0k )k = A =⇒
n→∞
k=1
n
X
for  > 0 ∃ N s.t. ∀ n ≥ N, A −  ≤ k (uk + u0k )k =⇒
k=1
n
X n
X
A−≤k uk k + k u0k )k ∀ n ≥ N =⇒ A −  ≤ kbkB + kb0 kB ∀  > 0 =⇒
k=1 k=1
kb + b0 kB ≤ kbkB + kb0 kB .

Now the norm of the element I(v) = v, 0, 0, · · · , is the limit of the norms of the
sequence of partial sums and hence is kvkV so kI(v)kB = kvkV and I(v) = 0
therefore implies v = 0 and hence I is also injective.
We need to check that B is complete, and also that I(V ) is dense. Here is
an extended discussion of the difficulty – of course maybe you can see it directly
yourself (or have a better scheme). Note that I suggest that you to write out your
own version of it carefully in Problem ??.
Okay, what does it mean for B to be a Banach space, as discussed above it
means that every absolutely summable series in B is convergent. Such a series {bn }
(n) (n)
is given by bn = {uk } + S where {uk } ∈ Ve and the summability condition is
that
N
(n)
X X X
(1.71) ∞> kbn kB = lim k uk k V .
N →∞
n n k=1
P
So, we want to show that bn = b converges, and to do so we need to find the
n
limit b. It is supposed to be given by an absolutely summable series. The ‘problem’
P P (n)
is that this series should look like uk in some sense – because it is supposed
n k
24 1. NORMED AND BANACH SPACES

to represent the sum of the bn ’s. Now, it would be very nice if we had the estimate
X X (n)
(1.72) kuk kV < ∞
n k

since this should allow us to break up the double sum in some nice way so as to get
an absolutely summable series out of the whole thing. The trouble is that (1.72)
need not hold. We know that each of the sums over k – for given n – converges,
but not the sum of the sums. All we know here is that the sum of the ‘limits of the
norms’ in (1.71) converges.
So, that is the problem! One way to see the solution is to note that we do not
(n)
have to choose the original {uk } to ‘represent’ bn – we can add to it any element
(n)
of S. One idea is to rearrange the uk – I am thinking here of fixed n – so that
it ‘converges even faster.’ I will not go through this in full detail but rather do it
later when we need the argument for the completeness of the space of Lebesgue
integrable functions. Given  > 0 we can choose p1 so that for all p ≥ p1 ,
X (n) X (n)
(1.73) |k uk kV − kbn kB | ≤ , kuk kV ≤ .
k≤p k≥p

Then in fact we can choose successive pj > pj−1 (remember that little n is fixed
here) so that
X (n) X (n)
(1.74) |k uk kV − kbn kB | ≤ 2−j , kuk kV ≤ 2−j  ∀ j.
k≤pj k≥pj

p1 pj
(n) P (n) (n) P (n)
Now, ‘resum the series’ defining instead v1 = uk , vj = uk and
k=1 k=pj−1 +1
do this setting  = 2−n for the nth series. Check that now
X X (n)
(1.75) kvk kV < ∞.
n k
(n)
Of course, you should also check that bn = {vk } + S so that these new summable
series work just as well as the old ones.
After this fiddling you can now try to find a limit for the sequence as
(p)
X
(1.76) b = {wk } + S, wk = vl ∈ V.
l+p=k+1

So, you need to check that this {wk } is absolutely summable in V and that bn → b
as n → ∞.
Finally then there is the question of showing that I(V ) is dense in B. You can
do this using the same idea as above – in fact it might be better to do it first. Given
an element b ∈ B we need to find elements in V, vk such that kI(vk ) − bkB → 0 as
Nj
P
k → ∞. Take an absolutely summable series uk representing b and take vj = uk
k=1
where the pj ’s are constructed as above and check that I(vj ) → b by computing
X X
(1.77) kI(vj ) − bkB = lim k uk kV ≤ kuk kV .
→∞
k>pj k>pj


7. MORE EXAMPLES 25

7. More examples
Let me collect some examples of normed and Banach spaces. Those mentioned
above and in the problems include:
• c0 the space of convergent sequences in C with supremum norm, a Banach
space.
• lp one space for each real number 1 ≤ p < ∞; the space of p-summable
series with corresponding norm; all Banach spaces. The most important
of these for us is the case p = 2, which is (a) Hilbert space.
• l∞ the space of bounded sequences with supremum norm, a Banach space
with c0 ⊂ l∞ as a closed subspace with the same norm.
• C([a, b]) or more generally C(M ) for any compact metric space M – the
Banach space of continuous functions with supremum norm.
• C∞ (R), or more generally C∞ (M ) for any metric space M – the Banach
space of bounded continuous functions with supremum norm.
• C0 (R), or more generally C0 (M ) for any metric space M – the Banach
space of continuous functions which ‘vanish at infinity’ (see Problem ??)
with supremum norm. A closed subspace, with the same norm, in C∞ (M ).
• C k ([a, b]) the space of k times continuously differentiable (so k ∈ N) func-
tions on [a, b] with norm the sum of the supremum norms on the function
and its derivatives. Each is a Banach space – see Problem ??.
• The space C([0, 1]) with norm
Z 1
(1.78) kukL1 = |u|dx
0

given by the Riemann integral of the absolute value. A normed space, but
not a Banach space. We will construct the concrete completion, L1 ([0, 1])
of Lebesgue integrable ‘functions’.
• The space R([a, b]) of Riemann integrable functions on [a, b] with kuk
defined by (1.78). This is only a seminorm, since there are Riemann
integrable functions (note that u Riemann integrable does imply that |u| is
Riemann integrable) with |u| having vanishing Riemann integral but which
are not identically zero. This cannot happen for continuous functions. So
the quotient is a normed space, but it is not complete.
• The same spaces – either of continuous or of Riemann integrable functions
but with the (semi- in the second case) norm
! p1
Z b
p
(1.79) kukLp = |u| .
a

Not complete in either case even after passing to the quotient to get a norm
for Riemann integrable functions. We can, and indeed will, define Lp (a, b)
as the completion of C([a, b]) with respect to the Lp norm. However we
will get a concrete realization of it soon.
• Suppose 0 < α < 1 and consider the subspace of C([a, b]) consisting of the
‘Hölder continuous functions’ with exponent α, that is those u : [a, b] −→
C which satisfy
(1.80) |u(x) − u(y)| ≤ C|x − y|α for some C ≥ 0.
26 1. NORMED AND BANACH SPACES

Note that this already implies the continuity of u. As norm one can take
the sum of the supremum norm and the ‘best constant’ which is the same
as
|u(x) − u(y)|
(1.81) kukC α = sup |u(x)| + sup ;
x∈[a,b]| x6=y∈[a,b] |x − y|α
it is a Banach space usually denoted C α ([a, b]).
• Note the previous example works for α = 1 as well, then it is not de-
noted C 1 ([a, b]), since that is the space of once continuously differentiable
functions; this is the space of Lipschitz functions Λ([a, b]) – again it is a
Banach space.
• We will also talk about Sobolev spaces later. These are functions with
‘Lebesgue integrable derivatives’. It is perhaps not easy to see how to
define these, but if one takes the norm on C 1 ([a, b])
 1
2 du 2 2
(1.82) kukH 1 = kukL2 + k kL2
dx
and completes it, one gets the Sobolev space H 1 ([a, b]) – it is a Banach
space (and a Hilbert space). In fact it is a subspace of C([a, b]) = C([a, b]).
Here is an example to see that the space of continuous functions on [0, 1] with
norm (1.78) is not complete; things are even worse than this example indicates! It
is a bit harder to show that the quotient of the Riemann integrable functions is not
complete, feel free to give it a try.
Take a simple non-negative continuous function on R for instance
(
1 − |x| if |x| ≤ 1
(1.83) f (x) =
0 if |x| > 1.
R1
Then −1 f (x) = 1. Now scale it up and in by setting

(1.84) fN (x) = N f (N 3 x) = 0 if |x| > N −3 .


R1
So it vanishes outside [−N −3 , N −3 ] and has −1 fN (x)dx = N −2 . It follows that the
sequence {fN } is absolutely summable
P with respect to the integral norm in (1.78)
on [−1, 1]. The pointwise series fN (x) converges everywhere except at x = 0 –
N
since at each point x 6= 0, fN (x) = 0 if N 3 |x| > 1. The resulting function, even if we
ignore the problem at x = 0, is not Riemann integrable because it is not bounded.
You might respond that the sum of the series is ‘improperly Riemann inte-
grable’. This is true but does not help much.
It is at this point that I start doing Lebesgue integration in the lectures. The
following material is from later in the course but fits here quite reasonably.

8. Baire’s theorem
At least once I wrote a version of the following material on the blackboard
during the first mid-term test, in an an attempt to distract people. It did not work
very well – its seems that MIT students have already been toughened up by this
stage. Baire’s theorem will be used later (it is also known as ‘Baire category theory’
although it has nothing to do with categories in the modern sense).
8. BAIRE’S THEOREM 27

This is a theorem about complete metric spaces – it could be included in the


earlier course ‘Real Analysis’ but the main applications are in Functional Analysis.

Theorem 1.2 (Baire). If M is a non-empty complete metric space and Cn ⊂


M, n ∈ N, are closed subsets such that
[
(1.85) M= Cn
n

then at least one of the Cn ’s has an interior point, i.e. contains a non-empty ball
in M.

Proof. We will assume that each of the Cn ’s has empty interior, hoping to
arrive at a contradiction to (1.85) using the other properties. Thus if p ∈ M and
 > 0 the open ball B(p, ) is not contained in any one of the Cn .
We start by choosing p1 ∈ M \ C1 which must exist since M is not empty and
otherwise C1 = M. Now, there must exist 1 > 0 such that B(p1 , 1 ) ∩ C1 = ∅,
since C1 is closed. No open ball around p1 can be contained in C2 so there exists
p2 ∈ B(p1 , 1 /3) which is not in C2 . Again since C2 is closed there exists 2 > 0,
2 < 1 /3 such that B(p2 , 2 ) ∩ C2 = ∅.
Proceeding inductively we suppose there is are k points pi , i = 1, . . . , k and
positive numbers

(1.86) 0 < k < k−1 /3 < k−2 /32 < · · · < 1 /3k−1

such that

(1.87) pj ∈ B(pj−1 , j−1 /3), B(pj , j ) ∩ Cj = ∅.

Then we can add another pk+1 by using the properties of Ck+1 – it has empty
interior so there is some point in B(pk , k /3) which is not in Ck+1 and then
B(pk+1 , k+1 ) ∩ Ck+1 = ∅ where k+1 > 0 but k+1 < k /3. Thus, we have a
sequence {pk } in M satisfying (1.86) and (1.87) for all k.
Since d(pk+1 , pk ) < k /3 this is a Cauchy sequence, in fact

(1.88) d(pk , pk+l ) < k /3 + · · · + k+l−1 /3 < 2k .

Since M is assumed to be complete this sequence converges to a limit, q ∈ M. Notice


however that pl ∈ B(pk , 2k /3) for all k > l so d(pk , q) ≤ 2k /3 which implies that
q∈/ Ck for any k. This is the desired contradiction to (1.85).
Thus, at least one of the Cn must have non-empty interior. 

In applications one might get a complete metric space written as a countable


union of subsets
[
(1.89) M= En , En ⊂ M
n

where the En are not necessarily closed. We can still apply Baire’s theorem however,
just take Cn = En to be the closures – then of course (1.85) holds since En ⊂ Cn .
The conclusion from (1.89) for a complete M is

(1.90) For at least one n the closure of En has non-empty interior.


28 1. NORMED AND BANACH SPACES

9. Uniform boundedness
One application of Baire’s theorem is often called the uniform boundedness
principle or Banach-Steinhaus Theorem.
Theorem 1.3 (Uniform boundedness). Let B be a Banach space and suppose
that Tn is a sequence of bounded (i.e. continuous) linear operators Tn : B −→ V
where V is a normed space. Suppose that for each b ∈ B the set {Tn (b)} ⊂ V is
bounded (in norm of course) then supn kTn k < ∞.
Proof. This follows from a pretty direct application of Baire’s theorem to B.
Consider the sets
(1.91) Sp = {b ∈ B, kbk ≤ 1, kTn bkV ≤ p ∀ n}, p ∈ N.
Each Sp is closed because Tn is continuous, so if bk → b is a convergent sequence
in Sp then kbk ≤ 1 and kTn (b)k ≤ p. The union of the Sp is the whole of the closed
ball of radius one around the origin in B :
[
(1.92) {b ∈ B; d(b, 0) ≤ 1} = Sp
p

because of the assumption of ‘pointwise boundedness’ – each b with kbk ≤ 1 must


be in one of the Sp ’s.
So, by Baire’s theorem one of the sets Sp has non-empty interior, it therefore
contains a closed ball of positive radius around some point. Thus for some p, some
v ∈ Sp , and some δ > 0,
(1.93) w ∈ B, kwkB ≤ δ =⇒ kTn (v + w)kV ≤ p ∀ n.
Since v ∈ Sp is fixed it follows that kTn wk ≤ kTn (v + w)k + kTn vk ≤ 2p for all w
with kwk ≤ δ. This however implies that the norms are uniformly bounded:
(1.94) kTn k ≤ 2p/δ
as claimed. 

10. Open mapping theorem


The second major application of Baire’s theorem is to
Theorem 1.4 (Open Mapping). If T : B1 −→ B2 is a bounded and surjective
linear map between two Banach spaces then T is open:
(1.95) T (O) ⊂ B2 is open if O ⊂ B1 is open.
This is ‘wrong way continuity’ and as such can be used to prove the continuity
of inverse maps as we shall see. The proof uses Baire’s theorem pretty directly,
but then another similar sort of argument is needed to complete the proof. Note
however that the proof is considerably simplified if we assume that B1 is a Hilbert
space. There are more direct but more computational proofs, see Problem ??. I
prefer this one because I have a reasonable chance of remembering the steps.

Proof. What we will try to show is that the image under T of the unit open
ball around the origin, B(0, 1) ⊂ B1 contains an open ball around the origin in B2 .
The first part, of the proof, using Baire’s theorem shows that the closure of the
10. OPEN MAPPING THEOREM 29

image, so in B2 , has 0 as an interior point – i.e. it contains an open ball around


the origin in B2 :
(1.96) T (B(0, 1) ⊃ B(0, δ), δ > 0.
To see this we apply Baire’s theorem to the sets
(1.97) Cp = clB2 T (B(0, p))
the closure of the image of the ball in B1 of radius p. We know that
[
(1.98) B2 = T (B(0, p))
p

since that is what surjectivity means – every point is the image of something. Thus
one of the closed sets Cp has an interior point, v. Since T is surjective, v = T u for
some u ∈ B1 . The sets Cp increase with p so we can take a larger p and v is still
an interior point, from which it follows that 0 = v − T u is an interior point as well.
Thus indeed
(1.99) Cp ⊃ B(0, δ)
for some δ > 0. Rescaling by p, using the linearity of T, it follows that with δ
replaced by δ/p, we get (1.96).
If we assume that B1 is a Hilbert space (and you are reading this after we
have studied Hilbert spaces) then (1.96) shows that if v ∈ B2 , kvk < δ there is
a sequence un with kun k ≤ 1 and T un → v. As a bounded sequence un has a
weakly convergent subsequence, unj * u, where we know this implies kuk ≤ 1 and
Aunj * Au = v since Aun → v. This strengthens (1.96) to
T (B(0, 1) ⊃ B(0, δ/2)
and proves that T is an open map.
If B1 is a Banach space but not a Hilbert space (or you don’t yet know about
Hilbert spaces) we need to work a little harder. Having applied Baire’s thereom,
consider now what (1.96) means. It follows that each v ∈ B2 , with kvk = δ, is the
limit of a sequence T un where kun k ≤ 1. What we want to find is such a sequence,
un , which converges. To do so we need to choose the sequence more carefully.
Certainly we can stop somewhere along the way and see that
δ 1
(1.100) v ∈ B2 , kvk = δ =⇒ ∃ u ∈ B1 , kuk ≤ 1, kv − T uk ≤ = kvk
2 2
where of course we could replace 2δ by any positive constant but the point is the
last inequality is now relative to the norm of v. Scaling again, if we take any v 6= 0
in B2 and apply (1.100) to v/kvk we conclude that (for C = p/δ a fixed constant)
1
(1.101) v ∈ B2 =⇒ ∃ u ∈ B1 , kuk ≤ Ckvk, kv − T uk ≤ kvk
2
where the size of u only depends on the size of v; of course this is also true for v = 0
by taking u = 0.
Using this we construct the desired better approximating sequence. Given
w ∈ B1 , choose u1 = u according to (1.101) for v = w = w1 . Thus ku1 k ≤ C,
and w2 = w1 − T u1 satisfies kw2 k ≤ 12 kwk. Now proceed by induction, supposing
that we have constructed a sequence uj , j < n, in B1 with kuj k ≤ C2−j+1 kwk
and kwj k ≤ 2−j+1 kwk for j ≤ n, where wj = wj−1 − T uj−1 – which we have for
n = 1. Then we can choose un , using (1.101), so kun k ≤ Ckwn k ≤ C2−n+1 kwk
30 1. NORMED AND BANACH SPACES

and such that wn+1 = wn − T un has kwn+1 k ≤ 12 kwn k ≤ 2−n kwk to extend the
induction.
P Thus we get a sequence un which is absolutely summable in B1 , since
kun k ≤ 2Ckwk, and hence converges by the assumed completeness of B1 this
n
time. Moreover
n
X n
X
(1.102) w − T( uj ) = w1 − (wj − wj+1 ) = wn+1
j=1 j=1

so T u = w and kuk ≤ 2Ckwk.


Thus finally we have shown that each w ∈ B(0, 1) in B2 is the image of some
u ∈ B1 with kuk ≤ 2C. Thus T (B(0, 3C)) ⊃ B(0, 1). By scaling it follows that the
image of any open ball around the origin contains an open ball around the origin.
Now, the linearity of T shows that the image T (O) of any open set is open,
since if w ∈ T (O) then w = T u for some u ∈ O and hence u + B(0, ) ⊂ O for  > 0
and then w + B(0, δ) ⊂ T (O) for δ > 0 sufficiently small. 

One important corollary of this is something that seems like it should be obvi-
ous, but definitely needs completeness to be true.
Corollary 1.2. If T : B1 −→ B2 is a bounded linear map between Banach
spaces which is 1-1 and onto, i.e. is a bijection, then it is a homeomorphism –
meaning its inverse, which is necessarily linear, is also bounded.
Proof. The only confusing thing is the notation. Note that T −1 is generally
used both for the inverse, when it exists, and also to denote the inverse map on sets
even when there is no true inverse. The inverse of T, let’s call it S : B2 −→ B1 , is
certainly linear. If O ⊂ B1 is open then S −1 (O) = T (O), since to say v ∈ S −1 (O)
means S(v) ∈ O which is just v ∈ T (O), is open by the Open Mapping theorem, so
S is continuous. 

11. Closed graph theorem


For the next application you should check, it is one of the problems, that the
product of two Banach spaces, B1 × B2 , – which is just the linear space of all pairs
(u, v), u ∈ B1 and v ∈ B2 , – is a Banach space with respect to the sum of the norms
(1.103) k(u, v)k = kuk1 + kvk2 .
Theorem 1.5 (Closed Graph). If T : B1 −→ B2 is a linear map between
Banach spaces then it is bounded if and only if its graph
(1.104) Gr(T ) = {(u, v) ∈ B1 × B2 ; v = T u}
is a closed subset of the Banach space B1 × B2 .
Proof. Suppose first that T is bounded, i.e. continuous. A sequence (un , vn ) ∈
B1 × B2 is in Gr(T ) if and only if vn = T un . So, if it converges, then un → u and
vn = T un → T v by the continuity of T, so the limit is in Gr(T ) which is therefore
closed.
Conversely, suppose the graph is closed. This means that viewed as a normed
space in its own right it is complete. Given the graph we can reconstruct the map
it comes from (whether linear or not) in a little diagram. From B1 × B2 consider
the two projections, π1 (u, v) = u and π2 (u, v) = v. Both of them are continuous
12. HAHN-BANACH THEOREM 31

since the norm of either u or v is less than the norm in (1.103). Restricting them
to Gr(T ) ⊂ B1 × B2 gives
(1.105) <
Gr(T )
S
π1 π2
| #
B1
T / B2 .

This little diagram commutes. Indeed there are two ways to map a point (u, v) ∈
Gr(T ) to B2 , either directly, sending it to v or first sending it to u ∈ B1 and then
to T u. Since v = T u these are the same.
Now, as already noted, Gr(T ) ⊂ B1 × B2 is a closed subspace, so it too is a
Banach space and π1 and π2 remain continuous when restricted to it. The map π1
is 1-1 and onto, because each u occurs as the first element of precisely one pair,
namely (u, T u) ∈ Gr(T ). Thus the Corollary above applies to π1 to show that its
inverse, S is continuous. But then T = π2 ◦ S, from the commutativity, is also
continuous proving the theorem. 
You might wish to entertain yourself by showing that conversely the Open
Mapping Theorem is a consequence of the Closed Graph Theorem.
The characterization of continuous linear maps through the fact that their
graphs are closed has led to significant extensions. For instance consider a linear
map but only defined on a subspace (often required to be dense in which case it is
said to be ‘densely defined’) D ⊂ B, where B is a Banach space,
(1.106) A : D −→ B linear.
Such a map is said to be closed if its graph
Gr(A) = {(u, Au); u ∈ D} ⊂ B × B is closed.
Check for example that if H 1 (R) ⊂ L2 (R) (I’m assuming that you are reading
this near the end of the course . . . ) is the space defined in Chapter 4, as consisting
of the elements with a strong derivative in L2 (R) then
d
(1.107) : D = H 1 (R) −→ L2 (R) is closed.
dx
This follows for instance from the ‘weak implies strong’ result for differentiation.
If un ∈ H 1 (R) is a sequence such that un → u in L2 (R) and dun /dx −→ v in L2
(which is convergence in L2 (R) × L2 (R)) then u ∈ H 1 (R) and v = du/dx in the
same strong sense.
Such a closed operator, A, can be turned into a bounded operator by changing
the norm on the domain D to the ‘graph norm’
(1.108) kukGr = kuk + kAuk.

12. Hahn-Banach theorem


Now, there is always a little pressure to state and prove the Hahn-Banach
Theorem. This is about extension of functionals. Stately starkly, the basic question
is: Does a normed space have any non-trivial continuous linear functionals on it?
That is, is the dual space always non-trivial (of course there is always the zero linear
functional but that is not very amusing). We do not really encounter this problem
since for a Hilbert space, or even a pre-Hilbert space, there is always the space itself,
32 1. NORMED AND BANACH SPACES

giving continuous linear functionals through the pairing – Riesz’ Theorem says that
in the case of a Hilbert space that is all there is. If you are following the course
then at this point you should also see that the only continuous linear functionals
on a pre-Hilbert space correspond to points in the completion. I could have used
the Hahn-Banach Theorem to show that any normed space has a completion, but
I gave a more direct argument for this, which was in any case much more relevant
for the cases of L1 (R) and L2 (R) for which we wanted concrete completions.
Theorem 1.6 (Hahn-Banach). If M ⊂ V is a linear subspace of a normed
space and u : M −→ C is a linear map such that
(1.109) |u(t)| ≤ CktkV ∀ t ∈ M

there exists a bounded linear functional U : V −→ C with kU k ≤ C and


then
U M = u.
First, by computation, we show that we can extend any continuous linear func-
tional ‘a little bit’ without increasing the norm.
Lemma 1.5. Suppose M ⊂ V is a subspace of a normed linear space, x ∈
/ M
and u : M −→ C is a bounded linear functional as in (1.109) then there exists
u0 : M 0 −→ C, where M 0 = {t0 ∈ V ; t0 = t + ax, a ∈ C}, such that
u0 = u, |u0 (t + ax)| ≤ Ckt + axkV , ∀ t ∈ M, a ∈ C.

(1.110) M

Proof. Note that the decomposition t0 = t + ax of a point in M 0 is unique,


since t + ax = t̃ + ãx implies (a − ã)x ∈ M so a = ã, since x ∈
/ M and hence t = t̃
as well. Thus
(1.111) u0 (t + ax) = u0 (t) + au(x) = u(t) + λa, λ = u0 (x)
and all we have at our disposal is the choice of λ. Any choice will give a linear
functional extending u, the problem of course is to arrange the continuity estimate
without increasing the constant C. In fact if C = 0 then u = 0 and we can take
the zero extension. So we might as well assume that C = 1 since dividing u by C
arranges this and if u0 extends u/C then Cu0 extends u and the norm estimate in
(1.110) follows. So we now assume that
(1.112) |u(t)| ≤ ktkV ∀ t ∈ M.
We want to choose λ so that
(1.113) |u(t) + aλ| ≤ kt + axkV ∀ t ∈ M, a ∈ C.
Certainly when a = 0 this represents no restriction on λ. For a 6= 0 we can divide
through by −a and (1.113) becomes
t t
(1.114) |a||u(− ) − λ| = |u(t) + aλ| ≤ kt + axkV = |a|k − − xkV
a a
and since −t/a ∈ M we only need to arrange that
(1.115) |u(t) − λ| ≤ kt − xkV ∀ t ∈ M
and the general case will follow by reversing the scaling.
A complex linear functional such as u can be recovered from its real part, as
we see below, so set
(1.116) w(t) = Re(u(t)), |w(t)| ≤ ktkV ∀ t ∈ M.
12. HAHN-BANACH THEOREM 33

We proceed to show the real version of the Lemma, that w can be extended to a
linear functional w0 : M + Rx −→ R if x ∈/ M without increasing the norm. The
same argument as above shows that the only freedom is the choice of λ = w0 (x)
and we need to choose λ ∈ R so that
(1.117) |w(t) − λ| ≤ kt − xkV ∀ t ∈ M.
The norm estimate on w shows that
(1.118) |w(t1 ) − w(t2 )| ≤ |u(t1 ) − u(t2 )| ≤ kt1 − t2 k ≤ kt1 − xkV + kt2 − xkV .
Writing this out using the reality we find
w(t1 ) − w(t2 ) ≤ kt1 − xkV + kt2 − xkV =⇒
(1.119)
w(t1 ) − kt1 − xk ≤ w(t2 ) + kt2 − xkV ∀ t1 , t2 ∈ M.
We can then take the supremum on the left and the infimum on the right and
choose λ in between – namely we have shown that there exists λ ∈ R with

(1.120) w(t) − kt − xkV ≤ sup (w(t1 ) − kt1 − xk) ≤ λ


t2 ∈M
≤ inf (w(t1 ) + kt1 − xk) ≤ w(t) + kt − xkV ∀ t ∈ M.
t2 ∈M

This in turn implies that


(1.121) −kt − xkV ≤ −w(t) + λ ≤ kt − xkV =⇒ |w(t) − λ| ≤ kt − xkV ∀ t ∈ M.
So we have an extension of w to a real functional w0 : M + Rx −→ R with
0
|w (t + ax)| ≤ kt + axkV for all a ∈ R. We can repeat this argument to obtain a
further extension w00 : M + Cx = M + Rx + R(ix) −→ R without increasing the
norm.
Now we find the desired extension of u by setting
(1.122) u0 (t + cx) = w00 (t + ax + b(ix)) − iw00 (it − bx + a(ix)) : M + Cx −→ C
where c = a + ib. This is certainly linear over the reals and linearity over complex
coefficients follows since

(1.123) u0 (it + icx) = w00 (it − bx + a(ix))) − iw00 (−t − ax − b(ix))


= i(w00 (t + ax + b(ix) − iw00 (it − bx + a(ix))) = iu0 (t + cx).
The uniqueness of a complex linear functional with given real part also shows that
u0 M = u.
Finally, to estimate the norm of u0 notice that for each t ∈ M and c ∈ C there
is a unique θ ∈ [0, 2π) such that
(1.124)
|u0 (t + cx)| = Re eiθ u0 (t + cx) = w00 (eiθ t + eiθ cx) ≤ keiθ t + eiθ cxkV = kt + cxkV .
This completes the proof of the Lemma. 

Proof of Hahn-Banach. This is an application of Zorn’s Lemma. I am not


going to get into the derivation of Zorn’s Lemma from the Axiom of Choice, but if
you believe the latter – and you are advised to do so, at least before lunchtime –
you should believe the former. See also the discussion in Section 2.9
34 1. NORMED AND BANACH SPACES

Zorn’s Lemma is a statement about partially ordered sets. A partial order on


a set E is a subset of E × E, so a relation, where the condition that (e, f ) be in the
relation is written e ≺ f and it must satisfy the three conditions
e≺e∀e∈E
(1.125) e ≺ f and f ≺ e =⇒ e = f
e ≺ f and f ≺ g =⇒ e ≺ g.
So, the missing ingredient between this and an order is that two elements need not
be related at all, either way.
A subset of a partially ordered set inherits the partial order and such a subset
is said to be a chain if each pair of its elements is related one way or the other.
An upper bound on a subset D ⊂ E is an element e ∈ E such that d ≺ e for all
d ∈ D. A maximal element of E is one which is not majorized, that is e ≺ f, f ∈ E,
implies e = f.
Lemma 1.6 (Zorn). If every chain in a (non-empty) partially ordered set has
an upper bound then the set contains at least one maximal element.
So, we are just accepting this Lemma as axiomatic. However, make sure that
you appreciate that it is true for countable sets. Namely if C is countable and has
no maximal element then it must contain a chain which has no upper bound. To
see this, write C as a sequence {ci }i∈N . Then x1 = c1 is not maximal so there exists
some ck , k > 1, with c1 ≺ ck in terms of the order in C. From the properties of N it
follows that there is a smallest k = k2 such that ck has this property, but k2 > 1.
Let this be x2 = ck2 and proceed in the same way – x3 = ck3 where k3 > k2 is the
smallest such integer for which ck2 ≺ ck3 . Assuming C is infinite in the first place
this grinds out an infinite chain xi . Now you can check that this cannot have an
upper bound because every element of C is either one of these, and so cannot be
an upper bound, or else it is cj with kl < j < kl+1 for some l and then it is not
greater than xl .
The point of Zorn’s Lemma is precisely that it applies to uncountable sets.
One consequence of Zorn’s Lemma is the existence of ‘Hamel bases’ for infinite
dimensional vector spaces. This is pretty much irrelevant for us but I include it
since you can use this to show the existence of non-continuous linear functionals on
a Banach space; the proof is also an easier version of the proof of the Hahn-Banach
theorem.
Definition 1.6. A Hamel basis B ⊂ V of a vector space is a linearly indepen-
dent subspace which spans, i.e. every element of V is a finite linear combination of
elements of B.
Hamel bases have a strong tendency to be big. Notice that for a finite dimensional
space this is just the usual notion of a basis.
Proof. Look at the collection X of all linearly independent subsets of V. This
is non-empty – assuming V 6= {0} we can take {x} ∈ X for any x 6= 0 in V. Inclusion
is a partial order on X . Suppose Y ⊂ X is a chain with respect to this partial order
– so for any two elements one is contained in the other. Let L = ∪Y be the union of
all the elements of Y. Each element of C is contained in L so L is an upper bounde
for Y. Thus by Zorn’s lemma X just contain a maximal element, say B. Then B
is a Hamel basis, since if some element w ∈ V were not a finite linear combination
13. DOUBLE DUAL 35

of elements of B then B ∪ {w} would also be linearly independent and hence an


element of X which contains B which contradicts its maximality. 
So, back to Hahn-Banach.
We are given a functional u : M −→ C defined on some linear subspace M ⊂ V
of a normed space where u is bounded with respect to the induced norm on M. We
will apply Zorn’s Lemma to the set E consisting of all extensions (v, N ) of u with
the same norm; it is generally non-countable. That is,

V ⊃ N ⊃ M, v = u and kvkN = kukM .
M
This is certainly non-empty since it contains (u, M ) and has the natural partial
order that (v1 , N1 ) ≺ (v2 , N2 ) if N1 ⊂ N2 and v2 N = v1 . You should check that
1
this is a partial order.
Let C be a chain in this set of extensions. Thus for any two elements (vi , Ni ) ∈
C, i = 1, 2, either (v1 , N1 ) ≺ (v2 , N2 ) or the other way around. This means that
[
(1.126) Ñ = {N ; (v, N ) ∈ C for some v} ⊂ V
is a linear space. Note that this union need not be countable, or anything like that,
but any two elements of Ñ are each in one of the N ’s and one of these must be
contained in the other by the chain condition. Thus each pair of elements of Ñ is
actually in a common N and hence so is their linear span. Similarly we can define
an extension
(1.127) ṽ : Ñ −→ C, ṽ(x) = v(x) if x ∈ N, (v, N ) ∈ C.
There may be many pairs (v, N ) ∈ C satisfying x ∈ N for a given x but the chain
condition implies that v(x) is the same for all of them. Thus ṽ is well defined, and is
clearly also linear, extends u and satisfies the norm condition |ṽ(x)| ≤ kukM kxkV .
Thus (ṽ, Ñ ) is an upper bound for the chain C.
So, the set of all extensions, E, with the norm condition, satisfies the hypothesis
of Zorn’s Lemma, so must – at least in the mornings – have a maximal element
(ũ, M̃ ). If M̃ = V then we are done. However, in the contary case there exists
x ∈ V \ M̃ . This means we can apply our little lemma and construct an extension
(u0 , M̃ 0 ) of (ũ, M̃ ) which is therefore also an element of E and satisfies (ũ, M̃ ) ≺
(u0 , M̃ 0 ). This however contradicts the condition that (ũ, M̃ ) be maximal, so is
forbidden by Zorn. 
There are many applications of the Hahn-Banach Theorem. As remarked ear-
lier, one significant one is that the dual space of a non-trivial normed space is itself
non-trivial.
Proposition 1.7. For any normed space V and element 0 6= v ∈ V there is a
continuous linear functional f : V −→ C with f (v) = 1 and kf k = 1/kvkV .
Proof. Start with the one-dimensional space, M, spanned by v and define
u(zv) = z. This has norm 1/kvkV . Extend it using the Hahn-Banach Theorem and
you will get a continuous functional f as desired. 

13. Double dual


Let me give another application of the Hahn-Banach theorem, although I have
generally not covered this in lectures. If V is a normed space, we know its dual
space, V 0 , to be a Banach space. Let V 00 = (V 0 )0 be the dual of the dual.
36 1. NORMED AND BANACH SPACES

Proposition 1.8. If v ∈ V then the linear map on V 0 :


(1.128) Tv : V 0 −→ C, Tv (v 0 ) = v 0 (v)
is continuous and this defines an isometric linear injection V ,→ V 00 , kTv k = kvk.
Proof. The definition of Tv is ‘tautologous’, meaning it is almost the definition
of V 0 . First check Tv in (1.128) is linear. Indeed, if v10 , v20 ∈ V 0 and λ1 , λ2 ∈ C then
Tv (λ1 v10 + λ2 v20 ) = (λ1 v10 + λ2 v20 )(v) = λ1 v10 (v) + λ2 v20 (v) = λ1 Tv (v10 ) + λ2 Tv (v20 ).
That Tv ∈ V 00 , i.e. is bounded, follows too since |Tv (v 0 )| = |v 0 (v)| ≤ kv 0 kV 0 kvkV ;
this also shows that kTv kV 00 ≤ kvk. On the other hand, by Proposition 1.7 above,
if kvk = 1 then there exists v 0 ∈ V 0 such that v 0 (v) = 1 and kv 0 kV 0 = 1. Then
Tv (v 0 ) = v 0 (v) = 1 shows that kTv k = 1 so in general kTv k = kvk. It also needs
to be checked that V 3 v 7−→ Tv ∈ V 00 is a linear map – this is clear from the
definition. It is necessarily 1-1 since kTv k = kvk. 
Now, it is definitely not the case in general that V 00 = V in the sense that this
injection is also a surjection. Since V 00 is always a Banach space, one necessary
condition is that V itself should be a Banach space. In fact the closure of the image
of V in V 00 is a completion of V. If the map to V 00 is a bijection then V is said
to be reflexive. It is pretty easy to find examples of non-reflexive Banach spaces,
the most familiar is c0 – the space of infinite sequences converging to 0. Its dual
can be identified with l1 , the space of summable sequences. Its dual in turn, the
bidual of c0 , is the space l∞ of bounded sequences, into which the embedding is the
obvious one, so c0 is not reflexive. In fact l1 is not reflexive either. There are useful
characterizations of reflexive Banach spaces. You may be interested enough to look
up James’ Theorem:- A Banach space is reflexive if and only if every continuous
linear functional on it attains its supremum on the unit ball.

14. Axioms of a vector space


In case you missed out on one of the basic linear algebra courses, or have a
poor memory, here are the axioms of a vector space over a field K (either R or C
for us).
A vector space structure on a set V is a pair of maps
(1.129) + : V × V −→ V, · : K × V −→ V
satisfying the conditions listed below. These maps are written +(v1 , v2 ) = v1 + v2
and ·(λ, v) = λv, λ ∈ K, V, v1 , v2 ∈ V.
additive commutativity v1 + v2 = v2 + v1 for all v1 , v2 ∈ V.
additive associativity v1 + (v2 + v3 ) = (v1 + v2 ) + v3 for all v1 , v2 , v3 ∈ V.
existence of zero There is an element 0 ∈ V such that v + 0 = v for all v ∈ V.
additive invertibility For each v ∈ V there exists w ∈ V such that v + w = 0.
distributivity of scalar additivity (λ1 + λ2 )v = λ1 v + λ2 v for all λ1 , λ2 ∈ K and
v ∈ V.
multiplicativity λ1 (λ2 v) = (λ1 λ2 )v for all λ1 , λ2 ∈ K and v ∈ V.
action of multiplicative identity 1v = v for all v ∈ V.
distributivity of space additivity λ(v1 + v2 ) = λv1 + λv2 for all λ ∈ K v1 , v2 ∈ V.
CHAPTER 2

The Lebesgue integral

In this second part of the course the basic theory of the Lebesgue integral is
presented. Here I follow an idea of Jan Mikusiński, of completing the space of
step functions on the line under the L1 norm but in such a way that the limiting
objects are seen directly as functions (defined almost everywhere). There are other
places you can find this, for instance the book of Debnaith and Mikusiński [1]. Here
I start from the Riemann integral, since this is a prerequisite of the course; this
streamlines things a little. The objective is to arrive at a working knowledge of
Lebesgue integration as quickly as seems acceptable, to pass on to the discussion
of Hilbert space and then to more analytic questions.
So, the treatment of the Lebesgue integral here is intentionally compressed,
while emphasizing the completeness of the spaces L1 and L2 . In lectures everything
is done for the real line but in such a way that the extension to higher dimensions
– carried out partly in the text but mostly in the problems – is not much harder.

1. Integrable functions
Recall that the Riemann integral is defined for a certain class of bounded func-
tions u : [a, b] −→ C (namely the Riemann integrable functions) which includes all
continuous functions. It depends on the compactness of the interval and the bound-
edness of the function, but can be extended to an ‘improper integral’ on the whole
real line for which however some of the good properties fail. This is NOT what
we will do. Rather we consider the space of continuous functions ‘with compact
support’:
(2.1)
Cc (R) = {u : R −→ C; u is continuous and ∃ R such that u(x) = 0 if |x| > R}.

Thus each element u ∈ Cc (R) vanishes outside an interval [−R, R] where the R
depends on the u. Note that the support of a continuous function is defined to be
the complement of the largest open set on which it vanishes (or as the closure of the
set of points at which it is non-zero – make sure you see why these are the same).
Thus (2.1) says that the support, which is necessarily closed, is contained in some
interval [−R, R], which is equivalent to saying it is compact.

37
38 2. THE LEBESGUE INTEGRAL

Lemma 2.1. The Riemann integral defines a continuous linear functional on


Cc (R) equipped with the L1 norm
Z Z
u = lim u(x)dx,
R R→∞ [−R,R]
Z
(2.2) kuk L 1 = lim |u(x)|dx,
R→∞ [−R,R]
Z
| u| ≤ kukL1 .
R

The limits here are trivial in the sense that the functions involved are constant for
large R.
Proof. These are basic properties of the Riemann integral see Rudin [4]. 
Note that Cc (R) is a normed space with respect to kukL1 as defined above; that
it is not complete is one of the main reasons for passing to the Lebesgue integral.
With this small preamble we can directly define the ‘space’ of Lebesgue integrable
functions on R.
Definition 2.1. A function f : R −→ C is Lebesgue integrable, written f ∈
n
1
P
L (R), if there exists a series with partial sums fn = wj , wj ∈ Cc (R) which is
j=1
absolutely summable,
XZ
(2.3) |wj | < ∞
j

and such that


X X
(2.4) |wj (x)| < ∞ =⇒ lim fn (x) = wj (x) = f (x).
n→∞
j j

This is a somewhat convoluted definition which you should think about a bit.
Its virtue is that it is all there. The problem is that it takes a bit of unravelling. Be-
fore we go any further note that the sequence wj obviously determines the sequence
of partial sums fn , both in Cc (R) but the converse is also true since
w1 = f1 , wk = fk − fk−1 , k > 1,
Z XZ
(2.5) X
|wj | < ∞ ⇐⇒ |fk − fk−1 | < ∞.
j k>1

You might also notice that can we do some finite manipulation, for instance replace
the sequence wj by
X
(2.6) W1 = wj , Wk = wN +k−1 , k > 1
j≤N

and nothing much changes, since the convergence conditions in (2.3) and (2.4) are
properties only of the tail of the sequences and the sum in (2.4) for wj (x) converges
if and only if the corresponding sum for Wk (x) converges and then converges to the
same limit.
Before massaging the definition a little, let me give a simple example and check
that this definition does include continuous functions defined on an interval and
extended to be zero outside – the theory we develop will include the usual Riemann
1. INTEGRABLE FUNCTIONS 39

integral although I will not quite prove this in full, but only because it is not
particularly interesting.
Lemma 2.2. If f ∈ C([a, b]) then
(
˜ f (x) if x ∈ [a, b]
(2.7) f (x) =
0 otherwise
is an integrable function.
Proof. Just ‘add legs’ to f˜ by considering the sequence


 0 if x < a − 1/n or x > b + 1/n,

(1 + n(x − a))f (a) if a − 1/n ≤ x < a,
(2.8) fn (x) =
(1 − n(x − b))f (b) if b < x ≤ b + 1/n,


f (x) if x ∈ [a, b].

This is a continuous function on each of the open subintervals in the description


with common limits at the endpoints, so fn ∈ Cc (R). By construction, fn (x) → f˜(x)
for each x ∈ R. Define the sequence wj which has partial sums the fn , as in (2.5)
above. Then wj = 0 in [a, b] for j > 1 and it can be written in terms of the ‘legs’
(
0 if x < a − 1/n, x ≥ a
ln =
(1 + n(x − a)) if a − 1/n ≤ x < a,
(
0 if x ≤ b, x > b + 1/n
rn =
(1 − n(x − b)) if b ≤ x ≤ b + 1/n,
as
(2.9) |wn (x)| = (ln − ln−1 )|f (a)| + (rn − rn−1 )|f (b)|, n > 1.
It follows that
(|f (a)| + |f (b)|)
Z
|wn (x)| =
n(n − 1)
so {wn } is an absolutely summable sequence showing that f˜ ∈ L1 (R). 
Returning to the definition, notice that we only say ‘there exists’ an absolutely
summable sequence and that it is required to converge to the function only at
points at which the pointwise sequence is absolutely summable. At other points
anything is permitted. So it is not immediately clear that there are any functions
not
P satisfying this condition. Indeed if there was a sequence like wj above with
|wj (x)| = ∞ always, then (2.4) would represent no restriction at all. So the
j
point of the definition is that absolute summability – a condition on the integrals
in (2.3) – does imply something about (absolute) convergence of the pointwise
series. Let us reenforce this idea with another definition:-
Definition 2.2. A set E ⊂ R is said to be of measure zero in the sense
of Lebesgue (which is pretty much always the meaning here) if there is a series
Pn PR
gn = vj , vj ∈ Cc (R) which is absolutely summable, |vj | < ∞, and such that
j=1 j
X
(2.10) |vj (x)| = ∞ ∀ x ∈ E.
j
40 2. THE LEBESGUE INTEGRAL

Notice that we do not require E to be precisely the set of points at which the
series in (2.10) diverges, only that it does so at all points of E, so E is just a subset
of the set on which some absolutely summable series of functions in Cc (R) does
not converge absolutely. So any subset of a set of measure zero is automatically
of measure zero. To introduce the little trickery we use to unwind the definition
above, consider first the following (important) result.
Lemma 2.3. Any finite union of sets of measure zero is a set of measure zero.
Proof. Since we can proceed in steps, it suffices to show that the union of
two sets of measure zero has measure zero. So, let the two sets be E and F and
two corresponding absolutely summable sequences, as in Definition 2.2, be vj and
wj . Consider the alternating sequence
(
vj if k = 2j − 1 is odd
(2.11) uk =
wj if k = 2j is even.
Thus {uk } simply interlaces the two sequences. It follows that uk is absolutely
summable, since
X X X
(2.12) kuk kL1 = kvj kL1 + kwj kL1 .
k j j
P
Moreover, the pointwise series |uk (x)| diverges precisely where one or other of
P P k
the two series |vj (x)| or |wj (x)| diverges. In particular it must diverge on
j j
E ∪ F which is therefore, from the definition, a set of measure zero. 

The definition of f ∈ L1 (R) above certainly requires that the equality on the
right in (2.4) should hold outside a set of measure zero, but in fact a specific one,
the one on which the series on the left diverges. Using the same idea as in the
lemma above we can get rid of this restriction.
Pn
Proposition 2.1. If f : R −→ C and there exists a series fn = wj with
P j=1
wj ∈ Cc (R) which is absolutely summable, so kwj kL1 < ∞, and a set E ⊂ R of
j
measure zero such that

X
(2.13) x ∈ R \ E =⇒ f (x) = lim fn (x) = wj (x)
n→∞
j=1

then f ∈ L1 (R).
Recall that when one writes down an equality such as on the right in (2.13) one is

P
implicitly saying that wj (x) converges and the equality holds for the limit. We
j=1
will call a sequence as the wj above an ‘approximating series’ for f ∈ L1 (R).
1
This is indeed a refinement of Pthe definition since all f ∈ L (R) arise this way,
taking E to be the set where |wj (x)| = ∞ for a series as in the defintion.
j

Proof. By definition of a set of measure zero there is some series vj as in


(2.10). Now, consider the series obtained by alternating the terms between wj , vj
1. INTEGRABLE FUNCTIONS 41

and −vj . Explicitly, set



wk
 if j = 3k − 2
(2.14) uj = vk if j = 3k − 1

−vk (x) if j = 3k.

This defines a series in Cc (R) which is absolutely summable, with


X X X
(2.15) kuj (x)kL1 = kwk kL1 + 2 kvk kL1 .
j k k

The same sort of identity is true for the pointwise series which shows that
X X X
(2.16) |uj (x)| < ∞ iff |wk (x)| < ∞ and |vk (x)| < ∞.
j k k

So if the pointwise series on the left converges absolutely, then x ∈


/ E, by definition
and hence, using (2.13), we find that
X X
(2.17) |uj (x)| < ∞ =⇒ f (x) = uj (x)
j j

since the sequence of partial sumsPof the uj cycles through fn , fn (x) + vn (x), then
fn (x) and then to fn+1 (x). Since |vk (x)| < ∞ the sequence |vn (x)| → 0 so (2.17)
k
indeed follows from (2.13). 

This is the trick at the heart of the definition of integrability above. Namely
we can manipulate the series involved in this sort of way to prove things about the
elements of L1 (R). One point to note is that if wj is an absolutely summable series
in Cc (R) then
P
 |wj (x)| when this is finite
(2.18) F (x) = j =⇒ F ∈ L1 (R).
0 otherwise

The sort of property (2.13), where some condition holds on the complement
of a set of measure zero is so commonly encountered in integration theory that we
give it a simpler name.

Definition 2.3. A condition that holds on R \ E for some set of measure zero,
E, is said to hold almost everywhere. In particular we write

(2.19) f = g a.e. if f (x) = g(x) ∀ x ∈ R \ E, E of measure zero.

Of course as yet we are living dangerously because we have done nothing to


show that sets of measure zero are ‘small’ let alone ‘ignorable’ as this definition
seems to imply. Beware of the trap of ‘proof by declaration’ !
Now Proposition 2.1 can be paraphrased as ‘A function f : R −→ C is Lebesgue
integrable if and only if it is the pointwise sum a.e. of an absolutely summable series
in Cc (R).’
42 2. THE LEBESGUE INTEGRAL

2. Linearity of L1
The word ‘space’ is quoted in the definition of L1 (R) above, because it is not
immediately obvious that L1 (R) is a linear space, even more importantly it is far
from obvious that the integral of a function in L1 (R) is well defined (which is the
point of the exercise after all). In fact we wish to define the integral to be
Z XZ
(2.20) f= wn
R n

where wn ∈ Cc (R) is any ‘approximating series’ meaning now as the wj in Prop-


sition 2.1. This is fine in so far as the series
P R on the right (of complex numbers)
does converge – since we demanded that |wn | < ∞ so this series converges
n
absolutely – but not fine in so far as the answer might well depend on which series
we choose which ‘approximates f ’ in the sense of the definition or Proposition 2.1.
So, the immediate aim is to prove these two things. First we will do a little
more than prove the linearity of L1 (R). Recall that a function is ‘positive’ if it takes
only non-negative values.
Proposition 2.2. The space L1 (R) is linear (over C) and if f ∈ L1 (R) the
real and imaginary parts, Re f, Im f are Lebesgue integrable as are their positive
parts and as is also the absolute value, |f |. For a real Lebesgue integrable function
there is an approximating sequence as in Proposition 2.1 which is real and if f ≥ 0
the sequence of partial sums can be arranged to be non-negative.
Proof. We first consider the real part of a function f ∈ L1 (R). Suppose wn ∈
Cc (R) is an approximating series as Rin Proposition
R 2.1. Then consider vn = Re wn .
This is absolutely summable, since |vn | ≤ |wn | and
X X
(2.21) wn (x) = f (x) =⇒ vn (x) = Re f (x).
n n

Since the left identity holds a.e., so does the right and hence Re f ∈ L1 (R) by
Proposition 2.1. The same argument with the imaginary parts shows that Im f ∈
L1 (R). This also shows that a real element has a real approximating sequence.
The fact that the sum of two integrable functions is integrable really is a simple
consequence of Proposition 2.1 and Lemma 2.3. Indeed, if f, g ∈ L1 (R) have
approximating series wn and vn as in Proposition 2.1 then un = wn +vn is absolutely
summable,
XZ XZ XZ
(2.22) |un | ≤ |wn | + |vn |
n n n

and
X X X
wn (x) = f (x), vn (x) = g(x) =⇒ un (x) = f (x) + g(x).
n n n

The first two conditions hold outside (probably different) sets of measure zero, E
and F, so the conclusion holds outside E ∪ F which is of measure zero. Thus
f + g ∈ L1 (R). The case of cf for c ∈ C is more obvious.
The proof that |f | ∈ L1 (R) if f ∈ L1 (R) is similar but perhaps a little trickier.
Again, let {wn } be an approximating series as in the definition showing that f ∈
3. THE INTEGRAL ON L1 43

L1 (R). To make a series for |f | we can try the ‘obvious’ thing. Namely we know
that
Xn X
(2.23) wj (x) → f (x) if |wj (x)| < ∞
j=1 j

so certainly it follows that


n
X X
| wj (x)| → |f (x)| if |wj (x)| < ∞.
j=1 j

So, set
k
X k−1
X
(2.24) v1 (x) = |w1 (x)|, vk (x) = | wj (x)| − | wj (x)| ∀ x ∈ R.
j=1 j=1

Then, for sure,


N
X N
X X
(2.25) vk (x) = | wj (x)| → |f (x)| if |wj (x)| < ∞.
k=1 j=1 j

So equality holds off a set of measure zero and we only need to check that {vj } is
an absolutely summable series.
The triangle inequality in the ‘reverse’ form ||v| − |w|| ≤ |v − w| shows that, for
k > 1,
k
X k−1
X
(2.26) |vk (x)| = || wj (x)| − | wj (x)|| ≤ |wk (x)|.
j=1 j=1

Thus
XZ XZ
(2.27) |vk | ≤ |wk | < ∞
k k

so the vk ’s do indeed form an absolutely summable series and (2.25) holds almost
everywhere, so |f | ∈ L1 (R).
For a positive function this last argument yields a real approximating sequence
with positive partial sums. 
By combining these results we can see again that if f, g ∈ L1 (R) are both real
valued then
(2.28) f+ = max(f, 0), max(f, g), min(f, g) ∈ L1 (R).
Indeed, the positive part, f+ = 21 (|f | + f ), max(f, g) = g + (f − g)+ , min(f, g) =
− max(−f, −g).

3. The integral on L1
Next we want to show that the integral is well defined via (2.20) for any approx-
imating series. From Propostion 2.2 it is enough to consider only real functions.
For this, recall a result concerning a case where uniform convergence of continu-
ous functions follows from pointwise convergence, namely when the convergence is
monotone, the limit is continuous, and the space is compact. It works on a general
compact metric space but we can concentrate on the case at hand.
44 2. THE LEBESGUE INTEGRAL

Lemma 2.4. If un ∈ Cc (R) is a decreasing sequence of non-negative functions


such that limn→∞ un (x) = 0 for each x ∈ R then un → 0 uniformly on R and
Z
(2.29) lim un = 0.
n→∞

Proof. Since all the un (x) ≥ 0 and they are decreasing (which really means
not increasing of course) if u1 (x) vanishes at x then all the other un (x) vanish there
too. Thus there is one R > 0 such that un (x) = 0 if |x| > R for all n, namely
one that works for u1 . So we only need consider what happens on [−R, R] which is
compact. For any  > 0 look at the sets
Sn = {x ∈ [−R, R]; un (x) ≥ }.
This can also be written Sn = u−1n ([, ∞)) ∩ [−R, R] and since un is continuous it
follows that Sn is closed and hence compact. Moreover the fact that the un (x) are
decreasing means that Sn+1 ⊂ Sn for all n. Finally,
\
Sn = ∅
n
since, by assumption, un (x) → 0 for each x. Now the property of compact sets in a
metric space that we use is that if such a sequence of decreasing compact sets has
empty intersection then the sets themselves are empty from some n onwards. This
means that there exists N such that supx un (x) <  for all n > N. Since  > 0 was
arbitrary, un → 0 uniformly.
One of the basic properties of the Riemann integral is that the integral of the
limit of a uniformly convergent sequence (even of Riemann integrable functions but
here continuous) is the limit of the sequence of integrals, which is (2.29) in this
case. 
We can easily extend this in a useful way – the direction of monotonicity is
reversed really just to mentally distinquish this from the preceding lemma.
Lemma 2.5. If vn ∈ Cc (R) is any increasing sequence such that limn→∞ vn (x) ≥
0 for each x ∈ R (where the possibility vn (x) → ∞ is included) then
Z
(2.30) lim vn dx ≥ 0 including possibly + ∞.
n→∞

Proof. This is really a corollary of the preceding lemma. Consider the se-
quence of functions
(
0 if vn (x) ≥ 0
(2.31) wn (x) =
−vn (x) if vn (x) < 0.
Since this is the maximum of two continuous functions, namely −vn and 0, it is
continuous and it vanishes for large x, so wn ∈ Cc (R). Since vn (x) is increasing,
wn is decreasing and it follows that lim wn (x) = 0 for all x – either it gets there
for some finite n and then stays 0 or the limit of vn (x) is zero. Thus Lemma 2.4
applies to wn so Z
lim wn (x)dx = 0.
n→∞ R R R
Now, vn (x) ≥ −wn (x) for all x, so for each Rn, vn ≥ − wn . From properties of
the Riemann integral, vn+1 ≥ vn implies that vn dx is an increasing sequence and
it is bounded below by one that converges to 0, so (2.30) is the only possibility. 
3. THE INTEGRAL ON L1 45

From this result applied carefully we see that the integral behaves sensibly for
absolutely summable series.
Lemma 2.6.P RSuppose un ∈ Cc (R) is an absolutely summable series of real-valued
functions, so |un |dx < ∞, and also suppose that
n
X
(2.32) un (x) = 0 a.e.
n
then
XZ
(2.33) un dx = 0.
n

R Proof. RAs already noted, the series (2.33) does converge, since the inequality
| un dx| ≤ |un |dx shows that it is absolutely convergent (hence Cauchy, hence
convergent).
If E is a set of measure zero such that (2.32) holds on the complement then
we can modify un as in (2.14) by adding and subtracting a non-negative absolutely
summable sequence vk which diverges absolutely on E. For the new sequence un
(2.32) is strengthened to
X X
(2.34) |un (x)| < ∞ =⇒ un (x) = 0
n n
and the conclusion (2.33) holds for the new sequence if and only if it holds for the
old one.
Now, we need to get ourselves into a position to apply Lemma 2.5. To do
this, just choose some integer N (large but it doesn’t matter yet) and consider the
sequence of functions – it depends on N but I will suppress this dependence –
N
X +1
(2.35) U1 (x) = un (x), Uj (x) = |uN +j (x)|, j ≥ 2.
n=1

P R is a sequence in Cc (R) and it is absolutely summable – the convergence of


This
|Uj |dx only depends on the ‘tail’ which is the same as for un . For the same
j
reason,
X X
(2.36) |Uj (x)| < ∞ ⇐⇒ |un (x)| < ∞.
j n

Now the sequence of partial sums


p
X N
X +1 p
X
(2.37) gp (x) = Uj (x) = un (x) + |uN +j |
j=1 n=1 j=2

is increasing with p – since we are adding non-negative functions. If the two equiv-
alent conditions in (2.36) hold then
X N
X +1 ∞
X
(2.38) un (x) = 0 =⇒ un (x) + |uN +j (x)| ≥ 0 =⇒ lim gp (x) ≥ 0,
p→∞
n n=1 j=2

since we are only increasing each term. On the other hand if these conditions do
not hold then the tail, any tail, sums to infinity so
(2.39) lim gp (x) = ∞.
p→∞
46 2. THE LEBESGUE INTEGRAL

Thus the conditions of Lemma 2.5 hold for gp and hence


N
X +1 Z X Z
(2.40) un + |uj (x)|dx ≥ 0.
n=1 j≥N +2

Using the same inequality as before this implies that


X∞ Z X Z
(2.41) un ≥ −2 |uj (x)|dx.
n=1 j≥N +2
P R
This is true for any N and as N → ∞, limN →∞ |uj (x)|dx = 0. So
j≥N +2
the fixed number on the left in (2.41), which is what we are interested in, must be
non-negative.
In fact the signs in the argument can be reversed, considering instead
N
X +1
(2.42) h1 (x) = − un (x), hp (x) = |uN +p (x)|, p ≥ 2
n=1

and the final conclusion is the opposite inequality in (2.41). That is, we conclude
what we wanted to show, that
∞ Z
X
(2.43) un = 0.
n=1

Finally then we are in a position to show that the integral of an element of
L1 (R) is well-defined.
Proposition 2.3. If f ∈ L1 (R) then
Z XZ
(2.44) f = lim un
n→∞
n
is independent of the approximating sequence, un , used to define it. Moreover,
Z Z X N
|f | = lim | uk |,
N →∞
k=1
Z Z
(2.45) | f| ≤ |f | and
Z n
X
lim |f − uj | = 0.
n→∞
j=1

So in some sense the definition of the Lebesgue integral ‘involves no cancellations’.


There are various extensions of the integral which do exploit cancellations – I invite
you to look into the definition of the Henstock integral (and its relatives).
Proof. The uniqueness of f follows from Lemma 2.6. Namely, if un and u0n
R

are two series approximating f as in Proposition 2.1 then the real and imaginary
parts of the difference u0n − un satisfy the hypothesis of Lemma 2.6 so it follows
that
XZ XZ
un = u0n .
n n
4. SUMMABLE SERIES IN L1 (R) 47

Then the first part of (2.45) follows from this definition of the integral applied
to |f | and the approximating series for |f | devised in the proof of Proposition 2.2.
The inequality
XZ XZ
(2.46) | un | ≤ |un |,
n n
which follows from the finite inequalities for the Riemann integrals
XZ XZ XZ
| un | ≤ |un | ≤ |un |
n≤N n≤N n

gives the second part.


The final part follows by applying the same arguments to the series {uk }k>n ,
n
P
as an absolutely summable series approximating f − uj and observing that the
j=1
integral is bounded by
Z n
X ∞ Z
X
(2.47) |f − uk | ≤ |uk | → 0 as n → ∞.
k=1 k=n+1


4. Summable series in L1 (R)


The next thing we want to know is when the ‘norm’, which is in fact only a
seminorm, on L1 (R), vanishes. That is, when does |f | = 0? One way is fairly
R

easy. The full result we are after is:-


Proposition 2.4. For an integrable function f ∈ L1 (R), the vanishing of |f |
R

implies that f is a null function in the sense that


(2.48) f (x) = 0 ∀ x ∈ R \ E where E is of measure zero.
Conversely, if (2.48) holds then f ∈ L1 (R) and |f | = 0.
R
R
Proof. The main part of this is the first part, that the vanishing of |f |
implies that f is null. The converse is the easier direction in the sense that we have
already done it.
Namely, if f is null in the sense of (2.48) then |f | is the limit a.e. of the
absolutely summable series with all Rterms 0. It follows from the definition of the
integral above that |f | ∈ L1 (R) and |f | = 0. 
For the forward argument we will use the following more technical result, which
is also closely related to the completeness of L1 (R) (note the small notational
difference, L1 is the Banach space which is the quotient by the null functions,
see below).
Proposition
PR 2.5. If fn ∈ L1 (R) is an absolutely summable series, meaning
that |fn | < ∞, then
n
X
(2.49) E = {x ∈ R; |fn (x)| = ∞} has measure zero.
n
If f : R −→ C satisfies
X
(2.50) f (x) = fn (x) a.e.
n
48 2. THE LEBESGUE INTEGRAL

then f ∈ L1 (R),
Z XZ
f= fn ,
n
Z Z Z n
X XZ
(2.51) | f| ≤ |f | = lim | fj | ≤ |fj | and
n→∞
j=1 j
Z Xn
lim |f − fj | = 0.
n→∞
j=1

This basically says we can replace ‘continuous function of compact support’ by


‘Lebesgue integrable function’ in the definition and get the same result. Of course
this makes no sense without the original definition, so what we are showing is that
iterating it makes no difference – we do not get a bigger space.

Proof. The proof is very like the proof of completeness via absolutely sum-
mable series for a normed space outlined in the preceding chapter.
1
P R By assumption each fn ∈ L (R), so there exists a sequence un,j ∈ Cc (R) with
|un,j | < ∞ and
j
X X
(2.52) |un,j (x)| < ∞ =⇒ fn (x) = un,j (x).
j j

We might hope that f (x) is given by the sum of the un,j (x) over both n and j, but
in general, this double series is not absolutely summable. However we can replace
it by one that is. For each n choose Nn so that
X Z
(2.53) |un,j | < 2−n .
j>Nn

This is possible by the assumed absolute summability – the tail of the series there-
fore being small. Having done this, we replace the series un,j by
X
(2.54) u0n,1 = un,j (x), u0n,j (x) = un,Nn +j−1 (x) ∀ j ≥ 2,
j≤Nn

summing the first Nn terms. This still sums to fn on the same set as in (2.52). So
in fact we can simply replace un,j by u0n,j and we have in addition the estimate
XZ Z
(2.55) |u0n,j | ≤ |fn | + 2−n+1 ∀ n.
j

This follows from the triangle inequality since, using (2.53),


Z N
X Z XZ Z
(2.56) |u0n,1 + u0n,j | ≥ |u0n,1 | − |u0n,j | ≥ |u0n,1 | − 2−n
j=2 j≥2
R
and the left side converges to |fn | by (2.45) as N → ∞. Using (2.53) again gives
(2.55).
Dropping the primes from the notation and denoting the new series again as un,j
we can let vk be some enumeration of the un,j – using the standard diagonalization
5. THE SPACE L1 (R) 49

procedure for instance. This gives a new series of continuous functions of compact
support which is absolutely summable since
XN Z XZ XZ
(2.57) |vk | ≤ |un,j | ≤ ( |fn | + 2−n+1 ) < ∞.
k=1 n,j n

Using the freedom to rearrange absolutely convergent series we see that


X X XX X
(2.58) |un,j (x)| < ∞ =⇒ f (x) = vk (x) = un,j (x) = fn (x).
n,j k n j n

The set where (2.58) fails is a set of measure zero, by definition. Thus f ∈ L1 (R)
and (2.49) also follows. To get the final result (2.51), rearrange the double series
for the integral (which is also absolutely convergent). 
For the moment we only need the weakest part, (2.49), of this. To paraphrase
this, for any absolutely summable series of integrable functions the absolute point-
wise series converges off a set of measure zero – it can only diverge on a set of
measure zero. It is rather shocking but thisR allows us to prove the rest of Proposi-
tion 2.4! Namely, suppose f ∈ L1 (R) and |f | = 0. Then Proposition 2.5 applies
to the series with each term being |f |. This is absolutely summable since all the
integrals are zero. So it must converge pointwise except on a set of measure zero.
Clearly it diverges whenever f (x) 6= 0,
Z
(2.59) |f | = 0 =⇒ {x; f (x) 6= 0} has measure zero

which is what we wanted to show to finally complete the proof of Proposition 2.4.

5. The space L1 (R)


At this point we are able to define the standard Lebesgue space
(2.60) L1 (R) = L1 (R)/N , N = {null functions}
and
R to check that it is a Banach space with the norm (arising from, to be pedantic)
|f |.
Theorem 2.1. The quotient space L1 (R) defined by (2.60) is a Banach space
in which the continuous functions of compact support form a dense subspace.
The elements of L1 (R) are equivalence classes of functions
(2.61) [f ] = f + N , f ∈ L1 (R).
That is, we ‘identify’ two elements of L1 (R) if (and only if) their difference is null,
which is to say they are equal off a set of measure zero. Note that the set which is
ignored here is not fixed, but can depend on the functions.
Proof. For an element of L1 (R) the integral of the absolute value is well-
defined by Propositions 2.2 and 2.4
Z
(2.62) k[f ]kL = |f |, f ∈ [f ]
1

and gives a semi-norm on L1 (R). It follows from Proposition 1.5 that on the quo-
tient, k[f ]k is indeed a norm.
The completeness of L1 (R) is a direct consequence of Proposition 2.5. Namely,
to show a normed space is complete it is enough to check that any absolutely
50 2. THE LEBESGUE INTEGRAL

summable series converges. If [fj ] is an absolutely summable series in L1 (R) then


fj is absolutely summable in L1 (R) and by Proposition 2.5 the sum of the series
exists so we can use (2.50) to define f off the set E and take it to be zero on E.
Then, f ∈ L1 (R) and the last part of (2.51) means precisely that
X Z X
(2.63) lim k[f ] − [fj ]kL1 = lim |f − fj | = 0
n→∞ n→∞
j<n j<n

showing the desired completeness. 

Note that despite the fact that it is technically incorrect, everyone says ‘L1 (R)
is the space of Lebesgue integrable functions’ even though it is really the space
of equivalence classes of these functions modulo equality almost everywhere. Not
much harm can come from this mild abuse of language.
Another consequence of Proposition 2.5 and the proof above is an extension of
Lemma 2.3.
Proposition 2.6. Any countable union of sets of measure zero is a set of
measure zero.
Proof. If E is a set of measure zero then any function f which Ris defined
on R and vanishes outside E is a null function – is in L1 (R) and has |f | = 0.
Conversely if the characteristic function of E, the function equal to 1 on E and
zero in R \ E is integrable and has integral zero then E has measure zero. This
is the characterization of null functions above. Now, if Ej is a sequence of sets of
measure zero and χk is the characteristic function of
[
(2.64) Ej
j≤k
R
then |χk | = 0 so this is an absolutely summable series with sum, the characteristic
function of the union, integrable and of integral zero. 

6. The three integration theorems


Even though we now ‘know’ which functions are Lebesgue integrable, it is often
quite tricky to use the definitions to actually show that a particular function has
this property. There are three standard results on convergence of sequences of
integrable functions which are powerful enough to cover most situations that arise
in practice – a Monotonicity Lemma, Fatou’s Lemma and Lebesgue’s Dominated
Convergence theorem.
Lemma 2.7 (Montonicity). If fj ∈ L1 (R) is a monotone sequence, either
fj (x) ≥ fRj+1 (x) for all x ∈ R and all j or fj (x) ≤ fj+1 (x) for all x ∈ R and
all j, and fj is bounded then
(2.65) {x ∈ R; lim fj (x) is finite} = R \ E
j→∞

where E has measure zero and


f = lim fj (x) a.e. is an element of L1 (R)
j→∞
(2.66) Z Z Z
with f = lim fj and lim |f − fj | = 0.
j→∞ j→∞
6. THE THREE INTEGRATION THEOREMS 51

In the usual approach through measure one has the concept of a measureable, non-
negative, function for which the integral ‘exists but is infinite’ – we do not have
this (but we could easily do it, or rather you could). Using this one can drop the
assumption about the finiteness of the integral but the result is not significantly
stronger.

Proof. Since we can change the sign of the fi it suffices to assume that the fi
are monotonically increasing. The sequence of integrals is therefore also montonic
increasing and, being bounded, converges. Turning the sequence into a series, by
setting g1 = f1 and gj = fj − fj−1 for j ≥ 2 the gj are non-negative for j ≥ 1 and
XZ XZ Z Z
(2.67) |gj | = gj = lim fn − f1
n→∞
j≥2 j≥2

converges. So this is indeed an absolutely summable series. We therefore know from


Proposition 2.5 that it converges absolutely a.e., that the limit, f, is integrable and
that
Z XZ Z
(2.68) f= gj = lim fj .
n→∞
j

The second part, corresponding to convergence for the equivalence classes in L1 (R)
follows from the fact established earlier about |f | but here it also follows from the
monotonicity since f (x) ≥ fj (x) a.e. so
Z Z Z
(2.69) |f − fj | = f − fj → 0 as j → ∞.

Now, to Fatou’s Lemma. This really just takes the monotonicity result and
applies it to a sequence of integrable functions with bounded integral. You should
recall that the max and min of two real-valued integrable functions is integrable
and that
Z Z Z
(2.70) min(f, g) ≤ min( f, g).

This follows from the identities


(2.71) 2 max(f, g) = |f − g| + f + g, 2 min(f, g) = −|f − g| + f + g.
Lemma 2.8 (Fatou). Let fRj ∈ L1 (R) be a sequence of real-valued non-negative
integrable functions such that fj is bounded then

f (x) = lim inf fn (x) exists a.e., f ∈ L1 (R) and


(2.72) Z n→∞ Z Z
lim inf fn = f ≤ lim inf fn .

Proof. You should remind yourself of the properties of lim inf as necessary!
Fix k and consider
(2.73) Fk,n = min fp (x) ∈ L1 (R).
k≤p≤k+n
52 2. THE LEBESGUE INTEGRAL

As discussed above this is integrable. Moreover, this is a decreasing sequence, as


n increases, because the minimum is over an increasing set of functions. The Fk,n
are non-negative so Lemma 2.7 applies and shows that
Z Z
(2.74) gk (x) = inf fp (x) ∈ L1 (R), gk ≤ fn ∀ n ≥ k.
p≥k

Note that for a decreasing sequence of non-negative numbers the limit exists and
is indeed the infimum. Thus in fact,
Z Z
(2.75) gk ≤ lim inf fn ∀ k.

Now, let k vary. Then, the infimum in (2.74) is over a set which decreases as k
increases. Thus the gk (x) are increasing. The integrals of this
R sequence are bounded
above in view of (2.75) since we assumed a bound on the fn ’s. So, we can apply
the monotonicity result again to see that
f (x) = lim gk (x) exists a.e and f ∈ L1 (R) has
k→∞
(2.76) Z Z
f ≤ lim inf fn .

Since f (x) = lim inf fn (x), by definition of the latter, we have proved the Lemma.


Now, we apply Fatou’s Lemma to prove what we are really after:-


Theorem 2.2 (Dominated convergence). Suppose fj ∈ L1 (R) is a sequence of
integrable functions such that
∃ h ∈ L1 (R) with |fj (x)| ≤ h(x) a.e. and
(2.77)
f (x) = lim fj (x) exists a.e.
j→∞

then f ∈ L1 (R) and [fj ] → [f ] in L1 (R), so


R R
f = limn→∞ fn (including the
assertion that this limit exists).
Proof. First, we can assume that the fj are real since the hypotheses hold for
the real and imaginary parts of the sequence and together give the desired result.
Moreover, we can change all the fj ’s to make them zero on the set on which the
initial estimate in (2.77) does not hold. Then this bound on the fj ’s becomes
(2.78) −h(x) ≤ fj (x) ≤ h(x) ∀ x ∈ R.
In particular this means that gj = h − fj is a non-negative sequence of integrable
functions
R andR the sequence
R ofR integrals is also bounded, since (2.77) also implies
that |fj | ≤ h, so gj ≤ 2 h. Thus Fatou’s Lemma applies to the gj . Since we
have assumed that the sequence gj (x) converges a.e. to h − f we know that
h − f (x) = lim inf gj (x) a.e. and
(2.79)
Z Z Z Z Z
h− f ≤ lim inf (h − fj ) = h − lim sup fj .

Notice the change on the right from liminf to limsup because of the sign.
6. THE THREE INTEGRATION THEOREMS 53

Now we can apply the same argument to gj0 (x) = h(x) + fj (x) since this is also
non-negative and has integrals bounded above. This converges a.e. to h(x) + f (x)
so this time we conclude that
Z Z Z Z Z
(2.80) h + f ≤ lim inf (h + fj ) = h + lim inf fj .
R
In both inequalities (2.79) and (2.80) we can cancel an h and combining them we
find
Z Z Z
(2.81) lim sup fj ≤ f ≤ lim inf fj .

In particular the limsup on the left is smaller than, or equal to, the liminf on the
right, for the sameR real sequence. This however implies that they are equal and
that the sequence fj converges. Thus indeed
Z Z
(2.82) f = lim fn .
n→∞

Convergence of fn to f in L1 (R) follows by applying the results proved so far


to |f − fn |, converging almost everywhere to 0. In this case (2.82) becomes
Z
lim |f − fn | = 0.
n→∞

Generally in applications it is Lebesgue’s dominated convergence which is used


to prove that some function is integrable. Of course, since we deduced it from
Fatou’s lemma, and the latter from the Monotonicity lemma, you might say that
Lebesgue’s theorem is the weakest of the three! However, it is very handy and often
a combination does the trick. For instance

Lemma 2.9. A continuous function u ∈ C(R) is Lebesgue integrable if and only


if the ‘improper Riemann integral’
Z R
(2.83) lim |u(x)|dx < ∞.
R→∞ −R

Note that the ‘improper integral’ without the absolute value can converge without
u being Lebesgue integrable.

Proof. If (2.83) holds then consider the sequence of functions vN = χ[−N,N ] |u|,
which we know to be in L1 (R) by Lemma 2.2. This is monotonic increasing with
limit |u|, so the Monotonicity Lemma shows that |u| ∈ L1 (R). Then consider
wN = χ[−N,N ] u which we also know to be in L1 (R). Since it is bounded by |u| and
converges pointwise to u, it follows from Dominated Convergence that u ∈ L1 (R).
Conversely, if u ∈ L1 (R) then |u| ∈ L1 (R) and χ[−N,N ] |u| ∈ L1 (R) converges to |u|
so by Dominated Convergence (2.83) must hold. 

So (2.83) holds for any u ∈ L1 (R).


54 2. THE LEBESGUE INTEGRAL

7. Notions of convergence
We have been dealing with two basic notions of convergence, but really there
are more. Let us pause to clarify the relationships between these different concepts.
(1) Convergence of a sequence in L1 (R) (or by slight abuse of language in
L1 (R)) – f and fn ∈ L1 (R) and
(2.84) kf − fn kL1 → 0 as n → ∞.
(2) Convergence almost everywhere:- For some sequence of functions fn and
function f,
(2.85) fn (x) → f (x) as n → ∞ for x ∈ R \ E
where E ⊂ R is of measure zero.
(3) Dominated convergence:- For fj ∈ L1 (R) (or representatives in L1 (R))
such that |fj | ≤ F (a.e.) for some F ∈ L1 (R) and (2.85) holds.
(4) What we might call ‘absolutely summable convergence’. Thus fn ∈ L1 (R)
n
gj where gj ∈ L1 (R) and
P PR
are such that fn = |gj | < ∞. Then (2.85)
j=1 j
holds for some f.
(5) Monotone convergence.
R For fj ∈ L1 (R), real valued and montonic, we
require that fj is bounded and it then follows that fj → f almost
1 1
everywhere,
R R with f ∈ L (R) and that the convergence is L and also that
f = limj fj .
So, one important point to know is that 1 does not imply 2. Nor conversely
does 2 imply 1 even if we assume that all the fj and f are in L1 (R).
However, montone convergence implies dominated convergence. Namely if f is
the limit then |fj | ≤ |f | + 2|f1 | and fj → f almost everywhere. Also, monotone
convergence implies convergence with absolute summability simply by taking the
sequence to have first term f1 and subsequence terms fj − fj−1 (assuming that fj
is monotonically increasing) one gets an absolutely summable series with sequence
of finite sums converging to f. Similarly absolutely summable convergence implies
dominatedP convergence for the sequence of partial sums; by montone convergence
the series |fn (x)| converges a.e. and in L1 to some function F which dominates
n
the partial sums which in turn converge pointwise. I suggest that you make a
diagram with these implications in it so that you are clear about the relationships
between them.

8. The space L2 (R)


So far we have discussed the Banach space L1 (R). The real aim is to get a
good hold on the (Hilbert) space L2 (R). This can be approached in several ways.
We could start off as for L1 (R) and define L2 (R) as the completion of Cc (R) with
respect to the norm
Z  21
2
(2.86) kf kL2 = |f | .

This would be rather repetitious; instead we adopt an approach based on Dominated


Convergence. You might think, by the way, that it is enough just to ask that
|f |2 ∈ L1 (R). This does not work, since even if real the sign of f could jump
around and make it non-integrable (provided you believe in the axiom of choice).
8. THE SPACE L2 (R) 55

Nor would this approach work for L1 (R) since |f | ∈ L1 (R) does not imply that
f ∈ L1 (R).
Definition 2.4. A function f : R −→ C is said to be ‘Lebesgue square inte-
grable’, written f ∈ L2 (R), if there exists a sequence un ∈ Cc (R) such that
(2.87) un (x) → f (x) a.e. and |un (x)|2 ≤ F (x) a.e. for some F ∈ L1 (R).
Proposition 2.7. The space L2 (R) is linear, f ∈ L2 (R) implies |f |2 ∈ L1 (R)
and (2.86) defines a seminorm on L2 (R) which vanishes precisely on the null func-
tions N ⊂ L2 (R).
Definition 2.5. We define L2 (R) = L(R)/N .
So we know that L2 (R) is a normed space. It is in fact complete and much more!

Proof. First to see the linearity of L2 (R) note that if f ∈ L2 (R) and c ∈ C
then cf ∈ L2 (R) since if un is a sequence as in the definition for f then cun is such
a sequence for cf.
Similarly if f, g ∈ L2 (R) with sequences un and vn then wn = un + vn has the
first property – since we know that the union of two sets of measure zero is a set
of measure zero and the second follows from the estimate
(2.88) |wn (x)|2 = |un (x) + vn (x)|2 ≤ 2|un (x)|2 + 2|vn (x)|2 ≤ 2(F + G)(x)
where |un (x)|2 ≤ F (x) and |vn (x)|2 ≤ G(x) with F, G ∈ L1 (R).
Moreover, if f ∈ L2 (R) then the sequence |un (x)|2 converges pointwise almost
everywhere to |f (x)|2 so by Lebesgue’s Dominated Convergence, |f |2 ∈ L1 (R). Thus
kf kL2 is well-defined. It vanishes if and only if |f |2 ∈ N but this is equivalent to
f ∈ N – conversely N ⊂ L2 (R) since the zero sequence works in the definition
above.
So we only need to check the triangle inquality, absolute homogeneity being
clear, to deduce that L2 = L2 /N is at least a normed space. In fact we checked
this earlier on Cc (R) and the general case follows by continuity:-

(2.89) kun + vn kL2 ≤ kun kL2 + kvn kL2 ∀ n =⇒


kf + gkL2 = lim kun + vn kL2 ≤ kf kL2 + kgkL2 .
n→∞

We will get a direct proof of the triangle inequality as soon as we start talking
about (pre-Hilbert) spaces.
So it only remains to check the completeness of L2 (R), which is really the whole
point of the discussion of Lebesgue integration.
Theorem 2.3. The space L2 (R) is complete with respect to k · kL2 and is a
completion of Cc (R) with respect to this norm.
Proof. That Cc (R) ⊂ L2 (R) follows directly from the definition and the fact
that a continuous null function must vanish. This is a dense subset since, if f ∈
L2 (R) a sequence un ∈ Cc (R) as in Definition 2.4 satisfies
(2.90) |un (x) − um (x)|2 ≤ 4F (x) ∀ n, m,
56 2. THE LEBESGUE INTEGRAL

and converges almost everwhere to |f (x) − um (x)|2 as n → ∞. Thus, by Dominated


Convergence, |f (x) − um (x)|2 ∈ L1 (R). As m → ∞, |f (x) − um (x)|2 → 0 almost
everywhere and |f (x) − um (x)|2 ≤ 4F (x) so again by dominated convergence
1
(2.91) kf − um kL2 = k(|f − um |2 )kL1 ) 2 → 0.
This shows the density of Cc (R) in L2 (R), the quotient by the null functions.
To prove completeness, we only need show that any absolutely L2 -summable
sequence in Cc (R) converges in L2 and the general case follows by density. So,
suppose φn ∈ Cc (R) is such a sequence:
X
kφn kL2 < ∞.
n
!2
P
Consider Fk (x) = |φk (x)| . This is an increasing sequence in Cc (R) and its
n≤k
L1 norm is bounded:
 2
X X
(2.92) kFk kL1 = k |φn |k2L2 ≤  kφn kL2  ≤ C 2 < ∞
n≤k n≤k

using the triangle inequality and absolutely L2 summability. Thus, by Monotone


Convergence, Fk (x) → F (x) a.e., Fk → F ∈ L1 (R) and Fk (x) ≤ F (x) a.e., where
we define F (x) to be the limit when this exists and
P zero otherwise.
Thus the sequence of partial sums uk (x) = φn (x) converges almost every-
n≤k
where – since it converges (absoliutely) on the set where Fk is bounded. Let f (x)
be the limit. We want to show that f ∈ L2 (R) but this follows from the definition
since
X
(2.93) |uk (x)|2 ≤ ( |φn (x)|)2 = Fk (x) ≤ F (x) a.e.
n≤k

As in (2.91) it follows that


Z
(2.94) |uk (x) − f (x)|2 → 0.

As for the case of L1 (R) it now follows that L2 (R) is complete. 


We want to check that L2 (R) is a Hilbert space (which I will define very soon,
even though it is in the next Chapter); to do so observe that if f, g ∈ L2 (R)
have approximating sequences un , vn as in Definition 2.4, so |un (x)|2 ≤ F (x) and
|vn (x)|2 ≤ G(x) with F, G ∈ L1 (R) then
1
(2.95) un (x)vn (x) → f (x)g(x) a.e. and |un (x)vn (x)| ≤ (F (x) + G(x))
2
1
shows that f g ∈ L (R) by Dominated Convergence. This leads to the basic property
of the norm on a (pre)-Hilbert space – that it comes from an inner product. In this
case
Z
1
(2.96) hf, giL2 = f (x)g(x), kf kL2 = hf, f i 2 .

At this point I normally move on to the next chapter on Hilbert spaces with
L2 (R) as one motivating example.
9. MEASURABLE AND NON-MEASURABLE SETS 57

9. Measurable and non-measurable sets


The σ-algebra of Lebesgue measurable sets on the line is discussed below but
we can directly consider the notion of a set of finite Lebesgue measure. Namely
such a set A ⊂ R is defined by the condition that the chactacteristic function
(
1 if x ∈ A
(2.97) χA (x) =
0 if x ∈ /A
1
R integrable, χA ∈ L (R). The measure of the set (think ‘length’) is then
is Lebesgue
µ(A) = R χA the properties of which are discussed below. Certainly if A ⊂ [−R, R]
has finite measure then µ(A) ≤ 2R from the properties of the integral. Similalry if
Ai ⊂ [−R, R] are a sequence of sets of finite measure which are disjoint, Ai ∩Aj = ∅,
i 6= j, then
G X
(2.98) A= Ai has finite measure and µ(A) = µ(Ai )
i i
using Monotone Convergence.
Now the question arises, enquiring minds want to know after all:- Are there
bounded sets which are not of finite measure? Similarly, are there functions of
bounded support which are not integrable? It turns out this question gets us into
somewhat deep water, but it is important to understand some of the limitiations
that the insistence on precision in Mathematics places on its practitioners!
Let me present a standard construction of a non-(Lebesgue-)measurable subset
of [0, 1] and then comment on the issues that it raises. We start with the quotient
space and quotient map
(2.99) q : R −→ R/Q, q(x) = {y ∈ R; y = x + r, r ∈ Q}.
This partitions R into disjoint subsets
G
(2.100) R= q −1 (τ ).
τ ∈R/Q

Two of these sets intersect if and only if they have elements differing by a rational,
and then they are the same.
Now, each of these sets q −1 (τ ) intersects [0, 1]. This follows from the density of
the rationals in the reals, since if x ∈ q −1 (τ ) there exists r ∈ Q such that |x−r| < 21
and then x0 = x + (−r + 12 ) ∈ q −1 (τ ) ∩ [0, 1]. So we can ‘localize’ (2.100) to
G
(2.101) [0, 1] = L(τ ), L(τ ) = q −1 (τ ) ∩ [0, 1]
τ ∈R/Q

where all the sets L(τ ) are non-empty.


Definition 2.6. A Vitali set, V ⊂ [0, 1], is a set which contains precisely one
element from each of the L(τ ).
Take such a set V and consider the translates of it by rationals in [−1, 1],
(2.102) Vr = {y ∈ [−1, 2]; y = x + r, x ∈ V }, r ∈ Q, |r| ≤ 1.
For different r these are disjoint – since by construction no two distinct elements
of V differ by a rational. The union of these sets however satisfies
G
(2.103) [0, 1] ⊂ Vr ⊂ [−1, 2].
r∈Q,|r|≤1
58 2. THE LEBESGUE INTEGRAL

Now, we can simply order the sets Vr into a sequence Ai by ordering the rationals
in [−1, 1].
Suppose V is of finite Lebesgue measure. Then we know that all the Vr are
of finite measure and µ(Vr ) = µ(V ) = µ(Ai ) for all i, from the properties of the
Lebesgue integral. This means that (2.98) applies, so we have the inequalities

X
(2.104) µ([0, 1]) = 1 ≤ µ(V ) ≤ 3 = µ([−1, 2]).
i=1
Clearly we have a problem! The only way the right-hand inequality can hold is if
µ(V ) = 0, but then the left-hand inequality fails.
Our conclusion then is that V cannot be Lebesgue measurable! Or is it? Since
we are careful people we trace back through the discussion and see (it took people
a long, long, time to recognize this) more precisely:-
Proposition 2.8. If a Vitali set, V ⊂ [0, 1] exists, containing precisely one
element of each of the sets L(τ ), then it is bounded and not of finite Lebesgue
measure; its characteristic function is a non-negative function of bounded support
which is not Lebesgue integrable.
Okay, so what is the ‘issue’ here. It is that the existence of such a Vitali set
requires the Axiom of Choice. There are lots of sets L(τ ) so from the standard
(Zermelo-Fraenkel) axions of set theory it does not follow that you can ‘choose an
element from each’ to form a new set. That is a (slightly informal) version of the
additional axiom. Now, it has been shown (namely by Gödel and Cohen) that the
Axiom of Choice is independent of the Zermelo-Fraenkel Axioms. This does not
mean consistency, it means conditional consistency. The Zermelo-Fraenkel axioms
together with the Axiom of Choice are inconsistent if and only if the Zermelo-
Fraenkel axioms on their own are inconsistent.
Conclusion: As a working Mathematician you are free to choose to believe in
the Axiom of Choice or not. It will make your life easier if you do, but it is up to
you. Note that if you do not admit the Axiom of Choice, it does not mean that
all bounded real sets are measurable, in the sense that you can prove it. Rather it
means that it is consistent to believe this (as shown by Solovay).
See also the discussion of the Hahn-Banach Theorem in Section 1.12.

10. Measurable functions


From our original definition of L1 (R) we know that Cc (R) is dense in L1 (R).
We also know that elements of Cc (R) can be approximated uniformly, and hence in
L1 (R) by step functions – finite linear combinations of the characteristic functions
of intervals. It is usual in measure theory to consider a somewhat larger class of
functions which contains the step functions:
Definition 2.7. A simple function on R is a finite linear combination (gener-
ally with complex coefficients) of characteristic functions of subsets of finite mea-
sure:
XN
(2.105) f= cj χ(Bj ), χ(Bj ) ∈ L1 (R), cj ∈ C.
j=1

The real and imaginary parts of a simple function are simple and the positive
and negative parts of a real simple function are simple. Since step functions are
11. THE SPACES Lp (R) 59

simple, we know that simple functions are dense in L1 (R) and that if 0 ≤ F ∈ L1 (R)
then there exists a sequence of simple functions (take them to be a summable
sequence of step functions) fn ≥ 0 such that fn → F almost everywhere and
fn ≤ G for some other G ∈ L1 (R).
We elevate a special case of the second notion of convergence above to a defi-
nition.
Definition 2.8. A function f : R −→ C is (Lebesgue) measurable if it is the
pointwise limit almost everywhere of a sequence of simple functions.
Lemma 2.10. A function is Lebesgue measurable if and only if it is the pointwise
limit, almost everywhere, of a sequence of continuous functions of compact support.
Proof. Continuous functions of compact support are the uniform limits of
step functions, so this condition certainly implies measurability in the sense of
Definition 2.8. Conversely, suppose a function f is the limit almost everywhere
of a squence un of simple functions. Each of these functions is integrable, so we
can find φn ∈ Cc (R) such that kun − φn kL1 < 2−n . Then the telescoped sequence
v1 = u1 − φ1 , vk = (uk − φk ) − (uk−1 − φk−1 ), k > 1, is absolutely summable so
un − φn → 0 almost everywhere, and hence φn → f off a set of measure zero. 

So replacing ‘simple functions’ by continuous functions in Definition 2.8 makes


no difference – and the same for approximation by elements of L1 (R).
The measurable functions form a linear space since if f and g are measurable
and fn , gn are sequences of simple functions as required by the definition then
c1 fn (x) + c2 f2 (x) → c1 f (x) + c2 g(x) on the intersection of the sets where fn (x) →
f (x) and gn (x) → g(x) which is the complement of a set of measure zero.
Now, from the discussion above, we know that each element of L1 (R) is mea-
surable. Conversely:
Lemma 2.11. A function f : R −→ C is an element of L1 (R) if and only if it
is measurable and there exists F ∈ L1 (R) such that |f | ≤ F almost everywhere.
Proof. If f is measurable there exists a sequence of simple functions fn such
that fn → f almost everywhere. The real part, Re f, is also measurable as the
limit almost everywhere of Re fn and from the hypothesis | Re f | ≤ F. We know
that there exists a sequence of simple functions gn , gn → F almost everywhere and
0 ≤ gn ≤ G for another element G ∈ L1 (R). Then set

gn (x)
 if Re fn (x) > gn (x)
(2.106) un (x) = Re fn (x) if − gn (x) ≤ Re fn (x) ≤ gn (x)

−gn (x) if Re fn (x) < −gn (x).

Thus un = max(vn , −gn ) where vn = min(Re fn , gn ) so un is simple and un → f


almost everywhere. Since |un | ≤ G it follows from Lebesgue Dominated Conver-
gence that Re f ∈ L1 (R). The same argument shows Im f = − Re(if ) ∈ L1 (R) so
f ∈ L1 (R) as claimed. 

11. The spaces Lp (R)


We use Lemma 2.11 as a model:
60 2. THE LEBESGUE INTEGRAL

Definition 2.9. For 1 ≤ p < ∞ we set


(2.107) Lp (R) = {f : R −→ C; f is measurable and |f |p ∈ L1 (R)}.
For p = ∞ we set
(2.108) L∞ (R) = {f : R −→ C; f measurable and ∃ C s.t. |f (x)| ≤ C a.e}
Observe that, in view of Lemma 2.10, the case p = 2 gives the same space as
Definition 2.4.
Proposition 2.9. For each 1 ≤ p < ∞,
Z  p1
p
(2.109) kukLp = |u|

is a seminorm on the linear space Lp (R) vanishing only on the null functions and
making the quotient Lp (R) = Lp (R) N into a Banach space.
Proof. The real part of an element of Lp (R) is in Lp (R) since it is measurable
and | Re f |p ≤ |f |p so | Re f |p ∈ L1 (R). Similarly, Lp (R) is linear; it is clear that
cf ∈ Lp (R) if f ∈ Lp (R) and c ∈ C and the sum of two elements, f, g, is measurable
and satisfies |f + g|p ≤ 2p (|f |p + |g|p ) so |f + g|p ∈ L1 (R).
We next strengthen (2.107) to the approximation condition that there exists a
sequence of simple functions vn such that
(2.110) vn → f a.e. and |vn |p ≤ F ∈ L1 (R) a.e.
which certainly implies (2.107). As in the proof of Lemma 2.11, suppose f ∈
Lp (R) is real and choose fn real-valued simple functions and converging to f almost
everywhere. Since |f |p ∈ L1 (R) there is a sequence of simple functions 0 ≤ hn such
that |hn | ≤ F for some F ∈ L1 (R) and hn → |f |p almost everywhere. Then set
1
gn = hnp which is also a sequence of simple functions and define vn by (2.106). It
follows that (2.110) holds for the real part of f but combining sequences for real
and imaginary parts such a sequence exists in general.
The advantage of the approximation condition (2.110) is that it allows us to
conclude that the triangle inequality holds for kukLp defined by (2.109) since we
know it for simple functions and from (2.110) it follows that |vn |p → |f |p in L1 (R)
so kvn kLp → kf kLp . Then if wn is a similar sequence for g ∈ Lp (R)
(2.111)
kf +gkLp ≤ lim sup kvn +wn kLp ≤ lim sup kvn kLp +lim sup kwn kLp = kf kLp +kgkLp .
n n n

The other two conditions being clear it follows that kukLp is a seminorm on Lp (R).
The vanishing of kukLp implies that |u|p and hence u ∈ N and the converse
follows immediately. Thus Lp (R) = Lp (R) N is a normed space and it only remains
to check completeness.
We know that completeness is equivalent to the convergence of any absolutely
summable series. So, we can suppose fn ∈ Lp (R) have
X Z  p1
(2.112) |fn |p < ∞.
n

Consider the sequence gn = fn χ[−R,R] for some fixed R > 0. This is in L1 (R) and
1
(2.113) kgn kL1 ≤ (2R) q kfn kLp
12. LEBESGUE MEASURE 61

by the integral form of Hölder’s inequality


(2.114) Z
1 1
f ∈ Lp (R), g ∈ Lq (R), + = 1 =⇒ f g ∈ L1 (R) and | f g| ≤ kf kLp |kgkLq
p q
which can be proved by the same approximation argument as above, see Problem ??.
Thus the series gn is absolutely summable inPL1 and so converges absolutely al-
most everywhere. It follows that the series fn (x) converges absolutely almost
P n
everywhere – since it is just gn (x) on [−R, R], to a function, f.
n
So, we only need show that f ∈ Lp (R) and that |f − Fn |p → 0 as n → ∞
R
n n
|fk |)p has
P P
where Fn = fk . By Minkowski’s inequality we know that hn = (
k=1 k=1
bounded L1 norm, since
1
n
X X
p
(2.115) k|hn |kL1 = k |fk |kLp . ≤ kfk kLp .
k=1 k

Thus, hn is an increasing sequence of functions in L1 (R) with bounded integral,


so by the Monotonicity Lemma it converges a.e. to a function h ∈ L1 (R). Since
|Fn |p ≤ h and |Fn |p → |f |p a.e. it follows by Dominated convergence that
1 X
(2.116) |f |p ∈ L1 (R), k|f |p kLp 1 ≤ kfn kLp
n
p
and hence f ∈ L (R). Applying the same reasoning to f − Fn which is the sum of
the series starting at term n + 1 gives the norm convergence:
X
(2.117) kf − Fn kLp ≤ kfk kLp → 0 as n → ∞.
k>n

A function f : R −→ C is locally integrable if
(
f (x) x ∈ [−N, N ]
(2.118) F[−N,N ] = =⇒ F[−N,N ] ∈ L1 (R) ∀ N.
0 x if |x| > N
So any continuous function on R is locally integrable as is any element of L1 (R).
Lemma 2.12. The locally integrable functions form a linear space, L1loc (R) and
Lp (R) = {f ∈ L1loc (R); |f |p ∈ L1 (R)} 1 ≤ p < ∞
(2.119) L∞ (R) = {f ∈ L1loc (R); sup |f (x)| < ∞ for some E of measure zero.}
R\E

The proof is left as an exercise.

12. Lebesgue measure


In case anyone is interested in how to define Lebesgue measure from where we
are now we can just use the integral.
Definition 2.10. A set A ⊂ R is measurable if its characteristic function χA is
locally integrable. A measurable set A has finite measure if χA ∈ L1 (R) and then
Z
(2.120) µ(A) = χA
62 2. THE LEBESGUE INTEGRAL

is the Lebesgue measure of A. If A is measurable but not of finite measure then


µ(A) = ∞ by definition.

We know immediately that any interval (a, b) is measurable (indeed whether


open, semi-open or closed) and has finite measure if and only if it is bounded –
then the measure is b − a.

Proposition 2.10. The complement of a measurable set is measurable and any


countable union or countable intersection of measurable sets is measurable.

Proof. The first part follows from the fact that the constant function 1 is
locally integrable and hence χR\A = 1 − χA is locally integrable if and only if χA is
locally integrable.
Notice the relationship between a characteristic function and the set it defines:-

(2.121) χA∪B = max(χA , χB ), χA∩B = min(χA , χB ).


S
If we have a sequence of sets An then Bn = k≤n Ak is clearly an increasing
sequence of sets and
X
(2.122) χBn → χB , B = An
n

is an increasing sequence which converges pointwise (at each point it jumps to 1


somewhere and then stays or else stays at 0.) Now, if we multiply by χ[−N,N ] then

(2.123) fn = χ[−N,N ] χBn → χB∩[−N,N ]

is an increasing sequence of integrable functions – assuming that is that the Ak ’s are


measurable – with integral bounded above, by 2N. Thus by the S monotonicity lemma
the limit is integrable so χB is locally integrable and hence n An is measurable.
For countable intersections the argument is similar, with the sequence of char-
acteristic functions decreasing. 

Corollary 2.1. The (Lebesgue) measurable subsets of R form a collection,


M, of the power set of R, including ∅ and R which is closed under complements,
countable unions and countable intersections.

Such a collection of subsets of a set X is called a ‘σ-algebra’ – so a σ-algebra


Σ in a set X is a collection of subsets of X containing X, ∅, the complement of
any element and countable unions and intersections of any element. A (positive)
measure is usually defined as a map µ : Σ −→ [0, ∞] with µ(∅) = 0 and such that
[ X
(2.124) µ( En ) = µ(En )
n n

for any sequence {Em } of sets in Σ which are disjoint (in pairs).
As for Lebesgue measure a set A ∈ Σ is ‘measurable’ and if µ(A) is not of finite
measure it is said to have infinite measure – for instance R is of infinite measure
in this sense. Since the measure of a set is always non-negative (or undefined if it
isn’t measurable) this does not cause any problems and in fact Lebesgue measure
is countably additive as in (2.124) provided we allow ∞ as a value of the measure.
It is a good exercise to prove this!
13. HIGHER DIMENSIONS 63

13. Higher dimensions


I have never actually covered this in lectures – there is simply not enough time.
Still it is worth knowing that the Lebesgue integral in higher dimensional Euclidean
spaces can be obtained following the same line of reasoning. So, we want – with
the advantage of a little more experience – to go back to the beginning and define
L1 (Rn ), L1 (Rn ), L2 (Rn ) and L2 (Rn ). In fact relatively little changes but there are
some things that one needs to check a little carefully.
The first hurdle is that I am not assuming that you have covered the Riemann
integral in higher dimensions; it is in my view a rather pointless thing to do anyway.
Fortunately we do not really need that since we can just iterate the one-dimensional
Riemann integral for continuous functions. So, define
(2.125) Cc (Rn ) = {u : Rn −→ C; continuous and such that u(x) = 0 for |x| > R}
where of course the R can depend on the element. Now, if we hold say the last
n − 1 variables fixed, we get a continuous function of one variable which vanishes
when |x| > R :
(2.126) u(·, x2 , . . . , xn ) ∈ Cc (R) for each (x2 , . . . , xn ) ∈ Rn−1 .
So we can integrate it and get a function
Z
(2.127) I1 (x2 , . . . , xn ) = u(x, x1 , . . . , xn ), I1 : Rn−1 −→ C.
R

Lemma 2.13. For each u ∈ Cc (Rn ), I1 ∈ Cc (Rn−1 ).


Proof. Certainly if |(x2 , . . . , xn )| > R then u(·, x2 , . . . , xn ) ≡ 0 as a function of
the first variable and hence I1 = 0 in |(x2 , . . . , xn )| > R. The continuity follows from
the uniform continuity of a function on the compact set |x| ≤ R, |(x2 , . . . , xn ) ≤ R
of Rn . Thus given  > 0 there exists δ > 0 such that
(2.128) |x − x0 | < δ, |y − y 0 |Rn−1 < δ =⇒ |u(x, y) − u(x0 , y 0 )| < .
From the standard estimate for the Riemann integral,
Z R
0
(2.129) |I1 (y) − I1 (y )| ≤ |u(x, y) − u(x, y 0 )|dx ≤ 2R
−R
0
if |y − y | < δ. This implies the (uniform) continuity of I1 . Thus I1 ∈ Cc (Rn−2 ) 

The upshot of this lemma is that we can integrate again, and hence a total of
n times and so define the (iterated) Riemann integral as
Z Z R Z R Z R
(2.130) u(z)dz = ··· u(x1 , x2 , x3 , . . . , xn )dx1 dx2 . . . dxn ∈ C.
Rn −R −R −R

Lemma 2.14. The interated Riemann integral is a well-defined linear map


(2.131) Cc (Rn ) −→ C
which satisfies
Z Z
(2.132) | u| ≤ |u| ≤ (2R)n sup |u| if u ∈ Cc (Rn ) and u(z) = 0 in |z| > R.

Proof. This follows from the standard estimate in one dimension. 


64 2. THE LEBESGUE INTEGRAL

Now, one slightly annoying thing is that we would really want to know that
the integral is independent of the order of integration. In fact it is not hard – see
Problem XX. Again using properties of the one-dimensional Riemann integral we
find:-
Lemma 2.15. The iterated integral
Z
(2.133) kukL1 = |u|
Rn
is a norm on Cc (Rn ).
Definition 2.11. The space L1 (Rn ) is defined to consist of those functions
f : Rn −→ C such that there exists a sequence {fn } which is absolutely summable
with respect to the L1 norm and such that
X X
(2.134) |fn (x)| < ∞ =⇒ fn (x) = f (x).
n n

Now you can go through the whole discusion above in this higher dimensional
case, and the only changes are really notational!
Things get a littlem more complicated in the discussion of change of variable.
This is covered in the problems. There are also a few other theorems it is good to
know!
CHAPTER 3

Hilbert spaces

There are really three ‘types’ of Hilbert spaces (over C). The finite dimen-
sional ones, essentially just Cn , for different integer values of n, with which you are
pretty familiar, and two infinite dimensional types corresponding to being separa-
ble (having a countable dense subset) or not. As we shall see, there is really only
one separable infinite-dimensional Hilbert space (no doubt you realize that the Cn
are separable) and that is what we are mostly interested in. Nevertheless we try
to state results in general and then give proofs (usually they are the nicest ones)
which work in the non-separable cases too.
I will first discuss the definition of pre-Hilbert and Hilbert spaces and prove
Cauchy’s inequality and the parallelogram law. This material can be found in many
other places, so the discussion here will be kept succinct. One nice source is the
book of G.F. Simmons, “Introduction to topology and modern analysis” [5]. I like
it – but I think it is long out of print. RBM:Add description of con-
tents when complete and
1. pre-Hilbert spaces mention problems

A pre-Hilbert space, H, is a vector space (usually over the complex numbers


but there is a real version as well) with a Hermitian inner product
h, i : H × H −→ C,
(3.1) hλ1 v1 + λ2 v2 , wi = λ1 hv1 , wi + λ2 hv2 , wi,
hw, vi = hv, wi
for any v1 , v2 , v and w ∈ H and λ1 , λ2 ∈ C which is positive-definite
(3.2) hv, vi ≥ 0, hv, vi = 0 =⇒ v = 0.
Note that the reality of hv, vi follows from the ‘Hermitian symmetry’ condition in
(3.1), the positivity is an additional assumption as is the positive-definiteness.
The combination of the two conditions in (3.1) implies ‘anti-linearity’ in the
second variable
(3.3) hv, λ1 w1 + λ2 w2 i = λ1 hv, w1 i + λ2 hv, w2 i
which is used without comment below.
The notion of ‘definiteness’ for such an Hermitian inner product exists without
the need for positivity – it just means
(3.4) hu, vi = 0 ∀ v ∈ H =⇒ u = 0.
Lemma 3.1. If H is a pre-Hilbert space with Hermitian inner product h, i then
1
(3.5) kuk = hu, ui 2
is a norm on H.
65
66 3. HILBERT SPACES

Proof. The first condition on a norm follows from (3.2). Absolute homogene-
ity follows from (3.1) since
(3.6) kλuk2 = hλu, λui = |λ|2 kuk2 .
So, it is only the triangle inequality we need. This follows from the next lemma,
which is the Cauchy-Schwarz inequality in this setting – (3.8). Indeed, using the
‘sesqui-linearity’ to expand out the norm

(3.7) ku + vk2 = hu + v, u + vi
= kuk2 + hu, vi + hv, ui + kvk2 ≤ kuk2 + 2kukkvk + kvk2
= (kuk + kvk)2 .


Lemma 3.2. The Cauchy-Schwarz inequality,


(3.8) |hu, vi| ≤ kukkvk ∀ u, v ∈ H
holds in any pre-Hilbert space.
Proof. This inequality is trivial if either u or v vanishes. For any non-zero u,
v ∈ H and s ∈ R positivity of the norm shows that
(3.9) 0 ≤ ku + svk2 = kuk2 + 2s Rehu, vi + s2 kvk2 .
This quadratic polynomial in s is non-zero for s large so can have only a single
minimum at which point the derivative vanishes, i.e. it is where
(3.10) 2skvk2 + 2 Rehu, vi = 0.
Substituting this into (3.9) gives
(3.11) kuk2 − (Rehu, vi)2 /kvk2 ≥ 0 =⇒ | Rehu, vi| ≤ kukkvk
which is what we want except that it is only the real part. However, we know that,
for some z ∈ C with |z| = 1, Rehzu, vi = Re zhu, vi = |hu, vi| and applying (3.11)
with u replaced by zu gives (3.8). 

Corollary 3.1. The inner product is continuous on the metric space (i.e. with
respect to the norm) H × H.
Proof. Corollaries really aren’t supposed to require proof! If (uj , vj ) → (u, v)
then, by definition ku − uj k → 0 and kv − vj k → 0 so from

(3.12) |hu, vi − huj , vj i| ≤ |hu, vi − hu, vj i| + |hu, vj i − huj , vj i|


≤ kukkv − vj k + ku − uj kkvj k
continuity follows. 

Corollary 3.2. The Cauchy-Scwharz inequality is optimal in the sense that


(3.13) kuk = sup |hu, vi|.
v∈H;kvk≤1

I really will leave this one to you.


3. ORTHONORMAL SEQUENCES 67

2. Hilbert spaces
Definition 3.1. A Hilbert space H is a pre-Hilbert space which is complete
with respect to the norm induced by the inner product.
As examples we know that Cn with the usual inner product
n
X
(3.14) hz, z 0 i = zj zj0
j=1

is a Hilbert space – since any finite dimensional normed space is complete. The
example we had from the beginning of the course is l2 with the extension of (3.14)

X
(3.15) ha, bi = aj bj , a, b ∈ l2 .
j=1

Completeness was shown earlier.


The whole outing into Lebesgue integration was so that we could have the
‘standard example’ at our disposal, namely
(3.16) L2 (R) = {u ∈ L1loc (R); |u|2 ∈ L1 (R)}/N
where N is the space of null functions. The inner product is
Z
(3.17) hu, vi = uv.
2 1
R that we showed that if u, v ∈ L (R) then uv ∈ L (R). We also showed that
Note
if |u|2 = 0 then u = 0 almost everywhere, i.e. u ∈ N , which is the definiteness of
the inner product (3.17). It is fair to say that we went to some trouble to prove
the completeness of this norm, so L2 (R) is indeed a Hilbert space.

3. Orthonormal sequences
Two elements of a pre-Hilbert space H are said to be orthogonal if
(3.18) hu, vi = 0 which can be written u ⊥ v.
A sequence of elements ei ∈ H, (finite or infinite) is said to be orthonormal if
kei k = 1 for all i and hei , ej i = 0 for all i 6= j.
Proposition 3.1 (Bessel’s inequality). If ei , i ∈ N, is an orthonormal sequence
in a pre-Hilbert space H, then
X
(3.19) |hu, ei i|2 ≤ kuk2 ∀ u ∈ H.
i

Proof. Start with the finite case, i = 1, . . . , N. Then, for any u ∈ H set
N
X
(3.20) v= hu, ei iei .
i=1

This is supposed to be ‘the projection of u onto the span of the ei ’. Anyway,


computing away we see that
N
X
(3.21) hv, ej i = hu, ei ihei , ej i = hu, ej i
i=1
68 3. HILBERT SPACES

using orthonormality. Thus, u − v ⊥ ej for all j so u − v ⊥ v and hence


(3.22) 0 = hu − v, vi = hu, vi − kvk2 .
Thus kvk2 = |hu, vi| and applying the Cauchy-Schwarz inequality we conclude that
kvk2 ≤ kvkkuk so either v = 0 or kvk ≤ kuk. Expanding out the norm (and
observing that all cross-terms vanish)
N
X
kvk2 = |hu, ei i|2 ≤ kuk2
i=1

which is (3.19).
In case the sequence is infinite this argument applies to any finite subsequence,
ei , i = 1, . . . , N since it just uses orthonormality, so (3.19) follows by taking the
supremum over N. 

4. Gram-Schmidt procedure
Definition 3.2. An orthonormal sequence, {ei }, (finite or infinite) in a pre-
Hilbert space is said to be maximal if
(3.23) u ∈ H, hu, ei i = 0 ∀ i =⇒ u = 0.
Theorem 3.1. Every separable pre-Hilbert space contains a maximal orthonor-
mal sequence.
Proof. Take a countable dense subset – which can be arranged as a sequence
{vj } and the existence of which is the definition of separability – and orthonormalize
it. First if v1 6= 0 set ei = v1 /kv1 k. Proceeding by induction we can suppose we
have found, for a given integer n, elements ei , i = 1, . . . , m, where m ≤ n, which
are orthonormal and such that the linear span
(3.24) sp(e1 , . . . , em ) = sp(v1 , . . . , vn ).
We certainly have this for n = 1. To show the inductive step observe that if vn+1
is in the span(s) in (3.24) then the same ei ’s work for n + 1. So we may as well
assume that the next element, vn+1 is not in the span in (3.24). It follows that
n
X w
(3.25) w = vn+1 − hvn+1 , ej iej 6= 0 so em+1 =
j=1
kwk
makes sense. By construction it is orthogonal to all the earlier ei ’s so adding em+1
gives the equality of the spans for n + 1.
Thus we may continue indefinitely, since in fact the only way the dense set
could be finite is if we were dealing with the space with one element, 0, in the first
place. There are only two possibilities, either we get a finite set of ei ’s or an infinite
sequence. In either case this must be a maximal orthonormal sequence. That is,
we claim
(3.26) H 3 u ⊥ ej ∀ j =⇒ u = 0.
This uses the density of the vj ’s. There must exist a sequence wk where each wk is
a vj , such that wk → u in H, assumed to satisfy (3.26). Now, each vj , and hence
each wk , is a finite linear combination of el ’s so, by Bessel’s inequality
X X
(3.27) kwk k2 = |hwk , el i|2 = |hu − wk , el i|2 ≤ ku − wk k2
l l
6. ISOMORPHISM TO l2 69

where hu, el i = 0 for all l has been used. Thus kwk k → 0 and hence u = 0. 
Although a non-complete but separable pre-Hilbert space has maximal or-
thonormal sets, these are not much use without completeness.

5. Orthonormal bases
Definition 3.3. In view of the following result, a maximal orthonormal se-
quence in a separable Hilbert space will be called an orthonormal basis; it is often
called a ‘complete orthonormal basis’ but the ‘complete’ is really redundant.
This notion of basis is not quite the same as in the finite dimensional case (although
it is a legitimate extension of it). There are other, quite different, notions of a basis
in infinite dimensions. See for instance ‘Hamel basis’ which arises in some settings
– it is discussed briefly in §1.12 and can be used to show the existence of a non-
continuous functional on a Banach space.
Theorem 3.2. If {ei } is an orthonormal basis (a maximal orthonormal se-
quence) in a Hilbert space then for any element u ∈ H the ‘Fourier-Bessel series’
converges to u :
X∞
(3.28) u= hu, ei iei .
i=1

In particular a Hilbert space with an orthonormal basis is separable!


Proof. The sequence of partial sums of the Fourier-Bessel series
N
X
(3.29) uN = hu, ei iei
i=1

is Cauchy. Indeed, if m < m0 then


0
m
X X
2
(3.30) kum0 − um k = |hu, ei i|2 ≤ |hu, ei i|2
i=m+1 i>m

which is small for large m by Bessel’s inequality. Since we are now assuming
completeness, um → w in H. However, hum , ei i = hu, ei i as soon as m > i and
|hw − un , ei i| ≤ kw − un k so in fact
(3.31) hw, ei i = lim hum , ei i = hu, ei i
m→∞
for each i. Thus u − w is orthogonal to all the ei so by the assumed completeness
of the orthonormal basis must vanish. Thus indeed (3.28) holds. 

6. Isomorphism to l2
A finite dimensional Hilbert space is isomorphic to Cn with its standard inner
product. Similarly from the result above
Proposition 3.2. Any infinite-dimensional separable Hilbert space (over the
complex numbers) is isomorphic to l2 , that is there exists a linear map
(3.32) T : H −→ l2
which is 1-1, onto and satisfies hT u, T vil2 = hu, viH and kT ukl2 = kukH for all u,
v ∈ H.
70 3. HILBERT SPACES

Proof. Choose an orthonormal basis – which exists by the discussion above


– and set

(3.33) T u = {hu, ej i}∞


j=1 .

This maps H into l2 by Bessel’s inequality. Moreover, it is linear since the entries
in the sequence are linear in u. It is 1-1 since T u = 0 implies hu, ej i = 0 for all j
implies u = 0 by the assumed completeness of the orthonormal basis. It is surjective
since if {cj }∞ 2
j=1 ∈ l then


X
(3.34) u= cj ej
j=1

converges in H. This is the same argument as above – the sequence of partial sums
is Cauchy since if n > m,
n
X n
X
(3.35) k cj ej k2H = |cj |2 .
j=m+1 j=m+1

Again by continuity of the inner product, T u = {cj } so T is surjective.


The equality of the norms follows from equality of the inner products and the
latter follows by computation for finite linear combinations of the ej and then in
general by continuity. 

7. Parallelogram law
What exactly is the difference between a general Banach space and a Hilbert
space? It is of course the existence of the inner product defining the norm. In fact
it is possible to formulate this condition intrinsically in terms of the norm itself.

Proposition 3.3. In any pre-Hilbert space the parallelogram law holds –

(3.36) kv + wk2 + kv − wk2 = 2kvk2 + 2kwk2 , ∀ v, w ∈ H.

Proof. Just expand out using the inner product

(3.37) kv + wk2 = kvk2 + hv, wi + hw, vi + kwk2

and the same for kv − wk2 and see the cancellation. 

Proposition 3.4. Any normed space where the norm satisfies the parallelogram
law, (3.36), is a pre-Hilbert space in the sense that
1
kv + wk2 − kv − wk2 + ikv + iwk2 − ikv − iwk2

(3.38) hv, wi =
4
is a positive-definite Hermitian inner product which reproduces the norm.

Proof. A problem below. 

So, when we use the parallelogram law and completeness we are using the
essence of the Hilbert space.
9. ORTHOCOMPLEMENTS AND PROJECTIONS 71

8. Convex sets and length minimizer


The following result does not need the hypothesis of separability of the Hilbert
space and allows us to prove the subsequent results – especially Riesz’ theorem –
in full generality.
Proposition 3.5. If C ⊂ H is a subset of a Hilbert space which is
(1) Non-empty
(2) Closed
(3) Convex, in the sense that v1 , v2 ∈ C implies 12 (v1 + v2 ) ∈ C
then there exists a unique element v ∈ C closest to the origin, i.e. such that
(3.39) kvk = inf kuk.
u∈C

Proof. By definition of the infimum of a non-empty set of real numbers which


is bounded below (in this case by 0) there must exist a sequence {vn } in C such
that kvn k → d = inf u∈C kuk. We show that vn converges and that the limit is the
point we want. The parallelogram law can be written
(3.40) kvn − vm k2 = 2kvn k2 + 2kvm k2 − 4k(vn + vm )/2k2 .
Since kvn k → d, given  > 0 if N is large enough then n > N implies 2kvn k2 <
2d2 + 2 /2. By convexity, (vn + vm )/2 ∈ C so k(vn + vm )/2k2 ≥ d2 . Combining
these estimates gives
(3.41) n, m > N =⇒ kvn − vm k2 ≤ 4d2 + 2 − 4d2 = 2
so {vn } is Cauchy. Since H is complete, vn → v ∈ C, since C is closed. Moreover,
the distance is continuous so kvk = limn→∞ kvn k = d.
Thus v exists and uniqueness follows again from the parallelogram law. If v
and v 0 are two points in C with kvk = kv 0 k = d then (v + v 0 )/2 ∈ C so
(3.42) kv − v 0 k2 = 2kvk2 + 2kv 0 k2 − 4k(v + v 0 )/2k2 ≤ 0 =⇒ v = v 0 .
Alternatively you can just observe that we have actually shown above that any
sequence in C such that kvn k → d converges, so this is true for the alternating
sequence of v and v 0 . 

9. Orthocomplements and projections


Proposition 3.6. If W ⊂ H is a linear subspace of a Hilbert space then
(3.43) W ⊥ = {u ∈ H; hu, wi = 0 ∀ w ∈ W }
is a closed linear subspace and W ∩ W ⊥ = {0}. If W is also closed then
(3.44) H = W ⊕ W⊥
meaning that any u ∈ H has a unique decomposition u = w + w⊥ where w ∈ W
and w⊥ ∈ W ⊥ .
Proof. That W ⊥ defined by (3.43) is a linear subspace follows from the lin-
earity of the condition defining it. If u ∈ W ⊥ and u ∈ W then u ⊥ u by the
definition so hu, ui = kuk2 = 0 and u = 0; thus W ∩ W ⊥ = {0}. Since the map
H 3 u −→ hu, wi ∈ C is continuous for each w ∈ H its null space, the inverse image
of 0, is closed. Thus
\
(3.45) W⊥ = {u ∈ H; hu, wi = 0}
w∈W
72 3. HILBERT SPACES

is closed.
Now, suppose W is closed. If W = H then W ⊥ = {0} and there is nothing to
show. So consider u ∈ H, u ∈
/ W and set
(3.46) C = u + W = {u0 ∈ H; u0 = u + w, w ∈ W }.
Then C is closed, since a sequence in it is of the form u0n = u + wn where wn is a
sequence in W and u0n converges if and only if wn converges. Also, C is non-empty,
since u ∈ C and it is convex since u0 = u + w0 and u00 = u + w00 in C implies
(u0 + u00 )/2 = u + (w0 + w00 )/2 ∈ C.
Thus the length minimization result above applies and there exists a unique
v ∈ C such that kvk = inf u0 ∈C ku0 k. The claim is that this v is orthogonal to W –
draw a picture in two real dimensions! To see this consider an aritrary point w ∈ W
and λ ∈ C then v + λw ∈ C and
(3.47) kv + λwk2 = kvk2 + 2 Re(λhv, wi) + |λ|2 kwk2 .
Choose λ = teiθ where t is real and the phase is chosen so that eiθ hv, wi = |hv, wi| ≥
0. Then the fact that kvk is minimal means that
kvk2 + 2t|hv, wi)| + t2 kwk2 ≥ kvk2 =⇒
(3.48)
t(2|hv, wi| + tkwk2 ) ≥ 0 ∀ t ∈ R =⇒ |hv, wi| = 0
which is what we wanted to show.
Thus indeed, given u ∈ H \ W we have constructed v ∈ W ⊥ such that u =
v + w, w ∈ W. This is (3.44) with the uniqueness of the decomposition already
shown since it reduces to 0 having only the decomposition 0 + 0 and this in turn is
W ∩ W ⊥ = {0}. 
Since the construction in the preceding proof associates a unique element in W,
a closed linear subspace, to each u ∈ H, it defines a map
(3.49) ΠW : H −→ W.
This map is linear, by the uniqueness since if ui = vi + wi , wi ∈ W, hvi , wi i = 0 are
the decompositions of two elements then
(3.50) λ1 u1 + λ2 u2 = (λ1 v1 + λ2 v2 ) + (λ1 w1 + λ2 w2 )
must be the corresponding decomposition. Moreover ΠW w = w for any w ∈ W
and kuk2 = kvk2 + kwk2 , Pythagoras’ Theorem, shows that
(3.51) Π2W = ΠW , kΠW uk ≤ kuk =⇒ kΠW k ≤ 1.
Thus, projection onto W is an operator of norm 1 (unless W = {0}) equal to its
own square. Such an operator is called a projection or sometimes an idempotent
(which sounds fancier).
Finite-dimensional subspaces are always closed by the Heine-Borel theorem.
Lemma 3.3. If {ej } is any finite or countable orthonormal set in a Hilbert space
then the orthogonal projection onto the closure of the span of these elements is
X
(3.52) Pu = hu, ek iek .

Proof. We know that the series in (3.52) converges and defines a bounded
linear operator of norm at most one by Bessel’s inequality. Clearly P 2 = P by the
same argument. If W is the closure of the span then (u−P u) ⊥ W since (u−P u) ⊥
10. RIESZ’ THEOREM 73

ek for each k and the inner product is continuous. Thus u = (u − P u) + P u is the


orthogonal decomposition with respect to W. 
Lemma 3.4. If W ⊂ H is a linear subspace of a Hilbert space which contains
the orthocomplement of a finite dimensional space then W is closed and W ⊥ is
finite-dimensional.
Proof. If U ⊂ W is a closed subspace with finite-dimensional orthocomple-
ment then each of the N elements, vi , of a basis of (Id −ΠU )W is the image of some
wi ∈ W. Since U is the null space of Id −ΠU it follows that any element of W can
be written uniquely in the form
N
X
(3.53) w =u+ ci vi , u = ΠU w ∈ U, ci = hw, vi i.
i=1
Then if φn is a sequence in W which converges in H it follows that ΠU φn converges
in U and hφn , vi i converges and hence the limit is in W. 
Note that the existence of a non-continuous linear functional H −→ C is equiv-
alent to the existence of a non-closed subspace of H with a one-dimensional comple-
ment. Namely the null space of a non-continuous linear functional cannot be closed,
since from this continuity follows, but it does have a one-dimensional complement
(not orthocomplement!)
Question 1. Does there exist a non-continuous linear functional on an infinite-
dimensional Hilbert space? 1

10. Riesz’ theorem


The most important application of the convexity result above is to prove Riesz’
representation theorem (for Hilbert space, there is another one to do with mea-
sures).
Theorem 3.3. If H is a Hilbert space then for any continuous linear functional
T : H −→ C there exists a unique element φ ∈ H such that
(3.54) T (u) = hu, φi ∀ u ∈ H.
Proof. If T is the zero functional then φ = 0 gives (3.54). Otherwise there
exists some u0 ∈ H such that T (u0 ) 6= 0 and then there is some u ∈ H, namely
u = u0 /T (u0 ) will work, such that T (u) = 1. Thus
(3.55) C = {u ∈ H; T (u) = 1} = T −1 ({1}) 6= ∅.
The continuity of T implies that C is closed, as the inverse image of a closed set
under a continuous map. Moreover C is convex since
(3.56) T ((u + u0 )/2) = (T (u) + T (u0 ))/2.
Thus, by Proposition 3.5, there exists an element v ∈ C of minimal length.
Notice that C = {v + w; w ∈ N } where N = T −1 ({0}) is the null space of T.
Thus, as in Proposition 3.6 above, v is orthogonal to N. In this case it is the unique
element orthogonal to N with T (v) = 1.
1The existence of such a functional requires some form of the Axiom of Choice (maybe a little
weaker in the separable case). You are free to believe that all linear functionals are continuous
but you will make your life difficult this way.
74 3. HILBERT SPACES

Now, for any u ∈ H,

(3.57) u − T (u)v satisfies T (u − T (u)v) = T (u) − T (u)T (v) = 0 =⇒


u = w + T (u)v, w ∈ N.
Then, hu, vi = T (u)kvk2 since hw, vi = 0. Thus if φ = v/kvk2 then
(3.58) u = w + hu, φiv =⇒ T (u) = hu, φiT (v) = hu, φi.


11. Adjoints of bounded operators


As an application of Riesz’ Theorem we can see that to any bounded linear
operator on a Hilbert space
(3.59) A : H −→ H, kAuk ≤ Ckuk ∀ u ∈ H
there corresponds a unique adjoint operator. This has profound consequences for
the theory of operators on a Hilbert space, as we shall see.
Proposition 3.7. For any bounded linear operator A : H −→ H on a Hilbert
space there is a unique bounded linear operator A∗ : H −→ H such that
(3.60) hAu, viH = hu, A∗ viH ∀ u, v ∈ H and kAk = kA∗ k.
Proof. To see the existence of A∗ v we need to work out what A∗ v ∈ H should
be for each fixed v ∈ H. So, fix v in the desired identity (3.60), which is to say
consider
(3.61) H 3 u −→ hAu, vi ∈ C.
This is a linear map and it is clearly bounded, since
(3.62) |hAu, vi| ≤ kAukkvk ≤ (kAkkvk)kuk.
Thus it is a continuous linear functional on H which depends on v. In fact it is just
the composite of two continuous linear maps
u7−→Au w7−→hw,vi
(3.63) H −→ H −→ C.
By Riesz’ theorem there is a unique element in H, which we can denote A∗ v (since
it only depends on v) such that
(3.64) hAu, vi = hu, A∗ vi ∀ u ∈ H.
This defines the map A∗ : H −→ H but we need to check that it is linear and
continuous. Linearity follows from the uniqueness part of Riesz’ theorem. Thus if
v1 , v2 ∈ H and c1 , c2 ∈ C then

(3.65) hAu, c1 v1 + c2 v2 i = c1 hAu, v1 i + c2 hAu, v2 i


= c1 hu, A∗ v1 i + c2 hu, A∗ v2 i = hu, c1 A∗ v2 + c2 A∗ v2 i
where we have used the definitions of A∗ v1 and A∗ v2 – by uniqueness we must have
A∗ (c1 v1 + c2 v2 ) = c1 A∗ v1 + c2 A∗ v2 .
Using the optimality of Cauchy’s inequality
(3.66) kA∗ vk = sup |hu, A∗ vi| = sup |hAu, vi| ≤ kAkkvk.
kuk=1 kuk=1
12. COMPACTNESS AND EQUI-SMALL TAILS 75

This shows that A∗ is bounded and that


(3.67) kA∗ k ≤ kAk.
The defining identity (3.60) also shows that (A∗ )∗ = A so the reverse equality
in (3.67) also holds and therefore
(3.68) kA∗ k = kAk.


One useful property of the adjoint operator is that


(3.69) Nul(A∗ ) = (Ran(A))⊥ .
Indeed w ∈ (Ran(A))⊥ means precisely that hw, Avi = 0 for all v ∈ H which
translates to
(3.70) w ∈ (Ran(A))⊥ ⇐⇒ hA∗ w, vi = 0 ⇐⇒ A∗ w = 0.
Note that in the finite dimensional case (3.69) is equivalent to Ran(A) = (Nul(A∗ ))⊥
but in the infinite dimensional case Ran(A) is often not closed in which case this
cannot be true and you can only be sure that
(3.71) Ran(A) = (Nul(A∗ ))⊥ .

12. Compactness and equi-small tails


A compact subset in a general metric space is one with the property that any
sequence in it has a convergent subsequence, with its limit in the set. You will recall,
with pleasure no doubt, the equivalence of this condition to the (more general since
it makes good sense in an arbitrary topological space) covering condition, that any
open cover of the set has a finite subcover. So, in a Hilbert space the notion of a
compact set is already fixed. We want to characterize it, actually in two closely
related ways.
In any metric space a compact set is both closed and bounded, so this must be
true in a Hilbert space. The Heine-Borel theorem gives a converse to this, for Rn
or Cn (and hence in any finite-dimensional normed space) any closed and bounded
set is compact. Also recall that the convergence of a sequence in Cn is equivalent
to the convergence of the n sequences given by its components and this is what is
used to pass first from R to C and then to Cn . All of this fails in infinite dimensions
and we need some condition in addition to being bounded and closed for a set to
be compact.
To see where this might come from, observe that
Lemma 3.5. In any metric space the set, S, consisting of the points of a con-
vergent sequence, together with its limit, is compact.
Proof. We show that S is compact by checking that any sequence in S has
a convergent subsequence. To see this, observe that a sequence {tj } in S either
has a subsequence converging to the limit, s, of the original sequence or it does
not. So we only need consider the latter case, but this means that, for some  > 0,
d(tj , s) > ; but then tj takes values in a finite set, since S \ B(s, ) is finite – hence
some value is repeated infinitely often and there is a convergent subsequence. 
76 3. HILBERT SPACES

Lemma 3.6. The image of a convergent sequence in a Hilbert space is a set with
equi-small tails with respect to any orthonormal sequence, i.e. if ek is an othonormal
sequence and un → u is a convergent sequence then given  > 0 there exists N such
that
X
(3.72) |hun , ek i|2 < 2 ∀ n.
k>N

Proof. Bessel’s inequality shows that for any u ∈ H,


X
(3.73) |hu, ek i|2 ≤ kuk2 .
k

The convergence of this series means that (3.72) can be arranged for any single
element un or the limit u by choosing N large enough, thus given  > 0 we can
choose N 0 so that
X
(3.74) |hu, ek i|2 < 2 /2.
k>N 0

Consider the closure of the subspace spanned by the ek with k > N. The
orthogonal projection onto this space (see Lemma 3.3) is
X
(3.75) PN u = hu, ek iek .
k>N

Then the convergence un → u implies the convergence in norm kPN un k → kPN uk,
so
X
(3.76) kPN un k2 = |hun , ek i|2 < 2 , n > n0 .
k>N

So, we have arranged (3.72) for n > n0 for some N. This estimate remains valid if
N is increased – since the tails get smaller – and we may arrange it for n ≤ n0 by
choosing N large enough. Thus indeed (3.72) holds for all n if N is chosen large
enough. 
This suggests one useful characterization of compact sets in a separable Hilbert
space since the equi-smallness of the tails, as in (3.72), for all u ∈ K just means
that the Fourier-Bessel series converges uniformly.
Proposition 3.8. A set K ⊂ H in a separable Hilbert space is compact if and
only if it is bounded, closed and the Fourier-Bessel sequence with respect to any
(one) complete orthonormal basis converges uniformly on it.
Proof. We already know that a compact set in a metric space is closed and
bounded. Suppose the equi-smallness of tails condition fails with respect to some
orthonormal basis ek . This means that for some  > 0 and all p there is an element
up ∈ K, such that
X
(3.77) |hup , ek i|2 ≥ 2 .
k>p

Consider the subsequence {up } generated this way. No subsequence of it can have
equi-small tails (recalling that the tail decreases with p). Thus, by Lemma 3.6,
it cannot have a convergent subsequence, so K cannot be compact if the equi-
smallness condition fails.
12. COMPACTNESS AND EQUI-SMALL TAILS 77

Thus we have proved the equi-smallness of tails condition to be necessary for


the compactness of a closed, bounded set. It remains to show that it is sufficient.
So, suppose K is closed, bounded and satisfies the equi-small tails condition
with respect to an orthonormal basis ek and {un } is a sequence in K. We only
need show that {un } has a Cauchy subsequence, since this will converge (H being
complete) and the limit will be in K (since it is closed). Consider each of the
sequences of coefficients hun , ek ) in C. Here k is fixed. This sequence is bounded:
(3.78) |hun , ek i| ≤ kun k ≤ C
by the boundedness of K. So, by the Heine-Borel theorem, there is a subsequence
unl such that hunl , ek i converges as l → ∞.
We can apply this argument for each k = 1, 2, . . . . First extract a subsequence
{un,1 } of {un } so that the sequence hun,1 , e1 i converges. Then extract a subsequence
un,2 of un,1 so that hun,2 , e2 i also converges. Then continue inductively. Now pass
to the ‘diagonal’ subsequence vn of {un } which has kth entry the kth term, uk,k in
the kth subsequence. It is ‘eventually’ a subsequence of each of the subsequences
previously constructed – meaning it coincides with a subsequence from some point
onward (namely the kth term onward for the kth subsquence). Thus, for this
subsequence each of the hvn , ek i converges.
Consider the identity (the orthonormal set ek is complete by assumption) for
the difference
X X
kvn − vn+l k2 = |hvn − vn+l , ek i|2 + |hvn − vn+l , ek i|2
k≤N k>N
(3.79) X X X
2
≤ |hvn − vn+l , ek i| + 2 |hvn , ek i|2 + 2 |hvn+l , ek i|2
k≤N k>N k>N

where the parallelogram law on C has been used. To make this sum less than 2
we may choose N so large that the last two terms are less than 2 /2 and this may
be done for all n and l by the equi-smallness of the tails. Now, choose n so large
that each of the terms in the first sum is less than 2 /2N, for all l > 0 using the
Cauchy condition on each of the finite number of sequence hvn , ek i. Thus, {vn } is
a Cauchy subsequence of {un } and hence as already noted convergent in K. Thus
K is indeed compact. 

This criterion for compactness is useful but is too closely tied to the existence
of an orthonormal basis to be easily applicable. However the condition can be
restated in a way that holds even in the non-separable case (and of course in the
finite-dimensional case, where it is trivial).
Proposition 3.9. A subset K ⊂ H of a Hilbert space is compact if and only if
it is closed and bounded and for every  > 0 there is a finite-dimensional subspace
W ⊂ H such that
(3.80) sup inf ku − wk < .
u∈K w∈W

So we see that the extra condition needed is ‘finite-dimensional approximability’.

Proof. Before proceeding to the proof consider (3.80). Since W is finite-


dimensional we know it is closed and hence the discussion in §9 above applies. In
78 3. HILBERT SPACES

particular u = w + w⊥ with w ∈ W and w⊥ ⊥ W where


(3.81) inf ku − wk = kw⊥ k.
w∈W

This can be restated in the form


(3.82) sup k(Id −ΠW )uk < 
u∈K

where ΠW is the orthogonal projection onto W (so Id −ΠW is the orthogonal pro-
jection onto W ⊥ ).
Now, let us first assume that H is separable, so we already have a condition
for compactness in Proposition 3.8. Then if K is compact we can consider an
orthonormal basis of H and the finite-dimensional spaces WN spanned by the first N
elements in the basis with ΠN the orthogonal projection onto it. Then k(Id −ΠN )uk
is precisely the length of the ‘tail’ of u with respect to the basis. So indeed, by
Proposition 3.8, given  > 0 there is an N such that k(Id −ΠN )uk < /2 for all
u ∈ K and hence (3.82) holds for W = WN .
Now suppose that K ⊂ H and for each  > 0 we can find a finite dimensional
subspace W such that (3.82) holds. Take a sequence {un } in K. The sequence
ΠW un ∈ W is bounded in a finite-dimensional space so has a convergent sub-
sequence. Now, for each j ∈ N there is a finite-dimensional subspace Wj (not
necessarily corresponding to an orthonormal basis) so that (3.82) holds for  = 1/j.
Proceeding as above, we can find successive subsequence of un such that the image
under Πj in Wj converges for each j. Passing to the diagonal subsequence unl it
follows that Πj uni converges for each j since it is eventually a subsequence of the
jth choice of subsequence above. Now, the triangle inequality shows that
(3.83) kuni − unk k ≤ kΠj (uni − unk )kWj + k(Id −Πj )uni k + k(Id −Πj )unk k.
Given  > 0 first choose j so large that the last two terms are each less than
1/j < /3 using the choice of Wj . Then if i, k > N is large enough the first term
on the right in (3.83) is also less than /3 by the convergence of Πj uni . Thus uni
is Cauchy in H and hence converges and it follows that K is compact.
This converse argument does not require the separability of H so to complete
the proof we only need to show the necessity of (3.81) in the non-separable case.
Thus suppose K is compact. Then K itself is separable – has a countable dense
subset – using the finite covering property (for each p > 0 there are finitely many
balls of radius 1/p which cover K so take the set consisting of all the centers for
all p). It follows that the closure of the span of K, the finite linear combinations of
elements of K, is a separable Hilbert subspace of H which contains K. Thus any
compact subset of a non-separable Hilbert space is contained in a separable Hilbert
subspace and hence (3.80) holds. 

13. Finite rank operators


Now, we need to starting thinking a little more seriously about operators on
a Hilbert space, remember that an operator is just a continuous linear map T :
H −→ H and the space of them (a Banach space) is denoted B(H) (rather than the
more cumbersome B(H, H) which is needed when the domain and target spaces are
different).
13. FINITE RANK OPERATORS 79

Definition 3.4. An operator T ∈ B(H) is of finite rank if its range has fi-
nite dimension (and that dimension is called the rank of T ); the set of finite rank
operators will be denoted R(H).
Why not F(H)? Because we want to use this for the Fredholm operators.
Clearly the sum of two operators of finite rank has finite rank, since the range
is contained in the sum of the ranges (but is often smaller):
(3.84) (T1 + T2 )u ∈ Ran(T1 ) + Ran(T2 ) ∀ u ∈ H.
Since the range of a constant multiple of T is contained in the range of T it follows
that the finite rank operators form a linear subspace of B(H).
What does a finite rank operator look like? It really looks like a matrix.
Lemma 3.7. If T : H −→ H has finite rank then there is a finite orthonormal
set {ek }L
k=1 in H and constants cij ∈ C such that
L
X
(3.85) Tu = cij hu, ej iei .
i,j=1

Proof. By definition, the range of T, R = T (H) is a finite dimensional sub-


space. So, it has a basis which we can diagonalize in H to get an orthonormal basis,
ei , i = 1, . . . , p. Now, since this is a basis of the range, T u can be expanded relative
to it for any u ∈ H :
Xp
(3.86) Tu = hT u, ei iei .
i=1
On the other hand, the map u −→ hT u, ei i is a continuous linear functional on H,
so hT u, ei i = hu, vi i for some vi ∈ H; notice in fact that vi = T ∗ ei . This means the
formula (3.86) becomes
p
X
(3.87) Tu = hu, vi iei .
i=1
Now, the Gram-Schmidt procedure can be applied to orthonormalize the sequence
e1 , . . . , ep , v1 . . . , vp resulting in e1 , . . . , eL . This means that each vi is a linear
combination which we can write as
X L
(3.88) vi = cij ej .
j=1

Inserting this into (3.87) gives (3.85) (where the constants for i > p are zero). 
It is clear that
(3.89) B ∈ B(H) and T ∈ R(H) then BT ∈ R(H).
Indeed, the range of BT is the range of B restricted to the range of T and this is
certainly finite dimensional since it is spanned by the image of a basis of Ran(T ).
Similalry T B ∈ R(H) since the range of T B is contained in the range of T. Thus
we have in fact proved most of
Proposition 3.10. The finite rank operators form a ∗-closed two-sided ideal
in B(H), which is to say a linear subspace such that
(3.90) B1 , B2 ∈ B(H), T ∈ R(H) =⇒ B1 T B2 , T ∗ ∈ R(H).
80 3. HILBERT SPACES

Proof. It is only left to show that T ∗ is of finite rank if T is, but this is an
immediate consequence of Lemma 3.7 since if T is given by (3.85) then
N
X
(3.91) T ∗u = cij hu, ei iej
i,j=1

is also of finite rank. 

Lemma 3.8 (Row rank=Colum rank). For any finite rank operator on a Hilbert
space, the dimension of the range of T is equal to the dimension of the range of T ∗ .
Proof. From the formula (3.87) for a finite rank operator, it follows that the
vi , i = 1, . . . , p must be linearly independent – since the ei form a basis for the
range and a linear relation between the vi would show the range had dimension less
than p. Thus in fact the null space of T is precisely the orthocomplement of the
span of the vi – the space of vectors orthogonal to each vi . Since
p
X
hT u, wi = hu, vi ihei , wi =⇒
i=1
p
X
(3.92) hw, T ui = hvi , uihw, ei i =⇒
i=1
p
X

T w= hw, ei ivi
i=1

the range of T ∗ is the span of the vi , so is also of dimension p. 

14. Compact operators


Definition 3.5. An element K ∈ B(H), the bounded operators on a separable
Hilbert space, is said to be compact (the old terminology was ‘totally bounded’
or ‘completely continuous’) if the image of the unit ball is precompact, i.e. has
compact closure – that is if the closure of K{u ∈ H; kukH ≤ 1} is compact in H.
Notice that in a metric space, to say that a set has compact closure is the same
as saying it is contained in a compact set; such a set is said to be precompact.
Proposition 3.11. An operator K ∈ B(H), bounded on a separable Hilbert
space, is compact if and only if it is the limit of a norm-convergent sequence of
finite rank operators.
Proof. So, we need to show that a compact operator is the limit of a conver-
gent sequence of finite rank operators. To do this we use the characterizations of
compact subsets of a separable Hilbert space discussed earlier. Namely, if {ei } is
an orthonormal basis of H then a subset I ⊂ H is compact if and only if it is closed
and bounded and has equi-small tails with respect to {ei }, meaning given  > 0
there exits N such that
X
(3.93) |hv, ei i|2 < 2 ∀ v ∈ I.
i>N

Now we shall apply this to the set K(B(0, 1)) where we assume that K is
compact (as an operator, don’t be confused by the double usage, in the end it turns
14. COMPACT OPERATORS 81

out to be constructive) – so this set is contained in a compact set. Hence (3.93)


applies to it. Namely this means that for any  > 0 there exists n such that
X
(3.94) |hKu, ei i|2 < 2 ∀ u ∈ H, kukH ≤ 1.
i>n

For each n consider the first part of these sequences and define
X
(3.95) Kn u = hKu, ei iei .
k≤n

This is clearly a linear operator and has finite rank – since its range is contained in
the span of the first n elements of {ei }. Since this is an orthonormal basis,
X
(3.96) kKu − Kn uk2H = |hKu, ei i|2
i>n

Thus (3.94) shows that kKu − Kn ukH ≤ . Now, increasing n makes kKu − Kn uk
smaller, so given  > 0 there exists n such that for all N ≥ n,
(3.97) kK − KN kB = sup kKu − Kn ukH ≤ .
kuk≤1

Thus indeed, Kn → K in norm and we have shown that the compact operators are
contained in the norm closure of the finite rank operators.
For the converse we assume that Tn → K is a norm convergent sequence in
B(H) where each of the Tn is of finite rank – of course we know nothing about the
rank except that it is finite. We want to conclude that K is compact, so we need to
show that K(B(0, 1)) is precompact. It is certainly bounded, by the norm of K. Let
Wn = Tn H be the range of Tn . By definition it is a finite dimensional subspace and
hence closed. Let Πn be the orthogonal projection onto Wn , so Id −Πn is projection
onto Wn⊥ . Thus the composite (Id −Πn )Tn = 0 and hence
(3.98) (Id −Πn )K = (Id −Πn )(K − Tn ) =⇒ k(Id −Πn )Kk → 0 as n → ∞.
So, for any  > 0 there exists n such that
(3.99) sup inf kKu − wk ≤ sup k(Id −Πn )Kuk < 
u∈B(0,1) w∈Wn kuk≤1

and it follows from Proposition 3.9 that K(B(0, 1)) is precompact and hence K is
compact. 
Proposition 3.12. For any separable Hilbert space, the compact operators form
a closed and ∗-closed two-sided ideal in B(H).
Proof. In any metric space (applied to B(H)) the closure of a set is closed,
so the compact operators are closed being the closure of the finite rank operators.
Similarly the fact that it is closed under passage to adjoints follows from the same
fact for finite rank operators. The ideal properties also follow from the correspond-
ing properties for the finite rank operators, or we can prove them directly anyway.
Namely if B is bounded and T is compact then for some c > 0 (namely 1/kBk
unless it is zero) cB maps B(0, 1) into itself. Thus cT B = T cB is compact since
the image of the unit ball under it is contained in the image of the unit ball under
T ; hence T B is also compact. Similarly BT is compact since B is continuous and
then
(3.100) BT (B(0, 1)) ⊂ B(T (B(0, 1))) is compact
82 3. HILBERT SPACES

since it is the image under a continuous map of a compact set. 

15. Weak convergence


It is convenient to formalize the idea that a sequence be bounded and that each
of the hun , ek i, the sequence of coefficients of some particular Fourier-Bessel series,
should converge.
Definition 3.6. A sequence, {un }, in a Hilbert space, H, is said to converge
weakly to an element u ∈ H if it is bounded in norm and huj , vi → hu, vi converges
in C for each v ∈ H. This relationship is written
(3.101) un * u.
In fact as we shall see below, the assumption that kun k is bounded and that u
exists are both unnecessary. That is, a sequence converges weakly if and only if
hun , vi converges in C for each v ∈ H. Conversely, there is no harm in assuming
it is bounded and that the ‘weak limit’ u ∈ H exists. Note that the weak limit is
unique since if u and u0 both have this property then hu − u0 , vi = limn→∞ hun , vi −
limn→∞ hun , vi = 0 for all v ∈ H and setting v = u − u0 it follows that u = u0 .
Lemma 3.9. A (strongly) convergent sequence is weakly convergent with the
same limit.
Proof. This is the continuity of the inner product. If un → u then
(3.102) |hun , vi − hu, vi| ≤ kun − ukkvk → 0
for each v ∈ H shows weak convergence. 
Lemma 3.10. For a bounded sequence in a separable Hilbert space, weak con-
vergence is equivalent to component convergence with respect to an orthonormal
basis.
Proof. Let ek be an orthonormal basis. Then if un is weakly convergent
it follows immediately that hun , ek i → hu, ek i converges for each k. Conversely,
suppose this is true for a bounded sequence, just that hun , ek i → ck in C for each
k. The norm boundedness and Bessel’s inequality show that
X X
(3.103) |ck |2 = lim |hun , ek i|2 ≤ sup kun k2
n→∞ n
k≤p k≤p

for all p. Thus in fact {ck } ∈ l2 and hence


X
(3.104) u= ck ek ∈ H
k
by the completeness of H. Clearly hun , ek i → hu, ek i for each k. It remains to show
that hun , vi → hu, vi for all v ∈ H. This is certainly true for any finite linear
combination of the ek and for a general v we can write
(3.105) hun , vi − hu, vi = hun , vp i − hu, vp i + hun , v − vp i − hu, v − vp i =⇒
|hun , vi − hu, vi| ≤ |hun , vp i − hu, vp i| + 2Ckv − vp k
P
where vp = hv, ek iek is a finite part of the Fourier-Bessel series for v and C is a
k≤p
bound for kun k. Now the convergence vp → v implies that the last term in (3.105)
can be made small by choosing p large, independent of n. Then the second last term
15. WEAK CONVERGENCE 83

can be made small by choosing n large since vp is a finite linear combination of the
ek . Thus indeed, hun , vi → hu, vi for all v ∈ H and it follows that un converges
weakly to u. 
Proposition 3.13. Any bounded sequence {un } in a separable Hilbert space
has a weakly convergent subsequence.
This can be thought of as different extension to infinite dimensions of the Heine-
Borel theorem. As opposed to the characterization of compact sets above, which
involves adding the extra condition of finite-dimensional approximability, here we
weaken the notion of convergence.
Proof. Choose an orthonormal basis {ek } and apply the procedure in the
proof of Proposition 3.8 to it. Thus, we may extract successive subsequence along
the kth of which hunp , ek i → ck ∈ C. Passing to the diagonal subsequence, vn , which
is eventually a subsequence of each of these ensures that hvn , ek i → ck for each k.
Now apply the preceeding Lemma to conclude that this subsequence converges
weakly. 
Lemma 3.11. For a weakly convergent sequence un * u
(3.106) kuk ≤ lim inf kun k
and a weakly convergent sequence converges strongly if and only if the weak limit
satisfies kuk = limn→∞ kun k.
Proof. Choose an orthonormal basis ek and observe that
X X
(3.107) |hu, ek i|2 = lim |hun , ek i|2 .
n→∞
k≤p k≤p
2
The sum on the right is bounded by kun k independently of p so
X
(3.108) |hu, ek i|2 ≤ lim inf kun k2
n
k≤p

by the definition of lim inf . Then let p → ∞ to conclude that


(3.109) kuk2 ≤ lim inf kun k2
n

from which (3.106) follows.


Now, suppose un * u then
(3.110) ku − un k2 = kuk2 − 2 Rehu, un i + kun k2 .
Weak convergence implies that the middle term converges to −2kuk2 so if the last
term converges to kuk2 then u → un . 
Observe that for any A ∈ B(H), if un * u then Aun * Au using the existence
of the adjoint:-
(3.111) hAun , vi = hun , A∗ vi → hu, A∗ vi = hAu, vi ∀ v ∈ H.
Lemma 3.12. An operator K ∈ B(H) is compact if and only if the image Kun
of any weakly convergent sequence {un } in H is strongly, i.e. norm, convergent.
This is the origin of the old name ‘completely continuous’ for compact operators,
since they turn even weakly convergent into strongly convergent sequences.
84 3. HILBERT SPACES

Proof. First suppose that un * u is a weakly convergent sequence in H and


that K is compact. We know that kun k < C is bounded so the sequence Kun
is contained in CK(B(0, 1)) and hence in a compact set (clearly if D is compact
then so is cD for any constant c.) Thus, any subsequence of Kun has a convergent
subsequence and the limit is necessarily Ku since Kun * Ku. But the condition
on a sequence in a metric space that every subsequence of it has a subsequence
which converges to a fixed limit implies convergence. (If you don’t remember this,
reconstruct the proof: To say a sequence vn does not converge to v is to say that
for some  > 0 there is a subsequence along which d(vnk , v) ≥ . This is impossible
given the subsequence of subsequence condition converging to the fixed limit v.)
Conversely, suppose that K has this property of turning weakly convergent into
strongly convergent sequences. We want to show that K(B(0, 1)) has compact clo-
sure. This just means that any sequence in K(B(0, 1)) has a (strongly) convergent
subsequence – where we do not have to worry about whether the limit is in the set
or not. Such a sequence is of the form Kun where un is a sequence in B(0, 1). How-
ever we know that we can pass to a subsequence which converges weakly, unj * u.
Then, by the assumption of the Lemma, Kunj → Ku converges strongly. Thus
un does indeed have a convergent subsequence and hence K(B(0, 1)) must have
compact closure. 

As noted above, it is not really necessary to assume that a weakly convergent


sequence in a Hilbert space is bounded, provided one has the Uniform Boundedness
Principle, Theorem 1.3, at the ready.
Proposition 3.14. If un ∈ H is a sequence in a Hilbert space and for all
v∈H
(3.112) hun , vi → F (v) converges in C
then kun kH is bounded and there exists w ∈ H such that un * w.
Proof. Apply the Uniform Boundedness Theorem to the continuous function-
als
(3.113) Tn (u) = hu, un i, Tn : H −→ C
where we reverse the order to make them linear rather than anti-linear. Thus, each
set |Tn (u)| is bounded in C since it is convergent. It follows from the Uniform
Boundedness Principle that there is a bound
(3.114) kTn k ≤ C.
However, this norm as a functional is just kTn k = kun kH so the original sequence
must be bounded in H. Define T : H −→ C as the limit for each u :
(3.115) T (u) = lim Tn (u) = lim hu, un i.
n→∞ n→∞

This exists for each u by hypothesis. It is a linear map and from (3.114) it is
bounded, kT k ≤ C. Thus by the Riesz Representation theorem, there exists w ∈ H
such that
(3.116) T (u) = hu, w) ∀ u ∈ H.
Thus hun , ui → hw, ui for all u ∈ H so un * w as claimed. 
16. THE ALGEBRA B(H) 85

16. The algebra B(H)


Recall the basic properties of the Banach space, and algebra, of bounded oper-
ators B(H) on a separable Hilbert space H. In particular that it is a Banach space
with respect to the norm
(3.117) kAk = sup kAukH
kukH =1

and that the norm satisfies


(3.118) kABk ≤ kAkkBk
as follows from the fact that
kABuk ≤ kAkkBuk ≤ kAkkBkkuk.
Consider the set of invertible elements:
(3.119) GL(H) = {A ∈ B(H); ∃ B ∈ B(H), BA = AB = Id}.
Note that this is equivalent to saying A is 1-1 and onto in view of the Open Mapping
Theorem, Theorem 1.4.
This set is open, to see this consider a neighbourhood of the identity.
Lemma 3.13. If A ∈ B(H) and kAk < 1 then
(3.120) Id −A ∈ GL(H).
Proof. This follows from the convergence of the Neumann series. If kAk < 1
then kAj k ≤ kAkj , from (3.118), and it follows that
X∞
(3.121) B= Aj
j=0

(where A0 = Id by definition) is absolutely summable in B(H) since kAj k con-
P
j=0
verges. Since B(H) is a Banach space, the sum converges. Moreover by the conti-
nuity of the product with respect to the norm
n
X n+1
X
(3.122) AB = A lim Aj = lim Aj = B − Id
n→∞ n→∞
j=0 j=1

and similalry BA = B − Id . Thus (Id −A)B = B(Id −A) = Id shows that B is a


(and hence the) 2-sided inverse of Id −A. 
Proposition 3.15. The invertible elements form an open subset GL(H) ⊂
B(H).
Proof. Suppose G ∈ GL(H), meaning it has a two-sided (and unique) inverse
G−1 ∈ B(H) :
(3.123) G−1 G = GG−1 = Id .
Then we wish to show that B(G; ) ⊂ GL(H) for some  > 0. In fact we shall see
that we can take  = kG−1 k−1 . To show that G + B is invertible set
(3.124) E = −G−1 B =⇒ G + B = G(Id +G−1 B) = G(Id −E)
From Lemma 3.13 we know that
(3.125) kBk < 1/kG−1 k =⇒ kG−1 Bk < 1 =⇒ Id −E is invertible.
86 3. HILBERT SPACES

Then (Id −E)−1 G−1 satisfies


(3.126) (Id −E)−1 G−1 (G + B) = (Id −E)−1 (Id −E) = Id .
Moreover E 0 = −BG−1 also satisfies kE 0 k ≤ kBkkG−1 k < 1 and
(3.127) (G + B)G−1 (Id −E 0 )−1 = (Id −E 0 )(Id −E 0 )−1 = Id .
Thus G + B has both a ‘left’ and a ‘right’ inverse. The associativity of the operator
product (that A(BC) = (AB)C) then shows that
(3.128) G−1 (Id −E 0 )−1 = (Id −E)−1 G−1 (G+B)G−1 (Id −E 0 )−1 = (Id −E)−1 G−1
so the left and right inverses are equal and hence G + B is invertible. 

Thus GL(H) ⊂ B(H), the set of invertible elements, is open. It is also a group
– since the inverse of G1 G2 if G1 , G2 ∈ GL(H) is G−1 −1
2 G1 .
This group of invertible elements has a smaller subgroup, U(H), the unitary
group, defined by
(3.129) U(H) = {U ∈ GL(H); U −1 = U ∗ }.
The unitary group consists of the linear isometric isomorphisms of H onto itself –
thus
(3.130) hU u, U vi = hu, vi, kU uk = kuk ∀ u, v ∈ H, U ∈ U(H).
This is an important object and we will use it a little bit later on.
The groups GL(H) and U(H) for a separable Hilbert space may seem very
similar to the familiar groups of invertible and unitary n × n matrices, GL(n) and
U(n), but this is somewhat deceptive. For one thing they are much bigger. In
fact there are other important qualitative differences. One important fact that
you should know, and there is a proof towards the end of this chapter, is that
both GL(H) and U(H) are contractible as metric spaces – they have no significant
topology. This is to be constrasted with the GL(n) and U(n) which have a lot of
topology, and are not at all simple spaces – especially for large n. One upshot of
this is that U(H) does not look much like the limit of the U(n) as n → ∞. In fact
there is another group which is essentially the large n limit of the U(n), namely
(3.131) U−∞ (H) = {Id +K ∈ U(H); K ∈ K(H)}.
It does have lots of interesting (and useful) topology.
Another important fact that we will discuss below is that GL(H) is not dense in
B(H), in contrast to the finite dimensional case. In other words there are operators
which are not invertible and cannot be made invertible by small perturbations.

17. Spectrum of an operator


Another direct application of Lemma 3.13, the convergence of the Neumann se-
ries, is that if A ∈ B(H) and λ ∈ C has |λ| > kAk then kλ−1 Ak < 1 so (Id −λ−1 A)−1
exists and satisfies
(3.132) (λ Id −A)λ−1 (Id −λ−1 A)−1 = Id = λ−1 (Id −λ−1 A)−1 (λ − A).
Thus, λ Id −A ∈ GL(H), which we usually abbreviate to λ − A, has inverse (λ −
A)−1 = λ−1 (Id −λ−1 A)−1 . The set of λ for which this operator is invertible is called
17. SPECTRUM OF AN OPERATOR 87

the resolvent set and we have shown


Res(A) = {λ ∈ C; (λ Id −A) ∈ GL(H)} ⊂ C
(3.133)
{|λ| > kAk} ⊂ Res(A).
From the discussion above, it is an open, and non-empty, set on which (A−λ)−1 ,
called the resolvent of A, is defined. The complement of the resolvent set is called
the spectrum of A
(3.134) Spec(A) = {λ ∈ C; λ Id −A ∈
/ GL(H)} ⊂ {λ ∈ C; |λ| ≤ kAk}.
As follows from the discussion above it is a compact set – in fact it cannot
be empty. This is quite easy to see if you know a little complex analysis since it
follows from Liouville’s Theorem. One way to show that λ ∈ Spec(A) is to check
that λ − A is not injective, since then it cannot be invertible. This means precisely
that λ is an eigenvalue of A :
(3.135) ∃ 0 6= u ∈ H s.t. Au = λu.
However, you should strongly resist the temptation to think that the spectrum is
the set of eigenvalues of A, this is sometimes but by no means always true. The
other way to show that λ ∈ Spec(A) is to prove that λ − A is not surjective. Note
that by the Open Mapping Theorem if λ − A is both surjective and injective then
λ ∈ Res(A).
For a finite rank operator the spectrum does consist of the set of eigenvalues.
For a bounded self-adjoint operator we can say more quite a bit more.
Proposition 3.16. If A : H −→ H is a bounded operator on a Hilbert space
and A∗ = A then A − λ Id is invertible for all λ ∈ C \ [−kAk, kAk] and conversely
at least one of A − kAk Id and A + kAk Id is not invertible.
The proof of this depends on a different characterization of the norm in the
self-adjoint case.
Lemma 3.14. If A∗ = A ∈ B(H) then
(3.136) kAk = sup |hAu, ui|.
kuk=1

Proof. Certainly, |hAu, ui| ≤ kAkkuk2 so the right side can only be smaller
than or equal to the left. Set
a = sup |hAu, ui| ≤ kAk.
kuk=1

Then for any u, v ∈ H, |hAu, vi| = hAeiθ u, vi for some θ ∈ [0, 2π), so we can arrange
that hAu, vi = |hAu0 , vi| is non-negative and ku0 k = 1 = kuk = kvk. Dropping the
primes and computing using the polarization identity

(3.137) 4hAu, vi
= hA(u + v), u + vi − hA(u − v), u − vi + ihA(u + iv), u + ivi − ihA(u − iv), u − ivi.
By the reality of the left side we can drop the last two terms and use the bound
|hAw, wi| ≤ akwkw on the first two to see that
(3.138) 4hAu, vi ≤ a(ku + vk2 + ku − vk2 ) = 2a(kuk2 + kvk2 ) = 4a
Thus, kAk = supkuk=kvk=1 |hAu, vi| ≤ a and hence kAk = a. 
88 3. HILBERT SPACES

This suggests an improvement on the last part of the statement of Proposi-


tion 3.16, namely
If A∗ = A ∈ B(H) then a− , a+ ∈ Spec(A) and Spec(A) ⊂ [a− , a+ ]
(3.139) where a− = inf hAu, ui, a+ = sup hAu, ui.
kuk=1 kuk=1

Observe that Lemma 3.14 shows that kAk = max(a+ , −a− ).


Proof of Proposition 3.16. First we show that if A∗ = A then Spec(A) ⊂
R. Thus we need to show that if λ = s + it where t 6= 0 then A − λ is invertible.
Now A − λ = (A − s) − it and A − s is bounded and selfadjoint, so it is enough to
consider the special case that λ = it. Then for any u ∈ H,
(3.140) Imh(A − it)u, ui = −tkuk2 .
So, certainly A − it is injective, since (A − it)u = 0 implies u = 0 if t 6= 0. The
adjoint of A − it is A + it so the adjoint is injective too. It follows from (3.71) that
the range of A − it is dense in H. By this density of the range, if w ∈ H there exists
a sequence un ∈ H with wn = (A − it)un → w. So again we find that
(3.141) | Imh(A − it)hun − um ), (un − um )i| = |t|kun − um k2
= | Imh(wn − wm ), (un − um )i| ≤ kwn − wm kkun − um k
1
=⇒ kun − um k ≤ kwn − wm k.
|t|
Since wn → w it is a Cauchy sequence and hence un is Cauchy so by completeness,
un → u and hence (A − it)u = w. Thus A − it is 1-1 and onto and kA−1 k ≤ 1/|t|.
So we have shown that Spec(A) ⊂ R.
We already know that Spec(A) ⊂ {z ∈ C; |z| ≤ kAk} so finally then we need to
show that one of A ± kAk Id is NOT invertible. This follows from (3.136). Indeed,
by the definition of sup there is a sequence un ∈ H with kun k = 1 such that
either hAun , un i → kAk or hAun , un i → −kAk. Assume we are in the first case, so
hAun , un i → kAk. Then
k(A − kAk)un k2 = kAun k2 − 2kAkhAun , un i + kAk2 kun k2
(3.142)
≤ 2kAk2 − 2kAkhAun , un i → 0.
Since the sequence is positive it follows that k(A − kAk)un k → 0. This means that
A − kAk Id is not invertible, since if it had a bounded inverse B then 1 = kun k ≤
kBkk(A − kAk)un k which is impossible. In the other case it follows similarly that
A + kAk is not invertible, or one can replace A by −A and use the same argument.
So one of A ± kAk is not invertible. 
Only slight modifications of this proof are needed to give (3.139) which we
restate in a slightly different form.
Lemma 3.15. If A = A∗ ∈ B(H) then
(3.143) Spec(A) ⊂ [α−, α+ ] ⇐⇒ α− ≤ hAu, ui ≤ α+ ∀ u ∈ H, kuk = 1.
Proof. Take a± to be defined as in (3.139) then set b = (a+ − a− )/2 and
consider B = A − b Id which is self-adjoint and clearly satisfies
(3.144) sup |hBu, ui| = b
kuk=1
18. SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS 89

Thus kBk = b and Spec(B) ⊂ [−b, b] and the argument in the proof above shows
that both end-points are in the spectrum. It follows that
(3.145) {a− } ∪ {a+ } ⊂ Spec(A) ⊂ [a− , a+ ]
from which the statement follows. 
In particular if A = A∗ then
(3.146) Spec(A) ⊂ [0, ∞) ⇐⇒ hAu, ui ≥ 0.

18. Spectral theorem for compact self-adjoint operators


One of the important differences between a general bounded self-adjoint op-
erator and a compact self-adjoint operator is that the latter has eigenvalues and
eigenvectors – lots of them.
Theorem 3.4. If A ∈ K(H) is a self-adjoint, compact operator on a separable
Hilbert space, so A∗ = A, then H has an orthonormal basis consisting of eigenvec-
tors of A, uj such that
(3.147) Auj = λj uj , λj ∈ R \ {0},
combining an orthonormal basis for the possibly infinite-dimensional (closed) null
space and eigenvectors with non-zero eigenvalues which can be arranged into a se-
quence such that |λj | is non-increasing and λj → 0 as j → ∞ (in case Nul(A)⊥ is
finite dimensional, this sequence is finite).
The operator A maps Nul(A)⊥ into itself so it may be clearer to first split off the null
space and then look at the operator acting on Nul(A)⊥ which has an orthonormal
basis of eigenvectors with non-vanishing eigenvalues.
Before going to the proof, let’s notice some useful conclusions. One is that we
have ‘Fredholm’s alternative’ in this case.
Corollary 3.3. If A ∈ K(H) is a compact self-adjoint operator on a separable
Hilbert space then the equation
(3.148) u − Au = f
either has a unique solution for each f ∈ H or else there is a non-trivial finite
dimensional space of solutions to
(3.149) u − Au = 0
and then (3.148) has a solution if and only if f is orthogonal to all these solutions.
Proof. This is just saying that the null space of Id −A is a complement to
the range – which is closed. So, either Id −A is invertible or if not then the range
is precisely the orthocomplement of Nul(Id −A). You might say there is not much
alternative from this point of view, since it just says the range is always the ortho-
complement of the null space. 
Let me separate off the heart of the argument from the bookkeeping.
Lemma 3.16. If A ∈ K(H) is a self-adjoint compact operator on a separable
(possibly finite-dimensional) Hilbert space then
(3.150) F (u) = hAu, ui, F : {u ∈ H; kuk = 1} −→ R
90 3. HILBERT SPACES

is a continuous function on the unit sphere which attains its supremum and infimum
where
(3.151) sup |F (u)| = kAk.
kuk=1

Furthermore, if the maximum or minimum of F (u) is non-zero it is attained at an


eivenvector of A with this extremal value as eigenvalue.
Proof. Since |F (u)| is the function considered in (3.136), (3.151) is a direct
consequence of Lemma 3.14. Moreover, continuity of F follows from continuity of
A and of the inner product so
(3.152) |F (u)−F (u0 )| ≤ |hAu, ui−hAu, u0 i|+|hAu, u0 i−hAu0 , u0 i| ≤ 2kAkku−u0 k
since both u and u0 have norm one.
If we were in finite dimensions this almost finishes the proof, since the sphere
is then compact and a continuous function on a compact set attains its supremum
and infimum. In the general case we need to use the compactness of A. Certainly
F is bounded,
(3.153) |F (u)| ≤ sup |hAu, ui| ≤ kAk.
kuk=1

Thus, there is a sequence u+
nsuch that F (u+n ) → sup F and another un such that

F (un ) → inf F. The properties of weak convergence mean that we can pass to a
weakly convergent subsequence in each case, and so assume that u± ±
n * u converges
±
weakly; then ku k ≤ 1 by the properties of weak convergence. The compactness of
A means that Au± ±
n → Au converges strongly, i.e. in norm. But then we can write

(3.154) |F (u± ± ± ± ± ± ± ±
n ) − F (u )| ≤ |hA(un − u ), un i| + |hAu , un − u i|

= |hA(u± ± ± ± ± ± ± ±
n − u ), un i| + |hu , A(un − u )i| ≤ 2kAun − Au k

to deduce that F (u± ) = lim F (u±


n ) are respectively the supremum and infimum of
F. Thus indeed, as in the finite dimensional case, the supremum and infimum are
attained, and hence are the max and min. Note that this is not typically true if A
is not compact as well as self-adjoint.
Now, suppose that Λ+ = sup F > 0. Then for any v ∈ H with v ⊥ u+ and
kvk = 1, the curve
(3.155) Lv : (−π, π) 3 θ 7−→ cos θu+ + sin θv
lies in the unit sphere. Expanding out

(3.156) F (Lv (θ)) =


hALv (θ), Lv (θ)i = cos2 θF (u+ ) + sin(2θ) Re(Au+ , v) + sin2 (θ)F (v)
we know that this function must take its maximum at θ = 0. The derivative there
(it is certainly continuously differentiable on (−π, π)) is 2 RehAu+ , vi which must
therefore vanish. The same is true for iv in place of v so in fact
(3.157) hAu+ , vi = 0 ∀ v ⊥ u+ , kvk = 1.
Taking the span of these v’s it follows that hAu+ , vi = 0 for all v ⊥ u+ so Au+
must be a multiple of u+ itself. Inserting this into the definition of F it follows
that Au+ = Λ+ u+ is an eigenvector with eigenvalue Λ+ = sup F.
18. SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS 91

The same argument applies to inf F if it is negative, for instance by replacing


A by −A. This completes the proof of the Lemma. 

Proof of Theorem 3.4. First consider the Hilbert space H0 = Nul(A)⊥ ⊂


H. Then, as noted above, A maps H0 into itself, since
(3.158) hAu, vi = hu, Avi = 0 ∀ u ∈ H0 , v ∈ Nul(A) =⇒ Au ∈ H0 .
Moreover, A0 , which is A restricted to H0 , is again a compact self-adjoint operator
– where the compactness follows from the fact that A(B(0, 1)) for B(0, 1) ⊂ H0 is
smaller than (actually of course equal to) the whole image of the unit ball.
Thus we can apply the Lemma above to A0 , with quadratic form F0 , and find
an eigenvector. Let’s agree to take the one associated to sup F0 unless sup F0 <
− inf F0 in which case we take one associated to the inf . Now, what can go wrong
here? Nothing except if F0 ≡ 0. However in that case we know from Lemma 3.14
that kAk = 0 so A = 0.
So, we now know that we can find an eigenvector with non-zero eigenvalue
unless A ≡ 0 which would implies Nul(A) = H. Now we proceed by induction.
Suppose we have found N mutually orthogonal eigenvectors ej for A all with norm
1 and eigenvectors λj – an orthonormal set of eigenvectors and all in H0 . Then we
consider
(3.159) HN = {u ∈ H0 = Nul(A)⊥ ; hu, ej i = 0, j = 1, . . . , N }.
From the argument above, A maps HN into itself, since
(3.160) hAu, ej i = hu, Aej i = λj hu, ej i = 0 if u ∈ HN =⇒ Au ∈ HN .
Moreover this restricted operator is self-adjoint and compact on HN as before so
we can again find an eigenvector, with eigenvalue either the max of min of the new
F for HN . This process will not stop uness F ≡ 0 at some stage, but then A ≡ 0
on HN and since HN ⊥ Nul(A) which implies HN = {0} so H0 must have been
finite dimensional.
Thus, either H0 is finite dimensional or we can grind out an infinite orthonormal
sequence ei of eigenvectors of A in H0 with the corresponding sequence of eigen-
values such that |λi | is non-increasing – since the successive FN ’s are restrictions
of the previous ones the max and min are getting closer to (or at least no further
from) 0.
So we need to rule out the possibility that there is an infinite orthonormal
sequence of eigenfunctions ej with corresponding eigenvalues λj where inf j |λj | =
a > 0. Such a sequence cannot exist since ej * 0 so by the compactness of A,
Aej → 0 (in norm) but kAej k ≥ a which is a contradiction. Thus if null(A)⊥
is not finite dimensional then the sequence of eigenvalues constructed above must
converge to 0.
Finally then, we need to check that this orthonormal sequence of eigenvectors
constitutes an orthonormal basis of H0 . If not, then we can form the closure of the
span of the ei we have constructed, H0 , and its orthocomplement in H0 – which
would have to be non-trivial. However, as before F restricts to this space to be
F 0 for the restriction of A0 to it, which is again a compact self-adjoint operator.
So, if F 0 is not identically zero we can again construct an eigenfunction, with non-
zero eigenvalue, which contradicts the fact the we are always choosing a largest
eigenvalue, in absolute value at least. Thus in fact F 0 ≡ 0 so A0 ≡ 0 and the
92 3. HILBERT SPACES

eigenvectors form and orthonormal basis of Nul(A)⊥ . This completes the proof of
the theorem. 

19. Functional Calculus


As we have seen, the non-zero eigenvalues of a compact self-adjoint operator A
form the image of a sequence in [−kAk, kAk] either converging to zero or finite. If ej
is an orthonormal sequence of eigenfunctions which spans Nul(A)⊥ with associated
eigenvalues λi then
X
(3.161) A= λi Pi , Pi u = hu, ei iei
i

being the projection onto the span Cei . Since Pi Pj = 0 if i 6= j and Pi2 = Pi it fol-
lows inductively that the positive powers of A are given by similar sums converging
in B(H) :
X
(3.162) Ak = λki Pi , Pi u = hu, ei iei , k ∈ N.
i
There is a similar formula for the identity of course, except we need to remember
that the null space of A then appears (and the series does not usually converge in
the norm topology on B(H)) :
X
(3.163) Id = Pi + PN , N = Nul(A).
i

The sum (3.163) can be interpreted in terms of a strong limit of operators, meaning
that the result converges when applied term by term to an element of H, so
X
(3.164) u= Pi u + PN u, ∀ u ∈ H
i
which is a form of the Fourier-Bessel series. Combining these formulæ we see that
for any polynomial p(z)
X
(3.165) p(A) = p(λi )Pi + p(0)PN
i

converges strongly, and in norm provided p(0) = 0.


In fact we can do this more generally, by choosing f ∈ C([−kAk, kAk) and
defining an operator by
X
(3.166) f (A) ∈ B(H), f (A)u = f (λi )(u, ei )ei
i

This series converges in the norm topology provided f (0) = 0 so to a compact


operator and if f is real it is self-adjoint. You can easily check that, always for
A = A∗ compact here, this formula defines a bounded linear map
(3.167) C([−kAk, kAk] −→ B(H)
which has nice properties. Most importantly
(3.168) (f g)(A) = f (A)g(A), (f (A))∗ = f¯(A)
so it takes the product of two continuous functions to the product of the operators.
We will proceed to show that such a map exists for any bounded self-adjoint
operator. Even though it may not have eigenfunctions – or even if it does, it
might not have an orthonormal basis of eigenvectors. Even so, it is still possible to
19. FUNCTIONAL CALCULUS 93

define f (A) for a continous function defined on [a− , a+ ] if Spec(A) ⊂ [a− , a+ ]. (In
fact it only has to be defined on the compact set Spec(A) which might be quite a
lot smaller). This is an effective extension of the spectral theorem to the case of
non-compact self-adjoint operators.
How does one define f (A)? Well, it is easy enough in case f is a polynomial,
since then we can simply substitute An in place of z n . If we factorize the polynomial
this is the same as setting
(3.169) f (z) = c(z−z1 )(z−z2 ) . . . (z−zN ) =⇒ f (A) = c(A−z1 )(A−z2 ) . . . (A−zN )
and this is equivalent to (3.166) in case A is also compact.
Notice that the result does not depend on the order of the factors or anything
like that. To pass to the case of a general continuous function we need to estimate
the norm in the polynomial case.
Proposition 3.17. If A = A∗ ∈ B(H) is a bounded self-adjoint operator on a
Hilbert space then for any polynomial with real coefficients
(3.170) kf (A)k ≤ sup |f (z)|, Spec(A) ⊂ [a− , a+ ].
z∈[a− ,a+ ]

Proof. For a polynomial we have defined f (A) by (3.169). We can drop the
constant c since it will just contribute a factor of |c| to both sides of (3.170). Now,
recall from Lemma 3.14 that for a self-adjoint operator the norm can be realized as
(3.171) kf (A)k = sup{|t|; t ∈ Spec(f (A))}.
That is, we need to think about when f (A) − t is invertible. However, f (z) − t
is another polynomial (with leading term z N because we normalized the leading
coefficient to be 1). Thus it can also be factorized:
N
Y
f (z) − t = (z − ζj (t)),
j=1
(3.172)
N
Y
f (A) − t = (A − ζj (t))
j=1

where the ζj ∈ C are the roots (which might be complex even though the polynomial
is real). Written in this way we can see that
N
Y
(3.173) (f (A) − t)−1 = (A − ζj (t))−1 if ζj (t) ∈
/ Spec(A) ∀ j.
j=1

Indeed the converse is also true, i.e. the inverse exists if and only if all the A − ζj (t)
are invertible, but in any case we see that
(3.174) Spec(f (A)) ⊂ {t ∈ C; ζj (t) ∈ Spec(A), for some j = 1, . . . , N }
since if t is not in the right side then f (A) − t is invertible.
Now this can be restated as
(3.175) Spec(f (A)) ⊂ f (Spec(A))
since t ∈/ f (Spec(A)) means f (z) 6= t for z ∈ Spec(A) which means that there is no
root of f (z) = t in Spec(A) and hence (3.174) shows that t ∈ / Spec(f (A)). In fact
it is easy to see that there is equality in (3.175).
94 3. HILBERT SPACES

Then (3.170) follows from (3.171), the norm is the sup of |z|, for z ∈ Spec(f (A))
so
kf (A)k ≤ sup |f (t)|.
t∈Spec(A)

This allows one to pass by continuity to f in the uniform closure of the poly-
nomials, which by the Stone-Weierstrass theorem is the whole of C([a− , a+ ]).
Theorem 3.5. If A = A∗ ∈ B(H) for a Hilbert space H then the map defined
on polynomials, through (3.169) extends by continuity to a bounded linear map
(3.176) C([a− , a+ ]) −→ B(H) if Spec(A) ⊂ [a− , a+ ], Spec(f (A)) ⊂ f ([a− , a+ ]).
Proof. By the Stone-Weierstrass theorem polynomials are dense in continous
functions on any compact interval, in the supremum norm. 
Remark 3.1. You should check the properties of this map, which also follow by
continuity, especially that (3.168) holds in this more general context. In particular,
f (A) is self-adjoint if f ∈ C([a− , a+ ]) is real-valued and is non-negative if f ≥ 0 on
Spec(A).

20. Spectral projection


I have not discussed this in lectures but it is natural at this point to push a
little further towards the full spectral theorem for bounded self-adjoint operators.
If A ∈ B(H) is self-adjoint, and [a− , a+ ] ⊃ Spec(A), we have defined f (A) ∈ B(H)
for A ∈ C([a− , a+ ]) real-valued and hence, for each u ∈ H,
(3.177) C([a− , a+ ]) 3 f 7−→ hf (A)u, ui ∈ R.
Thinking back to the treatment of the Lebesgue integral, you can think of this as a
replacement for the Riemann integral and ask whether it can be extended further,
to functions which are not necessarily continuous.
In fact (3.177) is essentially given by a Riemann-Stieltjes integral and this
suggests finding the increasing function which defines it. Of course we have the
rather large issue that this depends on a vector in Hilbert space as well – clearly
we want to allow this to vary too.
One direct approach is to try to define the ‘integral’ of the characteristic func-
tion (−∞, a] for fixed a ∈ R. To do this is consider
(3.178) Qa (u) = inf{hf (A)u, ui; f ∈ C([a− , a+ ]), f (t) ≥ 0, f (t) ≥ 1 on [a− , a]}.
Since f ≥ 0 we know that hf (A)u, ui ≥ 0 so the infimum exists and is non-negative.
In fact there must exist a sequence fn such that
(3.179) Qa (u) = limhfn (A)u, ui, fn ∈ C([a− , a+ ]), fn ≥ 0, fn (t) ≥ 1, a− ≤ t ≤ a,
where the sequence fn could depend on u. Consider an obvious choice for fn given
what we did earlier, namely

1
 a− ≤ t ≤ a
(3.180) fn (t) = 1 − (t − a)/n a ≤ t ≤ a + 1/n

0 t > a + 1/n.

Certainly
(3.181) Qa (u) ≤ limhfn (A)u, ui
20. SPECTRAL PROJECTION 95

where the limit exists since the sequence is decreasing.


Lemma 3.17. For any a ∈ [a− , a+ ],
(3.182) Qa (u) = limhfn (A)u, ui.
Proof. For any given f as in (3.178), and any  > 0 there exists n such that
f (t) ≥ 1/(1 + ) in a ≤ t ≤ a + 1/n, by continuity. This means that (1 + )f ≥ gn
and hence hf (A)u, ui ≥ (1 + )−1 hfn (A)u, ui from which (3.182) follows, given
(3.181). 
Thus in fact one sequence gives the infimum for all u. Now, use the polarization
identity to define
1
(3.183) Qa (u, v) = (Qa (u + v) − Qa (u − v) + iQa (u + iv) − iQa (u − iv)) .
4
The corresponding identity holds for hfn (A)u, vi so in fact
(3.184) Qa (u, v) = lim hfn (A)u, vi.
n→∞

It follows that Qa (u, v) is a sesquilinear form, linear in the first variable and an-
tilinear in the second. Moreover the fn (A) are uniformly bounded in B(H) (with
norm 1 in fact) so
(3.185) |Qa (u, v)| ≤ Ckukkvk.
Now, using the linearity in v of Qa (u, v) and the Riesz Representation theorem it
follows that for each u ∈ H there exists a unique Qa u ∈ H such that
(3.186) Qa (u, v) = hQa u, vi, ∀ v ∈ H, kQa uk ≤ kuk.
From the uniqueness, H 3 u 7−→ Qa u is linear so (3.186) shows that it is a bounded
linear operator. Thus we have proved most of
Proposition 3.18. For each a ∈ [a− , a+ ] ⊃ Spec(A) there is a uniquely defined
operator Qa ∈ B(H) such that
(3.187) Qa (u) = hQa u, ui
recovers (3.182) and Q∗a = Qa = Q2a is a projection satisfying
(3.188) Qa Qb = Qb Qa = Qb if b ≤ a, [Qa , f (A)] = 0 ∀ f ∈ C([a− , a+ ]).
This operator, or really the whole family Qa , is called the spectral projection of A.
Proof. We have already shown the existence of Qa ∈ B(H) with the property
(3.187) and since we defined it directly from Qa (u) it is unique. Self-adjointness
follows from the reality of Qa (u) ≥ 0 since hQa u, vi = hu, Qa vi then follows from
(3.186).
From (3.184) it follows that
hQa u, vi = lim hfn (A)u, vi =⇒
n→∞
(3.189)
hQa u, f (A)vi = lim hfn (A)u, f (A)vi = hQa f (A)u, vi
n→∞

since f (A) commutes with fn (A) for any continuous f. This proves the statement
in (3.188). Since fn fm ≤ fn is admissible in the definition of Qa in (3.178)
(3.190)
hQa u, vi = lim h(fn fm )(A)u, vi = lim hfn (A)u, fm (A)vi = hQa (A)u, fm (A)vi
n→∞ n→∞
96 3. HILBERT SPACES

and now letting m → ∞ shows that Q2a = Qa . A similar argument shows the first
identity in (3.188). 

Returning to the original thought that (3.177) represents a Riemann-Stieltjes


integral for each u we see that collectively what we have is a map

(3.191) [a− , a+ ] 3 a 7−→ Qa ∈ B(H)

taking values in the self-adjoint projections and increasing in the sense of (3.188).
A little more application allows one to recover the functional calculus as an integral
which can be written
Z
(3.192) f (A) = f (t)dQt
[a− ,a+ ]

which does indeed reduce to a Riemann-Stieltjes integral for each u :


Z
(3.193) hf (A)u, ui = f (t)dhQt u, ui.
[a− ,a+ ]

This, meaning (3.192), is the spectral resolution of the self-adjoint operator A,


replacing (and reducing to) the decomposition as a sum in the compact case
X
(3.194) f (A) = f (λj )Pj
n

where the Pj are the orthogonal projections onto the eigenspaces for λj .

21. Polar Decomposition


One nice application of the functional calculus for self-adjoint operators is to
get the polar decomposition of a general bounded operator.
1
Lemma 3.18. If A ∈ B(H) then E = (A∗ A) 2 , defined by the functional calculus,
is a non-negative self-adjoint operator.

Proof. That E exists as a self-adjoint operator satisfying E 2 = A∗ A follows


directly from Theorem 3.5 and positivity follows as in Remark 3.1. 

Proposition 3.19. Any bounded operator A can be written as a product


1
(3.195) A = U (A∗ A) 2 , U ∈ B(H), U ∗ U = Id −ΠNul(A) , U U ∗ = ΠRan(A) .
1
Proof. Set E = (A∗ A) 2 . We want to define U and we can see from the first
condition, A = U E, that

(3.196) U (w) = Av, if w = Ev.

This makes sense since Ev = 0 implies hEv, Evi = 0 and hence hA∗ Av, vi = 0 so
kAvk = 0 and Av = 0. So let us define
(
Av if w ∈ Ran(E), w = Ev
(3.197) U (w) =
0 if w ∈ (Ran(E))⊥ .
21. POLAR DECOMPOSITION 97

So U is defined on a dense subspace of H, Ran(E) ⊕ (Ran(E))⊥ which may not be


closed if Ran(E) is not closed. It follows that

(3.198) U (w1 + w2 ) = U (w1 ) = Av1 =⇒


kU (w1 + w2 )k2 = |hAv1 , Av1 i|2 = hE 2 v1 , v1 i = kEv1 k2 = kw1 k2 ≤ kw1 + w2 k2
if w1 = Ev, w2 ∈ (Ran E)⊥ .

Thus U is bounded on the dense subspace on which it is defined, so has a unique


continuous extension to a bounded operator U ∈ B(H). From the definition of U
the first, factorization, condition in (3.195) holds.
From the definition U vanishes on Ran(E)⊥ . We can now check that the con-
tinuous extension is a bijection

(3.199) U : Ran(E) −→ Ran(A).

Indeed, if w ∈ Ran(E) then kwk = kU wk from (3.198) so (3.199) is injective. The


same identity shows that the range of U in (3.199) is closed since if U wn converges,
kwn − wm k = kU (wn − wm )k shows that the sequence wn is Cauchy and hence
converges; the range is therefore Ran(A). This same identity, kU wk = kwk, for
w ∈ Ran(E), implies that

(3.200) hU w, U w0 i = hw, w0 i, w, w0 ∈ Ran(E).

This follows from the polarization identity

(3.201)
4hU w, U w0 i = kU (w + w0 )k2 − kU (w − w0 )k2 + ikU (w + iw0 )k2 − ikU (w − iw0 )k2
= kw + w0 k2 − kw − w0 k2 + ikw + iw0 k2 − ikw − iw0 k2 = 4hw, w0 i

The adjoint U ∗ of U has range contained in the orthocomplement of the null space
of U, so in Ran(E), and null space precisely Ran(A)⊥ so defines a linear map from
Ran(A) to Ran(E). As such it follows from (3.201) that

(3.202) U ∗ U = Id on Ran(E) =⇒ U ∗ = U −1 on Ran(A)

since U is a bijection it follows that U ∗ is the two-sided inverse of U as a map in


(3.199). The remainder of (3.195) follows from this, so completing the proof of the
Proposition. 

A bounded linear operator with the properties of U above, that there are two
decompositions of H = H1 ⊕ H2 = H3 ⊕ H4 into orthogonal closed subspaces, such
that U = 0 on H2 and U : H1 −→ H3 is a bijection with kU wk = kwk for all
w ∈ H1 is called a partial isometry. So the polar decomposition writes a general
bounded operator as product A = U E where U is a partial isometry from Ran(E)
1
onto Ran(A) and E = (A∗ A) 2 . If A is injective then U is actually unitary.
1
Exercise 1. Show that in the same sense, A = F V where F = (AA∗ ) 2 and
V is a partial isometry from Ran(A∗ ) to Ran F .
98 3. HILBERT SPACES

22. Compact perturbations of the identity


I have generally not had a chance to discuss most of the material in this section,
or the next, in the lectures.
Compact operators are, as we know, ‘small’ in the sense that they are norm
limits of finite rank operators. Accepting this, then you will want to say that an
operator such as
(3.203) Id −K, K ∈ K(H)
is ‘big’. We are quite interested in this operator because of spectral theory. To say
that λ ∈ C is an eigenvalue of K is to say that there is a non-trivial solution of
(3.204) Ku − λu = 0
where non-trivial means other than than the solution u = 0 which always exists. If
λ is an eigenvalue of K then certainly λ ∈ Spec(K), since λ−K cannot be invertible.
For general operators the converse is not correct, but for compact operators it is.
Lemma 3.19. If K ∈ B(H) is a compact operator then λ ∈ C \ {0} is an
eigenvalue of K if and only if λ ∈ Spec(K).
Proof. Since we can divide by λ we may replace K by λ−1 K and consider the
special case λ = 1. Now, if K is actually finite rank the result is straightforward.
By Lemma 3.7 we can choose a basis so that (3.85) holds. Let the span of the ei
be W – since it is finite dimensional it is closed. Then Id −K acts rather simply –
decomposing H = W ⊕ W ⊥ , u = w + w0
(3.205) (Id −K)(w + w0 ) = w + (IdW −K 0 )w0 , K 0 : W −→ W
being a matrix with respect to the basis. It follows that 1 is an eigenvalue of K
if and only if 1 is an eigenvalue of K 0 as an operator on the finite-dimensional
space W. A matrix, such as IdW −K 0 , is invertible if and only if it is injective, or
equivalently surjective. So, the same is true for Id −K.
In the general case we use the approximability of K by finite rank operators.
Thus, we can choose a finite rank operator F such that kK − F k < 1/2. Thus,
(Id −K + F )−1 = Id −B is invertible. Then we can write
(3.206) Id −K = Id −(K − F ) − F = (Id −(K − F ))(Id −L), L = (Id −B)F.
Thus, Id −K is invertible if and only if Id −L is invertible. Thus, if Id −K is not
invertible then Id −L is not invertible and hence has null space and from (3.206) it
follows that Id −K has non-trivial null space, i.e. K has 1 as an eigenvalue. 

A little more generally:-


Proposition 3.20. If K ∈ K(H) is a compact operator on a separable Hilbert
space then
null(Id −K) = {u ∈ H; (IdK )u = 0} is finite dimensional
(3.207) Ran(Id −K) = {v ∈ H; ∃u ∈ H, v = (Id −K)u} is closed and
Ran(Id −K)⊥ = {w ∈ H; (w, Ku) = 0 ∀ u ∈ H} is finite dimensional
and moreover
dim (null(Id −K)) = dim Ran(Id −K)⊥ .

(3.208)
22. COMPACT PERTURBATIONS OF THE IDENTITY 99

Proof of Proposition 3.20. First let’s check this in the case of a finite rank
operator K = T. Then
(3.209) Nul(Id −T ) = {u ∈ H; u = T u} ⊂ Ran(T ).
A subspace of a finite dimensional space is certainly finite dimensional, so this
proves the first condition in the finite rank case.
Similarly, still assuming that T is finite rank consider the range
(3.210) Ran(Id −T ) = {v ∈ H; v = (Id −T )u for some u ∈ H}.
Consider the subspace {u ∈ H; T u = 0}. We know that this this is closed, since T
is certainly continuous. On the other hand from (3.210),
(3.211) Ran(Id −T ) ⊃ Nul(T ).
Now, Nul(T ) is closed and has finite codimension – it’s orthocomplement is spanned
by a finite set which maps to span the image. As shown in Lemma 3.4 it follows
from this that Ran(Id −T ) itself is closed with finite dimensional complement.
This takes care of the case that K = T has finite rank! What about the general
case where K is compact? If K is compact then there exists B ∈ B(H) and T of
finite rank such that
1
(3.212) K = B + T, kBk < .
2
Now, consider the null space of Id −K and use (3.212) to write
(3.213) Id −K = (Id −B) − T = (Id −B)(Id −T 0 ), T 0 = (Id −B)−1 T.
Here we have used the convergence of the Neumann series, so (Id −B)−1 does exist.
Now, T 0 is of finite rank, by the ideal property, so
(3.214) Nul(Id −K) = Nul(Id −T 0 ) is finite dimensional.
Here of course we use the fact that (Id −K)u = 0 is equivalent to (Id −T 0 )u = 0
since Id −B is invertible. So, this is the first condition in (3.207).
Similarly, to examine the second we do the same thing but the other way around
and write
(3.215) Id −K = (Id −B) − T = (Id −T 00 )(Id −B), T 00 = T (Id −B)−1 .
Now, T 00 is again of finite rank and
(3.216) Ran(Id −K) = Ran(Id −T 00 ) is closed and of finite codimension.
What about (3.208)? This time let’s first check first that it is enough to consider
the finite rank case. For a compact operator we have written
(3.217) (Id −K) = G(Id −T )
1
where G = Id −B with kBk < 2 is invertible and T is of finite rank. So what we
want to see is that
(3.218) dim Nul(Id −K) = dim Nul(Id −T ) = dim Nul(Id −K ∗ ).
However, Id −K ∗ = (Id −T ∗ )G∗ and G∗ is also invertible, so
(3.219) dim Nul(Id −K ∗ ) = dim Nul(Id −T ∗ )
and hence it is enough to check that dim Nul(Id −T ) = dim Nul(Id −T ∗ ) – which is
to say the same thing for finite rank operators.
100 3. HILBERT SPACES

Now, for a finite rank operator, written out as (3.85), we can look at the vector
space W spanned by all the fi ’s and all the ei ’s together – note that there is
nothing to stop there being dependence relations among the combination although
separately they are independent. Now, T : W −→ W as is immediately clear and
N
X
(3.220) T ∗v = (v, fi )ei
i=1

so T : W −→ W too. In fact T w = 0 and T ∗ w0 = 0 if w0 ∈ W ⊥ since then


0

(w0 , ei ) = 0 and (w0 , fi ) = 0 for all i. It follows that if we write R : W ←→ W for


the linear map on this finite dimensional space which is equal to Id −T acting on
it, then R∗ is given by Id −T ∗ acting on W and we use the Hilbert space structure
on W induced as a subspace of H. So, what we have just shown is that
(3.221)
(Id −T )u = 0 ⇐⇒ u ∈ W and Ru = 0, (Id −T ∗ )u = 0 ⇐⇒ u ∈ W and R∗ u = 0.
Thus we really are reduced to the finite-dimensional theorem
(3.222) dim Nul(R) = dim Nul(R∗ ) on W.
You no doubt know this result. It follows by observing that in this case, every-
thing is now in W, Ran(W ) = Nul(R∗ )⊥ and in finite dimensions
(3.223) dim Nul(R) + dim Ran(R) = dim W = dim Ran(W ) + dim Nul(R∗ ).


23. Hilbert-Schmidt, Trace and Schatten ideals


As well as the finite rank and compact operators there are other important
ideals. Since these results are not exploited in the subsequence sections, the many
proofs are relegated to exercises.
First consider the Hilbert-Schmidt operators. The definition is based on
Lemma 3.20. For a separable Hilbert space, H, if A ∈ B(H) then once the sum
for any one orthonormal basis {ei }
X
(3.224) kAk2HS = kAei k2
i
is finite it is finite for any other orthonormal basis and is independent of the choice
of basis.
It is straightforward to show that the operators of finite rank satisfy (3.224); this
is basically Bessel’s inequality.
Proof. This is Problem XX. Starting from (3.224) for some orthonormal basis
ei , consider any other orthonormal basis fj . Using the completeness, expand using
Bessel’s identity
X X
(3.225) kAei k2 = |hAei , fj i|2 = |hei , A∗ fj i|2 .
j j

This converges absolutely, so the convergence of (3.224) implies the convergence of


the double sum, which can then be rearranged to give
X XX XX X
(3.226) kAei k2 = |hei , A∗ fj i|2 = |hei , A∗ fj i|2 = kA∗ fj k2
i i j j i j
23. HILBERT-SCHMIDT, TRACE AND SCHATTEN IDEALS 101

where Bessel’s identity is used again. Thus the sum for A∗ with respect to the new
basis is finite. Applying this argument again shows that the sum is independent of
the basis, and the same for the adjoint. 

Proposition 3.21. The operators for which (3.224) is finite form a 2-sided ∗
-ideal HS(H) ⊂ B(H), contained in the ideal of compact operators, it is a Hilbert
space and the norm satisfies
! 21
X
2
kAkB ≤ kAkHS = kAei k ,
(3.227)
i
kADkHS ≤ kAkHS kDkB , A ∈ HS(H), D ∈ B(H).
The inner product is
X
hA, BiHS = hAei , Bei iH , A, B ∈ HS(H).
i

For a compact operator the polar decomposition can be given a more explicit
form and we can use this to give another characterization of the Hilbert-Schmidt
operators.
Proposition 3.22. If A ∈ K(H) then there exist orthonormal bases ei of
Nul(A)⊥ and fj of Nul(A∗ )⊥ such that
X
Au = si hu, ei ifi
i
1
where the si are the non-zero eigenvalues of (A∗ A) 2 repeated with multiplicity.
The si are called the characteristic values of A.

Proof. First take a basis ei of eigenvectors of A∗ A restricted to Nul(A)⊥ =


Nul(A∗ A)⊥ with eigenvalues s2i > 0, so the si are the non-zero eigenvalues of
1
|A| = (A∗ A) 2 . Then A = U |A| with U a unitary operator from Ran(|A|) = Nul(A)⊥
to Ran(A) so (3.22) follows by taking fi = U ei . 

Extending the ei to an orthonormal basis of H it follows that


! 21
X
2
kAkHS = si = ks∗ kl2 .
i

So to say that A is Hilbert-Schmidt is to say that the sequence of its characteristic


values is in l2 (with the caveat that the sequence might be finite).
One reason that the Hilbert-Schmidt operators are of interest is their relation
to the ideal of operators ‘of trace class’, T (H).
Definition 3.7. The space T (H) ⊂ B(H) for a separable Hilbert space consists
of those operators A for which
X
(3.228) kAkTr = sup |hAei , fi i| < ∞
i

where the supremum is over pairs of orthonormal sequences {ei } and {fi }.
102 3. HILBERT SPACES

Proposition 3.23. The trace class operators form an ideal, T (H) ⊂ HS(H),
which is a Banach space with respect to the norm (3.228) which satisfies
1 1
(3.229) kAkB ≤ kAkTr , kAkHS ≤ kAkB2 kAkTr
2
;
the following two conditions are equivalent to A ∈ T (H) :
(1) The operator defined by the functional calculus,
1 1
(3.230) |A| 2 = (A∗ A) 4 ∈ HS(H).
(2) There are operators Bi , Bi0 ∈ HS(H) such that
N
X
(3.231) A= Bi0 Bi .
i=1

Proof. Note first that T (H) is a linear space and that k · kTr is a norm on
1
it. Now suppose A ∈ T (H) and consider its polar decomposition A = U (A∗ A) 2 .
1
Here U is a partial isometry mapping Ran(A∗ A) 2 to Ran(A) and vanishing on
1
⊥ 1
Ran(A∗ A) 2 . Consider an orthonormal basis {ei } of Ran(A∗ A) 2 . This is an or-
thonormal sequence in H as is fi = U ei . Inserting these into (3.228) shows that
1 1 1
X X
(3.232) |hU (A∗ A) 2 ei , fi i| = |h(A∗ A) 4 ei , (A∗ A) 4 ei i| < ∞
i i

where we use the fact that U ∗ fi = U ∗ U ei = ei . Since the closure of the range of
1 1
(A∗ A) 4 is the same as the closure of the range of (A∗ A) 2 it follows from (3.232)
1
that (3.230) holds (since adding an orthonormal basis of Ran((A∗ A) 4 )⊥ does not
increase the sum).
Next assume that (3.230) holds for A ∈ B(H). Then the polar decomposition
1 1
can be written A = (U (A∗ A) 4 )(A∗ A) 4 showing that A is the product of two
Hilbert-Schmidt operators, so in particular of the form (3.231).
Now assume that A is of the form (3.231), so is a sum of products of Hilbert-
Schmidt operators. The linearity of T (H) means it suffices to assume that A = BB 0
where B, B 0 ∈ HS(H). Then,
(3.233) |hAe, fi i| = |hB 0 ei , B ∗ fi i| ≤ kB 0 ei kH kB ∗ fi kH .
Taking a finite sum and applying Cauchy-Schwartz inequality
N N N
1 1
X X X
(3.234) |hAe, fi i| ≤ ( kB 0 ei k2 ) 2 ( kB ∗ fi k2 ) 2 .
i=1 i=1 i=1

If the sequences are orthonormal the right side is bounded by the product of the
Hilbert-Schmidt norms so
(3.235) kBB 0 kTr ≤ kBkHS kB 0 kHS
and A = BB 0 ∈ T (H).
The first inequality in (3.229) follows the choice of single unit vectors u and v
as orthonormal sequences, so
(3.236) |hAu, vi| ≤ kAkTr =⇒ kAk ≤ kAkTr .
The completeness of T (H) with respect to the trace norm follows standard
arguments which can be summarized as follows
23. HILBERT-SCHMIDT, TRACE AND SCHATTEN IDEALS 103

(1) If An is Cauchy in T (H) then by the equality just established, it is Cauchy


in B(H) and so converges in norm to A ∈ B(H).
(2) A Cauchy sequence is bounded, so there is a constant C = supn kAm kTr
such that for any N, any orthonormal sequences ei , fi ,
N
X
(3.237) |hAn ei , fi i| ≤ C.
i=1

Passing to the limit An → A in the finite sum gives the same bound with
An replaced by A and then allowing N → ∞ shows that A ∈ T (H).
Similarly the Cauchy condition means that for  > 0 there exists M such
that for all N, and any orthonormal sequences ei , fi
N
X
(3.238) m.n > M =⇒ |h(An − Am )ei , fi i| ≤ .
i=1

Passing first to the limit m → ∞ in the finite sum and then N → ∞


shows that
n > M =⇒ kAn − AkTr ≤ 
and so An → A in the trace norm.


Proposition 3.24. The trace functional


X
(3.239) T (H) 3 A 7−→ Tr(A) = hAei , ei i
i

is a continuous linear functional (with respect to the trace norm) which is indepen-
dent of the choice of orthonormal basis {ei } and which satsifies
(3.240) Tr(AB − BA) = 0 if A ∈ T (H), B ∈ B(H) or A, B ∈ HS(H).
Proof. The complex number Tr(AB − BA) depends linearly on A and, sepa-
rately, on B. The ideals are ∗-closed so decomposing A = (A + A∗ )/2 + i(A − A∗ )/2i
and similarly for B shows that it suffices to assume that A and B are self-adjoint. If
A ∈ T (H) we can choose use an orthonormal basis of eigenvectors for it to evaluate
the trace. Then if Aei = λi ei
X
(3.241) Tr(AB − BA) = (hBei , Aei i − hAei , Bei i)
i
X
= (λi hBei , ei i − λi hei , Bei i) = 0.
i

The case that A, B ∈ HS(H) is similar. 

This is the fundamental property of the trace functional, that it vanishes on


commutators where one of the elements is of trace class and the other is bounded.
Two other important properties are that
Lemma 3.21. (1) If A, B ∈ HS(H) then
(3.242) hA, BiHS = Tr(A∗ B)
104 3. HILBERT SPACES

(2) If T = T ∗ ∈ K(H) then T ∈ T (H) if and only if the sequence of non-zero


eigenvalues λj of T (repeated with multiplicity) is in l1 and
X X
(3.243) Tr(T ) = λj , kT kTr = |λj |.
j j

In fact the second result extends to Lidskii’s theorem: If T ∈ Tr(H) then the
spectrum outside 0 is discrete, so countable, and each P point is an eigenvalue λi
of finite algebraic multiplicity ki and then Tr(T ) = ki λi converges in l1 . The
i
algebraic multiplicity is the limit as k → ∞ of the dimension of the null space of
(T − λi )k . The standard proof of this is not elementary.
Next we turn to the more general Schatten classes.

Definition 3.8. An operator A ∈ K(H) is ‘of Schatten class,’ A ∈ Scp (H),


p ∈ [1, ∞) if and only if |A|p ∈ T (H), i.e.
! p1
X
(3.244) kT kScp = spi <∞
i

where si are the non-zero characteristic values of A repeated with multiplicity.

So T (H) = Sc1 (H), HS(H) = Sc2 (H).


Of course the notation is suggestive, but we need to be a bit careful in proving
the results which are implied by the notation!

Proposition 3.25. Each of the Schatten classes is a two-sided ∗-ideal in B(H)


which is a Banach space with respect to the norm (3.244); the norm is also given
by
X
(3.245) kT kpScp = sup |hT ei , fi i|p
i

with the supremum over orthonormal sequences, with finiteness implying that T ∈
Scp (H). If q is the conjugate index to p ∈ (0, ∞) then

(3.246) A ∈ Scq (H), B ∈ Scp (H) =⇒ AB ∈ T (H), kABkTr ≤ kAkScq kBkScp

and conversely, if A ∈ B(H) then A ∈ Scp (H) if and only if AB ∈ T (H) for all
B ∈ Scq (H) and
kAkScp = sup kABkTr .
kBkScq =1

Proof. The alternate realization of the Schatten norm in (3.245) is particu-


larly useful since whilst it is clear from the definition that cT ∈ Scp (H) if T ∈
Scp (H) and c ∈ C, it is not otherwise immediately clear that the space is linear (or
that the triangle inequality holds).
From the definition (3.244), that if T is self-adjoint then T ∈ Scp (H) if and
only if
X
(3.247) sup |hT fi , fi i|p = kT kpScp < ∞
i
23. HILBERT-SCHMIDT, TRACE AND SCHATTEN IDEALS 105

with the supremum over orthonormal sequences. To see this let ej be an orthonomal
basis of eigenvectors for T. Then expanding in the Fourier-Bessel series
X X
(3.248) hT fi , fi i = λj |hfi , ej i|2 ≤ |λj ||hfi , ej i|2/p |hfi , ej i|2/q
j j
X 1 X 1 X 1
p 2
≤( |λj | hfi , ej i| ) p ( |hfi , ej i|2 ) q = ( |λj |p hfi , ej i|2 ) p
j j j

by Hölder’s inequality, so
X X X X
(3.249) |hT fi , fi i|p ≤ |λj |p hfi , ej i|2 = |λj |p = kT kpScp .
i j i j

This proves (3.247) when T = T ∗ ∈ K(H).


Now consider (3.245). Let PN be the orthogonal projection onto the span
of the eigenspaces corresponding to the the N largest eigenvalues of |T |. Then
we replace T by TN = T PN ; certainly T PN → T in norm. Since TN has finite
rank, both Nul(TN ) and Nul(TN∗ ) are infinite dimensional so we can write the polar
decomposition as
TN = UN AN , AN = PN |T |PN

and take UN to be unitary (rather than a partial isometry) by extending it by


an isometric isomorphism Nul(TN )⊥ −→ Nul(TN∗ )⊥ . Then using Cauchy-Schwartz
inequality and then (3.247) for AN ,
X X 1 1

(3.250) |hTN ei , fi i|p = |hAN
2
ei , AN
2
UN fi i|p
i i
X 1 1
X 1 1

≤( kAN
2
ei k2p ) 2 ( kAN
2
UN fi k2p ) 2
i i
1 1
X X

≤( |hAN ei , ei i|p ) 2 | |hAN UN fi , U ∗ fi i|p ) 2
i i
≤ kAN kpScp = kTN kpScp ≤ kT kpScp .

As usual dropping to a finite sum on the left we can pass to the limit as N → ∞
and obtain a uniform bound on any finite sum for T from which (3.245) follows.
At this point we know that if A ∈ Scp (H) and U1 , U2 are unitary then

U1 AU2 ∈ Scp (H) and kU1 AU2 kScp = kAkScp .

From (3.245) it follows directly that Scp (H) is linear, that the triangle inequal-
ity holds, so that k · kScp is a norm, and Scp (H) is complete and that it is ∗-closed.
Now, if A ∈ Scq (H) and B ∈ Scp (H) for conjugate indices p, q ∈ (1, ∞) choose
a finite rank orthogonal projection P and consider ABP which is of finite rank, and
hence of trace class. We can compute its trace with respect to any orthonormal
basis. Choose an orthonormal basis ei of the range of P AP and fi so that the polar
decomposition of P AP becomes

P AP fi = si ei =⇒ P A∗ P ei = si fi
106 3. HILBERT SPACES

where the si are the characteristic values of P A. With finite sums


X
(3.251) | Tr(P AP BP )| = | hP AP 2 BP ei , ei i|
X N
X
=| hP BP ei , P A∗ P ei i| ≤ si |hP BP ei , fi i|
i=1
X q 1 X 1
≤( sj ) q ( |hP BP ei , fi i|p ) p ≤ kP AP kScq kP BP kScp
by Hölder’s inequality. Now |P BP | = P |B|P (and similarly for A) and from
minimax arguments discussed earlier it follows that, sj (|P BP |) ≤ sj (|B|) for all j.
So we see that
(3.252) | Tr(P AP BP )| ≤ kAkScq kBkScp .
Fixing B this is true for any A, so A can be replaced by U A with U unitary, in
such a way that AP B = |AP B|. We also know that kU AkScq = kAkScq and since
P |AP B|P is positive and
(3.253) Tr(P AP BP ) = Tr(P |AP B|P ) = kP |AP B|P kTr ≤ kAkScq kBkScp .
Taking an increasing sequence of projections PN , it follows that PN |APN B|PN →
|AB| in trace norm and that (3.246) holds.
The proof of optimality in this ‘non-commutative Hölder inequality’ is left as
an exercise. That Scp (H) is an ideal then follows from the fact that T (H) is an
ideal. 

24. Fredholm operators


Definition 3.9. A bounded operator F ∈ B(H) on a Hilbert space is said to
be Fredholm, written F ∈ F(H), if it has the three properties in (3.207) – its null
space is finite dimensional, its range is closed and the orthocomplement of its range
is finite dimensional.
In view of Proposition 3.20, if K ∈ K(H) then Id −K ∈ F(H). For general Fred-
holm operators the row-rank=colum-rank result (3.208) does not hold. Indeed the
difference of these two integers, called the index of the operator,
ind(F ) = dim (null(F )) − dim Ran(F )⊥

(3.254)
is a very important number with lots of interesting properties and uses.
Notice that the last two conditions in (3.207) are really independent since the
orthocomplement of a subspace is the same as the orthocomplement of its closure.
There is for instance a bounded operator on a separable Hilbert space with trivial
null space and dense range which is not closed. How could this be? Think for
instance of the operator on L2 (0, 1) which is multiplication by the function x.
This is assuredly bounded and an element of the null space would have to satisfy
xu(x) = 0 almost everywhere, and hence vanish almost everywhere. Moreover the
density of the L2 functions vanishing in x <  for some (non-fixed)  > 0 shows
that the range is dense. However this operator is not invertible and not Fredholm.
On the other hand we do know that a subspace with finite codimension is
closed, so we can replace the last two conditions in Definition 3.9 by saying that
the range of the operator has finite codimension. I have not done this directly since
it is a little too easy to fall into the trap of thinking that it is enough to check that
the closure of the range has finite codimension; it isn’t!
24. FREDHOLM OPERATORS 107

Before looking at general Fredholm operators let’s check that, in the case of
operators of the form Id −K, with K compact the third conclusion in (3.207) really
follows from the first. This is a general fact which I mentioned, at least, earlier but
let me pause to prove it.
Proposition 3.26. If B ∈ B(H) is a bounded operator on a Hilbert space and
B ∗ is its adjoint then
(3.255) Ran(B)⊥ = (Ran(B))⊥ = {v ∈ H; (v, w) = 0 ∀ w ∈ Ran(B)} = Nul(B ∗ ).
Proof. The definition of the orthocomplement of Ran(B) shows immediately
that
(3.256) v ∈ (Ran(B))⊥ ⇐⇒ (v, w) = 0 ∀ w ∈ Ran(B) ⇐⇒ (v, Bu) = 0 ∀ u ∈ H
⇐⇒ (B ∗ v, u) = 0 ∀ u ∈ H ⇐⇒ B ∗ v = 0 ⇐⇒ v ∈ Nul(B ∗ ).
On the other hand we have already observed that V ⊥ = (V )⊥ for any subspace –
since the right side is certainly contained in the left and (u, v) = 0 for all v ∈ V
implies that (u, w) = 0 for all w ∈ V by using the continuity of the inner product
to pass to the limit of a sequence vn → w. 
There is a more ‘analytic’ way of characterizing Fredholm operators, rather
than Definition 3.9.
Lemma 3.22. An operator F ∈ B(H) is Fredholm, F ∈ F(H), if and only if it
has a generalized inverse P satisfying
P F = Id −ΠNul(F )
(3.257)
F P = Id −ΠRan(F )⊥
with the two projections of finite rank.
Proof. If (3.257) holds then F must be Fredholm, since its null space is finite
dimensional, from the second identity the range of F must contain the range of
Id −ΠNul(F )⊥ and hence it must be closed and of finite codimension.
Conversely, suppose that F ∈ F(H). We can divide H into two pieces in two
ways as H = Nul(F ) ⊕ Nul(F )⊥ and H = Ran(F )⊥ ⊕ Ran(F ) where in each case
the first summand is finite-dimensional. Then F defines four maps, from each of
the two first summands to each of the two second ones but only one of these is
non-zero and so F corresponds to a bounded linear map F̃ : Nul(F )⊥ −→ Ran(F ).
These are two Hilbert spaces with a bounded linear bijection between them, so the
inverse map, P̃ : Ran(F ) −→ Nul(F )⊥ is bounded by the Open Mapping Theorem
and we can define
(3.258) P = P̃ ◦ ΠNul(F )⊥ .
Then (3.257) follows directly. 
What we want to show is that the Fredholm operators form an open set in
B(H) and that the index is locally constant. To do this we show that a weaker
version of (3.257) also implies that F is Fredholm.
Lemma 3.23. An operator F ∈ F(H) is Fredholm if and only if it has a para-
metrix Q ∈ B(H) in the sense that
QF = Id −EL
(3.259)
F Q = Id −ER
108 3. HILBERT SPACES

with EL and ER of finite rank. Moreover any two such parametrices differ by a
finite rank operator.
The term ‘parametrix’ refers to an inverse modulo an ideal. Here we are looking
at the ideal of finite rank operators. In fact this is equivalent to the existence of
an inverse modulo compact operators. One direction is obvious – since finite rank
operators are compact – the other is covered by one of the problems. Notice that
the parametrix Q is itself Fredholm, since reversing the two equations shows that
F is a parameterix for Q. Similarly it follows that if F is Fredholm then so is F ∗
and that the product of two Fredholm operators is Fredholm.

Proof. If F is Fredholm then Q = P certainly is a parameterix in this


sense. Conversely suppose that Q as in (3.259) exists. Then Nul(Id −EL ) is
finite dimensional – from (3.207) for instance. However, from the first identity
Nul(F ) ⊂ Nul(QF ) = Nul(Id −EL ) so Nul(F ) is finite dimensional too. Similarly,
the second identity shows that Ran(F ) ⊃ Ran(F Q) = Ran(Id −ER ) and the last
space is closed and of finite codimension, hence so is the first. Thus the existence
of such a parameterix Q implies that F is Fredholm.
Now if Q and Q0 both satisfy (3.259) with finite rank error terms ER 0
and EL0
0
for Q then
(3.260) (Q0 − Q)F = EL − EL0
is of finite rank. Applying the generalized inverse, P, of F on the right shows that
the difference
(3.261) (Q0 − Q) = (EL − EL0 )P + (Q0 − Q)ΠNul(F )
is indeed of finite rank. 

Observe that (3.260) can be reversed. If F is Fredholm, so has a parametrix


Q then all the operators Q + E where E is of finite rank are also parametrices. It
is also the case that if F is Fredholm and K is compact then F + K is Fredholm.
Indeed, if you go through the proof above replacing ‘finite rank’ by ‘compact’ you
can check this. Thus an operator is Fredholm if and only if it has invertible image
in the Calkin algebra, B(H)/K(H).
Now recall that finite-rank operators are of trace class, that the trace is well-
defined and that the trace of a commutator where one factor is bounded and the
other trace class vanishes. Using this we show
Lemma 3.24. If Q and F satisfy (3.259) then
(3.262) ind(F ) = Tr(EL ) − Tr(ER ).
Proof. We certainly know that (3.262) holds in the special case that Q = P
is the generalized inverse of F, since then EL = ΠNul(F ) and ER = ΠRan(F )⊥ and
the traces are the dimensions of these spaces.
Now, if Q is a parameterix as in (3.259) consider the straight line of operators
Qt = (1 − t)P + tQ. Using the two sets of identities for the generalized inverse and
paramaterix
Qt F = (1 − t)P F + tQF = Id −(1 − t)ΠNul(F ) − tEL ,
(3.263)
F Qt = (1 − t)F P + tF Q = Id −(1 − t)ΠRan(F )⊥ − tER .
25. KUIPER’S THEOREM 109

Thus Qt is a curve of parameterices and what we need to show is that


(3.264) J(t) = Tr((1 − t)ΠNul(F ) + tEL ) − Tr((1 − t)ΠRan(F )⊥ + tER )
is constant. This is a linear function of t as is Qt . We can differentiate (3.263) with
respect to t and see that
d d
(3.265) ((1 − t)ΠNul(F ) + tEL ) − ((1 − t)ΠRan(F )⊥ + tER ) = [Q − P, F ]
dt dt
=⇒ J 0 (t) = 0
since it is the trace of the commutator of a bounded and a finite rank operator
(using the last part of Lemma 3.23). 
Proposition 3.27. The Fredholm operators form an open set in B(H) on which
the index is locally constant.
Proof. We need to show that if F is Fredholm then there exists  > 0 such
that F + B is Fredholm if kBk < . Set B 0 = ΠRan(F ) BΠNul(F )⊥ then kB 0 k ≤ kBk
and B−B 0 is finite rank. If F̃ is the operator constructed in the proof of Lemma 3.22
then F̃ + B 0 is invertible as an operator from Nul(F )⊥ to Ran(F ) if  > 0 is small.
The inverse, PB0 , extended as 0 to Nul(F ) as P is defined in that proof, satisfies
PB0 (F + B) = Id −ΠNul(F ) + PB0 (B − B 0 ),
(3.266)
(F + B)PB0 = Id −ΠRan(F )⊥ + (B − B 0 )PB0
and so is a parametrix for F + B. Thus the set of Fredholm operators is open.
The index of F +B is given by the difference of the trace of the finite rank error
terms in the second and first lines here. It depends continuously on B in kBk < 
so, being integer-valued, is constant. 
This shows in particular there is an open subset of B(H) which contains no
invertible operators, in strong contrast to the finite dimensional case. In fact even
the Fredholm operators do not form a dense subset of B(H). One such open subset
consists of the semi-Fredholm operators – those with closed range and with either
null space or complement of range finite-dimensional.
Why is the index important? For one thing it actually labels the components
of the Fredholm operators – two Fredholm operators can be connected by a curve
of Fredholms if and only if they have the same index. One of the main applications
of the index is quite trivial to see – if the index of a Fredholm operator is positive
then the operator must have non-trivial null space. This is a remarkably powerful
method for showing that certain sorts (‘elliptic’ for one) of equations have non-
trivial solutions.

25. Kuiper’s theorem


For finite dimensional spaces, such as CN , the group of invertible operators –
in this case matrices and denoted typically GL(N ) – is a particularly important
example of a Lie group. One reason it is important is that it carries a good deal
of ‘topological’ structure. In particular – if you have done a little topology – its
fundamental group is not trivial, in fact it is isomorphic to Z. This corresponds to
the fact that a continuous closed curve c : S −→ GL(N ) is contractible if and only
if its winding number is zero – the effective number of times that the determinant
110 3. HILBERT SPACES

goes around the origin in C. There is a lot more topology than this and it is actually
quite complicated.
Perhaps surprisingly, the corresponding group of the invertible bounded oper-
ators on a separable (complex) infinite-dimensional Hilbert space is contractible.
This is Kuiper’s theorem, and means that this group, GL(H), has no ‘topology’ at
all, no holes in any dimension and for topological purposes it is like a big open ball.
The proof is not really hard, but it is not exactly obvious either. It depends on
an earlier idea, ‘Eilenberg’s swindle’ - it is an unusual name for a theorem - which
shows how the infinite-dimensionality is exploited. As you can guess, this is sort
of amusing (if you have the right attitude . . . ). The proof I give here is due to B.
Mityagin, [3].
Let’s denote by GL(H) this group. In view of the open mapping theorem we
know that
(3.267) GL(H) = {A ∈ B(H); A is injective and surjective}.
Contractibility means precisely that there is a continuous map
γ : [0, 1] × GL(H) −→ GL(H) s.t.
(3.268)
γ(0, A) = A, γ(1, A) = Id, ∀ A ∈ GL(H).
Continuity here means for the metric space [0, 1] × GL(H) where the metric comes
from the norms on R and B(H).
I will only show ‘weak contractibility’ of GL(H). This has nothing to do with
weak convergence, rather just means that we only look for an homotopy over com-
pact sets.
As a warm-up exercise, let us show that the group GL(H) is contractible to
the unitary subgroup
(3.269) U(H) = {U ∈ GL(H); U −1 = U ∗ }.
These are the isometric isomorphisms.
Proposition 3.28. There is a continuous map
(3.270)
Γ : [0, 1] × GL(H) −→ GL(H) s.t. Γ(0, A) = A, Γ(1, A) ∈ U(H) ∀ A ∈ GL(H).
Proof. This is a consequence of the functional calculus, giving the ‘polar
decomposition’ of invertible (and more generally bounded) operators. Namely, if
A ∈ GL(H) then AA∗ ∈ GL(H) is self-adjoint. Its spectrum is then contained in
an interval [a, b], where 0 < a ≤ b = kAk2 . It follows from what we showed earlier
1
that R = (AA∗ ) 2 is a well-defined bounded self-adjoint operator and R2 = AA∗ .
Moreover, R is invertible and the operator UA = R−1 A ∈ U(H). Certainly it is
bounded and UA∗ = A∗ R−1 so UA∗ UA = A∗ R−2 A = Id since R−2 = (AA∗ )−1 =
(A∗ )−1 A−1 . Thus UA∗ is a right inverse of UA , and (since UA is a bijection) is the
unique inverse so UA ∈ U(H). So we have shown A = RUA then
(3.271) Γ(s, A) = (s Id +(1 − s)R)UA , s ∈ [0, 1]
satisfies (3.270).
There is however the issue of continuity of this map. Continuity in s is clear
enough but we also need to show that the map
1
(3.272) GL(H) 3 A 7−→ (A∗ A) 2 ∈ GL(H),
defining R, is continuous.
25. KUIPER’S THEOREM 111

Certainly the map A − 7 → A∗ A is (norm) continuous. Suppose An → A in


GL(H) then given  there is a polynomial p such that
1
(3.273) k(B ∗ B) 2 − p(B ∗ B)k ≤ /3 for B = A, An ∀ n.
On the other hand (A∗n An )k → (A∗ A)k for any k so
1 1
(3.274) k(A∗ A) 2 − (A∗n An ) 2 k ≤
1 1
k(A∗ A) 2 − p(A∗ A)k + kp(A∗ A) − p(A∗n An )k + k(A∗n An ) 2 − p(A∗n An )k → 0.

So, for any compact subset X ⊂ GL(H) we seek a continuous map
γ : [0, 1] × X −→ GL(H) s.t.
(3.275)
γ(0, A) = A, γ(1, A) = Id, ∀ A ∈ X,
note that this is not contractibility of X, but of X in GL(H).
In fact, to carry out the construction without having to worry about too many
things at once, just consider (path) connectedness of GL(H) meaning that there is
a continuous map as in (3.275) where X = {A} just consists of one point – so the
map is just γ : [0, 1] −→ GL(H) such that γ(0) = A, γ(1) = Id .
The construction of γ is in three stages
(1) Creating a gap
(2) Rotating to a trivial factor
(3) Eilenberg’s swindle.
Lemma 3.25 (Creating a gap). If A ∈ B(H) and  > 0 is given there is a
decomposition H = HK ⊕ HL ⊕ HO into three closed mutually orthogonal infinite-
dimensional subspaces such that if QI is the orthogonal projections onto HI for
I = K, L, O then
(3.276) kQL BQK k < .
Proof. Choose an orthonormal basis ej , j ∈ N, of H. The subspaces Hi will
be determined by a corresponding decomposition
(3.277) N = K ∪ L ∪ O, K ∩ L = K ∩ O = L ∩ O = ∅.
Thus HI has orthonormal basis ek , k ∈ I, I = K, L, O. To ensure (3.276) we choose
the decomposition (3.277) so that all three sets are infinite and so that
(3.278) |(el , Bek )| < 2−l−1  ∀ l ∈ L, k ∈ K.
P
Once we have this, then for u ∈ H, QK u ∈ HK can be expanded to (Qk u, ek )ek
k∈K
and expanding in HL similarly,
X XX
QL BQK u = (BQK u, el )el = (Bek , el )(QK u, ek )el
l∈L k∈L k∈K
!
X X
(3.279) =⇒ kQL BQK uk2 ≤ |(Qk u, ek )|2 |(Bek , el )|2
k∈K l∈L
1 X 1
≤ 2 |(Qk u, ek )|2 ≤ 2 kuk2
2 2
k∈K

giving (3.276). The absolute convergence of the series following from (3.278).
112 3. HILBERT SPACES

Thus, it remains to find a decomposition (3.277) for which (3.278) holds. This
follows from Bessel’s inequality. First choose 1 ∈ K then (Be1 , el ) → 0 as l → ∞
so |(Be1 , el1 )| < /4 for l1 large enough and we will take l1 > 2k1 . Then we use
induction on N, choosing K(N ), L(N ) and O(N ) with
K(N ) = {k1 = 1 < k2 < . . . , kN },
L(N ) = {l1 < l2 < · · · < lN }, lr > 2kr , kr > lr−1 for 1 < r ≤ N and
O(N ) = {1, . . . , lN } \ (K(N ) ∪ L(N )).
Now, choose kN +1 > lN such that |(el , BekN +1 )| < 2−l−N , for all l ∈ L(N ), and
then lN +1 > 2kN +1 such that |(elN +1 , Bk )| < e−N −1−k  for k ∈ K(N +1) = K(N )∪
{kN +1 } and the inductive hypothesis follows with L(N + 1) = N (N ) ∪ {lN +1 }. 

Given a fixed operator A ∈ GL(H) Lemma 3.25 can be applied with  =


kA−1 k−1 . It then follows, from the convergence of the Neumann series, that the
curve
(3.280) A(s) = A − sQL AQK , s ∈ [0, 1]
lies in GL(H) and has endpoint satisfying
(3.281) QL BQK = 0, B = A(1), QL QK = 0 = QK QL , QK = Q2K , QL = Q2L
where all three projections, QL , QK and Id −QK − QL have infinite rank.
These three projections given an identification of H = H ⊕ H ⊕ H and so
replace the bounded operators by 3 × 3 matrices with entries which are bounded
operators on H. The condition (3.281) means that
     
B11 B12 B13 1 0 0 0 0 0
(3.282) B= 0 B22 B23  , QK = 0 0 0 , QL = 0 1 0 .
B31 B32 B33 0 0 0 0 0 0
So, now we have a ‘little hole’. Under the conditions (3.281) consider
(3.283) P = BQK B −1 (Id −QL ).
The condition QL BQK = 0 and the definition show that QL P = 0 = P QL . More-
over,
P 2 = BQK B −1 (Id −QL )BQK B −1 (Id −QL ) = BQK B −1 BQK B −1 (Id −QL ) = P.
So, P is a projection which acts on the range of Id −QL ; from its definition, the
range of P is contained in the range of BQK . Since
P BQK = BQK B −1 (Id −QL )BQK = BQK
it follows that P is a projection onto the range of BQK .
The next part of the proof can be thought of as a result on 3 × 3 matrices
but applied to a decomposition of Hilbert space. First, observe a little result on
rotations.
Lemma 3.26. If P and Q are projections on a Hilbert space with P Q = QP = 0
and M = M P = QM restricts to an isomorphism from the range of P to the range
of Q with ‘inverse’ M 0 = M 0 Q = P M 0 (so M 0 M = P and M M 0 = Q)
(3.284)
[−π/2, π/2] 3 θ 7−→ R(θ) = cos θP + sin θM − sin θM 0 + cos θQ + (Id −P − Q)
25. KUIPER’S THEOREM 113

is a path in the space of invertible operators such that


(3.285) R(0)P = P, R(π/2)P = M 0 P.
Proof. Computing directly, R(θ)R(−θ) = Id from which the invertibility fol-
lows as does (3.285). 

We have shown above that the projection P has range equal to the range of
BQK ; apply Lemma 3.26 with M = S(BQK )−1 P where S is a fixed isomorphism
of the range of QK to the range of QL . Then
(3.286) L1 (θ) = R(θ)B has L1 (0) = B, L(π/2) = B 0 with B 0 QK = QL SQK
an isomorphism onto the range of Q.
Next apply Lemma 3.26 again but for the projections QK and QL with the
isomorphism S, giving
(3.287) R0 (θ) = cos θQK + sin θS − sin θS 0 + cos θQL + QO .
Then the curve of invertibles
L2 (θ) = R0 (θ − θ0 )B 0 has L(0) = B 0 , L(π/2) = B 00 , B 00 QK = QK .
So, we have succeed by succesive homotopies through invertible elements in
arriving at an operator
 
Id E
(3.288) B 00 =
0 F
where we are looking at the decomposition of H = H ⊕ H according to the projec-
tions QK and Id −QK . The invertibility of this is equivalent to the invertibility of
F and the homotopy
 
Id (1 − s)E
(3.289) B 00 (s) =
0 F
connects it to
Id −(1 − s)EF −1
   
Id 0 00 −1
(3.290) L= , (B (s)) =
0 F 0 F −1
through invertibles.
The final step is ‘Eilenberg’s swindle’. In (3.290) we arrived at a family of
operators on H ⊕ H. Reversing the factors we can consider
 
F 0
(3.291) .
0 Id
Eilenberg’s idea is to connect this to the identity by an explicit curve which ‘only
uses F ’ and so is uniform in parameters. So for the moment just take F to be a
fixed unitary operator on H.
We use several isomorphism involving H and l2 (H) which are isomorphic of
course, as separable Hilbert spaces. First consider the two simple ‘rotations’ on
H ⊕H
   
Id cos t F sin t F cos t F sin t
(3.292) , .
−F −1 sin t Id cos t −F −1 sin t cos tF −1
114 3. HILBERT SPACES

These are both unitary norm-continuous curves, the first starts at the identity and
is off-diagonal at t = π/2 and equal to the second at that point. So by reversing
the second and going back we connect
   
Id 0 F 0
(3.293) Id = to .
0 Id 0 F −1
Now, we can also identify H ⊕ H with l2 (H ⊕ H). So an element of this space
is an l2 sequence with values in H ⊕ H and the identity just acts as the identity on
each 2 × 2 block. We can perform the to-and-fro rotation in (3.292) in each block.
That this is actually a norm-continuous curve acting on l2 (H ⊕ H) is a consequence
of the fact that it is ‘the same’ in each block and so it is actually a sequence of
operators, each on H ⊕ H, which are continuous, and uniformly so with respect to
the index i corresponding to a sequence in l2 ; so the whole operator is continuous.
This connects Id to the second matrix (3.293) acting in each block of l2 (H ⊕H).
So, here is one part of the swindle, we can reorder the space so it becomes l2 (H)
where now the operator is diagonal but with alternating entries
(3.294) Diag(F −1 , F, F −1 , F, . . . ) on l2 (H).
This requires just a unitary isomorphism corresponding to relabelling the basis
elements.
Now, go back to the operator (3.291) and look at the lower left identity element
acting on H. We can identify the H in this spot with l2 (H) and then we have a curve
linking this entry to (3.294). For the whole operator this gives a norm-continuous
curve connecting
 
F 0
(3.295) to Diag(F, F −1 , F, F −1 , F, . . . ) on H ⊕ l2 (H)
0 Id
just adding the first entry. But now we reverse the procedure using F −1 in place
of F so the end-point in (3.295) is connected to the identity on l2 (H) = H!
The fact that this construction only uses F itself and 2 × 2 matrices means that
it works uniformly when F depends continously on parameters in a compact set.
So we have constructed a curve as desired in (3.275) and hence we have proved:-
Theorem 3.6 (Kuiper). For any compact subset X ⊂ GL(H) there is a retrac-
tion γ as in (3.275).
Note that it follows from a result of Milnor (on CW complexes) that in this case
contractibility follows from weak contractibility. If you are topologically inclined
you might like to look up some applications of Kuiper’s Theorem - for instance
that the projective unitary group is a classifying space for two dimensional integral
cohomology, an Eilenberg-MacLane space.
CHAPTER 4

Differential and Integral operators

In the last part of the course some more concrete analytic questions are con-
sidered. First the completeness of the Fourier basis is shown, this is one of the
settings from which the notion of a Hilbert space originates. The index formula
for Toeplitz operators on Hardy space is then derived. Next operator methods are
used to demonstrate the uniqueness of the solutions to the Cauchy problem. The
completeness of the eigenbasis for ‘Sturm-Liouville’ theory is then deduced from
the spectral theorem. The Fourier transform is examined and used to prove the
completeness of the Hermite basis for L2 (R). Once one has all this, one can do a
lot more, but there is no time left. Such is life.

1. Fourier series
Let us now try applying our knowledge of Hilbert space to a concrete Hilbert
space such as L2 (a, b) for a finite interval (a, b) ⊂ R. Any such interval with b > a
can be mapped by a linear transformation onto (0, 2π) and so we work with this
special interval. You showed that L2 (a, b) is indeed a Hilbert space. One of the
reasons for developing Hilbert space techniques originally was precisely the following
result.
Theorem 4.1. If u ∈ L2 (0, 2π) then the Fourier series of u,
Z
1 X
(4.1) ck eikx , ck = u(x)e−ikx dx
2π (0,2π)
k∈Z
2
converges in L (0, 2π) to u.
Notice that this does not say the series converges pointwise, or pointwise almost
everywhere. In fact it is true that the Fourier series of a function in L2 (0, 2π)
converges almost everywhere to u, but it is hard to prove! In fact it is an important
result of L. Carleson. Here we are just claiming that
Z
1 X
(4.2) lim |u(x) − ck eikx |2 = 0
n→∞ 2π
|k|≤n
2
for any u ∈ L (0, 2π).
Our abstract Hilbert space theory has put us quite close to proving this. First
observe that if e0k (x) = exp(ikx) then these elements of L2 (0, 2π) satisfy
Z 2π (
if k 6= j
Z
0 0 0
(4.3) ek ej = exp(i(k − j)x) =
0 2π if k = j.
Thus the functions
e0k 1
(4.4) ek = = √ eikx
ke0k k 2π
115
116 4. DIFFERENTIAL AND INTEGRAL OPERATORS

form an orthonormal set in L2 (0, 2π). It follows that (4.1) is just the Fourier-Bessel
series for u with respect to this orthonormal set:-
√ 1
(4.5) ck = 2π(u, ek ) =⇒ ck eikx = (u, ek )ek .

So, we already know that this series converges in L2 (0, 2π) thanks to Bessel’s in-
equality. So ‘all’ we need to show is
Proposition 4.1. The ek , k ∈ Z, form an orthonormal basis of L2 (0, 2π), i.e.
are complete:
Z
(4.6) ueikx = 0 ∀ k =⇒ u = 0 in L2 (0, 2π).

This however, is not so trivial to prove. An equivalent statement is that the fi-
nite linear span of the ek is dense in L2 (0, 2π). I will prove this using Fejér’s method.
In this approach, we check that any continuous function on [0, 2π] satisfying the
additional condition that u(0) = u(2π) is the uniform limit on [0, 2π] of a sequence
in the finite span of the ek . Since uniform convergence of continuous functions cer-
tainly implies convergence in L2 (0, 2π) and we already know that the continuous
functions which vanish near 0 and 2π are dense in L2 (0, 2π) this is enough to prove
Proposition 4.1. However the proof is a serious piece of analysis, at least it seems so
to me! There are other approaches, for instance we could use the Stone-Weierstrass
Theorem; rather than do this we will deduce the Stone-Weierstrass Theorem from
Proposition 4.1. Another good reason to proceed directly is that Fejér’s approach
is clever and generalizes in various ways as we will see.
So, the problem is to find the sequence in the span of the ek which converges
to a given continuous function and the trick is to use the Fourier expansion that
we want to check! The idea of Cesàro is close to one we have seen before, namely
to make this Fourier expansion ‘converge faster’, or maybe better. For the moment
we can work with a general function u ∈ L2 (0, 2π) – or think of it as continuous if
you prefer. The truncated Fourier series of u is a finite linear combination of the
ek :
Z
1 X
(4.7) Un (x) = ( u(t)e−ikt dt)eikx
2π (0,2π)
|k|≤n

where I have just inserted the definition of the ck ’s into the sum. Since this is a
finite sum we can treat x as a parameter and use the linearity of the integral to
write it as
Z
1 X iks
(4.8) Un (x) = Dn (x − t)u(t), Dn (s) = e .
(0,2π) 2π
|k|≤n

Now this sum can be written as an explicit quotient, since, by telescoping,


1 1
(4.9) 2πDn (s)(eis/2 − e−is/2 ) = ei(n+ 2 )s − e−i(n+ 2 )s .
So in fact, at least where s 6= 0,
1 1
ei(n+ 2 )s − e−i(n+ 2 )s
(4.10) Dn (s) =
2π(eis/2 − e−is/2 )
and the limit as s → 0 exists just fine.
1. FOURIER SERIES 117

As I said, Cesàro’s idea is to speed up the convergence by replacing Un by its


average
n
1 X
(4.11) Vn (x) = Ul .
n+1
l=0

Again plugging in the definitions of the Ul ’s and using the linearity of the integral
we see that
Z n
1 X
(4.12) Vn (x) = Sn (x − t)u(t), Sn (s) = Dl (s).
(0,2π) n+1
l=0

So again we want to compute a more useful form for Sn (s) – which is the Fejér
kernel. Since the denominators in (4.10) are all the same,
n n
1 1
X X
(4.13) 2π(n + 1)(eis/2 − e−is/2 )Sn (s) = ei(l+ 2 )s − e−i(l+ 2 )s .
l=0 l=0

Using the same trick again,


n
1
X
(4.14) (eis/2 − e−is/2 ) ei(l+ 2 )s = ei(n+1)s − 1
l=0
so
2π(n + 1)(eis/2 − e−is/2 )2 Sn (s) = ei(n+1)s + e−i(n+1)s − 2
(n+1)
(4.15) 1 sin2 ( 2 s)
=⇒ Sn (s) = .
n + 1 2π sin2 ( 2s )
Now, what can we say about this function? One thing we know immediately is
that if we plug u = 1 into the discussion above, we get Un = 1 for n ≥ 0 and hence
Vn = 1 as well. Thus in fact
Z
(4.16) Sn (x − ·) = 1, ∀ x ∈ (0, 2π).
(0,2π)

Looking directly at (4.15) the first thing to notice is that Sn (s) ≥ 0. Also, we
can see that the denominator only vanishes when s = 0 or s = 2π in [0, 2π]. Thus
if we stay away from there, say s ∈ (δ, 2π − δ) for some δ > 0 then, sin(t) being a
bounded function,
(4.17) |Sn (s)| ≤ (n + 1)−1 Cδ on (δ, 2π − δ).
We are interested in how close Vn (x) is to the given u(x) in supremum norm,
where now we will take u to be continuous. Because of (4.16) we can write
Z
(4.18) u(x) = Sn (x − t)u(x)
(0,2π)

where t denotes the variable of integration (and x is fixed in [0, 2π]). This ‘trick’
means that the difference is
Z
(4.19) Vn (x) − u(x) = Sn (x − t)(u(t) − u(x)).
(0,2π)
118 4. DIFFERENTIAL AND INTEGRAL OPERATORS

For each x we split this integral into two parts, the set Γ(x) where x − t ∈ [0, δ] or
x − t ∈ [2π − δ, 2π] and the remainder. So
(4.20) Z Z
|Vn (x) − u(x)| ≤ Sn (x − t)|u(t) − u(x)| + Sn (x − t)|u(t) − u(x)|.
Γ(x) (0,2π)\Γ(x)

Now on Γ(x) either |t − x| ≤ δ – the points are close together – or t is close to 0 and
x to 2π so 2π − x + t ≤ δ or conversely, x is close to 0 and t to 2π so 2π − t + x ≤ δ.
In any case, by assuming that u(0) = u(2π) and using the uniform continuity of a
continuous function on [0, 2π], given  > 0 we can choose δ so small that
(4.21) |u(x) − u(t)| ≤ /2 on Γ(x).
On the complement of Γ(x) we have (4.17) and since u is bounded we get the
estimate
Z
(4.22) |Vn (x) − u(x)| ≤ /2 Sn (x − t) + (n + 1)−1 q(δ) ≤ /2 + (n + 1)−1 q(δ)
Γ(x)
−2
where q(δ) = 2 sin(δ/2) sup |u| is a positive constant depending on δ (and u).
Here the fact that Sn is non-negative and has integral one has been used again to
bound the integral of Sn (x − t) over Γ(x) by 1. Having chosen δ to make the first
term small, we can choose n large to make the second term small and it follows
that
(4.23) Vn (x) → u(x) uniformly on [0, 2π] as n → ∞
under the assumption that u ∈ C([0, 2π]) satisfies u(0) = u(2π).
So this proves Proposition 4.1 subject to the density in L2 (0, 2π) of the contin-
uous functions which vanish near (but not of course in a fixed neighbourhood of)
the ends. In fact we know that the L2 functions which vanish near the ends are
dense since we can chop off and use the fact that
Z Z !
2 2
(4.24) lim |f | + |f | = 0.
δ→0 (0,δ) (2π−δ,2π)

This proves Theorem 4.1.


Notice that from what we have shown it follows that the finite linear combi-
nations of the exp(ikx) are dense in the subspace of C([0, 2π]) consisting of the
functions with equal values at the ends. Taking a general element u ∈ C([0, 2π] we
can choose constants so that
(4.25) v = u − c − dx ∈ C([0, 2π]) satisfies v(0) = v(2π) = 0.
Indeed we just need to take c = u(0), d = u(1) − c. Then we know that v is the
uniform limit of a sequence of finite sums of the exp(ikx). However, the Taylor
series
X (ik)l
(4.26) eikx = xl
l!
l
ikx
converges uniformly to e in any (complex) disk. So it follows in turn that the
polynomials are dense
Theorem 4.2 (Stone-Weierstrass). The polynomials are dense in C([a, b]) for
any a < b, in the uniform topology.
2. TOEPLITZ OPERATORS 119

Make sure you understand the change of variable argument to get to a general
(finite) interval.

2. Toeplitz operators
Although the convergence of Fourier series was stated above for functions on an
interval (0, 2π) it can be immediately reinterpreted in terms of periodic functions
on the line, or equivalently functions on the circle S. Namely a 2π-periodic function
(4.27) u : R −→ C, u(x + 2π) = u(x) ∀ x ∈ R
is uniquely determined by its restriction to [0, 2π) by just iterating to see that
(4.28) u(x + 2πk) = u(x), x ∈ [0, 2π), k ∈ Z.
Conversely a function on [0, 2π) determines a 2π-periodic function this way. Thus
a function on the circle
(4.29) S = {z ∈ C : |z| = 1}
is the same as a periodic function on the line in terms of the standard angular
variable
(4.30) S 3 z = e2πiθ , θ ∈ [0, 2π).
In particular we can identify L2 (S) with L2 (0, 2π) in this way – since the missing
end-point corresponds to a set of measure zero. Equivalently this identifies L2 (S)
as the locally square integrable functions on R which are 2π-periodic.
Since S is a compact Lie group (what is that you say? Look it up!) this brings
us into the realm of harmonic analysis. Just restating the results above for any
u ∈ L2 (S) the Fourier series (thinking of each exp(ikθ) as a 2π-periodic function
on the line) converges in L2 (I) for any bounded interval
X Z
(4.31) u(x) = ak eikx , ak = u(x)e−ikx dx.
k∈Z (0,2π)

After this adjustment of attitude, we follow G.H. Hardy (you might enjoy ”A
Mathematician’s Apology”) in thinking about:
Definition 4.1. Hardy space is
(4.32) H = {u ∈ L2 (S); ak = 0 ∀ k < 0}.
There are lots of reasons to be interested in H ⊂ L2 (S) but for the moment
note that it is a closed subspace – since it is the intersection of the null spaces of the
continuous linear functionals H 7−→ ak , k < 0. Thus there is a unique orthogonal
projection
(4.33) πH : L2 (S) −→ H
with range H.
If we go back to the definition of L2 (S) we can see that a continuous function α ∈
C(S) defines a bounded linear operator on L2 (S) by multiplication. It is invertible
if and only if α(θ) 6= 0 for all θ ∈ [0, 2π) which is the same as saying that α is a
continuous map
(4.34) α : S −→ C∗ = C \ {0}.
For such a map there is a well-defined ‘winding number’ giving the number of
times that the curve in the plane defined by α goes around the origin. This is easy
120 4. DIFFERENTIAL AND INTEGRAL OPERATORS

to define using the properties of the logarithm. Suppose that α is once continuously
differentiable and consider
Z
1 dα
(4.35) α−1 dθ = wn(α).
2πi [0,2π] dθ
If we can write
(4.36) α = exp(2πif (θ))
with f : [0, 2π] −→ C continuous then necessarily f is differentiable and
Z 2π
df
(4.37) wn(α) = dθ = f (2π) − f (0) ∈ Z
0 dθ
since exp(2πi(f (0) − f (2π)))) = 1. In fact, even for a general α ∈ C(S; C∗ ), it is
always possible to find a continuous f satisfying (4.36), using the standard proper-
ties of the logarithm as a local inverse to exp, but ill-determined up to addition of
integral multiples of 2πi. Then the winding number is given by the last expression
in (4.37) and is independent of the choice of f.
Definition 4.2. A Toeplitz operator on H is an operator of the form
(4.38) Tα = πH απH : H −→ H, α ∈ C(S).
The result I want is one of the first ‘geometric index theorems’ – it is a very
simple case of the celebrated Atiyah-Singer index theorem (which it much predates).
Theorem 4.3 (Toeplitz). If α ∈ C(S; C∗ ) then the Toeplitz operator (4.38) is
Fredholm (on the Hardy space H) with index
(4.39) ind(Tα ) = − wn(α)
given in terms of the winding number of α.
Proof. First we need to show that Tα is indeed a Fredholm operator. To do
this we decompose the original, multiplication, operator into four pieces
(4.40) α = Tα + πH α(Id −πH ) + (Id −πH )απH + (Id −πH )α(Id −πH )
which you can think of as a 2 × 2 matrix corresponding to writing
L2 (S) = H ⊕ H− , H− = (Id −πH )L2 (S) = H ⊥ ,
(4.41)
 
Tα πH α(Id −πH )
α= .
(Id −πH )απH (Id −πH )α(Id −πH )
Now, we will show that the two ‘off-diagonal’ terms are compact operators
(on L2 (S)). Consider first (Id −πH )απH . It was shown above, as a form of the
Stone-Weierstrass Theorem, that the finite Fourier sums are dense in C(S) in the
supremum norm. This is not the convergence of the Fourier series but there is a
sequence αk → α in supremum norm, where each
Nk
X
(4.42) αk = akj eijθ .
j=−Nk

It follows that
(4.43) k(Id −πH )αk πH − (Id −πH )απH kB(L2 (S)) → 0.
2. TOEPLITZ OPERATORS 121

Now by (4.42) each (Id −πH )αk πH is a finite linear combination of terms
(4.44) (Id −πH )eijθ πH , |j| ≤ Nk .
However, each of these operators is of finite rank. They actually vanish if j ≥ 0 and
for j < 0 the rank is exactly −j. So each (Id −πH )αk πH is of finite rank and hence
(Id −πH )απH is compact. A very similar argument works for Hα(Id −H) (or you
can use adjoints).
Now, again assume that α 6= 0 does not vanish anywhere. Then the whole
multiplication operator in (4.40) is invertible. If we remove the two compact terms
we see that
(4.45) Tα + (Id −πH )α(Id −πH ) is Fredholm
since the Fredholm operators are stable under addition of compact operators. Here
the first part maps H to H and the second maps H− to H− . It follows that the
null space and range of Tα are the projections of the null space and range of the
sum (4.45) – so it must have finite dimensional null space and closed range with a
finite-dimensional complement as a map from H to itself:-
(4.46) α ∈ C(S; C∗ ) =⇒ Tα is Fredholm in B(H).
So it remains to compute its index. Note that the index of the sum (4.45)
acting on L2 (S) vanishes, so that does not really help! The key here is the stability
of both the index and the winding number.
Lemma 4.1. If α ∈ C(S; C∗ ) has winding number p ∈ Z then there is a curve
(4.47) αt : [0, 1] −→ C(S; C∗ ), α1 = α, α0 = eipθ .
Proof. If you take a continuous function f : [0, 2π] −→ C then
(4.48) α = exp(2πif ) ∈ C(S; C∗ ) iff f (2π) = f (0) + p, p ∈ Z
(so that α(2π) = α(0)) where p is precisely the winding number of α. So to construct
a continuous family as in (4.47) we can deform f instead provided we keep the
difference between the end values constant. Clearly
θ
(4.49) αt = exp(2πift ), ft (θ) = p (1 − t) + f (θ)t, t ∈ [0, 1]

θ
does this since ft (0) = f (0)t, ft (2π) = p(1 − t) + f (2π)t = f (0)t + p, f0 = p 2π ,
f1 (θ) = f (θ). 

It was shown above that the index of a Fredholm operator is constant on the
components – so along any norm continuous curve such as Tαt where αt is as in
(4.47). Thus the index of Tα , where α has winding number p is the same as the
index of the Toeplitz operator defined by exp(ipθ), which has the same winding
number (note that the winding number is also constant under deformations of α).
So we are left to compute the index of the operator πH eipθ πH acting on H. This is
just a p-fold ‘shift up’. If p ≤ 0 it is actually surjective and has null space spanned
by the exp(ijθ) with 0 ≤ j < −p – since these are mapped to exp(i(j + p)θ) and
hence killed by πH . Thus indeed the index of Tα for α = exp(ipθ) is −p in this case.
For p > 0 we can take the adjoint so we have proved Theorem 4.3. 
122 4. DIFFERENTIAL AND INTEGRAL OPERATORS

Why is this important? Suppose you have a function α ∈ C(S; C∗ ) and you
know it has winding number −k for k ∈ N. Then you know that the operator Tα
must have null space at least of dimension k. It could be bigger but this is an
existence theorem hence useful. The index is generally relatively easy to compute
and from that one can tell quite a lot about a Fredholm operator.

3. Cauchy problem
Most, if not all, of you will have had a course on ordinary differential equations
so the results here are probably familiar to you at least in outline. I am not going
to try to push things very far but I will use the Cauchy problem to introduce ‘weak
solutions’ of differential equations.
So, here is a form of the Cauchy problem. Let me stick to the standard interval
we have been using but as usual it does not matter. So we are interested in solutions
u of the equation, for some positive integer k
k−1
dk u X dj u
P u(x) = (x) + ak (x) (x) = f (x) on [0, 2π]
dxk j=0
dxj
(4.50) dj u
(0) = 0, j = 0, . . . , k − 1
dxj
aj ∈ C j ([0, 2π]), j = 0, . . . , k − 1.
So, the aj are fixed (corresponding if you like to some physical system), u is the
‘unknown’ function and f is also given. Recall that C j ([0, 2π]) is the space (complex
valued here) of functions on [0, 2π] which have j continuous derivatives. The middle
line consists of the ‘homogeneous’ Cauchy conditions – also called initial conditions –
where homogeneous just means zero. The general case of non-zero initial conditions
follows from this one.
If we want the equation to make ‘classical sense’ we need to assume for instance
that u has continuous derivatives up to order k and f is continuous. I have written
out the first term, involving the highest order of differentiation, in (4.50) separately
to suggest the following observation. Suppose u is just k times differentiable, but
without assuming the kth derivative is continous. The equation still makes sense
but if we assume that f is continuous then it actually follows that u is k times
continuously differentiable. In fact each of the terms in the sum is continuous, since
this only invovles derivatives up to order k − 1 multiplied by continuous functions.
We can (mentally if you like) move these to the right side of the equation, so together
with f this becomes a continuous function. But then the equation itself implies
k
that ddxuk is continuous and so u is actually k times continuously differentiable. This
is a rather trivial example of ‘elliptic’ regularity which we will push much further.
So, the problem is to prove
Theorem 4.4. For each f ∈ C([0, 2π]) there is a unique k times continuously
differentiable solution, u, to (4.50).
Note that in general there is no way of ‘writing the solution down’. We can
show it exists, and is unique, and we can say a lot about it but there is no formula
– although we will see that it is the sum of a reasonable series.
How to proceed? There are many ways but to adopt the one I want to use
I need to manipulate the equation in (4.50). There is a certain discriminatory
3. CAUCHY PROBLEM 123

property of the way I have written the equation. Although it seems rather natural,
writing the ‘coefficients’ ak on the left involves an element of ‘handism’ if that is a
legitimate concept. Instead we could try for the ‘rigthist’ approach and look at the
similar equation
k−1
dk u X dj (bj (x)u)
(x) + (x) = f (x) on [0, 2π]
dxk j=0
dxj
(4.51) dj u
(0) = 0, j = 0, . . . , k − 1
dxj
bj ∈ C j ([0, 2π]), j = 0, . . . , k − 1.
As already written in (4.50) we think of P as an operator, sending u to this sum.
Lemma 4.2. For any functions aj ∈ C j ([0, 2π]) there are unique functions bj ∈
j
C ([0, 2π]) so that (4.51) gives the same operator as (4.50).
Proof. Here we can simply write down a formula for the bj in terms of the
aj . Namely the product rule for derivatives means that
j  
dj (bj (x)u) X j dj−p bj dp u
(4.52) = · .
dxj p=0
p dxj−p dxp

If you are not quite confident that you know this, you do know it for j = 1 which
is just the usual product rule. So proceed by induction over j and observe that the
formula for j + 1 follows from the formula for j using the properties of the binomial
coefficients.
Pulling out the coefficients of a fixed derivative of u show that we need bj to
satisfy
k−1
X j  dj−p bj
(4.53) ap = bp + .
j=p+1
p dxj−p

This shows the uniquness since we can recover the aj from the bj . On the other
hand we can solve (4.53) for the bj too. The ‘top’ equation says ak−1 = bk−1 and
then successive equations determine bp in terms of ap and the bj with j > p which
we already know iteratively.
Note that the bj ∈ C j ([0, 2π]). 
So, what has been achieved by ‘writing the coefficients on the right’ ? The
important idea is that we can solve (4.50) in one particular case, namely when all
the aj (or equivalently bj ) vanish. Then we would just integrate k times. Let us
denote Riemann integration by
Z x
(4.54) I : C([0, 2π]) −→ C([0, 2π]), If (x) = f (s)ds.
0
Of course we can also think of this as Lebesgue integration and then we know for
instance that
(4.55) I : L2 (0, 2π) −→ C([0, 2π])
is a bounded linear operator. Note also that
(4.56) (If )(0) = 0
124 4. DIFFERENTIAL AND INTEGRAL OPERATORS

satisfies the first of the Cauchy conditions.


Now, we can apply the operator I to (4.51) and repeat k times. By the funda-
mental theorem of calculus
dp u j
j d u
(4.57) u ∈ C j ([0, 2π]), (0) = 0, p = 0, . . . , j =⇒ I ( ) = u.
dxp dxj
Thus (4.51) becomes
k−1
X
(4.58) (Id +B)u = u + I k−j (bj u) = I k f.
j=0

Notice that this argument is reversible. Namely if u ∈ C k ([0, 2π]) satisfies (4.58)
for f ∈ C([0, 2π]) then u ∈ C k ([0, 2π]) does indeed satisfy (4.58). In fact even more
is true
Proposition 4.2. The operator Id +B is invertible on L2 (0, 2π) and if f ∈
C([0, 2π]) then u = (Id +B)−1 I k f ∈ C k ([0, 2π]) is the unique solution of (4.51).
Proof. From (4.58) we see that B is given as a sum of operators of the form
I p ◦ b where b is multiplcation by a continuous function also denoted b ∈ C([0, 2π])
and p ≥ 1. Writing out I p as an iterated (Riemann) integral
Z x Z y1 Z yp−1
(4.59) I p v(x) = ··· v(yp )dyp · · · dy1 .
0 0 0

For the case of p = 1 we can write


Z 2π
(4.60) (I · bk−1 )v(x) = βk−1 (x, t)v(t)dt, βk−2 (x, t) = H(x − t)bk−1 (x)
0

where the Heaviside function restricts the integrand to t ≤ x. Similarly in the next
case by reversing the order of integration
Z xZ s
(4.61) (I 2 · bk−2 )v(x) = b(t)v(t)dtds
0 0
Z x Z x Z 2π
= bk−2 (t)v(t)dsdt = βk−2 (x, t)v(t)dt,
0 0
βk−2 = (x − t)+ bk−2 (x).
In general
Z 2π
1
p
(4.62) (I · bk−p )v(x) = βk−p (x, t)v(t)dt, βk−p = (x − t)p−1
+ bk−p (x).
0 (p − 1)!
The explicit formula here is not that important, but (throwing away a lot of infor-
mation) all the β∗ (t, x) have the property that they are of the form
(4.63) β(x, t) = H(x − t)e(x, t), e ∈ C([0, 2π]2 ).
This is a Volterra operator
Z 2π
(4.64) Bv(x) = β(x, t)v(t)
0

with β as in (4.63).
3. CAUCHY PROBLEM 125

So now the point is that for any Volterra operator B, Id +B is invertible on


L2 (0, 2π). In fact we can make a stronger statement that
X
(4.65) B Volterra =⇒ (−1)j B j converges in B(L2 (0, 2π)).
j

This is just the Neumann series, but notice we are not claiming that kBk < 1 which
would give the convergence as we know from earlier. Rather the key is that the
powers of B behave very much like the operators I k computed above.
Lemma 4.3. For a Volterra operator in the sense of (4.63) and (4.64)
(4.66)
Z 2π
Cj
B j v(x) = H(x−t)ej (x, t)v(t), ej ∈ C([0, 2π]2 ), ej ≤ (x−t)+j−1 , j > 1.
0 (j − 1)!
Proof. Proceeding inductively we can assume (4.66) holds for a given j. Then
B j+1 = B ◦ B j is of the form in (4.66) with
Z 2π
(4.67) ej+1 (x, t) = H(x − s)e(x, s)H(s − t)ej (s − x)ds
0
Z x Z x
Cj C j+1
= e(x, s)ej (s − t)ds ≤ sup |e| (s − t)j−1
+ ds ≤ (x − t)+j
t (j − 1)! t j!
provided C ≥ sup |e|. 
The estimate (4.67) means that, for a different constant
Cj
(4.68) kB j kL2 ≤ ,j > 1
j−1
which is summable, so the Neumann series (4.58) does converge.
To see the regularity of u = (Id +B)−1 I k f when f ∈ C([0, 2π]) consider (4.58).
Each of the terms in the sum maps L2 (0, 2π) into C([0, 2π]) so u ∈ C([0.2π]).
Proceeding iteratively, for each p = 0, . . . , k − 1, each of these terms, I k−j (bj u)
maps C p ([0, eπ]) into C p+1 ([0, 2π]) so u ∈ C k ([0, 2π]). Similarly for the Cauchy
conditions. Differentiating (4.58) recovers (4.51). 
As indicated above, the case of non-vanishing Cauchy data follows from Theo-
rem 4.4. Let
(4.69) Σ : C k ([0, 2π]) −→ C k
denote the Cauchy data map – evaluating the function and its first k − 1 derivatives
at 0.
Proposition 4.3. The combined map
(4.70) (Σ, P ) : C k ([0, 2π]) −→ Ck ⊕ C([0, 2π])
is an isomorphism.
Proof. The map Σ in (4.69) is certainly surjective, since it is surjective even
on polynomials of degree k − 1. Thus given z ∈ Ck there exists v ∈ C k ([0, 2π]) with
Σv = z. Now, given f ∈ C([0, 2π]) Theorem 4.4 allows us to find w ∈ C k ([0, 2π])
with P w = f − P v and Σw = 0. So u = v + w satisfies (Σ, P )u = (z, f ) and we have
shown the surjectivity of (4.70). The injectivity again follows from Theorem 4.4 so
(Σ, P ) is a bijection and hence and isomorphism using the Open Mapping Theorem
(or directly). 
126 4. DIFFERENTIAL AND INTEGRAL OPERATORS

Let me finish this discussion of the Cauch problem by introducing the notion
of a weak solution. let Σ2π : C k ([0, 2π]) −→ C k be the evaluation of the Cauchy
data at the top end of the interval. Then if u ∈ C k ([0, 2π]) satisfies Σu = 0 and
v ∈ C([90, 2π]) satisfies Σ2π v = 0 there are no boundary terms in integration by
parts for derivatives up to order k and it follows that
k−1
dk v X dj a j v
Z Z
(4.71) P uv = uQv, Qv = (−1)k k +
(0,2π) (0,2π) dx j=0
dxj

Thus Q is another operator just like P called the ‘formal adjoint’ of P.


If P u = f then (4.71) is just
(4.72) hu, QviL2 = hf, viL2 ∀ v ∈ C k ([0, 2π]) with Σ2π v = 0.
The significant point here is that (4.72) makes sense even if u, f ∈ L2 ([0, 2π]).
Definition 4.3. If u, f ∈ L2 ([0, 2π]) satisfy (4.72) then u is said to be a weak
solution of (4.51).
Exercise 2. Prove that ‘weak=strong’ meaning that if f ∈ C([0, 2π]) and
u ∈ L2 (0, 2π) is a weak solution of (4.72) then in fact u ∈ C k ([0, 2π]) satisifes (4.51)
‘in the classical sense’.

4. Dirichlet problem on an interval


I want to do a couple more ‘serious’ applications of what we have done so
far. There are many to choose from, and I will mention some more, but let me
first consider the Diriclet problem on an interval. I will choose the interval [0, 2π]
because we looked at it before but of course we could work on a general bounded
interval instead. So, we are supposed to be trying to solve
d2 u(x)
(4.73) − + V (x)u(x) = f (x) on (0, 2π), u(0) = u(2π) = 0
dx2
where the last part are the Dirichlet boundary conditions. I will assume that the
‘potential’
(4.74) V : [0, 2π] −→ R is continuous and real-valued.
Now, it certainly makes sense to try to solve the equation (4.73) for say a given
f ∈ C([0, 2π]), looking for a solution which is twice continuously differentiable on
the interval. It may not exist, depending on V but one thing we can shoot for,
which has the virtue of being explicit, is the following:
Proposition 4.4. If V ≥ 0 as in (4.74) then for each f ∈ C([0, 2π]) there
exists a unique twice continuously differentiable solution, u, to (4.73).
There are in fact various approaches to this but we want to go through L2
theory – not surprisingly of course. How to start?
Well, we do know how to solve (4.73) if V ≡ 0 since we can use (Riemann)
integration. Thus, ignoring the boundary conditions for the moment, we can find
a solution to −d2 v/dx2 = f on the interval by integrating twice:
Z xZ y
(4.75) v(x) = − f (t)dtdy satifies − d2 v/dx2 = f on (0, 2π).
0 0
4. DIRICHLET PROBLEM ON AN INTERVAL 127

Moroever v really is twice continuously differentiable if f is continuous. So, what


has this got to do with operators? Well, we can change the order of integration in
(4.75) to write v as
Z xZ x Z 2π
(4.76) v(x) = − f (t)dydt = a(x, t)f (t)dt, a(x, t) = (t − x)H(x − t)
0 t 0
where the Heaviside function H(y) is 1 when y ≥ 0 and 0 when y < 0. Thus a(x, t)
is actually continuous on [0, 2π] × [0, 2π] since the t − x factor vanishes at the jump
in H(t − x). So (4.76) shows that v is given by applying an integral operator, with
continuous kernel on the square, to f.
Before thinking more seriously about this, recall that there is also the matter
of the boundary conditions. Clearly, v(0) = 0 since we integrated from there. On
the other hand, there is no particular reason why
Z 2π
(4.77) v(2π) = (t − 2π)f (t)dt
0
should vanish. However, we can always add to v any linear function and still satisfy
the differential equation. Since we do not want to spoil the vanishing at x = 0 we
can only afford to add cx but if we choose the constant c correctly this will work.
Namely consider
Z 2π
1
(4.78) c= (2π − t)f (t)dt, then (v + cx)(2π) = 0.
2π 0
So, finally the solution we want is
Z 2π
tx
(4.79) w(x) = b(x, t)f (t)dt, b(x, t) = min(t, x) − ∈ C([0, 2π]2 )
0 2π
with the formula for b following by simple manipulation from
tx
(4.80) b(x, t) = a(x, t) + x −

Thus there is a unique, twice continuously differentiable, solution of −d2 w/dx2 = f
in (0, 2π) which vanishes at both end points and it is given by the integral operator
(4.79).
Lemma 4.4. The integral operator (4.79) extends by continuity from C([0, 2π])
to a compact, self-adjoint operator on L2 (0, 2π).
Proof. Since w is given by an integral operator with a continuous real-valued
kernel which is even in the sense that (check it)
(4.81) b(t, x) = b(x, t)
we might as well give a more general result. 
Proposition 4.5. If b ∈ C([0, 2π]2 ) then
Z 2π
(4.82) Bf (x) = b(x, t)f (t)dt
0
defines a compact operator on L2 (0, 2π) if in addition b satisfies
(4.83) b(t, x) = b(x, t)
then B is self-adjoint.
128 4. DIFFERENTIAL AND INTEGRAL OPERATORS

Proof. If f ∈ L2 ((0, 2π)) and v ∈ C([0, 2π]) then the product vf ∈ L2 ((0, 2π))
and kvf kL2 ≤ kvk∞ kf kL2 . This can be seen for instance by taking an absolutely
summable approximation to f, which gives a sequence of continuous functions con-
verging a.e. to f and bounded by a fixed L2 function and observing that vfn → vf
a.e. with bound a constant multiple, sup |v|, of that function. It follows that for
b ∈ C([0, 2π]2 ) the product
(4.84) b(x, y)f (y) ∈ L2 (0, 2π)
for each x ∈ [0, 2π]. Thus Bf (x) is well-defined by (4.83) since L2 ((0, 2π) ⊂
L1 ((0, 2π)).
Not only that, but Bf ∈ C([0, 2π]) as can be seen from the Cauchy-Schwarz
inequality,
(4.85) Z
1
|Bf (x0 ) − Bf (x)| = | (b(x0 , y) − b(x, y))f (y)| ≤ sup |b(x0 , y − b(x, y)|(2π) 2 kf kL2 .
y

Essentially the same estimate shows that


1
(4.86) sup |Bf (x)| ≤ (2π) 2 sup |b|kf kL2
x (x,y)

so indeed, B : L2 (0, 2π) −→ C([0, 2π]) is a bounded linear operator.


When b satisfies (4.83) and f and g are continuous
Z Z
(4.87) Bf (x)g(x) = f (x)Bg(x)

and the general case follows by approximation in L2 by continuous functions.


So, we need to see the compactness. If we fix x then b(x, y) ∈ C([0, 2π]) and
then if we let x vary,
(4.88) [0, 2π] 3 x 7−→ b(x, ·) ∈ C([0, 2π])
is continuous as a map into this Banach space. Again this is the uniform continuity
of a continuous function on a compact set, which shows that
(4.89) sup |b(x0 , y) − b(x, y)| → 0 as x0 → x.
y

Since the inclusion map C([0, 2π]) −→ L2 ((0, 2π)) is bounded, i.e continuous, it
follows that the map (I have reversed the variables)
(4.90) [0, 2π] 3 y 7−→ b(·, y) ∈ L2 ((0, 2π))
is continuous and so has a compact range.
Take the Fourier basis ek for [0, 2π] and expand b in the first variable. Given
 > 0 the compactness of the image of (4.90) implies that the Fourier Bessel series
converges uniformly (has uniformly small tails), so for some N
X
(4.91) |(b(x, y), ek (x))|2 <  ∀ y ∈ [0, 2π].
|k|>N

The finite part of the Fourier series is continuous as a function of both arguments
X
(4.92) bN (x, y) = ek (x)ck (y), ck (y) = (b(x, y), ek (x))
|k|≤N
4. DIRICHLET PROBLEM ON AN INTERVAL 129

and so defines another bounded linear operator BN as before. This operator can
be written out as
X Z
(4.93) BN f (x) = ek (x) ck (y)f (y)dy
|k|≤N

and so is of finite rank – it always takes values in the span of the first 2N + 1
trigonometric functions. On the other hand the remainder is given by a similar
operator with corresponding to qN = b − bN and this satisfies
(4.94) sup kqN (·, y)kL2 ((0,2π)) → 0 as N → ∞.
y

Thus, qN has small norm as a bounded operator on L2 ((0, 2π)) so B is compact –


it is the norm limit of finite rank operators. 
1
Now, recall from Problem# that uk = π − 2 sin(kx/2), k ∈ N, is also an or-
thonormal basis for L2 (0, 2π) (it is not the Fourier basis!) Moreover, differentiating
we find straight away that
d2 uk k2
(4.95) − = uk .
dx2 4
Since of course uk (0) = 0 = uk (2π) as well, from the uniqueness above we conclude
that
4
(4.96) Buk = 2 uk ∀ k.
k
Thus, in this case we know the orthonormal basis of eigenfunctions for B. They are
the uk , each eigenspace is 1 dimensional and the eigenvalues are 4k −2 .
Remark 4.1. As noted by Pavel Etingof it is worthwhile to go back to the
discussion of trace class operators to see that B is indeed of trace class. Its trace
can be computed in two ways. As the sum of its eigenvalues and as the integral of
its kernel on the diagonal. This gives the well-known formula
π2 X 1
(4.97) = .
6 k2
k∈N

This is a simple example of a ‘trace formula’; you might like to look up some others!
So, this happenstance allows us to decompose B as the square of another op-
erator defined directly on the othornormal basis. Namely
2
(4.98) Auk = uk =⇒ B = A2 .
k
Here again it is immediate that A is a compact self-adjoint operator on L2 (0, 2π)
since its eigenvalues tend to 0. In fact we can see quite a lot more than this.
Lemma 4.5. The operator A maps L2 (0, 2π) into C([0, 2π]) and Af (0) = Af (2π) =
0 for all f ∈ L2 (0, 2π).
Proof. If f ∈ L2 (0, 2π) we may expand it in Fourier-Bessel series in terms of
the uk and find
X
(4.99) f= ck uk , {ck } ∈ l2 .
k
130 4. DIFFERENTIAL AND INTEGRAL OPERATORS

Then of course, by definition,


X 2ck
(4.100) Af = uk .
k
k

Here each uk is a bounded continuous function, with the bound |uk | ≤ C being
independent of k. So in fact (4.100) converges uniformly and absolutely since it is
uniformly Cauchy, for any q > p,
  21
q q q
X 2ck X X
(4.101) | uk | ≤ 2C |ck |k −1 ≤ 2C  k −2  kf kL2
k
k=p k=p k=p

where Cauchy-Schwarz has been used. This proves that

A : L2 (0, 2π) −→ C([0, 2π])


is bounded and by the uniform convergence uk (0) = uk (2π) = 0 for all k implies
that Af (0) = Af (2π) = 0. 
So, going back to our original problem we try to solve (4.73) by moving the V u
term to the right side of the equation (don’t worry about regularity yet) and hope
to use the observation that
(4.102) u = −A2 (V u) + A2 f
should satisfy the equation and boundary conditions. In fact, let’s anticipate that
u = Av, which has to be true if (4.102) holds with v = −AV u + Af, and look
instead for
(4.103) v = −AV Av + Af =⇒ (Id +AV A)v = Af.
So, we know that multiplication by V, which is real and continuous, is a bounded
self-adjoint operator on L2 (0, 2π). Thus AV A is a self-adjoint compact operator so
we can apply our spectral theory to it and so examine the invertibility of Id +AV A.
Working in terms of a complete orthonormal basis of eigenfunctions ei of AV A we
see that Id +AV A is invertible if and only if it has trivial null space, i.e. if −1 is not
an eigenvalue of AV A. Indeed, an element of the null space would have to satisfy
u = −AV Au. On the other hand we know that AV A is positive since
Z Z
2
(4.104) (AV Aw, w) = (V Av, Av) = V (x)|Av| ≥ 0 =⇒ |u|2 = 0,
(0,2π) (0,2π)

using the assumed non-negativity of V. So, there can be no null space – all the eigen-
values of AV A are at least non-negative and the inverse is the bounded operator
given by its action on the basis
(4.105) (Id +AV A)−1 ei = (1 + τi )−1 ei , AV Aei = τi ei .
Thus Id +AV A is invertible on L2 (0, 2π) with inverse of the form Id +Q, Q
again compact and self-adjoint since (1 + τi )−1 − 1 → 0. Now, to solve (4.103) we
just need to take
(4.106) v = (Id +Q)Af ⇐⇒ v + AV Av = Af in L2 (0, 2π).
Then indeed
(4.107) u = Av satisfies u + A2 V u = A2 f.
4. DIRICHLET PROBLEM ON AN INTERVAL 131

In fact since v ∈ L2 (0, 2π) from (4.106) we already know that u ∈ C([0, 2π]) vanishes
at the end points.
Moreover if f ∈ C([0, 2π]) we know that Bf = A2 f is twice continuously
differentiable, since it is given by two integrations – that is where B came from.
Now, we know that u in L2 satisfies u = −A2 (V u)+A2 f. Since V u ∈ L2 ((0, 2π) =⇒
A(V u) ∈ L2 (0, 2π) and then, as seen above, A(A(V u) is continuous. So combining
this with the result about A2 f we see that u itself is continuous and hence so is
V u. But then, going through the routine again
(4.108) u = −A2 (V u) + A2 f
is the sum of two twice continuously differentiable functions. Thus it is so itself. In
fact from the properties of B = A2 it satisifes
d2 u
(4.109) − = −V u + f
dx2
which is what the result claims. So, we have proved the existence part of Proposi-
tion 4.4.
The uniqueness follows pretty much the same way. If there were two twice
continuously differentiable solutions then the difference w would satisfy
d2 w
(4.110) − + V w = 0, w(0) = w(2π) = 0 =⇒ w = −Bw = −A2 V w.
dx2
Thus w = Aφ, φ = −AV w ∈ L2 (0, 2π). Thus φ in turn satisfies φ = AV Aφ and
hence is a solution of (Id +AV A)φ = 0 which we know has none (assuming V ≥ 0).
Since φ = 0, w = 0.
This completes the proof of Proposition 4.4. To summarize, what we have
shown is that Id +AV A is an invertible bounded operator on L2 (0, 2π) (if V ≥ 0)
and then the solution to (4.73) is precisely
(4.111) u = A(Id +AV A)−1 Af
which is twice continuously differentiable and satisfies the Dirichlet conditions for
each f ∈ C([0, 2π]).
This may seem a ‘round-about’ approach but it is rather typical of methods
from Functional Analysis. What we have done is to separate the two problems of
‘existence’ and ‘regularity’. We first get existence of what is often called a ‘weak
solution’ of the problem, in this case given by (4.111), which is in L2 (0, 2π) for
f ∈ L2 (0, 2π) and then show, given regularity of the right hand side f, that this is
actually a ‘classical solution’.
Even if we do not assume that V ≥ 0 we can see fairly directly what is hap-
pening.
Theorem 4.5. For any V ∈ C([0, 2π]) real-valued, there is an orthonormal basis
wk of L2 (0, 2π) consisting of twice-continuously differentiable functions on [0, 2π],
2
vanishing at the end-points and satisfying − ddxw2k + V wk = Tk wk where Tk → ∞ as
k → ∞. The equation (4.73) has a (twice continuously differentiable) solution for
given f ∈ C([0, 2π]) if and only if
Z
(4.112) Tk = 0 =⇒ f wk = 0.
(0,2π)
132 4. DIFFERENTIAL AND INTEGRAL OPERATORS

Proof. For a real-valued V we can choose a constant c such that V + c ≥ 0.


Then the eigenvalue equation we are trying to solve can be rewritten
d2 w d2 w
(4.113) − + V w = T w ⇐⇒ − + (V + c)w = (T + c)w.
dx2 dx2
Proposition 4.4 shows that there is indeed an orthonormal basis of solutions of the
second equation, wk as in the statement above with positive eigenvalues Tk +c → ∞
with k.
So, only the solvability of (4.73) remains to be checked. What we have shown
above is that if f ∈ C([0, 2π]) then a twice continuously differentiable solution to
d2 w
(4.114) − + V w = f, w(0) = w(2π) = 0
dx2
is precisely of the form w = Av where
(4.115) v ∈ L2 (0, 2π), (Id +AV A)v = Af.
Going from (4.114) to (4.115) involves the properties of B = A2 since (4.114) implies
that
w + A2 V w = A2 f =⇒ w = Av with v as in (4.115)
Conversely if v satisfies (4.115) then w = Av satisfies w + A2 V w = A2 f which
implies that w has the correct regularity and satisfies (4.114).
Applying this to the case f = 0 shows that for twice continuously differentiable
functions on [0, 2π],

d2 w
(4.116) − + V w = 0, w(0) = w(2π) = 0 ⇐⇒
dx2
w ∈ A{v ∈ L2 (0, 2π); (Id +AV A)v = 0}.
Since AV A is compact and self-adjoint we see that

(4.117) (Id +AV A)v = Af has a solution in L2 (0, 2π) =⇒


Af ⊥ {v 0 ∈ L2 (0, 2π); (Id +AV A)v 0 = 0}.
However this last condition is equivalent to f ⊥ A{v ∈ L2 (0, 2π); (Id +AV A)v = 0}
which is by the equivalence of (4.114) and (4.115) reduces precisely to (4.112). 

So, ultimately the solution of the differential equation (4.73) is just like the
solution of a finite dimensional problem for a self-adjoint matrix. There is a solution
if and only if the right side is orthogonal to the null space; it just requires a bit
more work. Lots of ‘elliptic’ problems turn out to be like this.
We can also say (a great deal) more about the eigenvalues Tk and eigenfunctions
wk . For instance, the derivative
(4.118) wk0 (0) 6= 0.
Indeed, were this to vanish wk would be a solution of the Cauchy problem (4.50)
d2
for the second-order operator P = dx 2 − q + Tk with ‘forcing term’ f = 0 and hence,

by Theorem 4.4 must itself vanish on the interval, which is a contradiction.


From this in turn it follows that the (non-trivial) eigenspaces
d2 w
(4.119) Ek (q) = {w ∈ C 2 ([0, 2π]); − + qw = Tk w}
dx2
5. HARMONIC OSCILLATOR 133

are exactly one-dimensional. Indeed if wk is one non-zero element, so satisfing


(4.118) and w is another, then w − w0 (0)wk /wk0 (0) ∈ Ek (q) again must satisfy the
Cauchy conditions at 0 so w = w0 (0)wk /wk0 (0).
Exercise 3. Show that the eigenfunctions functions normalized to have wk0 (0) =
1 are all real and wk has exactly k − 1 zeros in the interior of the interval.
You could try your hand at proving Borg’s Theorem – if q ∈ C([0, 2π]) and
2
the eigenvalues Tk = k4 are the same as those for q = 0 then q = 0! This is the
beginning of a large theory of the inverse problem – to what extent can one recover
q from the knowledge of the Tk ? In brief the answer is that one cannot do so in
general. However q is determined if one knows the Tk and the ‘norming constants’
wk0 (2π)/wk0 (0).

5. Harmonic oscillator
As a second ‘serious’ application of our Hilbert space theory I want to discuss
the harmonic oscillator, the corresponding Hermite basis for L2 (R). Note that so
far we have not found an explicit orthonormal basis on the whole real line, even
though we know L2 (R) to be separable, so we certainly know that such a basis
exists. How to construct one explicitly and with some handy properties? One way
is to simply orthonormalize – using Gram-Schmidt – some countable set with dense
span. For instance consider the basic Gaussian function
x2
(4.120) ) ∈ L2 (R).
exp(−
2
This is so rapidly decreasing at infinity that the product with any polynomial is
also square integrable:
x2
(4.121) xk exp(−
) ∈ L2 (R) ∀ k ∈ N0 = {0, 1, 2, . . . }.
2
Orthonormalizing this sequence gives an orthonormal basis, where completeness
can be shown by an appropriate approximation technique but as usual is not so
simple. This is in fact the Hermite basis as we will eventually show.
Rather than proceed directly we will work up to this by discussing the eigen-
functions of the harmonic oscillator
d2
(4.122) P = − 2 + x2
dx
which we want to think of as an operator – although for the moment I will leave
vague the question of what it operates on.
As you probably already know, and we will show later, it is straightforward
to show that P has a lot of eigenvectors using the ‘creation’ and ‘annihilation’
operators. We want to know a bit more than this and in particular I want to
apply the abstract discussion above to this case but first let me go through the
‘formal’ theory. There is nothing wrong here, just that we cannot easily conclude
the completeness of the eigenfunctions.
The first thing to observe is that the Gaussian is an eigenfunction of H
2 d 2 2
(4.123) P e−x /2
=− (−xe−x /2 + x2 e−x /2 )
dx
2 2 2
= −(x2 − 1)e−x /2 + x2 e−x /2 = e−x /2
134 4. DIFFERENTIAL AND INTEGRAL OPERATORS

with eigenvalue 1. It is an eigenfunction but not, for the moment, of a bounded


operator on any Hilbert space – in this sense it is only a formal eigenfunction.
In this special case there is an essentially algebraic way to generate a whole
sequence of eigenfunctions from the Gaussian. To do this, write
d d
(4.124) P u = (− + x)( + x)u + u = (Cr An +1)u,
dx dx
d d
Cr = (− + x), An = ( + x)
dx dx
again formally as operators. Then note that
2
(4.125) An e−x /2
=0
which again proves (4.123). The two operators in (4.124) are the ‘creation’ operator
and the ‘annihilation’ operator. They almost commute in the sense that
(4.126) [An, Cr]u = (An Cr − Cr An)u = 2u
for say any twice continuously differentiable function u.
2
Now, set u0 = e−x /2 which is the ‘ground state’ and consider u1 = Cr u0 .
From (4.126), (4.125) and (4.124),
(4.127) P u1 = (Cr An Cr + Cr)u0 = Cr2 An u0 + 3 Cr u0 = 3u1 .
Thus, u1 is an eigenfunction with eigenvalue 3.
Lemma 4.6. For j ∈ N0 = {0, 1, 2, . . . } the function uj = Crj u0 satisfies
P uj = (2j + 1)uj .
Proof. This follows by induction on j, where we know the result for j = 0
and j = 1. Then
(4.128) P Cr uj = (Cr An +1) Cr uj = Cr(P − 1)uj + 3 Cr uj = (2j + 3)uj .

j j −x2 /2
Again by induction we can check that uj = (2 x + qj (x))e where qj is a
polynomial of degree at most j − 2. Indeed this is true for j = 0 and j = 1 (where
q0 = q1 ≡ 0) and then
2
(4.129) Cr uj = (2j+1 xj+1 + Cr qj )e−x /2
.
and qj+1 = Cr qj is a polynomial of degree at most j − 1 – one degree higher than
qj .
From this it follows in fact that the finite span of the uj consists of all the
2
products p(x)e−x /2 where p(x) is any polynomial.
Now, all these functions are in L2 (R) and we want to compute their norms.
First, a standard integral computation1 shows that

Z Z
−x2 /2 2 2
(4.130) (e ) = e−x = π
R R

1To compute the Gaussian integral, square it and write as a double integral then introduce
polar coordinates
Z
2
Z
2
Z ∞ Z 2π
−y 2 2 2 ∞
( e−x dx)2 = e−x e−r rdrdθ = π − e−r 0 = π.

dxdy =
R R2 0 0
6. FOURIER TRANSFORM 135

For j > 0, integration by parts (easily justified by taking the integral over [−R, R]
and then letting R → ∞) gives
Z Z Z
j j j
(4.131) 2
(Cr u0 ) = Cr u0 (x) Cr u0 (x)dx = u0 Anj Crj u0 .
R R R
Now, from (4.126), we can move one factor of An through the j factors of Cr until
it emerges and ‘kills’ u0

(4.132) An Crj u0 = 2 Crj−1 u0 + Cr An Crj−1 u0


= 2 Crj−1 u0 + Cr2 An Crj−2 u0 = 2j Crj−1 u0 .
So in fact,

Z Z
(4.133) (Crj u0 )2 = 2j (Crj−1 u0 )2 = 2j j! π.
R R
A similar argument shows that
Z Z
(4.134) uk uj = u0 Ank Crj u0 = 0 if k 6= j.
R R
Thus the functions
1 1 2
(4.135) ej = 2−j/2 (j!)− 2 π − 4 C j e−x /2

form an orthonormal sequence in L2 (R).


We would like to show this orthonormal sequence is complete. Rather than
argue through approximation, we can guess that in some sense the operator
d d d2
(4.136) An Cr = ( + x)(− + x) = − 2 + x2 + 1
dx dx dx
should be invertible, so one approach is to use the ideas above of Friedrichs’ exten-
sion to construct its ‘inverse’ and show this really exists as a compact, self-adjoint
operator on L2 (R) and that its only eigenfunctions are the ei in (4.135). Another,
more indirect approach is described below.

6. Fourier transform
The Fourier transform for functions on R is in a certain sense the limit of the
definition of the coefficients of the Fourier series on an expanding interval, although
that is not generally a good way to approach it. We know that if u ∈ L1 (R) and
v ∈ C∞ (R) is a bounded continuous function then vu ∈ L1 (R) – this follows from
our original definition by approximation. So if u ∈ L1 (R) the integral
Z
(4.137) û(ξ) = e−ixξ u(x)dx, ξ ∈ R

exists for each ξ ∈ R as a Lebesgue integral. Note that there are many different
normalizations of the Fourier transform in use. This is the standard ‘analyst’s’
normalization.
Proposition 4.6. The Fourier tranform, (4.137), defines a bounded linear map
(4.138) F : L1 (R) 3 u 7−→ û ∈ C0 (R)
into the closed subspace C0 (R) ⊂ C∞ (R) of continuous functions which vanish at
infinity (with respect to the supremum norm).
136 4. DIFFERENTIAL AND INTEGRAL OPERATORS

Proof. We know that the integral exists for each ξ and from the basic prop-
erties of the Lebesgue integal
(4.139) |û(ξ)| ≤ kukL1 , since |e−ixξ u(x)| = |u(x)|.
To investigate its properties we restrict to u ∈ Cc (R), a compactly-supported
continuous function, say with support in −R, R]. Then the integral becomes a
Riemann integral and the integrand is a continuous function of both variables. It
follows that the Fourier transform is uniformly continuous since
(4.140) Z
0 0
|û(ξ) − û(ξ 0 )| ≤ |e−ixξ − e−ixξ ||u(x)|dx ≤ 2R sup |u| sup |e−ixξ − e−ixξ |
|x|≤R |x|≤R

with the right side small by the uniform continuity of continuous functions on com-
pact sets. From (4.139), if un → u in L1 (R) with un ∈ Cc (R) it follows that ûn → û
uniformly on R. Thus, as the uniform limit of uniformly continuous functions, the
Fourier transform is uniformly continuous on R for any u ∈ L1 (R) (you can also
see this from the continuity-in-the-mean of L1 functions).
Now, we know that even the compactly-supported once continuously differen-
tiable functions, forming Cc1 (R) are dense in L1 (R) so we can also consider (4.137)
where u ∈ Cc1 (R). Then the integration by parts as follows is justified
de−ixξ
Z Z
du(x)
(4.141) ξ û(ξ) = i ( )u(x)dx = −i e−ixξ dx.
dx dx
Since du/dx ∈ Cc (R) (by assumption) the estimate (4.139) now gives
du
(4.142) sup |ξ û(ξ)| ≤ 2R sup | |.
ξ∈R x∈R dx
This certainly implies the weaker statement that
(4.143) lim |û(ξ)| = 0
|ξ|→∞

which is ‘vanishing at infinity’. Now we again use the density, this time of Cc1 (R),
in L1 (R) and the uniform estimate (4.139), plus the fact that if a sequence of
continuous functions on R converges uniformly on R and each element vanishes at
infinity then the limit vanishes at infinity to complete the proof of the Proposition.


7. Fourier inversion
We could use the completeness of the orthonormal sequence of eigenfunctions
for the harmonic oscillator discussed above to show that the Fourier tranform ex-
tends by continuity from Cc (R) to define an isomorphism
(4.144) F : L2 (R) −→ L2 (R)
with inverse given by the corresponding continuous extension of
Z
(4.145) Gv(x) = (2π)−1 eixξ v(ξ).

Instead, we will give a direct proof of the Fourier inversion formula, via Schwartz
space and an elegant argument due to Hörmander. Then we will use this to prove
the completeness of the eigenfunctions we have found.
7. FOURIER INVERSION 137

We have shown above that the Fourier transform is defined as an integral if


u ∈ L1 (R). Suppose that in addition we know that xu ∈ L1 (R). We can summarize
the combined information as
(4.146) (1 + |x|)u ∈ L1 (R).
Lemma 4.7. If u satisfies (4.146) then û is continuously differentiable and
dû/dξ = F(−ixu) is bounded.
Proof. Consider the difference quotient for the Fourier transform:
û(ξ + s) − û(ξ) e−ixs − 1
Z
(4.147) = D(x, s)e−ixξ u(x), D(x, s) = .
s s
We can use the standard proof of Taylor’s formula to write the difference quotient
inside the integral as
Z 1
(4.148) D(x, s) = −ix e−itxs dt =⇒ |D(x, s)| ≤ |x|.
0

It follows that as s → 0 (along a sequence if you prefer) D(x, s)e−ixξ u(x) is bounded
by the L1 (R) function |x||u(x)| and converges pointwise to −ie−ixξ xu(x). Domi-
nated convergence therefore shows that the integral converges showing that the
derivative exists and that
dû(ξ)
(4.149) = F(−ixu).

From the earlier results it follows that the derivative is continuous and bounded,
proving the lemma. 
Now, we can iterate this result and so conclude:
(1 + |x|)k u ∈ L1 (R) ∀ k =⇒
û is infinitely differentiable with bounded derivatives and
(4.150)
dk û
= F((−ix)k u).
dξ k
This result shows that from ‘decay’ of u we deduce smoothness of û. We can
go the other way too. One way to ensure the assumption in (4.150) is to make the
stronger assumption that
(4.151) xk u is bounded and continuous ∀ k.
Indeed, Dominated Convergence shows that if u is continuous and satisfies the
bound
|u(x)| ≤ (1 + |x|)−r , r > 1
then u ∈ L (R). So the integrability of xj u follows from the bounds in (4.151) for
1

k ≤ j + 2. This is throwing away information but simplifies things below.


In the opposite direction, suppose that u is continuously differentiable and
satisfies the estimates for some r > 1
du(x)
|u(x)| + | | ≤ C(1 + |x|)−r .
dx
Then consider
Z R
de−ixξ de−ixξ
Z
(4.152) ξ û = i u(x) = lim i u(x).
dx R→∞ −R dx
138 4. DIFFERENTIAL AND INTEGRAL OPERATORS

We may integrate by parts in this integral to get


!
Z R
−ixξ du
R
i e−ixξ u(x) −R − i

(4.153) ξ û = lim e .
R→∞ −R dx
The decay of u shows that the first term vanishes in the limit and the second integral
converges so
du
(4.154) ξ û = F(−i ).
dx
Iterating this in turn we see that if u has continuous derivatives of all orders
and for all j
dj u −r j
j
jd u
(4.155) | | ≤ Cj (1 + |x|) , r > 1 then the ξ û = F((−i) )
dxj dxj
are all bounded.
Laurent Schwartz defined a space which handily encapsulates these results.
Definition 4.4. Schwartz space, S(R), consists of all the infinitely differen-
tiable functions u : R −→ C such that
dk u
(4.156) kukj,k = sup |xj k | < ∞ ∀ j, k ≥ 0.
dx
This is clearly a linear space. In fact it is a complete metric space in a natural
way. All the k · kj,k in (4.156) are norms on S(R), but none of them is stronger
than the others. So there is no natural norm on S(R) with respect to which it is
complete. In the problems below you can find some discussion of the fact that
X ku − vkj,k
(4.157) d(u, v) = 2−j−k
1 + ku − vkj,k
j,k≥0

is a complete metric. We will not use this here but it is the right way to understand
what is going on.
Notice that there is some prejudice on the order of multiplication by x and dif-
ferentiation in (4.156). This is only apparent, since these estimates (taken together)
are equivalent to
dk (xj u)
(4.158) sup | | < ∞ ∀ j, k ≥ 0.
dxk
To see the equivalence we can use induction over N where the inductive statement
is the equivalence of (4.156) and (4.158) for j + k ≤ N. Certainly this is true for
N = 0 and to carry out the inductive step just differentiate out the product to see
that
dk (xj u) k
jd u
X l
md u
= x + c l,m,k,j x
dxk dxk dxl
l+m<k+j
where one can be much more precise about the extra terms, but the important
thing is that they all are lower order (in fact both degrees go down). If you want to
be careful, you can of course prove this identity by induction too! The equivalence
of (4.156) and (4.158) for N + 1 now follows from that for N.
Theorem 4.6. The Fourier transform restricts to a bijection on S(R) with
inverse
Z
1
(4.159) G(v)(x) = eixξ v(ξ).

7. FOURIER INVERSION 139

Proof. The proof (due to Hörmander as I said above) will take a little while
because we need to do some computation, but I hope you will see that it is quite
clear and elementary.
First we need to check that F : S(R) −→ S(R), but this is what I just did the
preparation for. Namely the estimates (4.156) imply that (4.155) applies to all the
dk (xj u)
dxk
and so
dj û
(4.160) ξk is continuous and bounded ∀ k, j =⇒ û ∈ S(R).
dξ j
This indeed is why Schwartz introduced this space.
So, what we want to show is that with G defined by (4.159), u = G(û) for all
u ∈ S(R). Notice that there is only a sign change and a constant factor to get from
F to G so certainly G : S(R) −→ S(R). We start off with what looks like a small
part of this. Namely we want to show that
Z
(4.161) I(û) = û = 2πu(0).

Here, I : S(R) −→ C is just integration, so it is certainly well-defined. To prove


(4.161) we need to use a version of Taylor’s formula and then do a little computation.
Lemma 4.8. If u ∈ S(R) then
x2
(4.162) u(x) = u(0) exp(− ) + xv(x), v ∈ S(R).
2
Proof. Here I will leave it to you (look in the problems) to show that the
Gaussian
x2
(4.163) exp(− ) ∈ S(R).
2
Observe then that the difference
x2
w(x) = u(x) − u(0) exp(− ) ∈ S(R) and w(0) = 0.
2
This is clearly a necessary condition to see that w = xv with v ∈ S(R) and we can
then see from the Fundamental Theorem of Calculus that
Z x Z 1 Z 1
w(x)
(4.164) w(x) = w0 (y)dy = x w0 (tx)dt =⇒ v(x) = w0 (tx) = .
0 0 0 x
From the first formula for v it follows that it is infinitely differentiable and from the
second formula the derivatives decay rapidly since each derivative can be written
l
in the form of a finite sum of terms p(x) ddxwl /xN where the p’s are polynomials.
The rapid decay of the derivatives of w therefore implies the rapid decay of the
derivatives of v. So indeed we have proved Lemma 4.8. 
2
Let me set γ(x) = exp(− x2 ) to simplify the notation. Taking the Fourier
transform of each of the terms in (4.162) gives
dv̂
(4.165) û = u(0)γ̂ + i .

Since v̂ ∈ S(R),
Z Z R
dv̂ dv̂  R
(4.166) = lim = lim v̂(ξ) −R = 0.
dξ R→∞ −R dξ R→∞
140 4. DIFFERENTIAL AND INTEGRAL OPERATORS

So now we see that Z Z


û = cu(0), c = γ̂
being a constant that we still need to work out!
2
Lemma 4.9. For the Gaussian, γ(x) = exp(− x2 ),

(4.167) γ̂(ξ) = 2πγ(ξ).
Proof. Certainly, γ̂ ∈ S(R) and from the identities for derivatives above
dγ̂ dγ
(4.168) = −iF(xγ), ξγ̂ = F(−i ).
dξ dx
Thus, γ̂ satisfies the same differential equation as γ :
dγ̂ dγ
+ ξγ̂ = −iF( + xγ) = 0.
dξ dx
This equation we can solve and so we conclude that γ̂ = c0 γ where c0 is also a
constant that we need to compute. To do this observe that
0
Z √
(4.169) c = γ̂(0) = γ = 2π

which gives (4.167). The computation of the integral in (4.169) is a standard clever
argument which you probably know. Namely take the square and work in polar
coordinates in two variables:
Z Z ∞Z ∞
2 2
(4.170) ( γ)2 = e−(x +y ) dxdy
0 0
Z 2π Z ∞ ∞
2 2
e−r /2
rdrdθ = 2π − e−r /2 0 = 2π.

=
0 0

So, finally we need to get from (4.161) to the inversion formula. Changing
variable in the Fourier transform we can see that for any y ∈ R, setting uy (x) =
u(x + y), which is in S(R) if u ∈ S(R),
Z Z
−ixξ
(4.171) F(uy ) = e uy (x)dx = e−i(s−y)ξ u(s)ds = eiyξ û.

Now, plugging uy into (4.161) we see that


Z Z
(4.172) ûy (0) = 2πuy (0) = 2πu(y) = eiyξ û(ξ)dξ =⇒ u(y) = Gu,

the Fourier inversion formula. So we have proved the Theorem. 

8. Convolution
There is a discussion of convolution later in the notes, I have inserted a new
(but not very different) treatment here to cover the density of S(R) in L2 (R) needed
in the next section.
Consider two continuous functions of compact support u, v ∈ Cc (R). Their
convolution is
Z Z
(4.173) u ∗ v(x) = u(x − y)v(y)dy = u(y)v(x − y)dy.
8. CONVOLUTION 141

The first integral is the definition, clearly it is a well-defined Riemann integral since
the integrand is continuous as a function of y and vanishes whenever v(y) vanishes
– so has compact support. In fact if both u and v vanish outside [−R, R] then
u ∗ v = 0 outside [−2R, 2R].
From standard properties of the Riemann integral (or Dominated convergence
if you prefer!) it follows easily that u∗v is continuous. What we need to understand
is what happens if (at least) one of u or v is smoother. In fact we will want to take
a very smooth function, so I pause here to point out
Lemma 4.10. There exists a (‘bump’) function ψ : R −→ R which is infinitely
differentiable, i.e. has continuous derivatives of all orders, vanishes outside [−1, 1],
is strictly positive on (−1, 1) and has integral 1.
Proof. We start with an explicit function,
(
e−1/x x > 0
(4.174) φ(x) =
0 x ≤ 0.
The exponential function grows faster than any polynomial at +∞, since
xk
(4.175) exp(x) > in x > 0 ∀ k.
k!
This can be seen directly from the Taylor series which converges on the whole line
(indeed in the whole complex plane)
X xk
exp(x) = .
k!
k≥0

From (4.175) we deduce that


e−1/x Rk
(4.176) lim k
= lim R = 0 ∀ k
x↓0 x R→∞ e

where we substitute R = 1/x and use the properties of exp . In particular φ in


(4.174) is continuous across the origin, and so everywhere. We can compute the
derivatives in x > 0 and these are of the form
dl φ pl (x)
(4.177) l
= 2l e−1/x , x > 0, pl a polynomial.
dx x
As usual, do this by induction since it is true for l = 0 and differetiating the formula
for a given l one finds
dl+1 φ p0l (x) −1/x
 
pl (x) pl (x)
(4.178) = − 2l 2l+1 + 2l e
dxl+1 x2l+2 x x
where the coefficient function is of the desired form pl+1 /x2l+2 .
Once we know (4.177) then we see from (4.176) that all these functions are
continuous down to 0 where they vanish. From this it follows that φ in (4.174)
is infinitely differentiable. For φ itself we can use the Fundamental Theorem of
Calculus to write
Z x
(4.179) φ(x) = U (t)dt + φ(), x >  > 0.

142 4. DIFFERENTIAL AND INTEGRAL OPERATORS

Here U is the derivative in x > 0. Taking the limit as  ↓ 0 both sides converge,
and then we see that Z x
φ(x) = U (t)dt.
0
From this it follows that φ is continuously differentiable across 0 and it derivative
is U, the continuous extension of the derivative from x > 0. The same argument
applies to succssive derivatives, so indeed φ is infinitely differentiable.
From φ we can construct a function closer to the desired bump function.
Namely
Φ(x) = φ(x + 1)φ(1 − x).
The first factor vanishes when x ≤ −1 and is otherwise positive while the second
vanishes when x ≥ 1 but is otherwise positive, so the product is infinitely differ-
entiable on R and positive on (−1, 1) but otherwise 0. Then we can normalize the
integral to 1 by taking
Z
(4.180) ψ(x) = Φ(x)/ Φ.


In particular from Lemma 4.10 we conclude that the space Cc∞ (R),
of infinitely
differentiable functions of compact support, is not empty. Going back to convolution
in (4.173) suppose now that v is smooth. Then
(4.181) u ∈ Cc (R), v ∈ Cc∞ (R) =⇒ u ∗ v ∈ Cc∞ (R).
As usual this follows from properties of the Riemann integral or by looking directly
at the difference quotient
u ∗ v(x + t) − u ∗ v(x) v(x + t − y) − v(x − y)
Z
= u(y) dt.
t t
As t → 0, the difference quotient for v converges uniformly (in y) to the derivative
and hence the integral converges and the derivative of the convolution exists,
d dv
(4.182) u ∗ v(x) = u ∗ ( ).
dx dx
This result allows immediate iteration, showing that the convolution is smooth and
we know that it has compact support
Proposition 4.7. For any u ∈ Cc (R) there exists un → u uniformly on R
where un ∈ Cc∞ (R) with supports in a fixed compact set.
Proof. For each  > 0 consider the rescaled bump function
x
(4.183) ψ = −1 ψ( ) ∈ Cc∞ (R).

In fact, ψ vanishes outside the interval (, ), is positive within this interval and
has integral 1 – which is what the factor of −1 does. Now set
(4.184) u = u ∗ ψ ∈ Cc∞ (R),  > 0,
from what we have just seen. From the supports of these functions, u vanishes
outside [−R−, R+] if u vanishes outside [−R, R]. So only the convergence remains.
To get this we use the fact that the integral of ψ is equal to 1 to write
Z
(4.185) u (x) − u(x) = (u(x − y)ψ (y) − u(x)ψ (y))dy.
9. PLANCHEREL AND PARSEVAL 143

Estimating the integral using the positivity of the bump function


Z 
(4.186) |u (x) − u(x)| = |u(x − y) − u(x)|ψ (y)dy.
−

By the uniformity of a continuous function on a compact set, given δ > 0 there


exists  > 0 such that
sup |u(x − y) − y(x)| < δ ∀ x ∈ R.
[−,]

So the uniform convergence follows:-


Z
(4.187) sup |u (x) − u(x)| ≤ δ φ = δ

Pass to a sequence n → 0 if you wish, 

Corollary 4.1. The spaces Cc∞ (R) and S(R) are dense in L2 (R).
Uniform convegence of continuous functions with support in a fixed subset is
stronger than L2 convergence, so the result follows from the Proposition above for
Cc∞ (R) ⊂ S(R).

9. Plancherel and Parseval


But which is which?
We proceed to show that F and G, defined in (4.137) and (4.145), both extend
to isomorphisms of L2 (R) which are inverses of each other. The main step is to
show that
Z Z
(4.188) u(x)v̂(x)dx = û(ξ)v(ξ)dξ, u, v ∈ S(R).

Since the integrals are rapidly convergent at infinity we may substitute the definite
of the Fourier transform into (4.188), write the result out as a double integral and
change the order of integration
Z Z Z
(4.189) u(x)v̂(x)dx = u(x) e−ixξ v(ξ)dξdx
Z Z Z
= v(ξ) e−ixξ u(x)dxdξ = û(ξ)v(ξ)dξ.

Now, if w ∈ S(R) we may replace v(ξ) by ŵ(ξ), since it is another element of


S(R). By the Fourier Inversion formula,
Z Z
(4.190) w(x) = (2π)−1 e−ixξ ŵ(ξ) =⇒ w(x) = (2π)−1 eixξ ŵ(ξ) = (2π)−1 v̂.

Substituting these into (4.188) gives Parseval’s formula


Z Z
1
(4.191) uw = ûŵ, u, w ∈ S(R).

Proposition 4.8. The Fourier transform F extends from S(R) to an isomor-
phism on L2 (R) with √12π F an isometric isomorphism with adjoint, and inverse,

2πG.
144 4. DIFFERENTIAL AND INTEGRAL OPERATORS

Proof. Setting u = w in (4.191) shows that



(4.192) kF(u)kL2 = 2πkukL2
for all u ∈ S(R). The density of S(R), established above, then implies that F
extends by continuity to the whole of L2 (R) as indicated. 

This isomorphism of L2 (R) has many implications. For instance, we would


like to define the Sobolev space H 1 (R) by the conditions that u ∈ L2 (R) and
du 2
dx ∈ L (R) but to do this we would need to make sense of the derivative. However,
we can ‘guess’ that if it exists, the Fourier transform of du/dx should be iξ û(ξ).
For a function in L2 , such as û given that u ∈ L2 , we do know what it means to
require ξ û(ξ) ∈ L2 (R). We can then define the Sobolev spaces of any positive, even
non-integral, order by
(4.193) H r (R) = {u ∈ L2 (R); |ξ|r û ∈ L2 (R)}.
Of course it would take us some time to investigate the properties of these spaces!

10. Weak and strong derivatives


In approaching the issue of the completeness of the eigenbasis for harmonic
oscillator more directly, rather than by the kernel method discussed above, we run
into the issue of weak and strong solutions of differential equations. Suppose that
u ∈ L2 (R), what does it mean to say that du 2
dx ∈ L (R). For instance, we will want
to understand what the ‘possible solutions of’
d
(4.194) An u = f, u, f ∈ L2 (R), An = +x
dx
are. Of course, if we assume that u is continuously differentiable then we know
what this means, but we need to consider the possibilities of giving a meaning to
(4.194) under more general conditions – without assuming too much regularity on
u (or any at all).
Notice that there is a difference between the two terms in An u = du dx + xu. If
u ∈ L2 (R) we can assign a meaning to the second term, xu, since we know that
xu ∈ L2loc (R). This is not a normed space, but it is a perfectly good vector space,
in which L2 (R) ‘sits’ – if you want to be pedantic it naturally injects into it. The
point however, is that we do know what the statement xu ∈ L2 (R) means, given
that u ∈ L2 (R), it means that there exists v ∈ L2 (R) so that xu = v in L2loc (R)
(or L2loc (R)). The derivative can actually be handled in a similar fashion using the
Fourier transform but I will not do that here.
Rather, consider the following three ‘L2 -based notions’ of derivative.

Definition 4.5. (1) We say that u ∈ L2 (R) has a Sobolev derivative if


there exists a sequence φn ∈ Cc1 (R) such that φn → u in L2 (R) and φ0n → v
in L2 (R), φ0n = dφ
dx in the usual sense of course.
n

(2) We say that u ∈ L2 (R) has a strong derivative (in the L2 sense) if the
limit
u(x + s) − u(x)
(4.195) lim = ṽ exists in L2 (R).
06=s→0 s
10. WEAK AND STRONG DERIVATIVES 145

(3) Thirdly, we say that u ∈ L2 (R) has a weak derivative in L2 if there exists
w ∈ L2 (R) such that
df
(4.196) (u, − )L2 = (w, f )L2 ∀ f ∈ Cc1 (R).
dx
In all cases, we will see that it is justified to write v = ṽ = w = du
dx because these
defintions turn out to be equivalent. Of course if u ∈ Cc1 (R) then u is differentiable
in each sense and the derivative is always du/dx – note that the integration by parts
used to prove (4.196) is justified in that case. In fact we are most interested in the
first and third of these definitions, the first two are both called ‘strong derivatives.’
It is easy to see that the existence of a Sobolev derivative implies that this
is also a weak derivative. Indeed, since φn , the approximating sequence whose
existence is the definition of the Sobolev derivative, is in Cc1 (R) the integration by
parts implicit in (4.196) is valid and so for all f ∈ Cc1 (R),
df
(4.197) (φn , − )L2 = (φ0n , f )L2 .
dx
Since φn → u in L2 and φ0n → v in L2 both sides of (4.197) converge to give the
identity (4.196).
Before proceeding to the rest of the equivalence of these definitions we need
to do some preparation. First let us investigate a little the consequence of the
existence of a Sobolev derivative.
Lemma 4.11. If u ∈ L2 (R) has a Sobolev derivative then u ∈ C(R) and there
exists a uniquely defined element w ∈ L2 (R) such that
Z x
(4.198) u(x) − u(y) = w(s)ds ∀ y ≤ x ∈ R.
y

Proof. Suppose u has a Sobolev derivative, determined by some approximat-


ing sequence φn . Consider a general element ψ ∈ Cc1 (R). Then φ̃n = ψφn is a
sequence in Cc1 (R) and φ̃n → ψu in L2 . Moreover, by the product rule for standard
derivatives
d
(4.199) φ̃n = ψ 0 φn + ψφ0n → ψ 0 u + ψw in L2 (R).
dx
Thus in fact ψu also has a Sobolev derivative, namely φ0 u + ψw if w is the Sobolev
derivative for u given by φn – which is to say that the product rule for derivatives
holds under these conditions.
Now, the formula (4.198) comes from the Fundamental Theorem of Calculus
which in this case really does apply to φ̃n and shows that
Z x
dφ̃n
(4.200) ψ(x)φn (x) − ψ(y)φn (y) = (s)ds.
y ds
For any given x = x̄ we can choose ψ so that ψ(x̄) = 1 and then we can take y
below the lower limit of the support of ψ so ψ(y) = 0. It follows that for this choice
of ψ,
Z x̄
(4.201) φn (x̄) = (ψ 0 φn (s) + ψφ0n (s))ds.
y
Now, we can pass to the limit as n → ∞ and the left side converges for each fixed x̄
(with ψ fixed) since the integrand converges in L2 and hence in L1 on this compact
146 4. DIFFERENTIAL AND INTEGRAL OPERATORS

interval. This actually shows that the limit φn (x̄) must exist for each fixed x̄. In
fact we can always choose ψ to be constant near a particular point and apply this
argument to see that

(4.202) φn (x) → u(x) locally uniformly on R.

That is, the limit exists locally uniformly, hence represents a continuous function
but that continuous function must be equal to the original u almost everywhere
(since ψφn → ψu in L2 ).
Thus in fact we conclude that ‘u ∈ C(R)’ (which really means that u has a
representative which is continuous). Not only that but we get (4.198) from passing
to the limit on both sides of
Z s Z s
0
(4.203) u(x) − u(y) = lim (φn (x) − φn (y)) = lim (φ (s))ds = w(s)ds.
n→∞ n→∞ y y

One immediate consequence of this is

(4.204) The Sobolev derivative is unique if it exists.

Indeed, if w1 and w2 are both Sobolev derivatives then (4.198) holds for both of
them, which means that w2 − w1 has vanishing integral on any finite interval and
we know that this implies that w2 = w1 a.e.
So at least for Sobolev derivatives we are now justified in writing
du
(4.205) w=
dx
since w is unique and behaves like a derivative in the integral sense that (4.198)
holds.

Lemma 4.12. If u has a Sobolev derivative then u has a strong derivative and
if u has a strong derivative then this is also a weak derivative.

Proof. If u has a Sobolev derivative then (3.17) holds. We can use this to
write the difference quotient as
u(x + s) − u(x) 1 s
Z
(4.206) − w(x) = (w(x + t) − w(x))dt
s s 0
since the integral in the second term can be carried out. Using this formula twice
the square of the L2 norm, which is finite, is

u(x + s) − u(x)
(4.207) k − w(x)k2L2
s Z Z sZ s
1
= 2 (w(x + t) − w(x)(w(x + t0 ) − w(x))dtdt0 dx.
s 0 0

There is a small issue of manupulating the integrals, but we can always ‘back off
a little’ and replace u by the approximating sequence φn and then everything is
fine – and we only have to check what happens at the end. Now, we can apply the
Cauchy-Schwarz inequality as a triple integral. The two factors turn out to be the
10. WEAK AND STRONG DERIVATIVES 147

same so we find that


s s
u(x + s) − u(x)
Z Z Z
1
(4.208) k − w(x)k2L2 ≤ 2 |w(x + t) − w(x)|2 dtdt0 dx
s s 0 0
1 s
Z Z
= |w(x + t) − w(x)|2 dxdt
s 0
since the integrand does not depend on t0 .
Now, something we checked long ago was that L2 functions are ‘continuous in
the mean’ in the sense that
Z
(4.209) lim |w(x + t) − w(x)|2 dx = 0.
06=t→0

Applying this to (4.208) and then estimating the t integral shows that
u(x + s) − u(x)
(4.210) − w(x) → 0 in L2 (R) as s → 0.
s
By definition this means that u has w as a strong derivative. I leave it up to you
to make sure that the manipulation of integrals is okay.
So, now suppose that u has a strong derivative, ṽ. Observe that if f ∈ Cc1 (R)
then the limit defining the derivative
f (x + s) − f (x)
(4.211) lim = f 0 (x)
06=s→0 s
is uniform. In fact this follows by writing down the Fundamental Theorem of
Calculus, as in (4.198), again using the properties of Riemann integrals. Now,
consider
f (x + s) − f (x)
Z Z
1 1
(u(x), )L2 = u(x)f (x + s)dx − u(x)f (x)dx
s s s
(4.212)
u(x − s) − u(x)
=( , f (x))L2
s
where we just need to change the variable of integration in the first integral from
x to x + s. However, letting s → 0 the left side converges because of the uniform
convergence of the difference quotient and the right side converges because of the
assumed strong differentiability and as a result (noting that the parameter on the
right is really −s)
df
(4.213) (u, )L2 = −(w, f )L2 ∀ f ∈ Cc1 (R)
dx
which is weak differentiability with derivative ṽ. 

So, at this point we know that Sobolev differentiabilty implies strong differen-
tiability and either of the stong ones implies the weak. So it remains only to show
that weak differentiability implies Sobolev differentiability and we can forget about
the difference!
Before doing that, note again that a weak derivative, if it exists, is unique –
since the difference of two would have to pair to zero in L2 with all of Cc1 (R) which
is dense. Similarly, if u has a weak derivative then so does ψu for any ψ ∈ Cc1 (R)
148 4. DIFFERENTIAL AND INTEGRAL OPERATORS

since we can just move ψ around in the integrals and see that
df df
(ψu, − ) = (u, −ψ )
dx dx
(4.214) dψf
= (u, − ) + (u, ψ 0 f )
dx
= (w, ψf + (ψ 0 u, f ) = (ψw + ψ 0 u, f )
which also proves that the product formula holds for weak derivatives.
So, let us consider u ∈ L2c (R) which does have a weak derivative. To show that
it has a Sobolev derivative we need to construct a sequence φn . We will do this by
convolution.
Lemma 4.13. If µ ∈ Cc (R) then for any u ∈ L2c (R),
Z
(4.215) µ ∗ u(x) = µ(x − s)u(s)ds ∈ Cc (R)

and if µ ∈ Cc1 (R) then


dµ ∗ u
(4.216) µ ∗ u(x) ∈ Cc1 (R), = µ0 ∗ u(x).
dx
It folows that if µ has more continuous derivatives, then so does µ ∗ u.

Proof. Since u has compact support and is in L2 it in L1 so the integral in


(4.215) exists for each x ∈ R and also vanishes if |x| is large enough, since the
integrand vanishes when the supports become separate – for some R, µ(x − s) is
supported in |s − x| ≤ R and u(s) in |s| < R which are disjoint for |x| > 2R. It is
also clear that µ ∗ u is continuous using the estimate (from uniform continuity of
µ)
(4.217) |µ ∗ u(x0 ) − µ ∗ u(x)| ≤ sup |µ(x − s) − µ(x0 − s)|kukL1 .
Similarly the difference quotient can be written
µ ∗ u(x0 ) − µ ∗ u(x) µ(x0 − s) − µ(x − s)
Z
(4.218) = u(s)ds
t s
and the uniform convergence of the difference quotient shows that
dµ ∗ u
(4.219) = µ0 ∗ u.
dx


One of the key properties of thes convolution integrals is that we can examine
what happens when we ‘concentrate’ µ. Replace the one µ by the family
x
(4.220) µ (x) = −1 µ( ),  > 0.

R
The singular factor here is introduced so that µ is independent of  > 0,
Z Z
(4.221) µ = µ ∀  > 0.

Note that since µ has compact support, the support of µ is concentrated in |x| ≤ R
for some fixed R.
10. WEAK AND STRONG DERIVATIVES 149

Lemma 4.14. If u ∈ L2c (R) and 0 ≤ µ ∈ Cc1 (R) then


Z
(4.222) lim µ ∗ u = ( µ)u in L2 (R).
06=→0

In fact there is no need to assume that u has compact support for this to work.
Proof. First we can change the variable of integration in the definition of the
convolution and write it intead as
Z
(4.223) µ ∗ u(x) = µ(s)u(x − s)ds.

Now, the rest is similar to one of the arguments above. First write out the difference
we want to examine as
Z Z
(4.224) µ ∗ u(x) − ( µ)(x) = µ (s)(u(x − s) − u(x))ds.
|s|≤R

Write out the square of the absolute value using the formula twice and we find that
Z Z
(4.225) |µ ∗ u(x) − ( µ)(x)|2 dx
Z Z Z
= µ (s)µ (t)(u(x − s) − u(x))(u(x − s) − u(x))dsdtdx
|s|≤R |t|≤R

Now we can write the integrand as the product of two similar factors, one being
1 1
(4.226) µ (s) 2 µ (t) 2 (u(x − s) − u(x))
using the non-negativity of µ. Applying the Cauchy-Schwarz inequality to this we
get two factors, which are again the same after relabelling variables, so
Z Z Z Z Z
(4.227) |µ ∗u(x)−( µ)(x)|2 dx ≤ µ (s)µ (t)|u(x−s)−u(x)|2 .
|s|≤R |t|≤R

The integral in x can be carried out first, then using continuity-in-the mean bounded
by J(s) → 0 as  → 0 since |s| < R. This leaves
Z Z
(4.228) |µ ∗ u(x) − ( µ)u(x)|2 dx
Z Z Z
≤ sup J(s) µ (s)µ (t) = ( ψ)2 Y sup → 0.
|s|≤R |s|≤R |t|≤R |s|≤R


After all this preliminary work we are in a position to to prove the remaining
part of ‘weak=strong’.
Lemma 4.15. If u ∈ L2 (R) has w as a weak L2 -derivative then w is also the
Sobolev derivative of u.
Proof. Let’s assume first that u has compact support, so we can use the
discussion above. Then setR φn = µ1/n ∗ u where µ ∈ Cc1 (R) is chosen to be non-
negative and have integral µ = 0; µ is defined in (4.220). Now from Lemma 4.14
it follows that φn → u in L2 (R). Also, from Lemma 4.13, φn ∈ Cc1 (R) has derivative
given by (4.216). This formula can be written as a pairing in L2 :
dµ1/n (x − s) 2 dµ1/n (x − s)
(4.229) (µ1/n )0 ∗ u(x) = (u(s), − )L = (w(s), )L2
ds ds
150 4. DIFFERENTIAL AND INTEGRAL OPERATORS

using the definition of the weak derivative of u. It therefore follows from Lemma 4.14
applied again that
(4.230) φ0n = µ/m1/n ∗ w → w in L2 (R).
Thus indeed, φn is an approximating sequence showing that w is the Sobolev de-
rivative of u.
In the general case that u ∈ L2 (R) has a weak derivative but is not necessarily
compactly supported, consider a function γ ∈ Cc1 (R) with γ(0) = 1 and consider
the sequence vm = γ(x)u(x) in L2 (R) each element of which has compact support.
Moreover, γ(x/m) → 1 for each x so by Lebesgue dominated convergence, vm → u
in L2 (R) as m → ∞. As shown above, vm has as weak derivative
dγ(x/m) 1
u + γ(x/m)w = γ 0 (x/m)u + γ(x/m)w → w
dx m
as m → ∞ by the same argument applied to the second term and the fact that
the first converges to 0 in L2 (R). Now, use the approximating sequence µ1/n ∗ vm
discussed converges to vm with its derivative converging to the weak derivative of
vm . Taking n = N (m) sufficiently large for each m ensures that φm = µ1/N (m) ∗ vm
converges to u and its sequence of derivatives converges to w in L2 . Thus the weak
derivative is again a Sobolev derivative. 

Finally then we see that the three definitions are equivalent and we will freely
denote the Sobolev/strong/weak derivative as du/dx or u0 .

11. Sobolev spaces


Now there are lots of applications of the Fourier transform which we do not
have the time to get into. However, let me just indicate the definitions of Sobolev
spaces and Schwartz space and how they are related to the Fourier transform.
First Sobolev spaces. We now see that F maps L2 (R) isomorphically onto
L2 (R) and we can see from (??) for instance that it ‘turns differentiations by x into
multiplication by ξ’. Of course we do not know how to differentiate L2 functions so
we have some problems making sense of this. One way, the usual mathematicians
trick, is to turn what we want into a definition.
Definition 4.6. The Sobolev spaces of order s, for any s ∈ (0, ∞), are defined
as subspaces of L2 (R) :
(4.231) H s (R) = {u ∈ L2 (R); (1 + |ξ|2 )s û ∈ L2 (R)}.
It is natural to identify H 0 (R) = L2 (R).
These Sobolev spaces, for each positive order s, are Hilbert spaces with the
inner product and norm
Z
s
(4.232) (u, v)H = (1 + |ξ|2 )s û(ξ)v̂(ξ), kuks = k(1 + |ξ|2 ) 2 ûkL2 .
s

That they are pre-Hilbert spaces is clear enough. Completeness is also easy, given
that we know the completeness of L2 (R). Namely, if un is Cauchy in H s (R) then
it follows from the fact that
(4.233) kvkL2 ≤ Ckvks ∀ v ∈ H s (R)
11. SOBOLEV SPACES 151

s
that un is Cauchy in L2 and also that (1 + |ξ|2 ) 2 ûn (ξ) is Cauchy in L2 . Both
therefore converge to a limit u in L2 and the continuity of the Fourier transform
shows that u ∈ H s (R) and that un → u in H s .
These spaces are examples of what is discussed above where we have a dense
inclusion of one Hilbert space in another, H s (R) −→ L2 (R). In this case the in-
clusion in not compact but it does give rise to a bounded self-adjoint operator on
L2 (R), Es : L2 (R) −→ H s (R) ⊂ L2 (R) such that
(4.234) (u, v)L2 = (Es u, Es v)H s .
s
It is reasonable to denote this as Es = (1 + |Dx |2 )− 2 since
2 −s
(4.235) u ∈ L2 (Rn ) =⇒ E
d s u(ξ) = (1 + |ξ| ) 2 û(ξ).

It is a form of ‘fractional integration’ which turns any u ∈ L2 (R) into Es u ∈ H s (R).


Having defined these spaces, which get smaller as s increases it can be shown for
instance that if n ≥ s is an integer then the set of n times continuously differentiable
functions on R which vanish outside a compact set are dense in H s . This allows us
to justify, by continuity, the following statement:-
Proposition 4.9. The bounded linear map
d du
(4.236) : H s (R) −→ H s−1 (R), s ≥ 1, v(x) = ⇐⇒ v̂(ξ) = iξ û(ξ)
dx dx
is consistent with differentiation on n times continuously differentiable functions of
compact support, for any integer n ≥ s.
In fact one can even get a ‘strong form’ of differentiation. The condition that
u ∈ H 1 (R), that u ∈ L2 ‘has one derivative in L2 ’ is actually equivalent, for
u ∈ L2 (R) to the existence of the limit
u(x + t)u(x)
(4.237) lim = v, in L2 (R)
t→0 t
and then v̂ = iξ û. Another way of looking at this is
u ∈ H 1 (R) =⇒ u : R −→ C is continuous and
Z x
(4.238)
u(x) − u(y) = v(t)dt, v ∈ L2 .
y

If such a v ∈ L2 (R) exists then it is unique – since the difference of two such
functions would have to have integral zero over any finite interval and we know
(from one of the exercises) that this implies that the function vansishes a.e.
One of the more important results about Sobolev spaces – of which there are
many – is the relationship between these ‘L2 derivatives’ and ‘true derivatives’.
1
Theorem 4.7 (Sobolev embedding). If n is an integer and s > n + 2 then
(4.239) H s (R) ⊂ C∞
n
(R)
consists of n times continuosly differentiable functions with bounded derivatives to
order n (which also vanish at infinity).
This is actually not so hard to prove, there are some hints in the exercises below.
152 4. DIFFERENTIAL AND INTEGRAL OPERATORS

These are not the only sort of spaces with ‘more regularity’ one can define
and use. For instance one can try to treat x and ξ more symmetrically and define
smaller spaces than the H s above by setting
s s
s
(4.240) Hiso (R) = {u ∈ L2 (R); (1 + |ξ|2 ) 2 û ∈ L2 (R), (1 + |x|2 ) 2 u ∈ L2 (R)}.
The ‘obvious’ inner product with respect to which these ‘isotropic’ Sobolev
s
spaces Hiso (R) are indeed Hilbert spaces is
Z Z Z
(4.241) (u, v)s,iso = uv + |x|2s uv + |ξ|2s ûv̂
R R R

which makes them look rather symmetric between u and û and indeed
s s
(4.242) F : Hiso (R) −→ Hiso (R) is an isomorphism ∀ s ≥ 0.
At this point, by dint of a little, only moderately hard, work, it is possible to
show that the harmonic oscillator extends by continuity to an isomorphism
s+2 s
(4.243) H : Hiso (R) −→ Hiso (R) ∀ s ≥ 2.
Finally in this general vein, I wanted to point out that Hilbert, and even Ba-
nach, spaces are not the end of the road! One very important space in relation to
a direct treatment of the Fourier transform, is the Schwartz space. The definition
is reasonably simple. Namely we denote Schwartz space by S(R) and say
u ∈ S(R) ⇐⇒ u : R −→ C
is continuously differentiable of all orders and for every n,
(4.244) X dp u
kukn = sup(1 + |x|)k | p | < ∞.
x∈R dx
k+p≤n

All these inequalities just mean that all the derivatives of u are ‘rapidly decreasing
at ∞’ in the sense that they stay bounded when multiplied by any polynomial.
So in fact we know already that S(R) is not empty since the elements of the
Hermite basis, ej ∈ S(R) for all j. In fact it follows immediately from this that
(4.245) S(R) −→ L2 (R) is dense.
If you want to try your hand at something a little challenging, see if you can check
that
\
s
(4.246) S(R) = Hiso (R)
s>0

which uses the Sobolev embedding theorem above.


As you can see from the definition in (4.244), S(R) is not likely to be a Banach
space. Each of the k · kn is a norm. However, S(R) is pretty clearly not going to be
complete with respect to any one of these. However it is complete with respect to
all, countably many, norms. What does this mean? In fact S(R) is a metric space
with the metric
X ku − vkn
(4.247) d(u, v) = 2−n
n
1 + ku − vkn

as you can check. So the claim is that S(R) is complete as a metric space – such a
thing is called a Fréchet space.
12. SCHWARTZ DISTRIBUTIONS 153

What has this got to do with the Fourier transform? The point is that
(4.248)
du dF(u)
F : S(R) −→ S(R) is an isomorphism and F( ) = iξF(u), F(xu) = −i
dx dξ
where this now makes sense. The dual space of S(R) – the space of continuous
linear functionals on it, is the space, denoted S 0 (R), of tempered distributions on
R.

12. Schwartz distributions


We do not have time in this course to really discuss distributions. Still, it is
a good idea for you to know what they are and why they are useful. Of course
to really appreciate their utility you need to read a bit more than I have here.
First think a little about the Schwartz space S(R) introduced above. The metric in
(4.247) might seem rather mysterious but it has the important property that each
of the norms k · kn defines a continuous function S(R) −→ R with respect to this
metric topology. In fact a linear map
(4.249)
T : S(R) −→ C linear is continuous iff ∃ N, C s.t. kT φk ≤ CkφkN ∀ φ ∈ S(R).
So, the continuous linear functionals on S(R) are just those which are continous
with respect to one of the norms.
These functionals are exactly the space of tempered distributions
(4.250) S 0 (R) = {T : S(R) −→ C linear and continuous}.
The relationship to functions is that each f ∈ L2 (R) (or more generally such that
(1 + |x|)−N ∈ L1 (R) for some N ) defines an element of S 0 (R) by integration:
Z
(4.251) Tf : S(R) 3 φ 7−→ f (x)φ(x) ∈ C =⇒ Tf ∈ S 0 (R).

Indeed, this amounts to showing that kφkL2 is a continuous norm on S(R) (so it
must be bounded by a multiple of one of the kφkN , which one?)
It is relatively straightforward to show that L2 (R) 3 f 7−→ Tf ∈ S 0 (R) is
injective – nothing is ‘lost’. So after a little more experience with distributions one
comes to identify f and Tf . Notice that this is just an extension of the behaviour of
L2 (R) where (because we can drop the complex conjugate in the inner product) by
Riesz’ Theorem we can identify (linearly) L2 (R) with it dual, exactly by the map
f 7−→ Tf .
Other elements of S 0 (R) include the delta ‘function’ at the origin and even its
‘derivatives’ for each j
dj φ
(4.252) δ j : S(R) 3 φ 7−→ (−1)j (0) ∈ C.
dxj
In fact one of the main points about the space S 0 (R) is that differentiation and
multiplication by polynomials is well defined
d
(4.253) : S 0 (R) −→ S 0 (R), ×x : S 0 (R) −→ S 0 (R)
dx
in a way that is consistent with their actions under the identification S(R) : φ 7−→
Tφ ∈ S 0 (R). This property is enjoyed by other spaces of distributions but the
154 4. DIFFERENTIAL AND INTEGRAL OPERATORS

fundamental fact that the Fourier transform extends to


(4.254) F : S 0 (R) −→ S 0 (R) as an isomorphism
is more characteristic of S 0 (R).

13. Poisson summation formula


We have talked both about Fourier series and the Fourier transform. It is
natural to ask: What is the connection between these? The Fourier series of a
function in L2 (0, 2π) we thought of as given by the Fourier-Bessel series with respect
to the orthonormal basis
exp(ikx)
(4.255) √ , k ∈ Z.

The interval here is just a particular choice – if the upper limit is changed to T
then the corresponding orthonormal basis of L2 (0, T ) is
exp(i2πkx/T )
(4.256) √ , k ∈ Z.
T
Sometimes the Fourier transform is thought of as the limit of the Fourier series
expansion when T → ∞. This is actually not such a nice limit, so unless you have
(or want) to do this I recommend against it!
A more fundamental relationship between the two comes about as follows. We
can think of L2 (0, 2π) as ‘really’ being the 2π-periodic functions restricted to this
interval. Since the values at the end-points don’t matter this does give a bijection –
between 2π-periodic, locally square-integrable functions on the line and L2 (0, 2π).
On the other hand we can also think of the periodic functions as being defined on
the circle, |z| = 1 in C or identified with the values of θ ∈ R modulo repeats:
(4.257) T = R/2πZ 3 θ 7−→ eiθ ∈ C.
Let us denote by C ∞ (T) the space of infinitely differentiable, 2π-periodic func-
tions on the line; this is also the space of smooth functions on the circle, thought
of as a manifold.
How can one construct such functions. There are plenty of examples, for in-
stance the exp(ikx). Another way to construct examples is to sum over translations:-
Lemma 4.16. The map
X
(4.258) A : S(R) 3 f −→ f (· − 2πk) ∈ C ∞ (T)
k∈Z
is surjective.
Proof. That the series in (4.258) converges uniformly on [0, 2π] (or any bounded
interval) is easy enought to see, since the rapid decay of elements of S(R) shows
that
(4.259) |f (x)| ≤ C(1 + |x|)−2 , x ∈ R =⇒ |f (x − 2πk)| ≤ C 0 (1 + |k|)−2 , x ∈ [0, 2π]
since if k > 2 |x−2πk| ≥ k if x ∈ [0, 2π]. Clearly (4.259) implies uniform convergence
of the series. Since the derivatives of f are also in S(R) the series obtained by
term-by-term differentiation also converges uniformly and by standard arguments
the limit Ag is therefore infinitely differentiable, with
dj Af dj f
(4.260) j
=A j.
dx dx
13. POISSON SUMMATION FORMULA 155

This shows that the map A, clearly linear, is well-defined. Now, how to see
that it is surjective? Let’s first prove a special case. Indeed, look for a function
ψ ∈ Cc∞ (R) ⊂ S(R) which is non-negative and such that Aψ = 1. We know that
we can find φ ∈ Cc∞ (R), φ ≥ 0 with φ > 0 on [0, 2π]. Then consider Aφ ∈ C ∞ (T).
It must be stricly positive, Aφ ≥  > 0 since it is larger that φ. So consider instead
the function
φ
(4.261) ψ= ∈ Cc∞ (R)

where we think of Aφ as 2π-periodic on R. In fact using this periodicity we see that
(4.262) Aψ ≡ 1.
So this shows that the constant function 1 is in the range of A. In general, just
take g ∈ C ∞ (T), thought of as 2π-periodic on the line, and it follows that
(4.263) f = Bg = ψg ∈ Cc∞ (R) ⊂ S(R) satsifies Af = g.
Indeed,
X X
(4.264) Ag = ψ(x − 2πk)g(x − 2πk) = g(x) ψ(x − 2πk) = g
k k
using the periodicity of g. In fact B is a right inverse for A,
(4.265) AB = Id on C ∞ (T).

Question 2. What is the null space of A?
Since f ∈ S(R) and Af ∈ C ∞ (T) ⊂ L2 (0, 2π) with our identifications above,
the question arises as to the relationship between the Fourier transform of f and
the Fourier series of Af.
Proposition 4.10 (Poisson summation formula). If g = Af, g ∈ C ∞ (T) and
f ∈ S(R) then the Fourier coefficients of g are
Z
(4.266) ck = ge−ikx = fˆ(k).
[0,2π]

Proof. Just substitute in the formula for g and, using uniform convergenc,
check that the sum of the integrals gives after translation the Fourier transform of
f. 
If we think of recovering g from its Fourier series,
1 X 1 Xˆ
(4.267) g(x) = ck eikx = f (k)eikx
2π 2π
k∈Z k∈Z
0
then in terms of the Fourier transform on S (R) alluded to above, this takes the
rather elegant form
!
1 X 1 X ikx X
(4.268) F δ(· − k) (x) = e = δ(x − 2πk).
2π 2π
k∈Z k∈Z k∈Z
The sums of translated Dirac deltas and oscillating exponentials all make sense in
S 0 (R).
APPENDIX A

Problems for Chapter 1

1. For §1
Problem 1.1. In case you are a bit shaky on it, go through the basic theory of
finite-dimensional vector spaces. Define a vector space V to be finite-dimensional
if there is an integer N such that any N elements of V are linearly dependent – if
vi ∈ V for i = 1, . . . N, then there exist ai ∈ K, not all zero, such that
N
X
(A.1) ai vi = 0 in V.
i=1
If N is the smallest such integer define dimension of V to be dim V = N −1 and show
that a finite dimensional vector space always has a basis, ei ∈ V, i = 1, . . . , dim V
such that any element of V can be written uniquely as a linear combination
dim
XV
(A.2) v= bi ei , bi ∈ K.
i=1

Problem 1.2. Show from first principles that if V is a vector space (over R or
C) then for any set X the space of all maps
(A.3) F(X; V ) = {u : X −→ V }
is a vector space over the same field, with ‘pointwise operations’ (which you should
write down carefully).
Problem 1.3. Show that if V is a vector space and S ⊂ V is a subset which
is closed under addition and scalar multiplication:
(A.4) v1 , v2 ∈ S, λ ∈ K =⇒ v1 + v2 ∈ S and λv1 ∈ S
then S is a vector space as well with operations ‘inherited from V ’ (and called, of
course, a subspace of V ).
Problem 1.4. Recall that a map between vector spaces L : V −→ W is linear
if L(v1 + v2 ) = Lv1 + Lv2 and Lλv = λLv for all elements v1 , v2 , v ∈ V and all
scalars λ. Show that given two finite dimensional vector spaces V and W over the
same field
(1) If dim V ≤ dim W then there is an injective linear map L : V −→ W.
(2) If dim V ≥ W then there is a surjective linear map L : V −→ W.
(3) if dim V = dim W then there is a linear isomorphism L : V −→ W, i.e. an
injective and surjective linear map.
Problem 1.5. If S ⊂ V is a linear subspace of a vector space show that the
relation on V
(A.5) v1 ∼ v2 ⇐⇒ v1 − v2 ∈ S
157
158 A. PROBLEMS FOR CHAPTER ??

is an equivalence relation and that the set of equivalence classes


[v] = {w ∈ V ; w − v ∈ S},
denoted usually V /S, is a linear space in a natural way and that the projection map
π : V −→ V /S, taking each v to [v] is linear.
Problem 1.6. Show that any two norms on a finite dimensional vector space
are equivalent.
Problem 1.7. Show that two norms on a vector space are equivalent if and
only if the topologies induced are the same – the sets open with respect to the
distance from one are open with respect to the distance coming from the other.
Problem 1.8. Write out a proof for each p with 1 ≤ p < ∞ that
X∞
lp = {a : N −→ C; |aj |p < ∞, aj = a(j)}
j=1

is a normed space with the norm


  p1
X∞
kakp =  |aj |p  .
j=1

This means writing out the proof that this is a linear space and that the three
conditions required of a norm hold.
Problem 1.9. Prove directly that each lp as defined in Problem 1.8 is complete,
i.e. it is a Banach space.
Problem 1.10. The space l∞ consists of the bounded sequences
(A.6) l∞ = {a : N −→ C; sup |an | < ∞}, kak∞ = sup |an |.
n n
Show that this is a non-separable Banach space.
Problem 1.11. Another closely related space consists of the sequences con-
verging to 0 :
(A.7) c0 = {a : N −→ C; lim an = 0}, kak∞ = sup |an |.
n→∞ n
Check that this is a separable Banach space and that it is a closed subspace of l∞
(perhaps do it in the opposite order).
Problem 1.12. Consider the ‘unit sphere’ in lp . This is the set of vectors of
length 1 :
S = {a ∈ lp ; kakp = 1}.
(1) Show that S is closed.
(2) Recall the sequential (so not the open covering definition) characterization
of compactness of a set in a metric space (e.g. by checking in Rudin’s
book).
(3) Show that S is not compact by considering the sequence in lp with kth
element the sequence which is all zeros except for a 1 in the kth slot. Note
that the main problem is not to get yourself confused about sequences of
sequences!
Problem 1.13. Show that the norm on any normed space is continuous.
1. FOR §?? 159

Problem 1.14. Finish the proof of the completeness of the space B constructed
in the second proof of Theorem 1.1.
160
APPENDIX B

Problems for Chapter 4

1. Hill’s equation
As an extended exercise I suggest you follow the ideas of §4.4 but now for
‘Hill’s equation’ which is the same problem as (4.73) but with periodic boundary
conditions:-
d2 u du du
(B.1) − + V u = f on (0, 2π), u(2π) = u(0), (2π) = (0).
dx2 dx dx
There are several ways to do this, but you cannot proceed in precisely the same
way since for V = 0 the constants are solutions of (B.1) – so even if the system has
a solution (which for some f it does not) this solution is not unique.
One way to proceed is to start from V = 1 say and solve the problem explicitly.
However the formulæ are not as simple as for the Dirichlet case.
So instead I will outline an approach starting from the solution of the Dirichlet
problem. This is allows you to see some important concepts – for instance the
Maximum Principle. You should proceed to prove this sequence of claims!
(1) If V ≥ 0 (always real-valued in C([0, 2π])) then we know that the Dirichlet
problem, (4.73), has a unique solution given in (4.111):
(B.2) u = SV f, SV = A(Id +AV A)−1 A.
Recall that the eigenfunctions of this operator are twice continuously
differentiable eigenfunctions for the Dirichlet problem with eigenvalues
Tk = λ−1k where the λk are the eigenvalues of SV .
(2) Prove the maximal principle in this case, that if V > 0, f ≥ 0 then
u = SV f ≥ 0. Hint:- If this were not true then there would be an interior
2
minimum at which u(p) < 0 but at this point − ddxu2 (p) ≤ 0 and V (p)u(p) <
0 which contradicts (4.73) since f (p) ≥ 0.
(3) Now, suppose u is a ‘classical’ (twice continuously differentiable) solution
to (B.1) (with V > 0). Then set u0 = u(0) = u(2π) and u0 = u − u0 and
observe that
d2 u0
(B.3) − + V u0 = f − u0 V =⇒ u0 = SV f − u0 SV V.
dx2
(4) Using the assumption that V > 0 show that
d d
(B.4) SV V (0) > 0, SV V (2π) < 0.
dx dx
2
d
Hint:- From the equation for SV V observe that dx 2 SV V (0) < 0 so if
d
dx SV V (0) ≤ 0 then SV V (x) < 0 for small x > 0 violating the Maximum
Principle and similarly at 2π.
161
162 B. PROBLEMS FOR CHAPTER ??

(5) Conclude from (B.3) that for V > 0 there is a unique solution to (B.1)
which is of the form

(B.5) u = TV f = SV f + u0 − u0 SV ,
d d
au0 = TV V (2π) − TV V (0),
dx dx
d d
a= SV V (2π) − SV V (0) > 0.
dx dx
(6) Show that TV is an injective compact self-adjoint operator and that its
eigenfunctions are twice continuously differentiable eigenfunctions for the
periodic boundary problem. Hint:- Boundedness follows from the proper-
ties of SV , as does compactness with a bit more effort. For self-adjointness
integrate the equation by parts.
(7) Conclude the analogue of Theorem 4.5 for periodic boundary conditions,
i.e. Hill’s equation.

2. Mehler’s formula and completeness


Starting from the ground state for the harmonic oscillator
d2 2
(B.6) P =− 2
+ x2 , Hu0 = u0 , u0 = e−x /2
dx
and using the creation and annihilation operators
d d
(B.7) An = + x, Cr = − + x, An Cr − Cr An = 2 Id, H = Cr An + Id
dx dx
we have constructed the higher eigenfunctions:
(B.8) uj = Crj u0 = pj (x)u0 (c), p(x) = 2j xj + l.o.ts, Huj = (2j + 1)uj
and shown that these are orthogonal, uj ⊥ uk , j 6= k, and so when normalized give
an orthonormal system in L2 (R) :
uj
(B.9) ej = 1 1 .
j/2
2 (j!) 2 π 4
Now, what we want to see, is that these ej form an orthonormal basis of L2 (R),
meaning they are complete as an orthonormal sequence. There are various proofs
of this, but the only ‘simple’ ones I know involve the Fourier inversion formula and
I want to use the completeness to prove the Fourier inversion formula, so that will
not do. Instead I want to use a version of Mehler’s formula.
To show the completeness of the ej ’s it is enough to find a compact self-adjoint
operator with these as eigenfunctions and no null space. It is the last part which
is tricky. The first part is easy. Remembering that all the ej are real, we can find
an operator with the ej ;s as eigenfunctions with corresponding eigenvalues λj > 0
(say) by just defining
X∞ ∞
X Z
(B.10) Au(x) = λj (u, ej )ej (x) = λj ej (x) ej (y)u(y).
j=0 j=0

For this to be a compact operator we need λj → 0 as j → ∞, although for bound-


edness we just need the λj to be bounded. So, the problem with this is to show
2. MEHLER’S FORMULA AND COMPLETENESS 163

that A has no null space – which of course is just the completeness of the e0j since
(assuming all the λj are positive)
(B.11) Au = 0 ⇐⇒ u ⊥ ej ∀ j.
Nevertheless, this is essentially what we will do. The idea is to write A as an
integral operator and then work with that. I will take the λj = wj where w ∈ (0, 1).
The point is that we can find an explicit formula for

X
(B.12) Aw (x, y) = wj ej (x)ej (y) = A(w, x, y).
j=0

To find A(w, x, y) we will need to compute the Fourier transforms of the ej .


Recall that
F : L1 (R) −→ C∞ (R), F(u) = û,
(B.13)
Z
û(ξ) = e−ixξ u(x), sup |û| ≤ kukL1 .

Lemma B.1. The Fourier transform of u0 is



(B.14) (Fu0 )(ξ) = 2πu0 (ξ).
Proof. Since u0 is both continuous and Lebesgue integrable, the Fourier trans-
form is the limit of a Riemann integral
Z R
(B.15) û0 (ξ) = lim eiξx u0 (x).
R→∞ −R

Now, for the Riemann integral we can differentiate under the integral sign with
respect to the parameter ξ – since the integrand is continuously differentiable – and
see that
Z R
d
û0 (ξ) = lim ixeiξx u0 (x)
dξ R→∞ −R
Z R
d
= lim i eiξx (− u0 (x)
(B.16) R→∞ −R dx
Z R Z R
d iξx
eiξx u0 (x)

= lim −i e u0 (x) − ξ lim
R→∞ −R dx R→∞ −R

= −ξ û0 (ξ).
Here I have used the fact that An u0 = 0 and the fact that the boundary terms
in the integration by parts tend to zero rapidly with R. So this means that û0 is
annihilated by An :
d
(B.17) ( + ξ)û0 (ξ) = 0.

Thus, it follows that û0 (ξ) = c exp(−ξ 2 /2) since these are the only functions in
annihilated by An . The constant is easy to compute, since
Z
2 √
(B.18) û0 (0) = e−x /2 dx = 2π

proving (B.14). 
164 B. PROBLEMS FOR CHAPTER ??

We can use this formula, of if you prefer the argument to prove it, to show that
2 √ 2
(B.19) v = e−x /4 =⇒ v̂ = πe−ξ .
Changing the names of the variables this just says
Z
−x2 1 2
(B.20) e = √ eixs−s /4 ds.
2 π R
The definition of the uj ’s can be rewritten
d 2 2 d 2
(B.21) uj (x) = (− + x)j e−x /2 = ex /2 (− )j e−x
dx dx
2
as is easy to see inductively – the point being that ex /2 is an integrating factor for
the creation operator. Plugging this into (B.20) and carrying out the derivatives –
which is legitimate since the integral is so strongly convergent – gives
2
ex /2
Z
2
(B.22) uj (x) = √ (−is)j eixs−s /4 ds.
2 π R
Now we can use this formula twice on the sum on the left in (B.12) and insert
the normalizations in (B.9) to find that
∞ ∞ 2 2
ex /2+y /2 (−1)j wj sj tj isx+ity−s2 /4−t2 /4
X X Z
(B.23) wj ej (x)ej (y) = e dsdt.
j=0 j=0
4π 3/2 R2 2j j!

The crucial thing here is that we can sum the series to get an exponential, this
allows us to finally conclude:
Lemma B.2. The identity (B.12) holds with
 
1 1−w 2 1+w 2
(B.24) A(w, x, y) = √ √ exp − (x + y) − (x − y)
π 1 − w2 4(1 + w) 4(1 − w)
Proof. Summing the series in (B.23) we find that
2 2
ex /2+y /2
Z
1 1 1
(B.25) A(w, x, y) = exp(− wst + isx + ity − s2 − t2 )dsdt.
4π 3/2 R2 2 4 4
Now, we can use the same formula as before for the Fourier transform of u0 to
evaluate these integrals explicitly. One way to do this is to make a change of
variables by setting
√ √
(B.26) s = (S + T )/ 2, t = (S − T )/ 2 =⇒ dsdt = dSdT,
1 1 1 x+y 1 x−y 1
− wst + isx + ity − s2 − t2 = iS √ − (1 + w)S 2 + iT √ − (1 − w)T 2 .
2 4 4 2 4 2 4
Note that the integrals in (B.25) are ‘improper’ (but rapidly convergent) Riemann
integrals, so there is no problem with the change of variable formula. The formula
for the Fourier transform of exp(−x2 ) can be used to conclude that

(x + y)2
Z
x+y 1 2 2 π
exp(iS √ − (1 + w)S )dS = p exp(− )
R 2 4 (1 + w) 2(1 + w)
(B.27) √
x−y 1 (x − y)2
Z
2 π
exp(iT √ − (1 − w)T 2 )dT = p exp(− ).
R 2 4 (1 − w) 2(1 − w)
2. MEHLER’S FORMULA AND COMPLETENESS 165

Inserting these formulæ back into (B.25) gives


(x + y)2 (x − y)2 x2 y2
 
1
(B.28) A(w, x, y) = √ √ exp − − + +
π 1 − w2 2(1 + w) 2(1 − w) 2 2
which after a little adjustment gives (B.24). 
Now, this explicit representation of Aw as an integral operator allows us to
show
Proposition B.1. For all real-valued f ∈ L2 (R),

X
(B.29) |(u, ej )|2 = kf k2L2 .
j=1

Proof. By definition of Aw
X∞
(B.30) |(u, ej )|2 = lim(f, Aw f )
w↑1
j=1

so (B.29) reduces to
(B.31) lim(f, Aw f ) = kf k2L2 .
w↑1

To prove (B.31) we will make our work on the integral operators rather simpler
by assuming first that f ∈ C(R) is continuous and vanishes outside some bounded
interval, f (x) = 0 in |x| > R. Then we can write out the L2 inner product as a
double integral, which is a genuine (iterated) Riemann integral:
Z Z
(B.32) (f, Aw f ) = A(w, x, y)f (x)f (y)dydx.

Here I have used the fact that f and A are real-valued.


Look at the formula for A in (B.24). The first thing to notice is the factor
1
(1 − w2 )− 2 which blows up as w → 1. On the other hand, the argument of the
exponential has two terms, the first tends to 0 as w → 1 and the becomes very
large and negative, at least when x − y 6= 0. Given the signs, we see that
if  > 0, X = {(x, y); |x| ≤ R, |y| ≤ R, |x − y| ≥ } then
(B.33) sup |A(w, x, y)| → 0 as w → 1.
X

So, the part of the integral in (B.32) over |x − y| ≥  tends to zero as w → 1.


So, look at the other part, where |x − y| ≤ . By the (uniform) continuity of f,
given δ > 0 there exits  > 0 such that
(B.34) |x − y| ≤  =⇒ |f (x) − f (y)| ≤ δ.
Now we can divide (B.32) up into three pieces:-
Z
(B.35) (f, Aw f ) = A(w, x, y)f (x)f (y)dydx
S∩{|x−y|≥}
Z
+ A(w, x, y)(f (x) − f (y))f (y)dydx
S∩{|x−y|≤}
Z
+ A(w, x, y)f (y)2 dydx
S∩{|x−y|≤}
166 B. PROBLEMS FOR CHAPTER ??

where S = [−R, R]2 .


Look now at the third integral in (B.35) since
q it is the important one. We can
1+w
change variable of integration from x to t = 1−w (x − y). Since |x − y| ≤ , the
q
1+w
new t variable runs over |t| ≤  1−w and then the integral becomes
r
1−w
Z
A(w, y + t , y)f (y)2 dydt, where
1+w
q
1+w
S∩{|t|≤ 1−w }
r
(B.36) A(w, y+t 1 − w , y)
1+w

   2
1 1−w t
=√ exp − (2y + t 1 − w)2 exp − .
π(1 + w) 4(1 + w) 4
Here y is bounded; the first exponential factor tends to 1 and the t domain extends
to (−∞, ∞) as w → 1, so it follows that for any  > 0 the third term in (B.35)
tends to

Z
2
(B.37) 2
kf kL2 as w → 1 since e−t /4 = 2 π.

Noting that A ≥ 0 the same argument shows that the second term is bounded by
a constant multiple of δ. Now, we have already shown that the first term in (B.35)
tends to zero as  → 0, so this proves (B.31) – given some γ > 0 first choose  > 0
so small that the first two terms are each less than 21 γ and then let w ↑ 0 to see
that the lim sup and lim inf as w ↑ 0 must lie in the range [kf k2 − γ, kf k2 + γ]. Since
this is true for all γ > 0 the limit exists and (B.29) follows under the assumption
that f is continuous and vanishes outside some interval [−R, R].
This actually suffices to prove the completeness of the Hermite basis. In any
case, the general case follows by continuity since such continuous functions vanishing
outside compact sets are dense in L2 (R) and both sides of (B.29) are continuous in
f ∈ L2 (R). 

Now, (B.31) certainly implies that the ej form an orthonormal basis, which is
what we wanted to show – but hard work! It is done here in part to remind you
of how we did the Fourier series computation of the same sort and to suggest that
you might like to compare the two arguments.

3. Friedrichs’ extension
Next I will discuss an abstract Hilbert space set-up which covers the treatment
of the Dirichlet problem above and several other applications to differential equa-
tions and indeed to other problems. I am attributing this method to Friedrichs and
he certainly had a hand in it.
Instead of just one Hilbert space we will consider two at the same time. First is
a ‘background’ space, H, a separable infinite-dimensional Hilbert space which you
can think of as being something like L2 (I) for some interval I. The inner product
on this I will denote (·, ·)H or maybe sometimes leave off the ‘H’ since this is the
basic space. Let me denote a second, separable infinite-dimensional, Hilbert space
as D, which maybe stands for ‘domain’ of some operator. So D comes with its own
inner product (·, ·)D where I will try to remember not to leave off the subscript.
3. FRIEDRICHS’ EXTENSION 167

The relationship between these two Hilbert spaces is given by a linear map
(B.38) i : D −→ H.
This is denoted ‘i’ because it is supposed to be an ‘inclusion’. In particular I will
always require that
(B.39) i is injective.
Since we will not want to have parts of H which are inaccessible, I will also assume
that
(B.40) i has dense range i(D) ⊂ H.
In fact because of these two conditions it is quite safe to identify D with i(D)
and think of each element of D as really being an element of H. The subspace
‘i(D) = D’ will not be closed, which is what we are used to thinking about (since it
is dense) but rather has its own inner product (·, ·)D . Naturally we will also suppose
that i is continuous and to avoid too many constants showing up I will suppose that
i has norm at most 1 so that
(B.41) ki(u)kH ≤ kukD .
If you are comfortable identifying i(D) with D this just means that the ‘D-norm’
on D is bigger than the H norm restricted to D. A bit later I will assume one more
thing about i.
What can we do with this setup? Well, consider an arbitrary element f ∈ H.
Then consider the linear map
(B.42) Tf : D 3 u −→ (i(u), f )H ∈ C.
where I have put in the identification i but will leave it out from now on, so just
write Tf (u) = (u, f )H . This is in fact a continuous linear functional on D since by
Cauchy-Schwarz and then (B.41),
(B.43) |Tf (u)| = |(u, f )H | ≤ kukH kf kH ≤ kf kH kukD .
So, by the Riesz’ representation – so using the assumed completeness of D (with
respect to the D-norm of course) there exists a unique element v ∈ D such that
(B.44) (u, f )H = (u, v)D ∀ u ∈ D.
Thus, v only depends on f and always exists, so this defines a map
(B.45) B : H −→ D, Bf = v iff (f, u)H = (v, u)D ∀ u ∈ D
where I have taken complex conjugates of both sides of (B.44).
Lemma B.3. The map B is a continuous linear map H −→ D and restricted
to D is self-adjoint:
(B.46) (Bw, u)D = (w, Bu)D ∀ u, w ∈ D.
The assumption that D ⊂ H is dense implies that B : H −→ D is injective.
Proof. The linearity follows from the uniqueness and the definition. Thus if
fi ∈ H and ci ∈ C for i = 1, 2 then
(c1 f1 + c2 f2 , u)H = c1 (f1 , u)H + c2 (f2 , u)H
(B.47)
= c1 (Bf1 , u)D + c2 (Bf2 , u)D = (c1 Bf1 + c2 Bf2 , u)D ∀ u ∈ D
168 B. PROBLEMS FOR CHAPTER ??

shows that B(c1 f1 + c2 f2 ) = c1 Bf1 + c2 Bf2 . Moreover from the estimate (B.43),
(B.48) |(Bf, u)D | ≤ kf kH kukD
and setting u = Bf it follows that kBf kD ≤ kf kH which is the desired continuity.
To see the self-adjointness suppose that u, w ∈ D, and hence of course since
we are erasing i, u, w ∈ H. Then, from the definitions
(B.49) (Bu, w)D = (u, w)H = (w, u)H = (Bw, u)D = (u, Bw)D
so B is self-adjoint.
Finally observe that Bf = 0 implies that (Bf, u)D = 0 for all u ∈ D and hence
that (f, u)H = 0, but since D is dense, this implies f = 0 so B is injective. 

To go a little further we will assume that the inclusion i is compact. Explicitly


this means
(B.50) un *D u =⇒ un (= i(un )) →H u
where the subscript denotes which space the convergence is in. Thus compactness
means that a weakly convergent sequence in D is, or is mapped to, a strongly
convergent sequence in H.
Lemma B.4. Under the assumptions (B.38), (B.39), (B.40), (B.41) and (B.50)
on the inclusion of one Hilbert space into another, the operator B in (B.45) is
compact as a self-adjoint operator on D and has only positive eigenvalues.
Proof. Suppose un * u is weakly convergent in D. Then, by assumption it is
strongly convergent in H. But B is continuous as a map from H to D so Bun → Bu
in D and it follows that B is compact as an operator on D.
So, we know that D has an orthonormal basis of eigenvectors of B. None of
the eigenvalues λj can be zero since B is injective. Moreover, from the definition if
Buj = λj uj then
(B.51) kuj k2H = (uj , uj )H = (Buj , uj )D = λj kuj k2D
showing that λj > 0. 

Now, in view of this we can define another compact operator on D by


1
(B.52) Auj = λj2 uj
taking the positive square-roots. So of course A2 = B. In fact A : H −→ D is also
a bounded operator.
Lemma B.5. If uj is an orthonormal basis of D of eigenvectors of B then
1
fj = λ− 2 uj is an orthonormal basis of H and A : D −→ D extends by continuity
to an isometric isomorphism A : H −→ D.
Proof. The identity (B.51) extends to pairs of eigenvectors
(B.53) (uj , uk )H = (Buj , uk )D = λj δjk
which shows that the fj form an orthonormal sequence in H. The span is dense
in D (in the H norm) and hence is dense in H so this set is complete. Thus A
maps an orthonormal basis of H to an orthonormal basis of D, so it is an isometric
isomorphism. 
3. FRIEDRICHS’ EXTENSION 169

If you think about this a bit you will see that this is an abstract version of the
treatment of the ‘trivial’ Dirichlet problem above, except that I did not describe
the Hilbert space D concretely in that case.
There are various ways this can be extended. One thing to note is that the
failure of injectivity, i.e. the loss of (B.39) is not so crucial. If i is not injective,
then its null space is a closed subspace and we can take its orthocomplement in
place of D. The result is the same except that the operator D is only defined on
this orthocomplement.
An additional thing to observe is that the completeness of D, although used
crucially above in the application of Riesz’ Representation theorem, is not really
such a big issue either
Proposition B.2. Suppose that D̃ is a pre-Hilbert space with inner product
(·, ·)D and i : Ã −→ H is a linear map into a Hilbert space. If this map is injective,
has dense range and satisfies (B.41) in the sense that
(B.54) ki(u)kH ≤ kukD ∀ u ∈ D̃
then it extends by continuity to a map of the completion, D, of D̃, satisfying (B.39),
(B.40) and (B.41) and if bounded sets in D̃ are mapped by i into precompact sets
in H then (B.50) also holds.

Proof. We know that a completion exists, D̃ ⊂ D, with inner product re-


stricting to the given one and every element of D is then the limit of a Cauchy
sequence in D̃. So we denote without ambiguity the inner product on D again as
(·, ·)D . Since i is continuous with respect to the norm on D (and on H of course)
it extends by continuity to the closure of D̃, namely D as i(u) = limn i(un ) if un is
Cauchy in D̃ and hence converges in D; this uses the completeness of H since i(un )
is Cauchy in H. The value of i(u) does not depend on the choice of approximating
sequence, since if vn → 0, i(vn ) → 0 by continuity. So, it follows that i : D −→ H
exists, is linear and continuous and its norm is no larger than before so (B.38)
holds. 

The map extended map may not be injective, i.e. it might happen that i(un ) → 0
even though un → u 6= 0.
The general discussion of the set up of Lemmas B.4 and B.5 can be continued
further. Namely, having defined the operators B and A we can define a new positive-
definite Hermitian form on H by
(B.55) (u, v)E = (Au, Av)H , u, v ∈ H
with the same relationship as between (·, ·)H and (·, ·)D . Now, it follows directly
that
(B.56) kukH ≤ kukE
so if we let E be the completion of H with respect to this new norm, then i : H −→
E is an injection with dense range and A extends to an isometric isomorphism
A : E −→ H. Then if uj is an orthonormal basis of H of eigenfunctions of A with
eigenvalues τj > 0 it follows that uj ∈ D and that the τj−1 uj form an orthonormal
basis for D while the τj uj form an orthonormal basis for E.
170 B. PROBLEMS FOR CHAPTER ??

Lemma B.6. With E defined as above as the completion of H with respect to


the inner product (B.55), B extends by continuity to an isomoetric isomorphism
B : E −→ D.
Proof. Since B = A2 on H this follows from the properties of the eigenbases
above. 

The typical way that Friedrichs’ extension arises is that we are actually given
an explicit ‘operator’, a linear map P : D̃ −→ H such that (u, v)D = (u, P v)H
satisfies the conditions of Proposition B.2. Then P extends by continuity to an
isomorphism P : D −→ E which is precisely the inverse of B as in Lemma B.6. We
shall see examples of this below.

4. Dirichlet problem revisited


So, does the setup of the preceding section work for the Dirichlet problem? We
take H = L2 ((0, 2π)). Then, and this really is Friedrichs’ extension, we take as a
subspace D̃ ⊂ H the space of functions which are once continuously differentiable
and vanish outside a compact subset of (0, 2π). This just means that there is some
smaller interval, depending on the function, [δ, 2π − δ], δ > 0, on which we have a
continuously differentiable function f with f (δ) = f 0 (δ) = f (2π−δ) = f 0 (2π−δ) = 0
and then we take it to be zero on (0, δ) and (2π − δ, 2π). There are lots of these,
let’s call the space D̃ as above
D̃ = {u ∈ C[0, 2π]; u continuously differentiable on [0, 2π],
(B.57)
u(x) = 0 in [0, δ] ∪ [2π − δ, 2π] for some δ > 0}.
Then our first claim is that
(B.58) D̃ is dense in L2 (0, 2π)
with respect to the norm on L2 of course.
What inner product should we take on D̃? Well, we can just integrate formally
by parts and set
Z
1 du dv
(B.59) (u, v)D = dx.
2π [0,2π] dx dx
This is a pre-Hilbert inner product. To check all this note first that (u, u)D = 0
implies du/dx = 0 by Riemann integration (since |du/dx|2 is continuous) and since
u(x) = 0 in x < δ for some δ > 0 it follows that u = 0. Thus (u, v)D makes D̃ into
a pre-Hilbert space, since it is a positive definite sesquilinear form. So, what about
the completion? Observe that, the elements of D̃ being continuously differentiable,
we can always integrate from x = 0 and see that
Z x
du
(B.60) u(x) = dx
0 dx

as u(0) = 0. Now, to say that un ∈ D̃ is Cauchy is to say that the continuous


functions vn = dun /dx are Cauchy in L2 (0, 2π). Thus, from the completeness of L2
we know that vn → v ∈ L2 (0, 2π). On the other hand (B.60) applies to each un so
Z x √
(B.61) |un (x) − um (x)| = | (vn (s) − vm (s))ds| ≤ 2πkvn − vm kL2
0
5. ISOTROPIC SPACE 171

by applying Cauchy-Schwarz. Thus in fact the sequence un is uniformly Cauchy


in C([0, 2π]) if un is Cauchy in D̃. From the completeness of the Banach space of
continuous functions it follows that un → u in C([0, 2π]) so each element of the
completion, D̃, ‘defines’ (read ‘is’) a continuous function:
(B.62) un → u ∈ D =⇒ u ∈ C([0, 2π]), u(0) = u(2π) = 0
where the Dirichlet condition follows by continuity from (B.61).
Thus we do indeed get an injection
(B.63) D 3 u −→ u ∈ L2 (0, 2π)
where the injectivity follows from (B.60) that if v = lim dun /dx vanishes in L2 then
u = 0.
Now, you can go ahead and check that with these definitions, B and A are the
same operators as we constructed in the discussion of the Dirichlet problem.

5. Isotropic space
There are some functions which should be in the domain of P, namely the twice
continuously differentiable functions on R with compact support, those which vanish
outside a finite interval. Recall that there are actually a lot of these, they are dense
in L2 (R). Following what we did above for the Dirichlet problem set

(B.64) D̃ = {u : R 7−→ C; ∃ R s.t. u = 0 in |x| > R,


u is twice continuously differentiable on R}.
Now for such functions integration by parts on a large enough interval (depend-
ing on the functions) produces no boundary terms so
Z Z  
du dv 2
(B.65) (P u, v)L2 = (P u)v = + x uv = (u, v)iso
R R dx dx
is a positive definite hermitian form on D̃. Indeed the vanishing of kukS implies
that kxukL2 = 0 and so u = 0 since u ∈ D̃ is continuous. The suffix ‘iso’ here
stands for ‘isotropic’ and refers to the fact that xu and du/dx are essentially on the
same footing here. Thus
du dv
(B.66) (u, v)iso = ( , )L2 + (xu, xv)L2 .
dx dx
This may become a bit clearer later when we get to the Fourier transform.
1
Definition B.1. Let Hiso (R) be the completion of D̃ in (B.64) with respect
to the inner product (·, ·)iso .
Proposition B.3. The inclusion map i : D̃ −→ L2 (R) extends by continuity
1
to i : Hiso −→ L2 (R) which satisfies (B.38), (B.39), (B.40), (B.41) and (B.50) with
D = Hiso and H = L2 (R) and the derivative and multiplication maps define an
1

injection
1
(B.67) Hiso −→ L2 (R) × L2 (R).
Proof. Let us start with the last part, (B.67). The map here is supposed to
be the continuous extension of the map
du
(B.68) D̃ 3 u 7−→ ( , xu) ∈ L2 (R) × L2 (R)
dx
172 B. PROBLEMS FOR CHAPTER ??

where du/dx and xu are both compactly supported continuous functions in this
case. By definition of the inner product (·, ·)iso the norm is precisely
du 2
(B.69) kuk2iso = k k 2 + kxuk2L2
dx L
so if un is Cauchy in D̃ with respect to k · kiso then the sequences dun /dx and xun
are Cauchy in L2 (R). By the completeness of L2 they converge defining an element
in L2 (R) × L2 (R) as in (B.67). Moreover the elements so defined only depend on
the element of the completion that the Cauchy sequence defines. The resulting map
(B.67) is clearly continuous.
1
Now, we need to show that the inclusion i extends to Hiso from D̃. This follows
from another integration identity. Namely, for u ∈ D̃ the Fundamental theorem of
calculus applied to
d du du
(uxu) = |u|2 + xu + ux
dx dx dx
gives
Z Z
du du
(B.70) kuk2L2 ≤ | xu| + |ux | ≤ kuk2iso .
R dx dx
Thus the inequality (B.41) holds for u ∈ D̃.
It follows that the inclusion map i : D̃ −→ L2 (R) extends by continuity to Hiso1
1 2
since if un ∈ D̃ is Cauchy with respect in Hiso it is Cauchy in L (R). It remains to
check that i is injective and compact, since the range is already dense on D̃.
1
If u ∈ Hiso then to say i(u) = 0 (in L2 (R)) is to say that for any un → u in
Hiso , with un ∈ D̃, un → 0 in L2 (R) and we need to show that this means un → 0
1
1
in Hiso to conclude that u = 0. To do so we use the map (B.67). If un D̃ converges
in Hiso then it follows that the sequence ( du
1 2 2
dx , xu) converges in L (R) × L (R). If v is
a continuous function of compact support then (xun , v)L2 = (un , xv) → (u, xv)L2 ,
for if u = 0 it follows that xun → 0 as well. Similarly, using integration by parts
the limit U of du 2
dx in L (R) satisfies
n

Z Z
dun dv dv
(B.71) (U, v)L2 = lim v = − lim un = −(u, )L2 = 0
n dx n dx dx
1
if u = 0. It therefore follows that U = 0 so in fact un → 0 in Hiso and the injectivity
of i follows. 
1
We can see a little more about the metric on Hiso .
1
Lemma B.7. Elements of Hiso are continuous functions and convergence with
respect to k · kiso implies uniform convergence on bounded intervals.
Proof. For elements of the dense subspace D̃, (twice) continuously differ-
entiable and vanishing outside a bounded interval the Fundamental Theorem of
Calculus shows that
Z x Z x
2 d 2 2 2 du
u(x) = ex /2 ( (e−t /2 u) = ex /2 (e−t /2 (−tu + )) =⇒
−∞ dt −∞ dt
(B.72) Z x
2 2 1
|u(x)| ≤ ex /2 ( e−t ) 2 kukiso
−∞
5. ISOTROPIC SPACE 173

where the estimate comes from the Cauchy-Schwarz applied to the integral. It fol-
lows that if un → u with respect to the isotropic norm then the sequence converges
uniformly on bounded intervals with
(B.73) sup |u(x)| ≤ C(R)kukiso .
[−R,R]


Now, to proceed further we either need to apply some ‘regularity theory’ or do a
computation. I choose to do the latter here, although the former method (outlined
below) is much more general. The idea is to show that
Lemma B.8. The linear map (P + 1) : Cc2 (R) −→ Cc (R) is injective with range
dense in L2 (R) and if f ∈ L2 (R) ∩ C(R) there is a sequence un ∈ Cc2 (R) such
1
that un → u in Hiso , un → u locally uniformly with its first two derivatives and
(P + 1)un → f in L2 (R) and locally uniformly.
Proof. Why P + 1 and not P ? The result is actually true for P but not so
easy to show directly. The advantage of P + 1 is that it factorizes
(P + 1) = An Cr on Cc2 (R).
so we proceed to solve the equation (P + 1)u = f in two steps.
First, if f ∈ c(R) then using the natural integrating factor
Z x
2 2 2
(B.74) v(x) = ex /2 et /2 f (t)dt + ae−x /2 satisfies An v = f.
−∞

The integral here is not in general finite if f does not vanish in x < −R, which by
2
assumption it does. Note that An e−x /2 = 0. This solution is of the form
2
(B.75) v ∈ C 1 (R), v(x) = a± e−x /2
in ± x > R
where R depends on f and the constants can be different.
In the second step we need to solve away such terms – in general one cannot.
However, we can always choose a in (B.74) so that
Z
2
(B.76) e−x /2 v(x) = 0.
R
Now consider
Z x
2 2
(B.77) u(x) = ex /2
e−t /2
v(t)dt.
−∞

Here the integral does make sense because of the decay in v from (B.75) and u ∈
C 2 (R). We need to understand how it behaves as x → ±∞. From the second part
of (B.75),
Z
2 2
(B.78) u(x) = a− erf − (x), x < −R, erf − (x) = ex /2−t
(−∞,x]
2
is an incomplete error function. It’s derivative is e−x but it actually satisfies
2
(B.79) |x erf − (x)| ≤ Cex , x < −R.
2
In any case it is easy to get an estimate such as Ce−bx as x → −∞ for any
0 < b < 1 by Cauchy-Schwarz.
174 B. PROBLEMS FOR CHAPTER ??

As x → ∞ we would generally expect the solution to be rapidly increasing,


but precisely because of (B.76). Indeed the vanishing of this integral means we can
rewrite (B.77) as an integral from +∞ :
Z
2 2
(B.80) u(x) = −ex /2 e−t /2 v(t)dt
[x,∞)
and then the same estimates analysis yields
Z
2
/2−t2
(B.81) u(x) = −a+ erf + (x), x < −R, erf + (x) = ex
[x,∞)

So for any f ∈ Cc (R) we have found a solution of (P + 1)u = f with u satisfying


the rapid decay conditions (B.78) and (B.81). These are such that if χ ∈ Cc2 (R) has
χ(t) = 1 in |t| < 1 then the sequence
x
(B.82) un = χ( )u(x) → u, u0n → u0 , u00n → u00
n
in all cases with convergence in L2 (R) and uniformly and even such that x2 un → xu
uniformly and in L2 (R).
This yields the first part of the Lemma, since if f ∈ Cc (R) and u is the solution
just constructed to (P + 1)u = f then (P + 1)un → f in L2 . So the closure L2 (R)
in range of (P + 1) on Cc2 (R) includes Cc (R) so is certainly dense in L2 (R).
The second part also follows from this construction. If f ∈ L2 (R) ∩ C(R) then
the sequence
x
(B.83) fn = χ( )f (x) ∈ Cc (R)
n
converges to f both in L2 (R) and locally uniformly. Consider the solution, un to
(P + 1)un = fn constructed above. We want to show that un → u in L2 and
locally uniformly with its first two derivatives. The decay in un is enough to allow
integration by parts to see that
Z
(B.84) (P + 1)un un = kun k2iso + kuk2L2 = |(fn , un )| ≤ kfn kl2 kun kL2 .
R
1
This shows that the sequence is bounded in Hiso and applying the same estimate
1
to un − um that it is Cauchy and hence convergent in Hiso . This implies un → u in
1 2
Hiso and so both in L (R) and locally uniformly. The differential equation can be
written
(B.85) (un )00 = x2 un − un − fn
where the right side converges locally uniformly. It follows from a standard result
on uniform convergence of sequences of derivatives that in fact the uniform limit u
is twice continuously differentiable and that (un )00 → u00 locally uniformly. So in
fact (P + 1)u = f and the last part of the Lemma is also proved. 
Bibliography

[3] B. S. Mitjagin, The homotopy structure of a linear group of a Banach space, Uspehi Mat.
Nauk 25 (1970), no. 5(155), 63–106. MR 0341523 (49 #6274a)
[4] W. Rudin, Principles of mathematical analysis, 3rd ed., McGraw Hill, 1976.
[5] George F. Simmons, Introduction to topology and modern analysis, Robert E. Krieger Pub-
lishing Co. Inc., Melbourne, Fla., 1983, Reprint of the 1963 original. MR 84b:54002

175
MIT OpenCourseWare
https://ocw.mit.edu

18.102 / 18.1021 Introduction to Functional Analysis


Spring 2021

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

You might also like