Linear Algebra Done Right: Sheldon Axler
Linear Algebra Done Right: Sheldon Axler
Linear Algebra Done Right: Sheldon Axler
Sheldon Axler
Linear Algebra
Done Right
Fourth Edition
Undergraduate Texts in Mathematics
Undergraduate Texts in Mathematics
Series Editors
Pamela Gorkin, Mathematics Department, Bucknell University, Lewisburg, PA, USA
Jessica Sidman, Mathematics and Statistics, Amherst College, Amherst, MA, USA
Advisory Board
Colin Adams, Williams College, Williamstown, MA, USA
Jayadev S. Athreya, University of Washington, Seattle, WA, USA
Nathan Kaplan, University of California, Irvine, CA, USA
Jill Pipher, Brown University, Providence, RI, USA
Jeremy Tyson, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Mathematics Subject Classification (2020): 15-01, 15A03, 15A04, 15A15, 15A18, 15A21
© Sheldon Axler 1996, 1997, 2015, 2024. This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution-NonCommercial
4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncom-
mercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you
give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons
license and indicate if changes were made.
The images or other third party material in this book are included in the book’s Creative Commons
license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s
Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder.
This work is subject to copyright. All commercial rights are reserved by the author(s), whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Regarding these commercial rights a non-exclusive
license has been granted to the publisher.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland.
Cover equation: Formula for the 𝑛th Fibonacci number. Exercise 21 in Section 5D
uses linear algebra to derive this formula.
v
Contents
Acknowledgments xvii
Chapter 1
Vector Spaces 1
1A 𝐑𝑛 and 𝐂𝑛 2
Complex Numbers 2
Lists 5
𝐅𝑛 6
Digression on Fields 10
Exercises 1A 10
1B Definition of Vector Space 12
Exercises 1B 16
1C Subspaces 18
Sums of Subspaces 19
Direct Sums 21
Exercises 1C 24
Chapter 2
Finite-Dimensional Vector Spaces 27
2A Span and Linear Independence 28
Linear Combinations and Span 28
Linear Independence 31
Exercises 2A 37
vi
Contents vii
2B Bases 39
Exercises 2B 42
2C Dimension 44
Exercises 2C 48
Chapter 3
Linear Maps 51
3A Vector Space of Linear Maps 52
Definition and Examples of Linear Maps 52
Algebraic Operations on ℒ(𝑉, 𝑊) 55
Exercises 3A 57
3B Null Spaces and Ranges 59
Null Space and Injectivity 59
Range and Surjectivity 61
Fundamental Theorem of Linear Maps 62
Exercises 3B 66
3C Matrices 69
Representing a Linear Map by a Matrix 69
Addition and Scalar Multiplication of Matrices 71
Matrix Multiplication 72
Column–Row Factorization and Rank of a Matrix 77
Exercises 3C 79
3D Invertibility and Isomorphisms 82
Invertible Linear Maps 82
Isomorphic Vector Spaces 86
Linear Maps Thought of as Matrix Multiplication 88
Change of Basis 90
Exercises 3D 93
3E Products and Quotients of Vector Spaces 96
Products of Vector Spaces 96
Quotient Spaces 98
Exercises 3E 103
3F Duality 105
Dual Space and Dual Map 105
Null Space and Range of Dual of Linear Map 109
viii Contents
Chapter 4
Polynomials 119
Zeros of Polynomials 122
Division Algorithm for Polynomials 123
Factorization of Polynomials over 𝐂 124
Factorization of Polynomials over 𝐑 127
Exercises 4 129
Chapter 5
Eigenvalues and Eigenvectors 132
5A Invariant Subspaces 133
Eigenvalues 133
Polynomials Applied to Operators 137
Exercises 5A 139
5B The Minimal Polynomial 143
Existence of Eigenvalues on Complex Vector Spaces 143
Eigenvalues and the Minimal Polynomial 144
Eigenvalues on Odd-Dimensional Real Vector Spaces 149
Exercises 5B 150
5C Upper-Triangular Matrices 154
Exercises 5C 160
5D Diagonalizable Operators 163
Diagonal Matrices 163
Conditions for Diagonalizability 165
Gershgorin Disk Theorem 170
Exercises 5D 172
5E Commuting Operators 175
Exercises 5E 179
Chapter 6
Inner Product Spaces 181
6A Inner Products and Norms 182
Inner Products 182
Contents ix
Norms 186
Exercises 6A 191
6B Orthonormal Bases 197
Orthonormal Lists and the Gram–Schmidt Procedure 197
Linear Functionals on Inner Product Spaces 204
Exercises 6B 207
6C Orthogonal Complements and Minimization Problems 211
Orthogonal Complements 211
Minimization Problems 217
Pseudoinverse 220
Exercises 6C 224
Chapter 7
Operators on Inner Product Spaces 227
7A Self-Adjoint and Normal Operators 228
Adjoints 228
Self-Adjoint Operators 233
Normal Operators 235
Exercises 7A 239
7B Spectral Theorem 243
Real Spectral Theorem 243
Complex Spectral Theorem 246
Exercises 7B 247
7C Positive Operators 251
Exercises 7C 255
7D Isometries, Unitary Operators, and Matrix Factorization 258
Isometries 258
Unitary Operators 260
QR Factorization 263
Cholesky Factorization 266
Exercises 7D 268
7E Singular Value Decomposition 270
Singular Values 270
SVD for Linear Maps and for Matrices 273
Exercises 7E 278
x Contents
Chapter 8
Operators on Complex Vector Spaces 297
8A Generalized Eigenvectors and Nilpotent Operators 298
Null Spaces of Powers of an Operator 298
Generalized Eigenvectors 300
Nilpotent Operators 303
Exercises 8A 306
8B Generalized Eigenspace Decomposition 308
Generalized Eigenspaces 308
Multiplicity of an Eigenvalue 310
Block Diagonal Matrices 314
Exercises 8B 316
8C Consequences of Generalized Eigenspace Decomposition 319
Square Roots of Operators 319
Jordan Form 321
Exercises 8C 324
8D Trace: A Connection Between Matrices and Operators 326
Exercises 8D 330
Chapter 9
Multilinear Algebra and Determinants 332
9A Bilinear Forms and Quadratic Forms 333
Bilinear Forms 333
Symmetric Bilinear Forms 337
Quadratic Forms 341
Exercises 9A 344
Contents xi
Index 385
You are probably about to begin your second exposure to linear algebra. Unlike
your first brush with the subject, which probably emphasized Euclidean spaces
and matrices, this encounter will focus on abstract vector spaces and linear maps.
These terms will be defined later, so don’t worry if you do not know what they
mean. This book starts from the beginning of the subject, assuming no knowledge
of linear algebra. The key point is that you are about to immerse yourself in
serious mathematics, with an emphasis on attaining a deep understanding of the
definitions, theorems, and proofs.
You cannot read mathematics the way you read a novel. If you zip through a
page in less than an hour, you are probably going too fast. When you encounter
the phrase “as you should verify”, you should indeed do the verification, which
will usually require some writing on your part. When steps are left out, you need
to supply the missing pieces. You should ponder and internalize each definition.
For each theorem, you should seek examples to show why each hypothesis is
necessary. Discussions with other students should help.
As a visual aid, definitions are in yellow boxes and theorems are in blue boxes
(in color versions of the book). Each theorem has an infomal descriptive name.
Please check the website below for additional information about the book,
including a link to videos that are freely available to accompany the book.
Your suggestions, comments, and corrections are most welcome.
Best wishes for success and enjoyment in learning linear algebra!
Sheldon Axler
San Francisco State University
website: https://linear.axler.net
e-mail: [email protected]
xii
Preface for Instructors
You are about to teach a course that will probably give students their second
exposure to linear algebra. During their first brush with the subject, your students
probably worked with Euclidean spaces and matrices. In contrast, this course will
emphasize abstract vector spaces and linear maps.
The title of this book deserves an explanation. Most linear algebra textbooks
use determinants to prove that every linear operator on a finite-dimensional com-
plex vector space has an eigenvalue. Determinants are difficult, nonintuitive,
and often defined without motivation. To prove the theorem about existence of
eigenvalues on complex vector spaces, most books must define determinants,
prove that a linear operator is not invertible if and only if its determinant equals 0,
and then define the characteristic polynomial. This tortuous (torturous?) path
gives students little feeling for why eigenvalues exist.
In contrast, the simple determinant-free proofs presented here (for example,
see 5.19) offer more insight. Once determinants have been moved to the end of
the book, a new route opens to the main goal of linear algebra—understanding
the structure of linear operators.
This book starts at the beginning of the subject, with no prerequisites other
than the usual demand for suitable mathematical maturity. A few examples
and exercises involve calculus concepts such as continuity, differentiation, and
integration. You can easily skip those examples and exercises if your students
have not had calculus. If your students have had calculus, then those examples and
exercises can enrich their experience by showing connections between different
parts of mathematics.
Even if your students have already seen some of the material in the first few
chapters, they may be unaccustomed to working exercises of the type presented
here, most of which require an understanding of proofs.
Here is a chapter-by-chapter summary of the highlights of the book:
• Chapter 1: Vector spaces are defined in this chapter, and their basic properties
are developed.
• Chapter 2: Linear independence, span, basis, and dimension are defined in this
chapter, which presents the basic theory of finite-dimensional vector spaces.
• Chapter 3: This chapter introduces linear maps. The key result here is the
fundamental theorem of linear maps: if 𝑇 is a linear map on 𝑉, then dim 𝑉 =
dim null 𝑇 + dim range 𝑇. Quotient spaces and duality are topics in this chapter
at a higher level of abstraction than most of the book; these topics can be
skipped (except that duality is needed for tensor products in Section 9D).
xiii
xiv Preface for Instructors
• Chapter 4: The part of the theory of polynomials that will be needed to un-
derstand linear operators is presented in this chapter. This chapter contains no
linear algebra. It can be covered quickly, especially if your students are already
familiar with these results.
• Chapter 5: The idea of studying a linear operator by restricting it to small sub-
spaces leads to eigenvectors in the early part of this chapter. The highlight of this
chapter is a simple proof that on complex vector spaces, eigenvalues always ex-
ist. This result is then used to show that each linear operator on a complex vector
space has an upper-triangular matrix with respect to some basis. The minimal
polynomial plays an important role here and later in the book. For example, this
chapter gives a characterization of the diagonalizable operators in terms of the
minimal polynomial. Section 5E can be skipped if you want to save some time.
• Chapter 6: Inner product spaces are defined in this chapter, and their basic
properties are developed along with tools such as orthonormal bases and the
Gram–Schmidt procedure. This chapter also shows how orthogonal projections
can be used to solve certain minimization problems. The pseudoinverse is then
introduced as a useful tool when the inverse does not exist. The material on
the pseudoinverse can be skipped if you want to save some time.
• Chapter 7: The spectral theorem, which characterizes the linear operators for
which there exists an orthonormal basis consisting of eigenvectors, is one of
the highlights of this book. The work in earlier chapters pays off here with espe-
cially simple proofs. This chapter also deals with positive operators, isometries,
unitary operators, matrix factorizations, and especially the singular value de-
composition, which leads to the polar decomposition and norms of linear maps.
• Chapter 8: This chapter shows that for each operator on a complex vector space,
there is a basis of the vector space consisting of generalized eigenvectors of the
operator. Then the generalized eigenspace decomposition describes a linear
operator on a complex vector space. The multiplicity of an eigenvalue is defined
as the dimension of the corresponding generalized eigenspace. These tools are
used to prove that every invertible linear operator on a complex vector space
has a square root. Then the chapter gives a proof that every linear operator on
a complex vector space can be put into Jordan form. The chapter concludes
with an investigation of the trace of operators.
• Chapter 9: This chapter begins by looking at bilinear forms and showing that the
vector space of bilinear forms is the direct sum of the subspaces of symmetric
bilinear forms and alternating bilinear forms. Then quadratic forms are diag-
onalized. Moving to multilinear forms, the chapter shows that the subspace of
alternating 𝑛-linear forms on an 𝑛-dimensional vector space has dimension one.
This result leads to a clean basis-free definition of the determinant of an opera-
tor. For complex vector spaces, the determinant turns out to equal the product of
the eigenvalues, with each eigenvalue included in the product as many times as
its multiplicity. The chapter concludes with an introduction to tensor products.
Preface for Instructors xv
This book usually develops linear algebra simultaneously for real and complex
vector spaces by letting 𝐅 denote either the real or the complex numbers. If you and
your students prefer to think of 𝐅 as an arbitrary field, then see the comments at the
end of Section 1A. I prefer avoiding arbitrary fields at this level because they intro-
duce extra abstraction without leading to any new linear algebra. Also, students are
more comfortable thinking of polynomials as functions instead of the more formal
objects needed for polynomials with coefficients in finite fields. Finally, even if the
beginning part of the theory were developed with arbitrary fields, inner product
spaces would push consideration back to just real and complex vector spaces.
You probably cannot cover everything in this book in one semester. Going
through all the material in the first seven or eight chapters during a one-semester
course may require a rapid pace. If you must reach Chapter 9, then consider
skipping the material on quotient spaces in Section 3E, skipping Section 3F
on duality (unless you intend to cover tensor products in Section 9D), covering
Chapter 4 on polynomials in a half hour, skipping Section 5E on commuting
operators, and skipping the subsection in Section 6C on the pseudoinverse.
A goal more important than teaching any particular theorem is to develop in
students the ability to understand and manipulate the objects of linear algebra.
Mathematics can be learned only by doing. Fortunately, linear algebra has many
good homework exercises. When teaching this course, during each class I usually
assign as homework several of the exercises, due the next class. Going over the
homework might take up significant time in a typical class.
Some of the exercises are intended to lead curious students into important
topics beyond what might usually be included in a basic second course in linear
algebra.
Please check the website below for additional links and information about the
book. Your suggestions, comments, and corrections are most welcome.
Best wishes for teaching a successful linear algebra class!
Sheldon Axler Contact the author, or Springer if the
San Francisco State University author is not available, for permission
website: https://linear.axler.net for translations or other commercial
e-mail: [email protected] reuse of the contents of this book.
Acknowledgments
I owe a huge intellectual debt to all the mathematicians who created linear algebra
over the past two centuries. The results in this book belong to the common heritage
of mathematics. A special case of a theorem may first have been proved long ago,
then sharpened and improved by many mathematicians in different time periods.
Bestowing proper credit on all contributors would be a difficult task that I have
not undertaken. In no case should the reader assume that any result presented
here represents my original contribution.
Many people helped make this a better book. The three previous editions of
this book were used as a textbook at over 375 universities and colleges around
the world. I received thousands of suggestions and comments from faculty and
students who used the book. Many of those suggestions led to improvements
in this edition. The manuscript for this fourth edition was class tested at 30
universities. I am extremely grateful for the useful feedback that I received from
faculty and students during this class testing.
The long list of people who should be thanked for their suggestions would
fill up many pages. Lists are boring to read. Thus to represent all contributors
to this edition, I will mention only Noel Hinton, a graduate student at Australian
National University, who sent me more suggestions and corrections for this fourth
edition than anyone else. To everyone who contributed suggestions, let me say
how truly grateful I am to all of you. Many many thanks!
I thank Springer for providing me with help when I needed it and for allowing
me the freedom to make the final decisions about the content and appearance
of this book. Special thanks to the two terrific mathematics editors at Springer
who worked with me on this project—Loretta Bartolini during the first half of
my work on the fourth edition, and Elizabeth Loew during the second half of my
work on the fourth edition. I am deeply indebted to David Kramer, who did a
magnificent job of copyediting and prevented me from making many mistakes.
Extra special thanks to my fantastic partner Carrie Heeter. Her understanding
and encouragement enabled me to work intensely on this new edition. Our won-
derful cat Moon, whose picture appears on the About the Author page, provided
sweet breaks throughout the writing process. Moon died suddenly due to a blood
clot as this book was being finished. We are grateful for five precious years with
him.
Sheldon Axler
xvii
Chapter 1
Vector Spaces
1A 𝐑𝑛 and 𝐂𝑛
Complex Numbers
You should already be familiar with basic properties of the set 𝐑 of real numbers.
Complex numbers were invented so that we can take square roots of negative
numbers. The idea is to assume we have a square root of −1, denoted by 𝑖, that
obeys the usual rules of arithmetic. Here are the formal definitions.
𝐂 = {𝑎 + 𝑏𝑖 ∶ 𝑎, 𝑏 ∈ 𝐑}.
• Addition and multiplication on 𝐂 are defined by
here 𝑎, 𝑏, 𝑐, 𝑑 ∈ 𝐑 .
Our first result states that complex addition and complex multiplication have
the familiar properties that we expect.
commutativity
𝛼 + 𝛽 = 𝛽 + 𝛼 and 𝛼𝛽 = 𝛽𝛼 for all 𝛼, 𝛽 ∈ 𝐂 .
associativity
(𝛼 + 𝛽) + 𝜆 = 𝛼 + (𝛽 + 𝜆) and (𝛼𝛽)𝜆 = 𝛼(𝛽𝜆) for all 𝛼, 𝛽, 𝜆 ∈ 𝐂 .
identities
𝜆 + 0 = 𝜆 and 𝜆1 = 𝜆 for all 𝜆 ∈ 𝐂 .
additive inverse
For every 𝛼 ∈ 𝐂 , there exists a unique 𝛽 ∈ 𝐂 such that 𝛼 + 𝛽 = 0.
multiplicative inverse
For every 𝛼 ∈ 𝐂 with 𝛼 ≠ 0, there exists a unique 𝛽 ∈ 𝐂 such that 𝛼𝛽 = 1.
distributive property
𝜆(𝛼 + 𝛽) = 𝜆𝛼 + 𝜆𝛽 for all 𝜆, 𝛼, 𝛽 ∈ 𝐂 .
The properties above are proved using the familiar properties of real numbers
and the definitions of complex addition and multiplication. The next example
shows how commutativity of complex multiplication is proved. Proofs of the
other properties above are left as exercises.
𝛼 = 𝑎 + 𝑏𝑖 and 𝛽 = 𝑐 + 𝑑𝑖,
𝛼𝛽 = (𝑎 + 𝑏𝑖)(𝑐 + 𝑑𝑖)
= (𝑎𝑐 − 𝑏𝑑) + (𝑎𝑑 + 𝑏𝑐)𝑖
and
𝛽𝛼 = (𝑐 + 𝑑𝑖)(𝑎 + 𝑏𝑖)
= (𝑐𝑎 − 𝑑𝑏) + (𝑐𝑏 + 𝑑𝑎)𝑖.
The equations above and the commutativity of multiplication and addition of real
numbers show that 𝛼𝛽 = 𝛽𝛼.
4 Chapter 1 Vector Spaces
Suppose 𝛼, 𝛽 ∈ 𝐂 .
• Let −𝛼 denote the additive inverse of 𝛼. Thus −𝛼 is the unique complex
number such that
𝛼 + (−𝛼) = 0.
• Subtraction on 𝐂 is defined by
𝛽 − 𝛼 = 𝛽 + (−𝛼).
𝛼(1/𝛼) = 1.
𝛽/𝛼 = 𝛽(1/𝛼).
So that we can conveniently make definitions and prove theorems that apply
to both real and complex numbers, we adopt the following notation.
1.6 notation: 𝐅
Lists
Before defining 𝐑𝑛 and 𝐂𝑛, we look at two important examples.
𝐅𝑛
To define the higher-dimensional analogues of 𝐑2 and 𝐑3, we will simply replace
𝐑 with 𝐅 (which equals 𝐑 or 𝐂 ) and replace the 2 or 3 with an arbitrary positive
integer.
1.10 notation: 𝑛
1.12 example: 𝐂4
𝐂4 = {(𝑧1 , 𝑧2 , 𝑧3 , 𝑧4 ) ∶ 𝑧1 , 𝑧2 , 𝑧3 , 𝑧4 ∈ 𝐂}.
If 𝑥, 𝑦 ∈ 𝐅𝑛, then 𝑥 + 𝑦 = 𝑦 + 𝑥.
𝑥 + 𝑦 = (𝑥1 , …, 𝑥𝑛 ) + (𝑦1 , …, 𝑦𝑛 )
= (𝑥1 + 𝑦1 , …, 𝑥𝑛 + 𝑦𝑛 )
= (𝑦1 + 𝑥1 , …, 𝑦𝑛 + 𝑥𝑛 )
= (𝑦1 , …, 𝑦𝑛 ) + (𝑥1 , …, 𝑥𝑛 )
= 𝑦 + 𝑥,
where the second and fourth equalities above hold because of the definition of
addition in 𝐅𝑛 and the third equality holds because of the usual commutativity of
addition in 𝐅.
1.15 notation: 0
0 = (0, …, 0).
Here we are using the symbol 0 in two different ways—on the left side of the
equation above, the symbol 0 denotes a list of length 𝑛, which is an element of 𝐅𝑛,
whereas on the right side, each 0 denotes a number. This potentially confusing
practice actually causes no problems because the context should always make
clear which 0 is intended.
Here the 0 above is the list defined in 1.15, not the number 0, because we have
not defined the sum of an element of 𝐅𝑛 (namely, 𝑥) and the number 0.
8 Chapter 1 Vector Spaces
refer to it as a vector.
When we think of vectors in 𝐑2 as arrows, we
can move an arrow parallel to itself (not changing
its length or direction) and still think of it as the
same vector. With that viewpoint, you will often
gain better understanding by dispensing with the
coordinate axes and the explicit coordinates and A vector.
just thinking of the vector, as shown in the figure here. The two arrows shown
here have the same length and same direction, so we think of them as the same
vector.
Whenever we use pictures in 𝐑2 or Mathematical models of the economy
use the somewhat vague language of can have thousands of variables, say
points and vectors, remember that these 𝑥1 , …, 𝑥5000 , which means that we must
are just aids to our understanding, not sub- work in 𝐑5000 . Such a space cannot be
stitutes for the actual mathematics that dealt with geometrically. However, the
we will develop. Although we cannot algebraic approach works well. Thus
draw good pictures in high-dimensional our subject is called linear algebra.
spaces, the elements of these spaces are
as rigorously defined as elements of 𝐑2.
For example, (2, −3, 17, 𝜋, √2) is an element of 𝐑5, and we may casually
refer to it as a point in 𝐑5 or a vector in 𝐑5 without worrying about whether the
geometry of 𝐑5 has any physical meaning.
Recall that we defined the sum of two elements of 𝐅𝑛 to be the element of 𝐅𝑛
obtained by adding corresponding coordinates; see 1.13. As we will now see,
addition has a simple geometric interpretation in the special case of 𝐑2.
Suppose we have two vectors 𝑢 and 𝑣 in 𝐑2
that we want to add. Move the vector 𝑣 parallel
to itself so that its initial point coincides with the
end point of the vector 𝑢, as shown here. The
sum 𝑢 + 𝑣 then equals the vector whose initial
point equals the initial point of 𝑢 and whose end
point equals the end point of the vector 𝑣, as The sum of two vectors.
shown here.
In the next definition, the 0 on the right side of the displayed equation is the
list 0 ∈ 𝐅𝑛.
Section 1A 𝐑𝑛 and 𝐂𝑛 9
Scalar multiplication.
10 Chapter 1 Vector Spaces
Digression on Fields
A field is a set containing at least two distinct elements called 0 and 1, along with
operations of addition and multiplication satisfying all properties listed in 1.3.
Thus 𝐑 and 𝐂 are fields, as is the set of rational numbers along with the usual
operations of addition and multiplication. Another example of a field is the set
{0, 1} with the usual operations of addition and multiplication except that 1 + 1 is
defined to equal 0.
In this book we will not deal with fields other than 𝐑 and 𝐂 . However, many
of the definitions, theorems, and proofs in linear algebra that work for the fields
𝐑 and 𝐂 also work without change for arbitrary fields. If you prefer to do so,
throughout much of this book (except for Chapters 6 and 7, which deal with inner
product spaces) you can think of 𝐅 as denoting an arbitrary field instead of 𝐑
or 𝐂 . For results (except in the inner product chapters) that have as a hypothesis
that 𝐅 is 𝐂 , you can probably replace that hypothesis with the hypothesis that 𝐅
is an algebraically closed field, which means that every nonconstant polynomial
with coefficients in 𝐅 has a zero. A few results, such as Exercise 13 in Section
1C, require the hypothesis on 𝐅 that 1 + 1 ≠ 0.
Exercises 1A
“Can you do addition?” the White Queen asked. “What’s one and one and one
and one and one and one and one and one and one and one?”
“I don’t know,” said Alice. “I lost count.”
Usually the choice of 𝐅 is either clear from the context or irrelevant. Thus we
often assume that 𝐅 is lurking in the background without specifically mentioning it.
With the usual operations of addition The simplest vector space is {0}, which
and scalar multiplication, 𝐅𝑛 is a vector contains only one point.
space over 𝐅, as you should verify. The
example of 𝐅𝑛 motivated our definition of vector space.
1.23 example: 𝐅∞
With these definitions, 𝐅∞ becomes a vector space over 𝐅, as you should verify.
The additive identity in this vector space is the sequence of all 0’s.
1.24 notation: 𝐅𝑆
for all 𝑥 ∈ 𝑆.
• For 𝜆 ∈ 𝐅 and 𝑓 ∈ 𝐅𝑆, the product 𝜆 𝑓 ∈ 𝐅𝑆 is the function defined by
(𝜆 𝑓 )(𝑥) = 𝜆 𝑓 (𝑥)
for all 𝑥 ∈ 𝑆.
14 Chapter 1 Vector Spaces
You should verify all three bullet points in the next example.
0(𝑥) = 0
for all 𝑥 ∈ 𝑆.
• For 𝑓 ∈ 𝐅𝑆, the additive inverse of 𝑓 is the function − 𝑓 ∶ 𝑆 → 𝐅 defined by
(− 𝑓 )(𝑥) = − 𝑓 (𝑥)
for all 𝑥 ∈ 𝑆.
The vector space 𝐅𝑛 is a special case The elements of the vector space 𝐑[0, 1]
of the vector space 𝐅𝑆 because each are real-valued functions on [0, 1], not
(𝑥1 , …, 𝑥𝑛 ) ∈ 𝐅𝑛 can be thought of as lists. In general, a vector space is an
a function 𝑥 from the set {1, 2, …, 𝑛} to 𝐅 abstract entity whose elements might
by writing 𝑥(𝑘) instead of 𝑥𝑘 for the 𝑘 th be lists, functions, or weird objects.
coordinate of (𝑥1 , …, 𝑥𝑛 ). In other words,
we can think of 𝐅𝑛 as 𝐅{1, 2, …, 𝑛}. Similarly, we can think of 𝐅∞ as 𝐅{1, 2, … }.
Soon we will see further examples of vector spaces, but first we need to develop
some of the elementary properties of vector spaces.
The definition of a vector space requires it to have an additive identity. The
next result states that this identity is unique.
Proof Suppose 0 and 0′ are both additive identities for some vector space 𝑉.
Then
0′ = 0′ + 0 = 0 + 0′ = 0,
where the first equality holds because 0 is an additive identity, the second equality
comes from commutativity, and the third equality holds because 0′ is an additive
identity. Thus 0′ = 0, proving that 𝑉 has only one additive identity.
Because additive inverses are unique, the following notation now makes sense.
Let 𝑣, 𝑤 ∈ 𝑉. Then
• −𝑣 denotes the additive inverse of 𝑣;
• 𝑤 − 𝑣 is defined to be 𝑤 + (−𝑣).
Almost all results in this book involve some vector space. To avoid having to
restate frequently that 𝑉 is a vector space, we now make the necessary declaration
once and for all.
1.29 notation: 𝑉
In the next result, 0 denotes a scalar (the number 0 ∈ 𝐅 ) on the left side of the
equation and a vector (the additive identity of 𝑉) on the right side of the equation.
Adding the additive inverse of 𝑎0 to both sides of the equation above gives 0 = 𝑎0,
as desired.
Now we show that if an element of 𝑉 is multiplied by the scalar −1, then the
result is the additive inverse of the element of 𝑉.
This equation says that (−1)𝑣, when added to 𝑣, gives 0. Thus (−1)𝑣 is the
additive inverse of 𝑣, as desired.
Exercises 1B
4 The empty set is not a vector space. The empty set fails to satisfy only one
of the requirements listed in the definition of a vector space (1.20). Which
one?
5 Show that in the definition of a vector space (1.20), the additive inverse
condition can be replaced with the condition that
0𝑣 = 0 for all 𝑣 ∈ 𝑉.
Here the 0 on the left side is the number 0, and the 0 on the right side is the
additive identity of 𝑉.
The phrase a “condition can be replaced” in a definition means that the
collection of objects satisfying the definition is unchanged if the original
condition is replaced with the new condition.
Section 1B Definition of Vector Space 17
and
𝑡 + ∞ = ∞ + 𝑡 = ∞ + ∞ = ∞,
𝑡 + (−∞) = (−∞) + 𝑡 = (−∞) + (−∞) = −∞,
∞ + (−∞) = (−∞) + ∞ = 0.
for all 𝑢1 , 𝑣1 , 𝑢2 , 𝑣2 ∈ 𝑉.
• Complex scalar multiplication on 𝑉𝐂 is defined by
1C Subspaces
By considering subspaces, we can greatly expand our examples of vector spaces.
The next result gives the easiest way Some people use the terminology
to check whether a subset of a vector linear subspace, which means the
space is a subspace. same as subspace.
The three conditions in the result above usually enable us to determine quickly
whether a given subset of 𝑉 is a subspace of 𝑉. You should verify all assertions
in the next example.
Section 1C Subspaces 19
(a) If 𝑏 ∈ 𝐅, then
{(𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) ∈ 𝐅4 ∶ 𝑥3 = 5𝑥4 + 𝑏}
is a subspace of 𝐅4 if and only if 𝑏 = 0.
(b) The set of continuous real-valued functions on the interval [0, 1] is a subspace
of 𝐑[0, 1].
(c) The set of differentiable real-valued functions on 𝐑 is a subspace of 𝐑𝐑.
(d) The set of differentiable real-valued functions 𝑓 on the interval (0, 3) such
that 𝑓 ′(2) = 𝑏 is a subspace of 𝐑(0, 3) if and only if 𝑏 = 0.
(e) The set of all sequences of complex numbers with limit 0 is a subspace of 𝐂∞.
Verifying some of the items above The set {0} is the smallest subspace of
shows the linear structure underlying 𝑉, and 𝑉 itself is the largest subspace
parts of calculus. For example, (b) above of 𝑉. The empty set is not a subspace
requires the result that the sum of two of 𝑉 because a subspace must be a
continuous functions is continuous. As vector space and hence must contain at
another example, (d) above requires the least one element, namely, an additive
result that for a constant 𝑐, the derivative identity.
of 𝑐 𝑓 equals 𝑐 times the derivative of 𝑓.
The subspaces of 𝐑2 are precisely {0}, all lines in 𝐑2 containing the origin,
and 𝐑2. The subspaces of 𝐑3 are precisely {0}, all lines in 𝐑3 containing the origin,
all planes in 𝐑3 containing the origin, and 𝐑3. To prove that all these objects are
indeed subspaces is straightforward—the hard part is to show that they are the
only subspaces of 𝐑2 and 𝐑3. That task will be easier after we introduce some
additional tools in the next chapter.
Sums of Subspaces
When dealing with vector spaces, we are The union of subspaces is rarely a sub-
usually interested only in subspaces, as space (see Exercise 12), which is why
opposed to arbitrary subsets. The notion we usually work with sums rather than
of the sum of subspaces will be useful. unions.
𝑉1 + ⋯ + 𝑉𝑚 = {𝑣1 + ⋯ + 𝑣𝑚 ∶ 𝑣1 ∈ 𝑉1 , …, 𝑣𝑚 ∈ 𝑉𝑚 }.
20 Chapter 1 Vector Spaces
The next result states that the sum of subspaces is a subspace, and is in fact the
smallest subspace containing all the summands (which means that every subspace
containing all the summands also contains the sum).
Section 1C Subspaces 21
Proof The reader can verify that 𝑉1 + ⋯ + 𝑉𝑚 contains the additive identity 0
and is closed under addition and scalar multiplication. Thus 1.34 implies that
𝑉1 + ⋯ + 𝑉𝑚 is a subspace of 𝑉.
The subspaces 𝑉1 , …, 𝑉𝑚 are all con- Sums of subspaces in the theory of vec-
tained in 𝑉1 + ⋯ + 𝑉𝑚 (to see this, consider tor spaces are analogous to unions of
sums 𝑣1 + ⋯ + 𝑣𝑚 where all except one subsets in set theory. Given two sub-
of the 𝑣𝑘 ’s are 0). Conversely, every sub- spaces of a vector space, the smallest
space of 𝑉 containing 𝑉1 , …, 𝑉𝑚 contains subspace containing them is their sum.
𝑉1 + ⋯ + 𝑉𝑚 (because subspaces must Analogously, given two subsets of a set,
contain all finite sums of their elements). the smallest subset containing them is
Thus 𝑉1 + ⋯ + 𝑉𝑚 is the smallest subspace their union.
of 𝑉 containing 𝑉1 , …, 𝑉𝑚 .
Direct Sums
Suppose 𝑉1 , …, 𝑉𝑚 are subspaces of 𝑉. Every element of 𝑉1 + ⋯ + 𝑉𝑚 can be
written in the form
𝑣1 + ⋯ + 𝑣𝑚 ,
where each 𝑣𝑘 ∈ 𝑉𝑘 . Of special interest are cases in which each vector in
𝑉1 + ⋯ + 𝑉𝑚 can be represented in the form above in only one way. This situation
is so important that it gets a special name (direct sum) and a special symbol (⊕).
𝑉2 = {(0, 0, 𝑧) ∈ 𝐅3 ∶ 𝑧 ∈ 𝐅},
𝑉3 = {(0, 𝑦, 𝑦) ∈ 𝐅3 ∶ 𝑦 ∈ 𝐅}.
where the first vector on the right side is in 𝑉1 , the second vector is in 𝑉2 , and the
third vector is in 𝑉3 .
However, 𝐅3 does not equal the direct sum of 𝑉1 , 𝑉2 , 𝑉3 , because the vector
(0, 0, 0) can be written in more than one way as a sum 𝑣1 + 𝑣2 + 𝑣3 , with each
𝑣𝑘 ∈ 𝑉𝑘 . Specifically, we have
and, of course,
(0, 0, 0) = (0, 0, 0) + (0, 0, 0) + (0, 0, 0),
where the first vector on the right side of each equation above is in 𝑉1 , the second
vector is in 𝑉2 , and the third vector is in 𝑉3 . Thus the sum 𝑉1 + 𝑉2 + 𝑉3 is not a
direct sum.
The definition of direct sum requires The symbol ⊕, which is a plus sign
every vector in the sum to have a unique inside a circle, reminds us that we are
representation as an appropriate sum. dealing with a special type of sum of
The next result shows that when deciding subspaces—each element in the direct
whether a sum of subspaces is a direct sum can be represented in only one way
sum, we only need to consider whether 0 as a sum of elements from the specified
can be uniquely written as an appropriate subspaces.
sum.
Section 1C Subspaces 23
The next result gives a simple con- The symbol ⟺ used below means
dition for testing whether a sum of two “if and only if ”; this symbol could also
subspaces is a direct sum. be read to mean “is equivalent to”.
The result above deals only with Sums of subspaces are analogous to
the case of two subspaces. When ask- unions of subsets. Similarly, direct
ing about a possible direct sum with sums of subspaces are analogous to
more than two subspaces, it is not disjoint unions of subsets. No two sub-
enough to test that each pair of the spaces of a vector space can be disjoint,
subspaces intersect only at 0. To see because both contain 0. So disjoint-
this, consider Example 1.44. In that ness is replaced, at least in the case
nonexample of a direct sum, we have of two subspaces, with the requirement
𝑉1 ∩ 𝑉2 = 𝑉1 ∩ 𝑉3 = 𝑉2 ∩ 𝑉3 = {0}. that the intersection equal {0}.
Exercises 1C
14 Suppose
𝑈 = {(𝑥, −𝑥, 2𝑥) ∈ 𝐅3 ∶ 𝑥 ∈ 𝐅} and 𝑊 = {(𝑥, 𝑥, 2𝑥) ∈ 𝐅3 ∶ 𝑥 ∈ 𝐅}.
Describe 𝑈 + 𝑊 using symbols, and also give a description of 𝑈 + 𝑊 that
uses no symbols.
15 Suppose 𝑈 is a subspace of 𝑉. What is 𝑈 + 𝑈?
16 Is the operation of addition on the subspaces of 𝑉 commutative? In other
words, if 𝑈 and 𝑊 are subspaces of 𝑉, is 𝑈 + 𝑊 = 𝑊 + 𝑈?
17 Is the operation of addition on the subspaces of 𝑉 associative? In other
words, if 𝑉1 , 𝑉2 , 𝑉3 are subspaces of 𝑉, is
(𝑉1 + 𝑉2 ) + 𝑉3 = 𝑉1 + (𝑉2 + 𝑉3 )?
then 𝑉1 = 𝑉2 .
20 Suppose
𝑈 = {(𝑥, 𝑥, 𝑦, 𝑦) ∈ 𝐅4 ∶ 𝑥, 𝑦 ∈ 𝐅}.
Find a subspace 𝑊 of 𝐅4 such that 𝐅4 = 𝑈 ⊕ 𝑊.
21 Suppose
𝑈 = {(𝑥, 𝑦, 𝑥 + 𝑦, 𝑥 − 𝑦, 2𝑥) ∈ 𝐅5 ∶ 𝑥, 𝑦 ∈ 𝐅}.
Find a subspace 𝑊 of 𝐅5 such that 𝐅5 = 𝑈 ⊕ 𝑊.
22 Suppose
𝑈 = {(𝑥, 𝑦, 𝑥 + 𝑦, 𝑥 − 𝑦, 2𝑥) ∈ 𝐅5 ∶ 𝑥, 𝑦 ∈ 𝐅}.
Find three subspaces 𝑊1 , 𝑊2 , 𝑊3 of 𝐅5, none of which equals {0}, such that
𝐅5 = 𝑈 ⊕ 𝑊1 ⊕ 𝑊2 ⊕ 𝑊3 .
26 Chapter 1 Vector Spaces
𝑉 = 𝑉1 ⊕ 𝑈 and 𝑉 = 𝑉2 ⊕ 𝑈,
then 𝑉1 = 𝑉2 .
Hint: When trying to discover whether a conjecture in linear algebra is true
or false, it is often useful to start by experimenting in 𝐅2.
𝑓 (−𝑥) = 𝑓 (𝑥)
𝑓 (−𝑥) = − 𝑓 (𝑥)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0
International License (https://creativecommons.org/licenses/by-nc/4.0), which permits any noncommercial use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to original author and source, provide a link to the Creative Commons license, and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended use
is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
Chapter 2
Finite-Dimensional Vector Spaces
In the last chapter we learned about vector spaces. Linear algebra focuses not
on arbitrary vector spaces, but on finite-dimensional vector spaces, which we
introduce in this chapter.
We begin this chapter by considering linear combinations of lists of vectors.
This leads us to the crucial concept of linear independence. The linear dependence
lemma will become one of our most useful tools.
A list of vectors in a vector space that is small enough to be linearly independent
and big enough so the linear combinations of the list fill up the vector space is
called a basis of the vector space. We will see that every basis of a vector space
has the same length, which will allow us to define the dimension of a vector space.
This chapter ends with a formula for the dimension of the sum of two subspaces.
• 𝐅 denotes 𝐑 or 𝐂 .
• 𝑉 denotes a vector space over 𝐅.
The main building of the Institute for Advanced Study, in Princeton, New Jersey.
Paul Halmos (1916–2006) wrote the first modern linear algebra book in this building.
Halmos’s linear algebra book was published in 1942 (second edition published in 1958).
The title of Halmos’s book was the same as the title of this chapter.
© Sheldon Axler 2024
S. Axler, Linear Algebra Done Right, Undergraduate Texts in Mathematics,
27
https://doi.org/10.1007/978-3-031-41026-0_2
28 Chapter 2 Finite-Dimensional Vector Spaces
𝑎1 𝑣1 + ⋯ + 𝑎𝑚 𝑣𝑚 ,
where 𝑎1 , …, 𝑎𝑚 ∈ 𝐅.
• (17, −4, 2) is a linear combination of (2, 1, −3), (1, −2, 4), which is a list of
length two of vectors in 𝐑3, because
• (17, −4, 5) is not a linear combination of (2, 1, −3), (1, −2, 4), which is a list
of length two of vectors in 𝐑3, because there do not exist numbers 𝑎1 , 𝑎2 ∈ 𝐅
such that
(17, −4, 5) = 𝑎1 (2, 1, −3) + 𝑎2 (1, −2, 4).
In other words, the system of equations
17 = 2𝑎1 + 𝑎2
−4 = 𝑎1 − 2𝑎2
5 = −3𝑎1 + 4𝑎2
Some mathematicians use the term linear span, which means the same as
span.
spans 𝐅𝑛. Here the 𝑘 th vector in the list above has 1 in the 𝑘 th slot and 0 in all other
slots.
Suppose (𝑥1 , …, 𝑥𝑛 ) ∈ 𝐅𝑛. Then
Example 2.8 above shows that 𝐅𝑛 is a Recall that by definition every list has
finite-dimensional vector space for every finite length.
positive integer 𝑛.
The definition of a polynomial is no doubt already familiar to you.
𝑝(𝑧) = 𝑎0 + 𝑎1 𝑧 + 𝑎2 𝑧2 + ⋯ + 𝑎𝑚 𝑧𝑚
for all 𝑧 ∈ 𝐅.
• 𝒫(𝐅) is the set of all polynomials with coefficients in 𝐅.
𝑝(𝑧) = 𝑎0 + 𝑎1 𝑧 + ⋯ + 𝑎𝑚 𝑧𝑚.
In the next definition, we use the convention that −∞ < 𝑚, which means that
the polynomial 0 is in 𝒫𝑚 (𝐅).
For 𝑚 a nonnegative integer, 𝒫𝑚 (𝐅) denotes the set of all polynomials with
coefficients in 𝐅 and degree at most 𝑚.
Linear Independence
Suppose 𝑣1 , …, 𝑣𝑚 ∈ 𝑉 and 𝑣 ∈ span(𝑣1 , …, 𝑣𝑚 ). By the definition of span, there
exist 𝑎1 , …, 𝑎𝑚 ∈ 𝐅 such that
𝑣 = 𝑎1 𝑣1 + ⋯ + 𝑎𝑚 𝑣𝑚 .
Consider the question of whether the choice of scalars in the equation above is
unique. Suppose 𝑐1 , …, 𝑐𝑚 is another set of scalars such that
𝑣 = 𝑐1 𝑣1 + ⋯ + 𝑐𝑚 𝑣𝑚 .
Subtracting the last two equations, we have
0 = (𝑎1 − 𝑐1 )𝑣1 + ⋯ + (𝑎𝑚 − 𝑐𝑚 )𝑣𝑚 .
32 Chapter 2 Finite-Dimensional Vector Spaces
𝑎1 𝑣1 + ⋯ + 𝑎𝑚 𝑣𝑚 = 0
is 𝑎1 = ⋯ = 𝑎𝑚 = 0.
• The empty list ( ) is also declared to be linearly independent.
If some vectors are removed from a linearly independent list, the remaining
list is also linearly independent, as you should verify.
linear dependence lemma implies that there exists 𝑘 ∈ {1, 2, 3, 4} such that the 𝑘 th
vector in this list is a linear combination of the previous vectors in the list. Let’s
see how to find the smallest value of 𝑘 that works.
Taking 𝑘 = 1 in the linear dependence lemma works if and only if the first
vector in the list equals 0. Because (1, 2, 3) is not the 0 vector, we cannot take
𝑘 = 1 for this list.
Taking 𝑘 = 2 in the linear dependence lemma works if and only if the second
vector in the list is a scalar multiple of the first vector. However, there does not
exist 𝑐 ∈ 𝐑 such that (6, 5, 4) = 𝑐(1, 2, 3). Thus we cannot take 𝑘 = 2 for this list.
Taking 𝑘 = 3 in the linear dependence lemma works if and only if the third
vector in the list is a linear combination of the first two vectors. Thus for the list
in this example, we want to know whether there exist 𝑎, 𝑏 ∈ 𝐑 such that
(15, 16, 17) = 𝑎(1, 2, 3) + 𝑏(6, 5, 4).
The equation above is equivalent to a system of three linear equations in the two
unknowns 𝑎, 𝑏. Using Gaussian elimination or appropriate software, we find that
𝑎 = 3, 𝑏 = 2 is a solution of the equation above, as you can verify. Thus for the
list in this example, taking 𝑘 = 3 is the smallest value of 𝑘 that works in the linear
dependence lemma.
Section 2A Span and Linear Independence 35
Step 1
Let 𝐵 be the list 𝑤1 , …, 𝑤𝑛 , which spans 𝑉. Adjoining 𝑢1 at the beginning of
this list produces a linearly dependent list (because 𝑢1 can be written as a linear
combination of 𝑤1 , …, 𝑤𝑚 ). In other words, the list
𝑢1 , 𝑤1 , …, 𝑤𝑛
is linearly dependent.
Thus by the linear dependence lemma (2.19), one of the vectors in the list above
is a linear combination of the previous vectors in the list. We know that 𝑢1 ≠ 0
because the list 𝑢1 , …, 𝑢𝑚 is linearly independent. Thus 𝑢1 is not in the span
of the previous vectors in the list above (because 𝑢1 is not in {0}, which is the
span of the empty list). Hence the linear dependence lemma implies that we
can remove one of the 𝑤’s so that the new list 𝐵 (of length 𝑛) consisting of 𝑢1
and the remaining 𝑤’s spans 𝑉.
Step k, for k = 2, …, m
The list 𝐵 (of length 𝑛) from step 𝑘 − 1 spans 𝑉. In particular, 𝑢𝑘 is in the span of
the list 𝐵. Thus the list of length (𝑛 + 1) obtained by adjoining 𝑢𝑘 to 𝐵, placing
it just after 𝑢1 , …, 𝑢𝑘 − 1 , is linearly dependent. By the linear dependence lemma
(2.19), one of the vectors in this list is in the span of the previous ones, and
because 𝑢1 , …, 𝑢𝑘 is linearly independent, this vector cannot be one of the 𝑢’s.
Hence there still must be at least one remaining 𝑤 at this step. We can remove
from our new list (after adjoining 𝑢𝑘 in the proper place) a 𝑤 that is a linear
combination of the previous vectors in the list, so that the new list 𝐵 (of length
𝑛) consisting of 𝑢1 , …, 𝑢𝑘 and the remaining 𝑤’s spans 𝑉.
After step 𝑚, we have added all the 𝑢’s and the process stops. At each step
as we add a 𝑢 to 𝐵, the linear dependence lemma implies that there is some 𝑤 to
remove. Thus there are at least as many 𝑤’s as 𝑢’s.
36 Chapter 2 Finite-Dimensional Vector Spaces
The next two examples show how the result above can be used to show, without
any computations, that certain lists are not linearly independent and that certain
lists do not span a given vector space.
Step 1
If 𝑈 = {0}, then 𝑈 is finite-dimensional and we are done. If 𝑈 ≠ {0}, then
choose a nonzero vector 𝑢1 ∈ 𝑈.
Step k
If 𝑈 = span(𝑢1 , …, 𝑢𝑘 − 1 ), then 𝑈 is finite-dimensional and we are done. If
𝑈 ≠ span(𝑢1 , …, 𝑢𝑘 − 1 ), then choose a vector 𝑢𝑘 ∈ 𝑈 such that
𝑢𝑘 ∉ span(𝑢1 , …, 𝑢𝑘 − 1 ).
After each step, as long as the process continues, we have constructed a list
of vectors such that no vector in this list is in the span of the previous vectors.
Thus after each step we have constructed a linearly independent list, by the linear
dependence lemma (2.19). This linearly independent list cannot be longer than
any spanning list of 𝑉 (by 2.22). Thus the process eventually terminates, which
means that 𝑈 is finite-dimensional.
Section 2A Span and Linear Independence 37
Exercises 2A
𝑣1 − 𝑣2 , 𝑣2 − 𝑣3 , 𝑣3 − 𝑣4 , 𝑣4
also spans 𝑉.
3 Suppose 𝑣1 , …, 𝑣𝑚 is a list of vectors in 𝑉. For 𝑘 ∈ {1, …, 𝑚}, let
𝑤𝑘 = 𝑣1 + ⋯ + 𝑣𝑘 .
𝑣1 − 𝑣2 , 𝑣2 − 𝑣3 , 𝑣3 − 𝑣4 , 𝑣4
5𝑣1 − 4𝑣2 , 𝑣2 , 𝑣3 , …, 𝑣𝑚
is linearly independent.
38 Chapter 2 Finite-Dimensional Vector Spaces
𝑤𝑘 = 𝑣1 + ⋯ + 𝑣𝑘 .
Show that the list 𝑣1 , …, 𝑣𝑚 is linearly independent if and only if the list
𝑤1 , …, 𝑤𝑚 is linearly independent.
15 Explain why there does not exist a list of six polynomials that is linearly
independent in 𝒫4 (𝐅).
16 Explain why no list of four polynomials spans 𝒫4 (𝐅).
17 Prove that 𝑉 is infinite-dimensional if and only if there is a sequence 𝑣1 , 𝑣2 , …
of vectors in 𝑉 such that 𝑣1 , …, 𝑣𝑚 is linearly independent for every positive
integer 𝑚.
18 Prove that 𝐅∞ is infinite-dimensional.
19 Prove that the real vector space of all continuous real-valued functions on
the interval [0, 1] is infinite-dimensional.
20 Suppose 𝑝0 , 𝑝1 , …, 𝑝𝑚 are polynomials in 𝒫𝑚 (𝐅) such that 𝑝𝑘 (2) = 0 for each
𝑘 ∈ {0, …, 𝑚}. Prove that 𝑝0 , 𝑝1 , …, 𝑝𝑚 is not linearly independent in 𝒫𝑚 (𝐅).
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0
International License (https://creativecommons.org/licenses/by-nc/4.0), which permits any noncommercial use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to original author and source, provide a link to the Creative Commons license, and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended use
is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
Section 2B Bases 39
2B Bases
In the previous section, we discussed linearly independent lists and we also
discussed spanning lists. Now we bring these concepts together by considering
lists that have both properties.
(g) The list 1, 𝑧, …, 𝑧𝑚 is a basis of 𝒫𝑚 (𝐅), called the standard basis of 𝒫𝑚 (𝐅).
In addition to the standard basis, 𝐅𝑛 has many other bases. For example,
(7, 5), (−4, 9) and (1, 2), (3, 5)
are both bases of 𝐅2.
The next result helps explain why bases are useful. Recall that “uniquely”
means “in only one way”.
2.29 𝑣 = 𝑎1 𝑣1 + ⋯ + 𝑎𝑛 𝑣𝑛 ,
where 𝑎1 , …, 𝑎𝑛 ∈ 𝐅.
40 Chapter 2 Finite-Dimensional Vector Spaces
A spanning list in a vector space may not be a basis because it is not linearly
independent. Our next result says that given any spanning list, some (possibly
none) of the vectors in it can be discarded so that the remaining list is linearly
independent and still spans the vector space.
As an example in the vector space 𝐅2, if the procedure in the proof below is
applied to the list (1, 2), (3, 6), (4, 7), (5, 9), then the second and fourth vectors
will be removed. This leaves (1, 2), (4, 7), which is a basis of 𝐅2.
Every spanning list in a vector space can be reduced to a basis of the vector
space.
Stop the process after step 𝑛, getting a list 𝐵. This list 𝐵 spans 𝑉 because
our original list spanned 𝑉 and we have discarded only vectors that were already
in the span of the previous vectors. The process ensures that no vector in 𝐵 is
in the span of the previous ones. Thus 𝐵 is linearly independent, by the linear
dependence lemma (2.19). Hence 𝐵 is a basis of 𝑉.
Our next result is in some sense a dual of 2.30, which said that every spanning
list can be reduced to a basis. Now we show that given any linearly independent list,
we can adjoin some additional vectors (this includes the possibility of adjoining
no additional vectors) so that the extended list is still linearly independent but
also spans the space.
As an application of the result above, Using the same ideas but more ad-
we now show that every subspace of a vanced tools, the next result can be
finite-dimensional vector space can be proved without the hypothesis that 𝑉 is
paired with another subspace to form a finite-dimensional.
direct sum of the whole space.
42 Chapter 2 Finite-Dimensional Vector Spaces
To prove the first equation above, suppose 𝑣 ∈ 𝑉. Then, because the list
𝑢1 , …, 𝑢𝑚 , 𝑤1 , …, 𝑤𝑛 spans 𝑉, there exist 𝑎1 , …, 𝑎𝑚 , 𝑏1 , …, 𝑏𝑛 ∈ 𝐅 such that
𝑣 = 𝑎⏟
1𝑢
⏟ +⏟
1⏟ ⋯ +⏟
𝑎⏟
𝑚𝑢
⏟𝑚+⏟
𝑏1 𝑤1+
⏟⏟⏟ ⋯ +⏟𝑏⏟
𝑛𝑤
⏟𝑛 .
𝑢 𝑤
𝑣 = 𝑎1 𝑢1 + ⋯ + 𝑎𝑚 𝑢𝑚 = 𝑏1 𝑤1 + ⋯ + 𝑏𝑛 𝑤𝑛 .
Thus
𝑎1 𝑢1 + ⋯ + 𝑎𝑚 𝑢𝑚 − 𝑏1 𝑤1 − ⋯ − 𝑏𝑛 𝑤𝑛 = 0.
Because 𝑢1 , …, 𝑢𝑚 , 𝑤1 , …, 𝑤𝑛 is linearly independent, this implies that
𝑎1 = ⋯ = 𝑎𝑚 = 𝑏1 = ⋯ = 𝑏𝑛 = 0.
Exercises 2B
Find a basis of 𝑈.
(b) Extend the basis in (a) to a basis of 𝐑5.
(c) Find a subspace 𝑊 of 𝐑5 such that 𝐑5 = 𝑈 ⊕ 𝑊.
Section 2B Bases 43
Find a basis of 𝑈.
(b) Extend the basis in (a) to a basis of 𝐂5.
(c) Find a subspace 𝑊 of 𝐂5 such that 𝐂5 = 𝑈 ⊕ 𝑊.
5 Suppose 𝑉 is finite-dimensional and 𝑈, 𝑊 are subspaces of 𝑉 such that
𝑉 = 𝑈 + 𝑊. Prove that there exists a basis of 𝑉 consisting of vectors in
𝑈 ∪ 𝑊.
𝑣1 + 𝑣2 , 𝑣2 + 𝑣3 , 𝑣3 + 𝑣4 , 𝑣4
is also a basis of 𝑉.
8 Prove or give a counterexample: If 𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 is a basis of 𝑉 and 𝑈 is a
subspace of 𝑉 such that 𝑣1 , 𝑣2 ∈ 𝑈 and 𝑣3 ∉ 𝑈 and 𝑣4 ∉ 𝑈, then 𝑣1 , 𝑣2 is a
basis of 𝑈.
9 Suppose 𝑣1 , …, 𝑣𝑚 is a list of vectors in 𝑉. For 𝑘 ∈ {1, …, 𝑚}, let
𝑤𝑘 = 𝑣1 + ⋯ + 𝑣𝑘 .
𝑢1 , …, 𝑢𝑚 , 𝑤1 , …, 𝑤𝑛
is a basis of 𝑉.
11 Suppose 𝑉 is a real vector space. Show that if 𝑣1 , …, 𝑣𝑛 is a basis of 𝑉 (as a
real vector space), then 𝑣1 , …, 𝑣𝑛 is also a basis of the complexification 𝑉𝐂
(as a complex vector space).
See Exercise 8 in Section 1B for the definition of the complexification 𝑉𝐂 .
44 Chapter 2 Finite-Dimensional Vector Spaces
2C Dimension
Although we have been discussing finite-dimensional vector spaces, we have not
yet defined the dimension of such an object. How should dimension be defined?
A reasonable definition should force the dimension of 𝐅𝑛 to equal 𝑛. Notice that
the standard basis
of 𝐅𝑛 has length 𝑛. Thus we are tempted to define the dimension as the length of
a basis. However, a finite-dimensional vector space in general has many different
bases, and our attempted definition makes sense only if all bases in a given vector
space have the same length. Fortunately that turns out to be the case, as we now
show.
Any two bases of a finite-dimensional vector space have the same length.
Now that we know that any two bases of a finite-dimensional vector space
have the same length, we can formally define the dimension of such spaces.
To check that a list of vectors in 𝑉 The real vector space 𝐑2 has dimen-
is a basis of 𝑉, we must, according to sion two; the complex vector space 𝐂
the definition, show that the list in ques- has dimension one. As sets, 𝐑2 can
tion satisfies two properties: it must be be identified with 𝐂 (and addition is
linearly independent and it must span 𝑉. the same on both spaces, as is scalar
The next two results show that if the list multiplication by real numbers). Thus
in question has the right length, then we when we talk about the dimension of
only need to check that it satisfies one a vector space, the role played by the
of the two required properties. First we choice of 𝐅 cannot be neglected.
prove that every linearly independent list
of the right length is a basis.
To find a basis of 𝑈, first note that each of the polynomials 1, (𝑥 − 5)2, and (𝑥 − 5)3
is in 𝑈.
Suppose 𝑎, 𝑏, 𝑐 ∈ 𝐑 and
for every 𝑥 ∈ 𝐑 . Without explicitly expanding the left side of the equation above,
we can see that the left side has a 𝑐𝑥3 term. Because the right side has no 𝑥3
term, this implies that 𝑐 = 0. Because 𝑐 = 0, we see that the left side has a 𝑏𝑥2
term, which implies that 𝑏 = 0. Because 𝑏 = 𝑐 = 0, we can also conclude that
𝑎 = 0. Thus the equation above implies that 𝑎 = 𝑏 = 𝑐 = 0. Hence the list
1, (𝑥 − 5)2, (𝑥 − 5)3 is linearly independent in 𝑈. Thus 3 ≤ dim 𝑈. Hence
The next result gives a formula for the dimension of the sum of two subspaces
of a finite-dimensional vector space. This formula is analogous to a familiar
counting formula: the number of elements in the union of two finite sets equals
the number of elements in the first set, plus the number of elements in the second
set, minus the number of elements in the intersection of the two sets.
For 𝑆 a finite set, let #𝑆 denote the number of elements of 𝑆. The table below
compares finite sets with finite-dimensional vector spaces, showing the analogy
between #𝑆 (for sets) and dim 𝑉 (for vector spaces), as well as the analogy between
unions of subsets (in the context of sets) and sums of subspaces (in the context of
vector spaces).
sets vector spaces
𝑆 is a finite set 𝑉 is a finite-dimensional vector space
#𝑆 dim 𝑉
for subsets 𝑆1 , 𝑆2 of 𝑆, the union 𝑆1 ∪ 𝑆2 for subspaces 𝑉1 , 𝑉2 of 𝑉, the sum 𝑉1 + 𝑉2
is the smallest subset of 𝑆 containing 𝑆1 is the smallest subspace of 𝑉 containing
and 𝑆2 𝑉1 and 𝑉2
#(𝑆1 ∪ 𝑆2 ) dim(𝑉1 + 𝑉2 )
= #𝑆1 + #𝑆2 − #(𝑆1 ∩ 𝑆2 ) = dim 𝑉1 + dim 𝑉2 − dim(𝑉1 ∩ 𝑉2 )
#(𝑆1 ∪ 𝑆2 ) = #𝑆1 + #𝑆2 dim(𝑉1 + 𝑉2 ) = dim 𝑉1 + dim 𝑉2
⟺ 𝑆1 ∩ 𝑆2 = ∅ ⟺ 𝑉1 ∩ 𝑉2 = {0}
𝑆1 ∪ ⋯ ∪ 𝑆𝑚 is a disjoint union ⟺ 𝑉1 + ⋯ + 𝑉𝑚 is a direct sum ⟺
#(𝑆1 ∪ ⋯ ∪ 𝑆𝑚 ) = #𝑆1 + ⋯ + #𝑆𝑚 dim(𝑉1 + ⋯ + 𝑉𝑚 )
= dim 𝑉1 + ⋯ + dim 𝑉𝑚
The last row above focuses on the analogy between disjoint unions (for sets)
and direct sums (for vector spaces). The proof of the result in the last box above
will be given in 3.94.
You should be able to find results about sets that correspond, via analogy, to
the results about vector spaces in Exercises 12 through 18.
Exercises 2C
1 Show that the subspaces of 𝐑2 are precisely {0}, all lines in 𝐑2 containing
the origin, and 𝐑2.
2 Show that the subspaces of 𝐑3 are precisely {0}, all lines in 𝐑3 containing
the origin, all planes in 𝐑3 containing the origin, and 𝐑3.
3 (a) Let 𝑈 = {𝑝 ∈ 𝒫4 (𝐅) ∶ 𝑝(6) = 0}. Find a basis of 𝑈.
(b) Extend the basis in (a) to a basis of 𝒫4 (𝐅).
(c) Find a subspace 𝑊 of 𝒫4 (𝐅) such that 𝒫4 (𝐅) = 𝑈 ⊕ 𝑊.
4 (a) Let 𝑈 = {𝑝 ∈ 𝒫4 (𝐑) ∶ 𝑝″(6) = 0}. Find a basis of 𝑈.
(b) Extend the basis in (a) to a basis of 𝒫4 (𝐑).
(c) Find a subspace 𝑊 of 𝒫4 (𝐑) such that 𝒫4 (𝐑) = 𝑈 ⊕ 𝑊.
5 (a) Let 𝑈 = {𝑝 ∈ 𝒫4 (𝐅) ∶ 𝑝(2) = 𝑝(5)}. Find a basis of 𝑈.
(b) Extend the basis in (a) to a basis of 𝒫4 (𝐅).
(c) Find a subspace 𝑊 of 𝒫4 (𝐅) such that 𝒫4 (𝐅) = 𝑈 ⊕ 𝑊.
Section 2C Dimension 49
𝑉 = 𝑉1 ⊕ ⋯ ⊕ 𝑉𝑛 .
19 Explain why you might guess, motivated by analogy with the formula for
the number of elements in the union of three finite sets, that if 𝑉1 , 𝑉2 , 𝑉3 are
subspaces of a finite-dimensional vector space, then
dim(𝑉1 + 𝑉2 + 𝑉3 )
= dim 𝑉1 + dim 𝑉2 + dim 𝑉3
− dim(𝑉1 ∩ 𝑉2 ) − dim(𝑉1 ∩ 𝑉3 ) − dim(𝑉2 ∩ 𝑉3 )
+ dim(𝑉1 ∩ 𝑉2 ∩ 𝑉3 ).
dim(𝑉1 + 𝑉2 + 𝑉3 )
I at once gave up my former occupations, set down natural history and all its
progeny as a deformed and abortive creation, and entertained the greatest disdain
for a would-be science which could never even step within the threshold of real
knowledge. In this mood I betook myself to the mathematics and the branches of
study appertaining to that science as being built upon secure foundations, and so
worthy of my consideration.
So far our attention has focused on vector spaces. No one gets excited about
vector spaces. The interesting part of linear algebra is the subject to which we
now turn—linear maps.
We will frequently use the powerful fundamental theorem of linear maps,
which states that the dimension of the domain of a linear map equals the dimension
of the subspace that gets sent to 0 plus the dimension of the range. This will imply
the striking result that a linear map from a finite-dimensional vector space to itself
is one-to-one if and only if its range is the whole space.
A major concept that we will introduce in this chapter is the matrix associated
with a linear map and with a basis of the domain space and a basis of the target
space. This correspondence between linear maps and matrices provides much
insight into key aspects of linear algebra.
This chapter concludes by introducing product, quotient, and dual spaces.
In this chapter we will need additional vector spaces, which we call 𝑈 and 𝑊,
in addition to 𝑉. Thus our standing assumptions are now as follows.
• 𝐅 denotes 𝐑 or 𝐂 .
• 𝑈, 𝑉, and 𝑊 denote vector spaces over 𝐅.
Stefan Schäfer CC BY-SA
homogeneity
𝑇(𝜆𝑣) = 𝜆(𝑇𝑣) for all 𝜆 ∈ 𝐅 and all 𝑣 ∈ 𝑉.
Note that for linear maps we often Some mathematicians use the phrase
use the notation 𝑇𝑣 as well as the usual linear transformation, which means
function notation 𝑇(𝑣). the same as linear map.
Let’s look at some examples of linear maps. Make sure you verify that each
of the functions defined in the next example is indeed a linear map:
differentiation
Define 𝐷 ∈ ℒ(𝒫(𝐑)) by
𝐷𝑝 = 𝑝′.
The assertion that this function is a linear map is another way of stating a basic
result about differentiation: ( 𝑓 + 𝑔)′ = 𝑓 ′ + 𝑔′ and (𝜆 𝑓 )′ = 𝜆 𝑓 ′ whenever 𝑓, 𝑔 are
differentiable and 𝜆 is a constant.
integration
Define 𝑇 ∈ ℒ(𝒫(𝐑), 𝐑) by
1
𝑇𝑝 = ∫ 𝑝.
0
The assertion that this function is linear is another way of stating a basic result
about integration: the integral of the sum of two functions equals the sum of the
integrals, and the integral of a constant times a function equals the constant times
the integral of the function.
multiplication by 𝑥2
Define a linear map 𝑇 ∈ ℒ(𝒫(𝐑)) by
(𝑇𝑝)(𝑥) = 𝑥2 𝑝(𝑥)
for each 𝑥 ∈ 𝐑 .
backward shift
Recall that 𝐅∞ denotes the vector space of all sequences of elements of 𝐅. Define
a linear map 𝑇 ∈ ℒ(𝐅∞ ) by
𝑇(𝑥1 , 𝑥2 , 𝑥3 , … ) = (𝑥2 , 𝑥3 , … ).
from 𝐑3 to 𝐑2
Define a linear map 𝑇 ∈ ℒ(𝐑3, 𝐑2 ) by
𝑇(𝑥, 𝑦, 𝑧) = (2𝑥 − 𝑦 + 3𝑧, 7𝑥 + 5𝑦 − 6𝑧).
from 𝐅𝑛 to 𝐅𝑚
To generalize the previous example, let 𝑚 and 𝑛 be positive integers, let 𝐴𝑗, 𝑘 ∈ 𝐅
for each 𝑗 = 1, …, 𝑚 and each 𝑘 = 1, …, 𝑛, and define a linear map 𝑇 ∈ ℒ(𝐅𝑛, 𝐅𝑚 )
by
𝑇(𝑥1 , …, 𝑥𝑛 ) = (𝐴1, 1 𝑥1 + ⋯ + 𝐴1, 𝑛 𝑥𝑛 , …, 𝐴𝑚, 1 𝑥1 + ⋯ + 𝐴𝑚, 𝑛 𝑥𝑛 ).
Actually every linear map from 𝐅𝑛 to 𝐅𝑚 is of this form.
composition
Fix a polynomial 𝑞 ∈ 𝒫(𝐑). Define a linear map 𝑇 ∈ ℒ(𝒫(𝐑)) by
(𝑇𝑝)(𝑥) = 𝑝(𝑞(𝑥)).
The existence part of the next result means that we can find a linear map that
takes on whatever values we wish on the vectors in a basis. The uniqueness part
of the next result means that a linear map is completely determined by its values
on a basis.
54 Chapter 3 Linear Maps
𝑇𝑣𝑘 = 𝑤𝑘
for each 𝑘 = 1, …, 𝑛.
Proof First we show the existence of a linear map 𝑇 with the desired property.
Define 𝑇 ∶ 𝑉 → 𝑊 by
𝑇(𝑐1 𝑣1 + ⋯ + 𝑐𝑛 𝑣𝑛 ) = 𝑐1 𝑤1 + ⋯ + 𝑐𝑛 𝑤𝑛 ,
𝑇(𝑐1 𝑣1 + ⋯ + 𝑐𝑛 𝑣𝑛 ) = 𝑐1 𝑤1 + ⋯ + 𝑐𝑛 𝑤𝑛 .
for all 𝑣 ∈ 𝑉.
You should verify that 𝑆 + 𝑇 and 𝜆𝑇 Linear maps are pervasive throughout
as defined above are indeed linear maps. mathematics. However, they are not as
In other words, if 𝑆, 𝑇 ∈ ℒ(𝑉, 𝑊) and ubiquitous as imagined by people who
𝜆 ∈ 𝐅, then 𝑆 + 𝑇 ∈ ℒ(𝑉, 𝑊) and 𝜆𝑇 ∈ seem to think cos is a linear map from
ℒ(𝑉, 𝑊). 𝐑 to 𝐑 when they incorrectly write that
Because we took the trouble to de- cos(𝑥 + 𝑦) equals cos 𝑥 +cos 𝑦 and that
fine addition and scalar multiplication on cos 2𝑥 equals 2 cos 𝑥.
ℒ(𝑉, 𝑊), the next result should not be a
surprise.
The routine proof of the result above is left to the reader. Note that the additive
identity of ℒ(𝑉, 𝑊) is the zero linear map defined in Example 3.3.
Usually it makes no sense to multiply together two elements of a vector space,
but for some pairs of linear maps a useful product exists, as in the next definition.
Thus 𝑆𝑇 is just the usual composition 𝑆 ∘ 𝑇 of two functions, but when both
functions are linear, we usually write 𝑆𝑇 instead of 𝑆 ∘ 𝑇. The product notation
𝑆𝑇 helps make the distributive properties (see next result) seem natural.
Note that 𝑆𝑇 is defined only when 𝑇 maps into the domain of 𝑆. You should
verify that 𝑆𝑇 is indeed a linear map from 𝑈 to 𝑊 whenever 𝑇 ∈ ℒ(𝑈, 𝑉) and
𝑆 ∈ ℒ(𝑉, 𝑊).
56 Chapter 3 Linear Maps
associativity
(𝑇1 𝑇2 )𝑇3 = 𝑇1 (𝑇2 𝑇3 ) whenever 𝑇1 , 𝑇2 , and 𝑇3 are linear maps such that
the products make sense (meaning 𝑇3 maps into the domain of 𝑇2 , and 𝑇2
maps into the domain of 𝑇1 ).
identity
𝑇𝐼 = 𝐼𝑇 = 𝑇 whenever 𝑇 ∈ ℒ(𝑉, 𝑊); here the first 𝐼 is the identity operator
on 𝑉, and the second 𝐼 is the identity operator on 𝑊.
distributive properties
(𝑆1 + 𝑆2 )𝑇 = 𝑆1 𝑇 + 𝑆2 𝑇 and 𝑆(𝑇1 + 𝑇2 ) = 𝑆𝑇1 + 𝑆𝑇2 whenever
𝑇, 𝑇1 , 𝑇2 ∈ ℒ(𝑈, 𝑉) and 𝑆, 𝑆1 , 𝑆2 ∈ ℒ(𝑉, 𝑊).
Add the additive inverse of 𝑇(0) to each side of the equation above to conclude
that 𝑇(0) = 0.
𝑓 (𝑥) = 𝑚𝑥 + 𝑏
is a linear map if and only if 𝑏 = 0 (use 3.10). Thus the linear functions of high
school algebra are not the same as linear maps in the context of linear algebra.
Section 3A Vector Space of Linear Maps 57
Exercises 3A
1 Suppose 𝑏, 𝑐 ∈ 𝐑 . Define 𝑇 ∶ 𝐑3 → 𝐑2 by
𝑇(𝑥, 𝑦, 𝑧) = (2𝑥 − 4𝑦 + 3𝑧 + 𝑏, 6𝑥 + 𝑐𝑥𝑦𝑧).
Show that 𝑇 is linear if and only if 𝑏 = 𝑐 = 0.
2 Suppose 𝑏, 𝑐 ∈ 𝐑 . Define 𝑇 ∶ 𝒫(𝐑) → 𝐑2 by
2
𝑇𝑝 = (3𝑝(4) + 5𝑝′(6) + 𝑏𝑝(1)𝑝(2), ∫ 𝑥3 𝑝(𝑥) 𝑑𝑥 + 𝑐 sin 𝑝(0)).
−1
Show that 𝑇 is linear if and only if 𝑏 = 𝑐 = 0.
3 Suppose that 𝑇 ∈ ℒ(𝐅𝑛, 𝐅𝑚 ). Show that there exist scalars 𝐴𝑗, 𝑘 ∈ 𝐅 for
𝑗 = 1, …, 𝑚 and 𝑘 = 1, …, 𝑛 such that
𝑇(𝑥1 , …, 𝑥𝑛 ) = (𝐴1, 1 𝑥1 + ⋯ + 𝐴1, 𝑛 𝑥𝑛 , …, 𝐴𝑚, 1 𝑥1 + ⋯ + 𝐴𝑚, 𝑛 𝑥𝑛 )
for every (𝑥1 , …, 𝑥𝑛 ) ∈ 𝐅𝑛.
This exercise shows that the linear map 𝑇 has the form promised in the
second to last item of Example 3.3.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0
International License (https://creativecommons.org/licenses/by-nc/4.0), which permits any noncommercial use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to original author and source, provide a link to the Creative Commons license, and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended use
is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
Section 3B Null Spaces and Ranges 59
For 𝑇 ∈ ℒ(𝑉, 𝑊), the null space of 𝑇, denoted by null 𝑇, is the subset of 𝑉
consisting of those vectors that 𝑇 maps to 0:
null 𝑇 = {𝑣 ∈ 𝑉 ∶ 𝑇𝑣 = 0}.
𝑇(𝑥1 , 𝑥2 , 𝑥3 , … ) = (𝑥2 , 𝑥3 , … ).
The next result shows that the null space of each linear map is a subspace of
the domain. In particular, 0 is in the null space of every linear map.
As we will soon see, for a linear map the next definition is closely connected
to the null space.
We could rephrase the definition The term one-to-one means the same
above to say that 𝑇 is injective if 𝑢 ≠ 𝑣 as injective.
implies that 𝑇𝑢 ≠ 𝑇𝑣. Thus 𝑇 is injective
if and only if it maps distinct inputs to distinct outputs.
The next result says that we can check whether a linear map is injective
by checking whether 0 is the only vector that gets mapped to 0. As a simple
application of this result, we see that of the linear maps whose null spaces we
computed in 3.12, only multiplication by 𝑥2 is injective (except that the zero map
is injective in the special case 𝑉 = {0}).
For 𝑇 ∈ ℒ(𝑉, 𝑊), the range of 𝑇 is the subset of 𝑊 consisting of those vectors
that are equal to 𝑇𝑣 for some 𝑣 ∈ 𝑉:
The next result shows that the range of each linear map is a subspace of the
vector space into which it is being mapped.
Proof Suppose 𝑇 ∈ ℒ(𝑉, 𝑊). Then 𝑇(0) = 0 (by 3.10), which implies that
0 ∈ range 𝑇.
If 𝑤1 , 𝑤2 ∈ range 𝑇, then there exist 𝑣1 , 𝑣2 ∈ 𝑉 such that 𝑇𝑣1 = 𝑤1 and
𝑇𝑣2 = 𝑤2 . Thus
𝑇(𝑣1 + 𝑣2 ) = 𝑇𝑣1 + 𝑇𝑣2 = 𝑤1 + 𝑤2 .
Hence 𝑤1 + 𝑤2 ∈ range 𝑇. Thus range 𝑇 is closed under addition.
If 𝑤 ∈ range 𝑇 and 𝜆 ∈ 𝐅, then there exists 𝑣 ∈ 𝑉 such that 𝑇𝑣 = 𝑤. Thus
𝑇(𝜆𝑣) = 𝜆𝑇𝑣 = 𝜆𝑤.
To illustrate the definition above, note that of the ranges we computed in 3.17,
only the differentiation map is surjective (except that the zero map is surjective in
the special case 𝑊 = {0}).
Whether a linear map is surjective de- Some people use the term onto, which
pends on what we are thinking of as the means the same as surjective.
vector space into which it maps.
𝑐1 𝑇𝑣1 + ⋯ + 𝑐𝑛 𝑇𝑣𝑛 = 0.
Then
𝑇(𝑐1 𝑣1 + ⋯ + 𝑐𝑛 𝑣𝑛 ) = 0.
Hence
𝑐1 𝑣1 + ⋯ + 𝑐𝑛 𝑣𝑛 ∈ null 𝑇.
Because 𝑢1 , …, 𝑢𝑚 spans null 𝑇, we can write
𝑐1 𝑣1 + ⋯ + 𝑐𝑛 𝑣𝑛 = 𝑑1 𝑢1 + ⋯ + 𝑑𝑚 𝑢𝑚 ,
where the 𝑑’s are in 𝐅. This equation implies that all the 𝑐’s (and 𝑑’s) are 0
(because 𝑢1 , …, 𝑢𝑚 , 𝑣1 , …, 𝑣𝑛 is linearly independent). Thus 𝑇𝑣1 , …, 𝑇𝑣𝑛 is linearly
independent and hence is a basis of range 𝑇, as desired.
Now we can show that no linear map from a finite-dimensional vector space
to a “smaller” vector space can be injective, where “smaller” is measured by
dimension.
where the first line above comes from the fundamental theorem of linear maps
(3.21) and the second line follows from 2.37. The inequality above states that
dim null 𝑇 > 0. This means that null 𝑇 contains vectors other than 0. Thus 𝑇 is
not injective (by 3.15).
Because dim 𝐅4 > dim 𝐅3, we can use 3.22 to assert that 𝑇 is not injective, without
doing any calculations.
64 Chapter 3 Linear Maps
The next result shows that no linear map from a finite-dimensional vector
space to a “bigger” vector space can be surjective, where “bigger” is measured by
dimension.
As we will soon see, 3.22 and 3.24 have important consequences in the theory
of linear equations. The idea is to express questions about systems of linear
equations in terms of linear maps. Let’s begin by rephrasing in terms of linear
maps the question of whether a homogeneous system of linear equations has a
nonzero solution.
Fix positive integers 𝑚 and 𝑛, and let Homogeneous, in this context, means
𝐴𝑗, 𝑘 ∈ 𝐅 for 𝑗 = 1, …, 𝑚 and 𝑘 = 1, …, 𝑛. that the constant term on the right side
Consider the homogeneous system of lin- of each equation below is 0.
ear equations
𝑛
∑ 𝐴1, 𝑘 𝑥𝑘 = 0
𝑘=1
⋮
𝑛
∑ 𝐴𝑚, 𝑘 𝑥𝑘 = 0.
𝑘=1
Clearly 𝑥1 = ⋯ = 𝑥𝑛 = 0 is a solution of the system of equations above; the
question here is whether any other solutions exist.
Define 𝑇 ∶ 𝐅𝑛 → 𝐅𝑚 by
𝑛 𝑛
3.25 𝑇(𝑥1 , …, 𝑥𝑛 ) = ( ∑ 𝐴1, 𝑘 𝑥𝑘 , …, ∑ 𝐴𝑚, 𝑘 𝑥𝑘 ).
𝑘=1 𝑘=1
The equation 𝑇(𝑥1 , …, 𝑥𝑛 ) = 0 (the 0 here is the additive identity in 𝐅𝑚, namely,
the list of length 𝑚 of all 0’s) is the same as the homogeneous system of linear
equations above.
Thus we want to know if null 𝑇 is strictly bigger than {0}, which is equivalent
to 𝑇 not being injective (by 3.15). The next result gives an important condition
for ensuring that 𝑇 is not injective.
Section 3B Null Spaces and Ranges 65
Proof Use the notation and result from the discussion above. Thus 𝑇 is a linear
map from 𝐅𝑛 to 𝐅𝑚, and we have a homogeneous system of 𝑚 linear equations
with 𝑛 variables 𝑥1 , …, 𝑥𝑛 . From 3.22 we see that 𝑇 is not injective if 𝑛 > 𝑚.
Proof Use the notation and result from the example above. Thus 𝑇 is a linear
map from 𝐅𝑛 to 𝐅𝑚, and we have a system of 𝑚 equations with 𝑛 variables 𝑥1 , …, 𝑥𝑛 .
From 3.24 we see that 𝑇 is not surjective if 𝑛 < 𝑚.
Exercises 3B
1 Give an example of a linear map 𝑇 with dim null 𝑇 = 3 and dim range 𝑇 = 2.
2 Suppose 𝑆, 𝑇 ∈ ℒ(𝑉) are such that range 𝑆 ⊆ null 𝑇. Prove that (𝑆𝑇)2 = 0.
3 Suppose 𝑣1 , …, 𝑣𝑚 is a list of vectors in 𝑉. Define 𝑇 ∈ ℒ(𝐅𝑚, 𝑉) by
𝑇(𝑧1 , …, 𝑧𝑚 ) = 𝑧1 𝑣1 + ⋯ + 𝑧𝑚 𝑣𝑚 .
24 (a) Suppose dim 𝑉 = 5 and 𝑆, 𝑇 ∈ ℒ(𝑉) are such that 𝑆𝑇 = 0. Prove that
dim range 𝑇𝑆 ≤ 2.
(b) Give an example of 𝑆, 𝑇 ∈ ℒ(𝐅5 ) with 𝑆𝑇 = 0 and dim range 𝑇𝑆 = 2.
25 Suppose that 𝑊 is finite-dimensional and 𝑆, 𝑇 ∈ ℒ(𝑉, 𝑊). Prove that
null 𝑆 ⊆ null 𝑇 if and only if there exists 𝐸 ∈ ℒ(𝑊) such that 𝑇 = 𝐸𝑆.
26 Suppose that 𝑉 is finite-dimensional and 𝑆, 𝑇 ∈ ℒ(𝑉, 𝑊). Prove that
range 𝑆 ⊆ range 𝑇 if and only if there exists 𝐸 ∈ ℒ(𝑉) such that 𝑆 = 𝑇𝐸.
27 Suppose 𝑃 ∈ ℒ(𝑉) and 𝑃2 = 𝑃. Prove that 𝑉 = null 𝑃 ⊕ range 𝑃.
28 Suppose 𝐷 ∈ ℒ(𝒫(𝐑)) is such that deg 𝐷𝑝 = (deg 𝑝) − 1 for every non-
constant polynomial 𝑝 ∈ 𝒫(𝐑). Prove that 𝐷 is surjective.
The notation 𝐷 is used above to remind you of the differentiation map that
sends a polynomial 𝑝 to 𝑝′.
68 Chapter 3 Linear Maps
33 Suppose that 𝑉 and 𝑊 are real vector spaces and 𝑇 ∈ ℒ(𝑉, 𝑊). Define
𝑇𝐂 ∶ 𝑉𝐂 → 𝑊𝐂 by
𝑇𝐂 (𝑢 + 𝑖𝑣) = 𝑇𝑢 + 𝑖𝑇𝑣
for all 𝑢, 𝑣 ∈ 𝑉.
(a) Show that 𝑇𝐂 is a (complex) linear map from 𝑉𝐂 to 𝑊𝐂 .
(b) Show that 𝑇𝐂 is injective if and only if 𝑇 is injective.
(c) Show that range 𝑇𝐂 = 𝑊𝐂 if and only if range 𝑇 = 𝑊.
See Exercise 8 in Section 1B for the definition of the complexification 𝑉𝐂 .
The linear map 𝑇𝐂 is called the complexification of the linear map 𝑇.
Section 3C Matrices 69
3C Matrices
Representing a Linear Map by a Matrix
We know that if 𝑣1 , …, 𝑣𝑛 is a basis of 𝑉 and 𝑇 ∶ 𝑉 → 𝑊 is linear, then the values
of 𝑇𝑣1 , …, 𝑇𝑣𝑛 determine the values of 𝑇 on arbitrary vectors in 𝑉—see the linear
map lemma (3.4). As we will soon see, matrices provide an efficient method of
recording the values of the 𝑇𝑣𝑘 ’s in terms of a basis of 𝑊.
If the bases 𝑣1 , …, 𝑣𝑛 and 𝑤1 , …, 𝑤𝑚 are not clear from the context, then the
notation ℳ(𝑇, (𝑣1 , …, 𝑣𝑛 ), (𝑤1 , …, 𝑤𝑚 )) is used.
𝑣1 ⋯ 𝑣𝑘 ⋯ 𝑣𝑛
𝑤1 𝐴1, 𝑘
⎛
⎜ ⎞
⎟.
ℳ(𝑇) = ⋮ ⎜
⎜ ⋮ ⎟
⎟
𝑤𝑚 ⎝ 𝐴𝑚, 𝑘 ⎠
In the matrix above only the 𝑘 th col-
The 𝑘 th column of ℳ(𝑇) consists of
umn is shown. Thus the second index of the scalars needed to write 𝑇𝑣 as a
each displayed entry of the matrix above linear combination of 𝑤 , …, 𝑤 :
𝑘
Because 𝑇(1, 0) = (1, 2, 7) and 𝑇(0, 1) = (3, 5, 9), the matrix of 𝑇 with respect
to the standard bases is the 3-by-2 matrix below:
1 3
⎛ ⎞
ℳ(𝑇) = ⎜
⎜ ⎟.
⎜ 2 5 ⎟
⎟
⎝ 7 9 ⎠
When working with 𝒫𝑚 (𝐅), use the standard basis 1, 𝑥, 𝑥2, …, 𝑥𝑚 unless the
context indicates otherwise.
In the next result, the assumption is that the same bases are used for all three
linear maps 𝑆 + 𝑇, 𝑆, and 𝑇.
The verification of the result above follows from the definitions and is left to
the reader.
Still assuming that we have some bases in mind, is the matrix of a scalar times
a linear map equal to the scalar times the matrix of the linear map? Again, the
question does not yet make sense because we have not defined scalar multiplication
on matrices. Fortunately, the natural definition again has the right properties.
3 1 4 2 6 2 4 2 10 4
2( )+( )=( )+( )=( )
−1 5 1 6 −2 10 1 6 −1 16
In the next result, the assumption is that the same bases are used for both the
linear maps 𝜆𝑇 and 𝑇.
For 𝑚 and 𝑛 positive integers, the set of all 𝑚-by-𝑛 matrices with entries in 𝐅
is denoted by 𝐅𝑚, 𝑛.
Proof The verification that 𝐅𝑚, 𝑛 is a vector space is left to the reader. Note that
the additive identity of 𝐅𝑚, 𝑛 is the 𝑚-by-𝑛 matrix all of whose entries equal 0.
The reader should also verify that the list of distinct 𝑚-by-𝑛 matrices that have
0 in all entries except for a 1 in one entry is a basis of 𝐅𝑚, 𝑛. There are 𝑚𝑛 such
matrices, so the dimension of 𝐅𝑚, 𝑛 equals 𝑚𝑛.
Matrix Multiplication
Suppose, as previously, that 𝑣1 , …, 𝑣𝑛 is a basis of 𝑉 and 𝑤1 , …, 𝑤𝑚 is a basis of 𝑊.
Suppose also that 𝑢1 , …, 𝑢𝑝 is a basis of 𝑈.
Consider linear maps 𝑇 ∶ 𝑈 → 𝑉 and 𝑆 ∶ 𝑉 → 𝑊. The composition 𝑆𝑇 is a
linear map from 𝑈 to 𝑊. Does ℳ(𝑆𝑇) equal ℳ(𝑆)ℳ(𝑇)? This question does
not yet make sense because we have not defined the product of two matrices. We
will choose a definition of matrix multiplication that forces this question to have
a positive answer. Let’s see how to do this.
Section 3C Matrices 73
Thus ℳ(𝑆𝑇) is the 𝑚-by-𝑝 matrix whose entry in row 𝑗, column 𝑘, equals
𝑛
∑ 𝐴𝑗, 𝑟 𝐵𝑟, 𝑘 .
𝑟=1
Now we see how to define matrix multiplication so that the desired equation
ℳ(𝑆𝑇) = ℳ(𝑆)ℳ(𝑇) holds.
Note that we define the product of You may have learned this definition
two matrices only when the number of of matrix multiplication in an earlier
columns of the first matrix equals the course, although you may not have
number of rows of the second matrix. seen this motivation for it.
In the next result, we assume that the same basis of 𝑉 is used in considering
𝑇 ∈ ℒ(𝑈, 𝑉) and 𝑆 ∈ ℒ(𝑉, 𝑊), the same basis of 𝑊 is used in considering
𝑆 ∈ ℒ(𝑉, 𝑊) and 𝑆𝑇 ∈ ℒ(𝑈, 𝑊), and the same basis of 𝑈 is used in considering
𝑇 ∈ ℒ(𝑈, 𝑉) and 𝑆𝑇 ∈ ℒ(𝑈, 𝑊).
The proof of the result above is the calculation that was done as motivation
before the definition of matrix multiplication.
In the next piece of notation, note that as usual the first index refers to a row
and the second index refers to a column, with a vertically centered dot used as a
placeholder.
3.45 example: 𝐴𝑗, ⋅ equals 𝑗th row of 𝐴 and 𝐴⋅, 𝑘 equals 𝑘 th column of 𝐴
The notation 𝐴2, ⋅ denotes the second row of 𝐴 and 𝐴⋅, 2 denotes the second
8 4 5
column of 𝐴. Thus if 𝐴 = ( ), then
1 9 7
4
𝐴2, ⋅ = ( 1 9 7 ) and 𝐴⋅, 2 = ( ).
9
The product of a 1-by-𝑛 matrix and an 𝑛-by-1 matrix is a 1-by-1 matrix. How-
ever, we will frequently identify a 1-by-1 matrix with its entry. For example,
6
( 3 4 )( ) = ( 26 )
2
The next result gives yet another way to think of matrix multiplication. In the
result below, (𝐴𝐵)⋅, 𝑘 is column 𝑘 of the 𝑚-by-𝑝 matrix 𝐴𝐵. Thus (𝐴𝐵)⋅, 𝑘 is an
𝑚-by-1 matrix. Also, 𝐴𝐵⋅, 𝑘 is an 𝑚-by-1 matrix because it is the product of an
𝑚-by-𝑛 matrix and an 𝑛-by-1 matrix. Thus the two sides of the equation in the
result below have the same size, making it reasonable that they might be equal.
Proof As discussed above, (𝐴𝐵)⋅, 𝑘 and 𝐴𝐵⋅, 𝑘 are both 𝑚-by-1 matrices. If 1 ≤
𝑗 ≤ 𝑚, then the entry in row 𝑗 of (𝐴𝐵)⋅, 𝑘 is the left side of 3.47 and the entry in
row 𝑗 of 𝐴𝐵⋅, 𝑘 is the right side of 3.47. Thus (𝐴𝐵)⋅, 𝑘 = 𝐴𝐵⋅, 𝑘 .
Our next result will give another way of thinking about the product of an
𝑚-by-𝑛 matrix and an 𝑛-by-1 matrix, motivated by the next example.
𝑏1
⎛ ⎞
Suppose 𝐴 is an 𝑚-by-𝑛 matrix and 𝑏 = ⎜ ⋮ ⎟
⎜
⎜ ⎟ is an 𝑛-by-1 matrix. Then
⎟
⎝ 𝑏𝑛 ⎠
𝐴𝑏 = 𝑏1 𝐴⋅, 1 + ⋯ + 𝑏𝑛 𝐴⋅, 𝑛 .
Proof If 𝑘 ∈ {1, …, 𝑚}, then the definition of matrix multiplication implies that
the entry in row 𝑘 of the 𝑚-by-1 matrix 𝐴𝑏 is
𝐴𝑘, 1 𝑏1 + ⋯ + 𝐴𝑘, 𝑛 𝑏𝑛 .
The entry in row 𝑘 of 𝑏1 𝐴⋅, 1 + ⋯ + 𝑏𝑛 𝐴⋅, 𝑛 also equals the number displayed above.
Because 𝐴𝑏 and 𝑏1 𝐴⋅, 1 + ⋯ + 𝑏𝑛 𝐴⋅, 𝑛 have the same entry in row 𝑘 for each
𝑘 ∈ {1, …, 𝑚}, we conclude that 𝐴𝑏 = 𝑏1 𝐴⋅, 1 + ⋯ + 𝑏𝑛 𝐴⋅, 𝑛 .
Our two previous results focus on the columns of a matrix. Analogous results
hold for the rows of a matrix. Specifically, see Exercises 8 and 9, which can be
proved using appropriate modifications of the proofs of 3.48 and 3.50.
The next result is the main tool used in the next subsection to prove the
column–row factorization (3.56) and to prove that the column rank of a matrix
equals the row rank (3.57). To be consistent with the notation often used with the
column–row factorization, including in the next subsection, the matrices in the
next result are called 𝐶 and 𝑅 instead of 𝐴 and 𝐵.
Proof Suppose 𝑘 ∈ {1, …, 𝑛}. Then column 𝑘 of 𝐶𝑅 equals 𝐶𝑅⋅, 𝑘 (by 3.48),
which equals the linear combination of the columns of 𝐶 with coefficients coming
from 𝑅⋅, 𝑘 (by 3.50). Thus (a) holds.
To prove (b), follow the pattern of the proof of (a) but use rows instead of
columns and use Exercises 8 and 9 instead of 3.48 and 3.50.
Section 3C Matrices 77
the other. Thus the span of this list of length four has dimension at least two. The
span of this list of vectors in 𝐅2, 1 cannot have dimension larger than two because
dim 𝐅2, 1 = 2. Thus the span of this list has dimension two, which means that the
column rank of 𝐴 is two.
The row rank of 𝐴 is the dimension of
span(( 4 7 1 8 ), ( 3 5 2 9 ))
in 𝐅1, 4. Neither of the two vectors listed above in 𝐅1, 4 is a scalar multiple of the
other. Thus the span of this list of length two has dimension two, which means
that the row rank of 𝐴 is two.
Proof Each column of 𝐴 is an 𝑚-by-1 matrix. The list 𝐴⋅, 1 , …, 𝐴⋅, 𝑛 of columns
of 𝐴 can be reduced to a basis of the span of the columns of 𝐴 (by 2.30). This
basis has length 𝑐, by the definition of the column rank. The 𝑐 columns in this
basis can be put together to form an 𝑚-by-𝑐 matrix 𝐶.
If 𝑘 ∈ {1, …, 𝑛}, then column 𝑘 of 𝐴 is a linear combination of the columns
of 𝐶. Make the coefficients of this linear combination into column 𝑘 of a 𝑐-by-𝑛
matrix that we call 𝑅. Then 𝐴 = 𝐶𝑅, as follows from 3.51(a).
In Example 3.53, the column rank and row rank turned out to equal each other.
The next result states that this happens for all matrices.
Suppose 𝐴 ∈ 𝐅𝑚, 𝑛. Then the column rank of 𝐴 equals the row rank of 𝐴.
Because the column rank equals the row rank, the last result allows us to
dispense with the terms “column rank” and “row rank” and just use the simpler
term “rank”.
See 3.133 and Exercise 8 in Section 7A for alternative proofs that the column
rank equals the row rank.
Exercises 3C
1 Suppose 𝑇 ∈ ℒ(𝑉, 𝑊). Show that with respect to each choice of bases of 𝑉
and 𝑊, the matrix of 𝑇 has at least dim range 𝑇 nonzero entries.
2 Suppose 𝑉 and 𝑊 are finite-dimensional and 𝑇 ∈ ℒ(𝑉, 𝑊). Prove that
dim range 𝑇 = 1 if and only if there exist a basis of 𝑉 and a basis of 𝑊 such
that with respect to these bases, all entries of ℳ(𝑇) equal 1.
3 Suppose 𝑣1 , …, 𝑣𝑛 is a basis of 𝑉 and 𝑤1 , …, 𝑤𝑚 is a basis of 𝑊.
(a) Show that if 𝑆, 𝑇 ∈ ℒ(𝑉, 𝑊), then ℳ(𝑆 + 𝑇) = ℳ(𝑆) + ℳ(𝑇).
(b) Show that if 𝜆 ∈ 𝐅 and 𝑇 ∈ ℒ(𝑉, 𝑊), then ℳ(𝜆𝑇) = 𝜆ℳ(𝑇).
This exercise asks you to verify 3.35 and 3.38.
1 0 0 0
⎛
⎜ ⎞
⎟
⎜
⎜ 0 1 0 0 ⎟
⎟.
⎝ 0 0 1 0 ⎠
Compare with Example 3.33. The next exercise generalizes this exercise.
5 Suppose 𝑉 and 𝑊 are finite-dimensional and 𝑇 ∈ ℒ(𝑉, 𝑊). Prove that there
exist a basis of 𝑉 and a basis of 𝑊 such that with respect to these bases, all
entries of ℳ(𝑇) are 0 except that the entries in row 𝑘, column 𝑘, equal 1 if
1 ≤ 𝑘 ≤ dim range 𝑇.
Try to find a clean proof that illustrates the following quote from Emil Artin:
“It is my experience that proofs involving matrices can be shortened by 50%
if one throws the matrices out.”
(𝐴𝐶)t = 𝐶 t 𝐴t.
This exercise shows that the transpose of the product of two matrices is the
product of the transposes in the opposite order.
The definition above mentions “an inverse”. However, the next result shows
that we can change this terminology to “the inverse”.
Now that we know that the inverse is unique, we can give it a notation.
3.61 notation: 𝑇 −1
The next result shows that a linear map is invertible if and only if it is one-to-
one and onto.
Proof Suppose 𝑇 ∈ ℒ(𝑉, 𝑊). We need to show that 𝑇 is invertible if and only
if it is injective and surjective.
First suppose 𝑇 is invertible. To show that 𝑇 is injective, suppose 𝑢, 𝑣 ∈ 𝑉
and 𝑇𝑢 = 𝑇𝑣. Then
𝑢 = 𝑇 −1 (𝑇𝑢) = 𝑇 −1 (𝑇𝑣) = 𝑣,
so 𝑢 = 𝑣. Hence 𝑇 is injective.
We are still assuming that 𝑇 is invertible. Now we want to prove that 𝑇 is
surjective. To do this, let 𝑤 ∈ 𝑊. Then 𝑤 = 𝑇(𝑇 −1 𝑤), which shows that 𝑤 is
in the range of 𝑇. Thus range 𝑇 = 𝑊. Hence 𝑇 is surjective, completing this
direction of the proof.
Now suppose 𝑇 is injective and surjective. We want to prove that 𝑇 is invertible.
For each 𝑤 ∈ 𝑊, define 𝑆(𝑤) to be the unique element of 𝑉 such that 𝑇(𝑆(𝑤)) = 𝑤
(the existence and uniqueness of such an element follow from the surjectivity and
injectivity of 𝑇). The definition of 𝑆 implies that 𝑇 ∘ 𝑆 equals the identity operator
on 𝑊.
To prove that 𝑆 ∘ 𝑇 equals the identity operator on 𝑉, let 𝑣 ∈ 𝑉. Then
𝑇((𝑆 ∘ 𝑇)𝑣) = (𝑇 ∘ 𝑆)(𝑇𝑣) = 𝐼(𝑇𝑣) = 𝑇𝑣.
Thus 𝜆𝑆(𝑤) is the unique element of 𝑉 that 𝑇 maps to 𝜆𝑤. By the definition of 𝑆,
this implies that 𝑆(𝜆𝑤) = 𝜆𝑆(𝑤). Hence 𝑆 is linear, as desired.
For a linear map from a vector space to itself, you might wonder whether
injectivity alone, or surjectivity alone, is enough to imply invertibility. On infinite-
dimensional vector spaces, neither condition alone implies invertibility, as illus-
trated by the next example, which uses two familiar linear maps from Example 3.3.
84 Chapter 3 Linear Maps
• The multiplication by 𝑥2 linear map from 𝒫(𝐑) to 𝒫(𝐑) (see 3.3) is injective
but it is not invertible because it is not surjective (the polynomial 1 is not in
the range).
• The backward shift linear map from 𝐅∞ to 𝐅∞ (see 3.3) is surjective but it is
not invertible because it is not injective [the vector (1, 0, 0, 0, … ) is in the null
space].
In view of the example above, the next result is remarkable—it states that for
a linear map from a finite-dimensional vector space to a vector space of the same
dimension, either injectivity or surjectivity alone implies the other condition.
Note that the hypothesis below that dim 𝑉 = dim 𝑊 is automatically satisfied in
the important special case where 𝑉 is finite-dimensional and 𝑊 = 𝑉.
The next example illustrates the power of the previous result. Although it is
possible to prove the result in the example below without using linear algebra, the
proof using linear algebra is cleaner and easier.
Section 3D Invertibility and Isomorphisms 85
Two finite-dimensional vector spaces over 𝐅 are isomorphic if and only if they
have the same dimension.
The previous result implies that each Every finite-dimensional vector space
finite-dimensional vector space 𝑉 is iso- is isomorphic to some 𝐅𝑛. Thus why not
morphic to 𝐅𝑛, where 𝑛 = dim 𝑉. For just study 𝐅𝑛 instead of more general
example, if 𝑚 is a nonnegative integer, vector spaces? To answer this ques-
then 𝒫𝑚 (𝐅) is isomorphic to 𝐅𝑚 + 1. tion, note that an investigation of 𝐅𝑛
Recall that the notation 𝐅𝑚, 𝑛 denotes would soon lead to other vector spaces.
the vector space of 𝑚-by-𝑛 matrices with For example, we would encounter the
entries in 𝐅. If 𝑣1 , …, 𝑣𝑛 is a basis of 𝑉 null space and range of linear maps.
and 𝑤1 , …, 𝑤𝑚 is a basis of 𝑊, then for Although each of these vector spaces
each 𝑇 ∈ ℒ(𝑉, 𝑊), we have a matrix is isomorphic to some 𝐅𝑚, thinking of
ℳ(𝑇) ∈ 𝐅𝑚, 𝑛. Thus once bases have them that way often adds complexity
been fixed for 𝑉 and 𝑊, ℳ becomes a but no new insight.
function from ℒ(𝑉, 𝑊) to 𝐅𝑚, 𝑛. Notice
that 3.35 and 3.38 show that ℳ is a lin-
ear map. This linear map is actually an
isomorphism, as we now show.
Now we can determine the dimension of the vector space of linear maps from
one finite-dimensional vector space to another.
Proof The desired result follows from 3.71, 3.70, and 3.40.
88 Chapter 3 Linear Maps
𝑣 = 𝑏1 𝑣1 + ⋯ + 𝑏𝑛 𝑣𝑛 .
Proof The desired result follows immediately from the definitions of ℳ(𝑇) and
ℳ(𝑇𝑣𝑘 ).
The next result shows how the notions of the matrix of a linear map, the matrix
of a vector, and matrix multiplication fit together.
ℳ(𝑇𝑣) = ℳ(𝑇)ℳ(𝑣).
Each 𝑚-by-𝑛 matrix 𝐴 induces a linear map from 𝐅𝑛, 1 to 𝐅𝑚, 1, namely the
matrix multiplication function that takes 𝑥 ∈ 𝐅𝑛, 1 to 𝐴𝑥 ∈ 𝐅𝑚, 1. The result above
can be used to think of every linear map (from a finite-dimensional vector space
to another finite-dimensional vector space) as a matrix multiplication map after
suitable relabeling via the isomorphisms given by ℳ. Specifically, if 𝑇 ∈ ℒ(𝑉, 𝑊)
and we identify 𝑣 ∈ 𝑉 with ℳ(𝑣) ∈ 𝐅𝑛, 1, then the result above says that we can
identify 𝑇𝑣 with ℳ(𝑇)ℳ(𝑣).
Because the result above allows us to think (via isomorphisms) of each linear
map as multiplication on 𝐅𝑛, 1 by some matrix 𝐴, keep in mind that the specific
matrix 𝐴 depends not only on the linear map but also on the choice of bases. One
of the themes of many of the most important results in later chapters will be the
choice of a basis that makes the matrix 𝐴 as simple as possible.
In this book, we concentrate on linear maps rather than on matrices. However,
sometimes thinking of linear maps as matrices (or thinking of matrices as linear
maps) gives important insights that we will find useful.
90 Chapter 3 Linear Maps
Notice that no bases are in sight in the statement of the next result. Although
ℳ(𝑇) in the next result depends on a choice of bases of 𝑉 and 𝑊, the next result
shows that the column rank of ℳ(𝑇) is the same for all such choices (because
range 𝑇 does not depend on a choice of basis).
Suppose 𝑉 and 𝑊 are finite-dimensional and 𝑇 ∈ ℒ(𝑉, 𝑊). Then dim range 𝑇
equals the column rank of ℳ(𝑇).
Change of Basis
In Section 3C we defined the matrix
ℳ(𝑇, (𝑣1 , …, 𝑣𝑛 ), (𝑤1 , …, 𝑤𝑚 ))
of a linear map 𝑇 from 𝑉 to a possibly different vector space 𝑊, where 𝑣1 , …, 𝑣𝑛
is a basis of 𝑉 and 𝑤1 , …, 𝑤𝑚 is a basis of 𝑊. For linear maps from a vector space
to itself, we usually use the same basis for both the domain vector space and the
target vector space. When using a single basis in both capacities, we often write
the basis only once. In other words, if 𝑇 ∈ ℒ(𝑉) and 𝑣1 , …, 𝑣𝑛 is a basis of 𝑉,
then the notation ℳ(𝑇, (𝑣1 , …, 𝑣𝑛 )) is defined by the equation
ℳ(𝑇, (𝑣1 , …, 𝑣𝑛 )) = ℳ(𝑇, (𝑣1 , …, 𝑣𝑛 ), (𝑣1 , …, 𝑣𝑛 )).
If the basis 𝑣1 , …, 𝑣𝑛 is clear from the context, then we can write just ℳ(𝑇).
with 1’s on the diagonal (the entries where the row number equals the column
number) and 0’s elsewhere is called the identity matrix and is denoted by 𝐼.
Section 3D Invertibility and Isomorphisms 91
In the definition above, the 0 in the lower left corner of the matrix indicates that
all entries below the diagonal are 0, and the 0 in the upper right corner indicates
that all entries above the diagonal are 0.
With respect to each basis of 𝑉, the matrix of the identity operator 𝐼 ∈ ℒ(𝑉)
is the identity matrix 𝐼. Note that the symbol 𝐼 is used to denote both the identity
operator and the identity matrix. The context indicates which meaning of 𝐼 is
intended. For example, consider the equation ℳ(𝐼) = 𝐼; on the left side 𝐼 denotes
the identity operator, and on the right side 𝐼 denotes the identity matrix.
If 𝐴 is a square matrix (with entries in 𝐅, as usual) of the same size as 𝐼, then
𝐴𝐼 = 𝐼𝐴 = 𝐴, as you should verify.
The same proof as used in 3.60 shows Some mathematicians use the terms
that if 𝐴 is an invertible square matrix, nonsingular and singular, which
then there is a unique matrix 𝐵 such that mean the same as invertible and non-
𝐴𝐵 = 𝐵𝐴 = 𝐼 (and thus the notation invertible.
𝐵 = 𝐴−1 is justified).
−1
If 𝐴 is an invertible matrix, then (𝐴−1 ) = 𝐴 because
𝐴−1 𝐴 = 𝐴𝐴−1 = 𝐼.
Also, if 𝐴 and 𝐶 are invertible square matrices of the same size, then 𝐴𝐶 is
invertible and (𝐴𝐶)−1 = 𝐶−1 𝐴−1 because
(𝐴𝐶)(𝐶−1 𝐴−1 ) = 𝐴(𝐶𝐶−1 )𝐴−1
= 𝐴𝐼𝐴−1
= 𝐴𝐴−1
= 𝐼,
The next result deals with the matrix of the identity operator 𝐼 with respect
to two different bases. Note that the 𝑘 th column of ℳ(𝐼, (𝑢1 , …, 𝑢𝑛 ), (𝑣1 , …, 𝑣𝑛 ))
consists of the scalars needed to write 𝑢𝑘 as a linear combination of the basis
𝑣1 , …, 𝑣𝑛 .
In the statement of the next result, 𝐼 denotes the identity operator from 𝑉 to 𝑉.
In the proof, 𝐼 also denotes the 𝑛-by-𝑛 identity matrix.
𝐴 = 𝐶−1 𝐵𝐶.
Substituting the equation above into 3.85 gives the equation 𝐴 = 𝐶−1 𝐵𝐶.
𝑣1 , …, 𝑣𝑛 .
Exercises 3D
𝑉1 × ⋯ × 𝑉𝑚 = {(𝑣1 , …, 𝑣𝑚 ) ∶ 𝑣1 ∈ 𝑉1 , …, 𝑣𝑚 ∈ 𝑉𝑚 }.
• Addition on 𝑉1 × ⋯ × 𝑉𝑚 is defined by
Also, 2(5 − 6𝑥 + 4𝑥2, (3, 8, 7)) = (10 − 12𝑥 + 8𝑥2, (6, 16, 14)).
The next result should be interpreted to mean that the product of vector spaces
is a vector space with the operations of addition and scalar multiplication as
defined by 3.87.
The proof of the result above is left to the reader. Note that the additive identity
of 𝑉1 × ⋯ × 𝑉𝑚 is (0, …, 0), where the 0 in the 𝑘 th slot is the additive identity of 𝑉𝑘 .
The additive inverse of (𝑣1 , …, 𝑣𝑚 ) ∈ 𝑉1 × ⋯ × 𝑉𝑚 is (−𝑣1 , …, −𝑣𝑚 ).
Section 3E Products and Quotients of Vector Spaces 97
(𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 ),
where 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 ∈ 𝐑 .
Although elements of 𝐑2 × 𝐑3 and 𝐑5 look similar, they are not the same kind
of object. Elements of 𝐑2 × 𝐑3 are lists of length two (with the first item itself a
list of length two and the second item a list of length three), and elements of 𝐑5
are lists of length five. Thus 𝐑2 × 𝐑3 does not equal 𝐑5.
The linear map This isomorphism is so natural that
we should think of it as a relabel-
((𝑥1 , 𝑥2 ), (𝑥3 , 𝑥4 , 𝑥5 )) ↦ (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 )
ing. Some people informally say that
is an isomorphism of the vector space 𝐑2 ×𝐑3 equals 𝐑5, which is not techni-
𝐑2 × 𝐑3 onto the vector space 𝐑5. Thus cally correct but which captures the
these two vector spaces are isomorphic, al- spirit of identification via relabeling.
though they are not equal.
The next example illustrates the idea that we will use in the proof of 3.92.
The list above is linearly independent and it spans 𝒫2 (𝐑) × 𝐑2. Thus it is a basis
of 𝒫2 (𝐑) × 𝐑2.
Proof Choose a basis of each 𝑉𝑘 . For each basis vector of each 𝑉𝑘 , consider the
element of 𝑉1 × ⋯ × 𝑉𝑚 that equals the basis vector in the 𝑘 th slot and 0 in the other
slots. The list of all such vectors is linearly independent and spans 𝑉1 × ⋯ × 𝑉𝑚 .
Thus it is a basis of 𝑉1 × ⋯ × 𝑉𝑚 . The length of this basis is dim 𝑉1 + ⋯ + dim 𝑉𝑚 ,
as desired.
98 Chapter 3 Linear Maps
Γ(𝑣1 , …, 𝑣𝑚 ) = 𝑣1 + ⋯ + 𝑣𝑚 .
Proof By 3.15, Γ is injective if and only if the only way to write 0 as a sum
𝑣1 + ⋯ + 𝑣𝑚 , where each 𝑣𝑘 is in 𝑉𝑘 , is by taking each 𝑣𝑘 equal to 0. Thus 1.45
shows that Γ is injective if and only if 𝑉1 + ⋯ + 𝑉𝑚 is a direct sum, as desired.
Quotient Spaces
We begin our approach to quotient spaces by defining the sum of a vector and a
subset.
3.95 notation: 𝑣 + 𝑈
𝑣 + 𝑈 = {𝑣 + 𝑢 ∶ 𝑢 ∈ 𝑈}.
Section 3E Products and Quotients of Vector Spaces 99
(17, 20) + 𝑈
Suppose 𝑈 is a subspace of 𝑉. Then the quotient space 𝑉/𝑈 is the set of all
translates of 𝑈. Thus
𝑉/𝑈 = {𝑣 + 𝑈 ∶ 𝑣 ∈ 𝑉}.
100 Chapter 3 Linear Maps
• If 𝑈 = {(𝑥, 2𝑥) ∈ 𝐑2 ∶ 𝑥 ∈ 𝐑}, then 𝐑2/𝑈 is the set of all lines in 𝐑2 that have
slope 2.
• If 𝑈 is a line in 𝐑3 containing the origin, then 𝐑3/𝑈 is the set of all lines in 𝐑3
parallel to 𝑈.
• If 𝑈 is a plane in 𝐑3 containing the origin, then 𝐑3/𝑈 is the set of all planes in
𝐑3 parallel to 𝑈.
Our next goal is to make 𝑉/𝑈 into a vector space. To do this, we will need the
next result.
𝑣 − 𝑤 ∈ 𝑈 ⟺ 𝑣 + 𝑈 = 𝑤 + 𝑈 ⟺ (𝑣 + 𝑈) ∩ (𝑤 + 𝑈) ≠ ∅.
𝑣 + 𝑢 = 𝑤 + ((𝑣 − 𝑤) + 𝑢) ∈ 𝑤 + 𝑈.
𝑣 + 𝑢1 = 𝑤 + 𝑢2 .
(𝑣 + 𝑈) + (𝑤 + 𝑈) = (𝑣 + 𝑤) + 𝑈
𝜆(𝑣 + 𝑈) = (𝜆𝑣) + 𝑈
As part of the proof of the next result, we will show that the definitions above
make sense.
Section 3E Products and Quotients of Vector Spaces 101
Proof The potential problem with the definitions above of addition and scalar
multiplication on 𝑉/𝑈 is that the representation of a translate of 𝑈 is not unique.
Specifically, suppose 𝑣1 , 𝑣2 , 𝑤1 , 𝑤2 ∈ 𝑉 are such that
𝑣1 + 𝑈 = 𝑣2 + 𝑈 and 𝑤1 + 𝑈 = 𝑤2 + 𝑈.
To show that the definition of addition on 𝑉/𝑈 given above makes sense, we must
show that (𝑣1 + 𝑤1 ) + 𝑈 = (𝑣2 + 𝑤2 ) + 𝑈.
By 3.101, we have
𝑣1 − 𝑣 2 ∈ 𝑈 and 𝑤1 − 𝑤2 ∈ 𝑈.
Because 𝑈 is a subspace of 𝑉 and thus is closed under addition, this implies that
(𝑣1 − 𝑣2 ) + (𝑤1 − 𝑤2 ) ∈ 𝑈. Thus (𝑣1 + 𝑤1 ) − (𝑣2 + 𝑤2 ) ∈ 𝑈. Using 3.101 again,
we see that
(𝑣1 + 𝑤1 ) + 𝑈 = (𝑣2 + 𝑤2 ) + 𝑈,
as desired. Thus the definition of addition on 𝑉/𝑈 makes sense.
Similarly, suppose 𝜆 ∈ 𝐅. We are still assuming that 𝑣1 + 𝑈 = 𝑣2 + 𝑈.
Because 𝑈 is a subspace of 𝑉 and thus is closed under scalar multiplication, we
have 𝜆(𝑣1 − 𝑣2 ) ∈ 𝑈. Thus 𝜆𝑣1 − 𝜆𝑣2 ∈ 𝑈. Hence 3.101 implies that
(𝜆𝑣1 ) + 𝑈 = (𝜆𝑣2 ) + 𝑈.
The reader should verify that 𝜋 is indeed a linear map. Although 𝜋 depends
on 𝑈 as well as 𝑉, these spaces are left out of the notation because they should be
clear from the context.
102 Chapter 3 Linear Maps
̃
3.106 notation: 𝑇
To show that the definition of 𝑇̃ makes sense, suppose 𝑢, 𝑣 ∈ 𝑉 are such that
𝑢 + null 𝑇 = 𝑣 + null 𝑇. By 3.101, we have 𝑢 − 𝑣 ∈ null 𝑇. Thus 𝑇(𝑢 − 𝑣) = 0.
Hence 𝑇𝑢 = 𝑇𝑣. Thus the definition of 𝑇 ̃ indeed makes sense. The routine
verification that 𝑇 is a linear map from 𝑉/(null 𝑇) to 𝑊 is left to the reader.
̃
The next result shows that we can think of 𝑇
̃ as a modified version of 𝑇, with
a domain that produces a one-to-one map.
(b) 𝑇
̃ is injective;
Proof
(a) If 𝑣 ∈ 𝑉, then (𝑇
̃ ∘ 𝜋)(𝑣) = 𝑇(𝜋(𝑣))
̃ ̃ + null 𝑇) = 𝑇𝑣, as desired.
= 𝑇(𝑣
(b) Suppose 𝑣 ∈ 𝑉 and 𝑇(𝑣 ̃ + null 𝑇) = 0. Then 𝑇𝑣 = 0. Thus 𝑣 ∈ null 𝑇.
Hence 3.101 implies that 𝑣 + null 𝑇 = 0 + null 𝑇. This implies that null 𝑇
̃=
{0 + null 𝑇}. Hence 𝑇 is injective, as desired.
̃
Exercises 3E
𝐴 = {(𝑥, 𝑦, 𝑧) ∈ 𝐑3 ∶ 2𝑥 + 3𝑦 + 5𝑧 = 𝑐}.
12 Suppose 𝑣1 , …, 𝑣𝑚 ∈ 𝑉. Let
3F Duality
Dual Space and Dual Map
Linear maps into the scalar field 𝐅 play a special role in linear algebra, and thus
they get a special name.
The vector space ℒ(𝑉, 𝐅) also gets a special name and special notation.
The dual space of 𝑉, denoted by 𝑉 ′, is the vector space of all linear functionals
on 𝑉. In other words, 𝑉 ′ = ℒ(𝑉, 𝐅).
dim 𝑉 ′ = dim 𝑉.
In the following definition, the linear map lemma (3.4) implies that each 𝜑𝑗 is
well defined.
⎧
{1 if 𝑘 = 𝑗,
𝜑𝑗 (𝑣𝑘 ) = ⎨
{
⎩0 if 𝑘 ≠ 𝑗.
𝜑𝑗 (𝑥1 , …, 𝑥𝑛 ) = 𝑥𝑗
The next result shows that the dual basis of a basis of 𝑉 consists of the linear
functionals on 𝑉 that give the coefficients for expressing a vector in 𝑉 as a linear
combination of the basis vectors.
𝑣 = 𝜑1 (𝑣)𝑣1 + ⋯ + 𝜑𝑛 (𝑣)𝑣𝑛
for each 𝑣 ∈ 𝑉.
3.115 𝑣 = 𝑐1 𝑣1 + ⋯ + 𝑐𝑛 𝑣𝑛 .
If 𝑗 ∈ {1, …, 𝑛}, then applying 𝜑𝑗 to both sides of the equation above gives
𝜑𝑗 (𝑣) = 𝑐𝑗 .
Substituting the values for 𝑐1 , …, 𝑐𝑛 given by the equation above into 3.115 shows
that 𝑣 = 𝜑1 (𝑣)𝑣1 + ⋯ + 𝜑𝑛 (𝑣)𝑣𝑛 .
Section 3F Duality 107
The next result shows that the dual basis is indeed a basis of the dual space.
Thus the terminology “dual basis” is justified.
3.117 𝑎1 𝜑1 + ⋯ + 𝑎𝑛 𝜑𝑛 = 0.
Now
(𝑎1 𝜑1 + ⋯ + 𝑎𝑛 𝜑𝑛 )(𝑣𝑘 ) = 𝑎𝑘
for each 𝑘 = 1, …, 𝑛. Thus 3.117 shows that 𝑎1 = ⋯ = 𝑎𝑛 = 0. Hence 𝜑1 , …, 𝜑𝑛
is linearly independent.
Because 𝜑1 , …, 𝜑𝑛 is a linearly independent list in 𝑉 ′ whose length equals
dim 𝑉 ′ (by 3.111), we can conclude that 𝜑1 , …, 𝜑𝑛 is a basis of 𝑉 ′ (see 2.38).
Suppose 𝑇 ∈ ℒ(𝑉, 𝑊). The dual map of 𝑇 is the linear map 𝑇 ′ ∈ ℒ(𝑊 ′, 𝑉 ′ )
defined for each 𝜑 ∈ 𝑊 ′ by
𝑇 ′(𝜑) = 𝜑 ∘ 𝑇.
• If 𝜆 ∈ 𝐅 and 𝜑 ∈ 𝑊 ′, then
The prime notation appears with two unrelated meanings in the next example:
𝐷′ denotes the dual of the linear map 𝐷, and 𝑝′ denotes the derivative of a
polynomial 𝑝.
108 Chapter 3 Linear Maps
(𝐷′(𝜑))(𝑝) = (𝜑 ∘ 𝐷)(𝑝)
= 𝜑(𝐷𝑝)
= 𝜑(𝑝′ )
1
= ∫ 𝑝′
0
= 𝑝(1) − 𝑝(0).
Thus 𝐷′(𝜑) is the linear functional on 𝒫(𝐑) taking 𝑝 to 𝑝(1) − 𝑝(0).
In the next result, (a) and (b) imply that the function that takes 𝑇 to 𝑇 ′ is a
linear map from ℒ(𝑉, 𝑊) to ℒ(𝑊 ′, 𝑉 ′ ).
In (c) below, note the reversal of order from 𝑆𝑇 on the left to 𝑇 ′ 𝑆′ on the right.
Proof The proofs of (a) and (b) are left to the reader.
To prove (c), suppose 𝜑 ∈ 𝑈 ′. Then
(𝑆𝑇)′(𝜑) = 𝜑 ∘ (𝑆𝑇) = (𝜑 ∘ 𝑆) ∘ 𝑇 = 𝑇 ′(𝜑 ∘ 𝑆) = 𝑇 ′ (𝑆′(𝜑)) = (𝑇 ′ 𝑆′ )(𝜑),
where the first, third, and fourth equal- Some books use the notation 𝑉 ∗ and
ities above hold because of the defini- 𝑇 ∗ for duality instead of 𝑉 ′ and 𝑇 ′.
tion of the dual map, the second equality However, here we reserve the notation
holds because composition of functions 𝑇 ∗ for the adjoint, which will be intro-
is associative, and the last equality fol- duced when we study linear maps on
lows from the definition of composition. inner product spaces in Chapter 7.
The equation above shows that
(𝑆𝑇)′(𝜑) = (𝑇 ′ 𝑆′ )(𝜑) for all 𝜑 ∈ 𝑈 ′.
Thus (𝑆𝑇)′ = 𝑇 ′ 𝑆′.
Section 3F Duality 109
Proof Note that 0 ∈ 𝑈 0 (here 0 is the zero linear functional on 𝑉) because the
zero linear functional applied to every vector in 𝑈 is the zero vector in 𝑉.
Suppose 𝜑, 𝜓 ∈ 𝑈 0. Thus 𝜑, 𝜓 ∈ 𝑉 ′ and 𝜑(𝑢) = 𝜓(𝑢) = 0 for every 𝑢 ∈ 𝑈.
If 𝑢 ∈ 𝑈, then
(𝜑 + 𝜓)(𝑢) = 𝜑(𝑢) + 𝜓(𝑢) = 0 + 0 = 0.
Thus 𝜑 + 𝜓 ∈ 𝑈 0.
Similarly, 𝑈 0 is closed under scalar multiplication. Thus 1.34 implies that 𝑈 0
is a subspace of 𝑉 ′.
The next result shows that dim 𝑈 0 is the difference of dim 𝑉 and dim 𝑈. For
example, this shows that if 𝑈 is a two-dimensional subspace of 𝐑5, then 𝑈 0 is a
′
three-dimensional subspace of (𝐑5 ) , as in Example 3.123.
The next result can be proved following the pattern of Example 3.123: choose
a basis 𝑢1 , …, 𝑢𝑚 of 𝑈, extend to a basis 𝑢1 , …, 𝑢𝑚 , …, 𝑢𝑛 of 𝑉, let 𝜑1 , …, 𝜑𝑚 , …, 𝜑𝑛
be the dual basis of 𝑉 ′, and then show that 𝜑𝑚 + 1 , …, 𝜑𝑛 is a basis of 𝑈 0, which
implies the desired result. You should construct the proof just outlined, even
though a slicker proof is presented here.
Proof Let 𝑖 ∈ ℒ(𝑈, 𝑉) be the inclusion map defined by 𝑖(𝑢) = 𝑢 for each 𝑢 ∈ 𝑈.
Thus 𝑖′ is a linear map from 𝑉 ′ to 𝑈 ′. The fundamental theorem of linear maps
(3.21) applied to 𝑖′ shows that
However, null 𝑖′ = 𝑈 0 (as can be seen by thinking about the definitions) and
dim 𝑉 ′ = dim 𝑉 (by 3.111), so we can rewrite the equation above as
and then 3.126 becomes the equation dim 𝑈 + dim 𝑈 0 = dim 𝑉, as desired.
Section 3F Duality 111
The next result can be a useful tool to show that a subspace is as big as
possible—see (a)—or to show that a subspace is as small as possible—see (b).
3.127 condition for the annihilator to equal {0} or the whole space
The proof of (a) in the next result does not use the hypothesis that 𝑉 and 𝑊
are finite-dimensional.
Proof
(a) First suppose 𝜑 ∈ null 𝑇 ′. Thus 0 = 𝑇 ′(𝜑) = 𝜑 ∘ 𝑇. Hence
0 = (𝜑 ∘ 𝑇)(𝑣) = 𝜑(𝑇𝑣) for every 𝑣 ∈ 𝑉.
Thus 𝜑 ∈ (range 𝑇)0. This implies that null 𝑇 ′ ⊆ (range 𝑇)0.
To prove the inclusion in the opposite direction, now suppose 𝜑 ∈ (range 𝑇)0.
Thus 𝜑(𝑇𝑣) = 0 for every vector 𝑣 ∈ 𝑉. Hence 0 = 𝜑 ∘ 𝑇 = 𝑇 ′(𝜑). In other
words, 𝜑 ∈ null 𝑇 ′, which shows that (range 𝑇)0 ⊆ null 𝑇 ′, completing the
proof of (a).
112 Chapter 3 Linear Maps
(b) We have
dim null 𝑇 ′ = dim(range 𝑇)0
= dim 𝑊 − dim range 𝑇
= dim 𝑊 − (dim 𝑉 − dim null 𝑇)
= dim null 𝑇 + dim 𝑊 − dim 𝑉,
where the first equality comes from (a), the second equality comes from
3.125, and the third equality comes from the fundamental theorem of linear
maps (3.21).
The next result can be useful because sometimes it is easier to verify that 𝑇 ′
is injective than to show directly that 𝑇 is surjective.
𝑇 is surjective ⟺ 𝑇 ′ is injective.
Proof We have
𝑇 ∈ ℒ(𝑉, 𝑊) is surjective ⟺ range 𝑇 = 𝑊
⟺ (range 𝑇)0 = {0}
⟺ null 𝑇 ′ = {0}
⟺ 𝑇 ′ is injective,
where the second equivalence comes from 3.127(a) and the third equivalence
comes from 3.128(a).
Proof
(a) We have
dim range 𝑇 ′ = dim 𝑊 ′ − dim null 𝑇 ′
= dim 𝑊 − dim(range 𝑇)0
= dim range 𝑇,
where the first equality comes from 3.21, the second equality comes from
3.111 and 3.128(a), and the third equality comes from 3.125.
Section 3F Duality 113
(b) First suppose 𝜑 ∈ range 𝑇 ′. Thus there exists 𝜓 ∈ 𝑊 ′ such that 𝜑 = 𝑇 ′(𝜓).
If 𝑣 ∈ null 𝑇, then
𝜑(𝑣) = (𝑇 ′(𝜓))𝑣 = (𝜓 ∘ 𝑇)(𝑣) = 𝜓(𝑇𝑣) = 𝜓(0) = 0.
Hence 𝜑 ∈ (null 𝑇)0. This implies that range 𝑇 ′ ⊆ (null 𝑇)0.
We will complete the proof by showing that range 𝑇 ′ and (null 𝑇)0 have the
same dimension. To do this, note that
dim range 𝑇 ′ = dim range 𝑇
= dim 𝑉 − dim null 𝑇
= dim(null 𝑇)0,
where the first equality comes from (a), the second equality comes from 3.21,
and the third equality comes from 3.125.
𝑇 is injective ⟺ 𝑇 ′ is surjective.
Proof We have
𝑇 is injective ⟺ null 𝑇 = {0}
⟺ (null 𝑇)0 = 𝑉 ′
⟺ range 𝑇 ′ = 𝑉 ′,
where the second equivalence follows from 3.127(b) and the third equivalence
follows from 3.130(b).
= 𝐶 𝑘, 𝑗 .
We also have
(𝜓𝑗 ∘ 𝑇)(𝑣𝑘 ) = 𝜓𝑗 (𝑇𝑣𝑘 )
𝑚
= 𝜓𝑗 ( ∑ 𝐴𝑟, 𝑘 𝑤𝑟 )
𝑟=1
𝑚
= ∑ 𝐴𝑟, 𝑘 𝜓𝑗 (𝑤𝑟 )
𝑟=1
= 𝐴𝑗, 𝑘 .
Comparing the last line of the last two sets of equations, we have 𝐶𝑘, 𝑗 = 𝐴𝑗, 𝑘 .
Thus 𝐶 = 𝐴t. In other words, ℳ(𝑇 ′ ) = (ℳ(𝑇))t, as desired.
Now we use duality to give an alternative proof that the column rank of a
matrix equals the row rank of the matrix. This result was previously proved using
different tools—see 3.57.
Suppose 𝐴 ∈ 𝐅𝑚, 𝑛. Then the column rank of 𝐴 equals the row rank of 𝐴.
See Exercise 8 in Section 7A for another alternative proof of the result above.
Section 3F Duality 115
Exercises 3F
𝑇𝑣 = 𝜑1 (𝑣)𝑤1 + ⋯ + 𝜑𝑚 (𝑣)𝑤𝑚 ,
𝑝(𝑘) (0)
𝜑𝑘 (𝑝) = .
𝑘!
Here 𝑝(𝑘) denotes the 𝑘 th derivative of 𝑝, with the understanding that the 0th
derivative of 𝑝 is 𝑝.
13 Show that the dual map of the identity operator on 𝑉 is the identity operator
on 𝑉 ′.
14 Define 𝑇 ∶ 𝐑3 → 𝐑2 by
𝑇(𝑥, 𝑦, 𝑧) = (4𝑥 + 5𝑦 + 6𝑧, 7𝑥 + 8𝑦 + 9𝑧).
for each 𝑥 ∈ 𝐑 .
(a) Suppose 𝜑 ∈ 𝒫(𝐑)′ is defined by 𝜑(𝑝) = 𝑝′(4). Describe the linear
functional 𝑇 ′(𝜑) on 𝒫(𝐑).
(b) Suppose 𝜑 ∈ 𝒫(𝐑)′ is defined by 𝜑(𝑝) = ∫01 𝑝. Evaluate (𝑇 ′(𝜑))(𝑥3 ).
(Λ𝑣)(𝜑) = 𝜑(𝑣)
• 𝐅 denotes 𝐑 or 𝐂 .
Alireza Javaheri CC BY
𝑧 = Re 𝑧 + (Im 𝑧)𝑖.
Suppose 𝑧 ∈ 𝐂 .
• The complex conjugate of 𝑧 ∈ 𝐂 , denoted by 𝑧, is defined by
𝑧 = Re 𝑧 − (Im 𝑧)𝑖.
4.3 example: real and imaginary part, complex conjugate, absolute value
Suppose 𝑧 = 3 + 2𝑖. Then
• Re 𝑧 = 3 and Im 𝑧 = 2;
• 𝑧 = 3 − 2𝑖;
• |𝑧| = √32 + 22 = √13.
Proof Except for the last item above, Geometric interpretation of triangle in-
the routine verifications of the assertions equality: The length of each side of a
above are left to the reader. To verify the triangle is less than or equal to the sum
triangle inequality, we have of the lengths of the two other sides.
|𝑤 + 𝑧|2 = (𝑤 + 𝑧)(𝑤 + 𝑧)
= 𝑤𝑤 + 𝑧𝑧 + 𝑤𝑧 + 𝑧𝑤
= |𝑤|2 + |𝑧|2 + 𝑤𝑧 + 𝑤𝑧
= |𝑤|2 + |𝑧|2 + 2 Re(𝑤𝑧)
≤ |𝑤|2 + |𝑧|2 + 2∣𝑤𝑧∣
= |𝑤|2 + |𝑧|2 + 2|𝑤| |𝑧|
= (|𝑤| + |𝑧|)2.
Taking square roots now gives the desired See Exercise 2 for the reverse triangle
inequality |𝑤 + 𝑧| ≤ |𝑤| + |𝑧|. inequality.
122 Chapter 4 Polynomials
Zeros of Polynomials
Recall that a function 𝑝 ∶ 𝐅 → 𝐅 is called a polynomial of degree 𝑚 if there exist
𝑎0 , …, 𝑎𝑚 ∈ 𝐅 with 𝑎𝑚 ≠ 0 such that
𝑝(𝑧) = 𝑎0 + 𝑎1 𝑧 + ⋯ + 𝑎𝑚 𝑧𝑚
for all 𝑧 ∈ 𝐅. A polynomial could have more than one degree if the representation
of 𝑝 in the form above were not unique. Our first task is to show that this cannot
happen.
The solutions to the equation 𝑝(𝑧) = 0 play a crucial role in the study of a
polynomial 𝑝 ∈ 𝒫(𝐅). Thus these solutions have a special name.
𝑝(𝜆) = 0.
The next result is the key tool that we will use to show that the degree of a
polynomial is unique.
𝑝(𝑧) = (𝑧 − 𝜆)𝑞(𝑧)
for every 𝑧 ∈ 𝐅.
Now we can prove that polynomials do not have too many zeros.
The result above implies that the coefficients of a polynomial are uniquely
determined (because if a polynomial had two different sets of coefficients, then
subtracting the two representations of the polynomial would give a polynomial
with some nonzero coefficients but infinitely many zeros). In particular, the degree
of a polynomial is uniquely defined.
Recall that the degree of the 0 poly- The 0 polynomial is declared to have
nomial is defined to be −∞. When degree −∞ so that exceptions are not
necessary, use the expected arithmetic needed for various reasonable results
with −∞. For example, −∞ < 𝑚 and such as deg(𝑝𝑞) = deg 𝑝 + deg 𝑞.
−∞ + 𝑚 = −∞ for every integer 𝑚.
Proof Let 𝑛 = deg 𝑝 and let 𝑚 = deg 𝑠. If 𝑛 < 𝑚, then take 𝑞 = 0 and 𝑟 = 𝑝 to
get the desired equation 𝑝 = 𝑠𝑞 + 𝑟 with deg 𝑟 < deg 𝑠. Thus we now assume that
𝑛 ≥ 𝑚.
The list
4.10 1, 𝑧, …, 𝑧𝑚 − 1 , 𝑠, 𝑧𝑠, …, 𝑧𝑛 − 𝑚 𝑠
is linearly independent in 𝒫𝑛 (𝐅) because each polynomial in this list has a different
degree. Also, the list 4.10 has length 𝑛 + 1, which equals dim 𝒫𝑛 (𝐅). Hence 4.10
is a basis of 𝒫𝑛 (𝐅) [by 2.38].
Because 𝑝 ∈ 𝒫𝑛 (𝐅) and 4.10 is a basis of 𝒫𝑛 (𝐅), there exist unique constants
𝑎0 , 𝑎1 , …, 𝑎𝑚 − 1 ∈ 𝐅 and 𝑏0 , 𝑏1 , …, 𝑏𝑛 − 𝑚 ∈ 𝐅 such that
4.11 𝑝 = 𝑎0 + 𝑎1 𝑧 + ⋯ + 𝑎𝑚 − 1 𝑧𝑚 − 1 + 𝑏0 𝑠 + 𝑏1 𝑧𝑠 + ⋯ + 𝑏𝑛 − 𝑚 𝑧𝑛 − 𝑚 𝑠
0 + 𝑎1 𝑧 + ⋯ + 𝑎𝑚 − 1 𝑧
= 𝑎⏟⏟⏟⏟⏟⏟⏟⏟⏟ 𝑚 − 1 + 𝑠(𝑏 + 𝑏 𝑧 + ⋯ + 𝑏 𝑛 − 𝑚 ).
⏟⏟⏟⏟⏟⏟⏟⏟⏟
0 1 𝑛 − 𝑚𝑧
𝑟 𝑞
other proofs of the fundamental theorem of algebra. The proof using Liouville’s
theorem is particularly nice if you are comfortable with analytic functions. All
proofs of the fundamental theorem of algebra need to use some analysis, because
the result is not true if 𝐂 is replaced, for example, with the set of numbers of the
form 𝑐 + 𝑑𝑖 where 𝑐, 𝑑 are rational numbers.
Proof De Moivre’s theorem, which you can prove using induction on 𝑘 and the
addition formulas for cosine and sine, states that if 𝑘 is a positive integer and
𝜃 ∈ 𝐑 , then
(cos 𝜃 + 𝑖 sin 𝜃)𝑘 = cos 𝑘𝜃 + 𝑖 sin 𝑘𝜃.
Suppose 𝑤 ∈ 𝐂 and 𝑘 is a positive integer. Using polar coordinates, we know
that there exist 𝑟 ≥ 0 and 𝜃 ∈ 𝐑 such that
𝑟(cos 𝜃 + 𝑖 sin 𝜃) = 𝑤.
De Moivre’s theorem implies that
𝑘
(𝑟1/𝑘 (cos 𝜃𝑘 + 𝑖 sin 𝜃𝑘 )) = 𝑤.
Thus every complex number has a 𝑘 th root, a fact that we will soon use.
Suppose 𝑝 is a nonconstant polynomial with complex coefficients and highest-
order nonzero term 𝑐𝑚 𝑧𝑚. Then |𝑝(𝑧)| → ∞ as |𝑧| → ∞ (because |𝑝(𝑧)|/∣𝑧𝑚 ∣ → |𝑐𝑚 |
as |𝑧| → ∞). Thus the continuous function 𝑧 ↦ |𝑝(𝑧)| has a global minimum at
some point 𝜁 ∈ 𝐂 . To show that 𝑝(𝜁 ) = 0, suppose that 𝑝(𝜁 ) ≠ 0.
Define a new polynomial 𝑞 by
𝑝(𝑧 + 𝜁 )
𝑞(𝑧) = .
𝑝(𝜁 )
The function 𝑧 ↦ |𝑞(𝑧)| has a global minimum value of 1 at 𝑧 = 0. Write
𝑞(𝑧) = 1 + 𝑎𝑘 𝑧𝑘 + ⋯ + 𝑎𝑚 𝑧𝑚,
where 𝑘 is the smallest positive integer such that the coefficient of 𝑧𝑘 is nonzero;
in other words, 𝑎𝑘 ≠ 0.
1
Let 𝛽 ∈ 𝐂 be such that 𝛽𝑘 = − . There is a constant 𝑐 > 1 such that if
𝑎𝑘
𝑡 ∈ (0, 1), then
|𝑞(𝑡𝛽)| ≤ ∣1 + 𝑎𝑘 𝑡𝑘 𝛽𝑘 ∣ + 𝑡𝑘 + 1 𝑐
= 1 − 𝑡𝑘 (1 − 𝑡𝑐).
Thus taking 𝑡 to be 1/(2𝑐) in the inequality above, we have |𝑞(𝑡𝛽)| < 1, which
contradicts the assumption that the global minimum of 𝑧 ↦ |𝑞(𝑧)| is 1. This
contradiction implies that 𝑝(𝜁 ) = 0, showing that 𝑝 has a zero, as desired.
126 Chapter 4 Polynomials
where 𝑐, 𝜆1 , …, 𝜆𝑚 ∈ 𝐂 .
Proof Let
𝑝(𝑧) = 𝑎0 + 𝑎1 𝑧 + ⋯ + 𝑎𝑚 𝑧𝑚,
where 𝑎0 , …, 𝑎𝑚 are real numbers. Suppose 𝜆 ∈ 𝐂 is a zero of 𝑝. Then
𝑎0 + 𝑎1 𝜆 + ⋯ + 𝑎𝑚 𝜆𝑚 = 0.
where we have used basic properties of the complex conjugate (see 4.4). The
equation above shows that 𝜆 is a zero of 𝑝.
𝑥2 + 𝑏𝑥 + 𝑐 = (𝑥 − 𝜆1 )(𝑥 − 𝜆2 )
Conversely, now suppose 𝑏2 ≥ 4𝑐. Then there is a real number 𝑑 such that
2
𝑑 = 𝑏4 − 𝑐. From the displayed equation above, we have
2
𝑏 2
𝑥2 + 𝑏𝑥 + 𝑐 = (𝑥 + ) − 𝑑2
2
𝑏 𝑏
= (𝑥 + + 𝑑)(𝑥 + − 𝑑),
2 2
which gives the desired factorization.
Proof First we will prove that the desired factorization exists, and after that we
will prove the uniqueness.
Think of 𝑝 as an element of 𝒫(𝐂). If all (complex) zeros of 𝑝 are real, then
we have the desired factorization by 4.13. Thus suppose 𝑝 has a zero 𝜆 ∈ 𝐂 with
𝜆 ∉ 𝐑 . By 4.14, 𝜆 is a zero of 𝑝. Thus we can write
Chapter 4 Polynomials 129
for some polynomial 𝑞 ∈ 𝒫(𝐂) of degree two less than the degree of 𝑝. If we
can prove that 𝑞 has real coefficients, then using induction on the degree of 𝑝
completes the proof of the existence part of this result.
To prove that 𝑞 has real coefficients, we solve the equation above for 𝑞, getting
𝑝(𝑥)
𝑞(𝑥) =
𝑥2 − 2(Re 𝜆)𝑥 + |𝜆|2
for all 𝑥 ∈ 𝐑 . The equation above implies that 𝑞(𝑥) ∈ 𝐑 for all 𝑥 ∈ 𝐑 . Writing
𝑞(𝑥) = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛 − 2 𝑥𝑛 − 2,
for all 𝑥 ∈ 𝐑 . This implies that Im 𝑎0 , …, Im 𝑎𝑛 − 2 all equal 0 (by 4.8). Thus all
coefficients of 𝑞 are real, as desired. Hence the desired factorization exists.
Now we turn to the question of uniqueness of our factorization. A factor of 𝑝
of the form 𝑥2 + 𝑏𝑘 𝑥 + 𝑐𝑘 with 𝑏𝑘2 < 4𝑐𝑘 can be uniquely written as (𝑥 − 𝜆𝑘 )(𝑥 − 𝜆𝑘 )
with 𝜆𝑘 ∈ 𝐂 . A moment’s thought shows that two different factorizations of 𝑝 as
an element of 𝒫(𝐑) would lead to two different factorizations of 𝑝 as an element
of 𝒫(𝐂), contradicting 4.13.
Exercises 4
8 Suppose 𝑝 ∈ 𝒫(𝐂) has degree 𝑚. Prove that 𝑝 has 𝑚 distinct zeros if and
only if 𝑝 and its derivative 𝑝′ have no zeros in common.
9 Prove that every polynomial of odd degree with real coefficients has a real
zero.
10 For 𝑝 ∈ 𝒫(𝐑), define 𝑇𝑝 ∶ 𝐑 → 𝐑 by
⎧ 𝑝(𝑥) − 𝑝(3)
{
{ if 𝑥 ≠ 3,
(𝑇𝑝)(𝑥) = ⎨ 𝑥 − 3
{
{𝑝′(3)
⎩ if 𝑥 = 3
for each 𝑥 ∈ 𝐑 . Show that 𝑇𝑝 ∈ 𝒫(𝐑) for every polynomial 𝑝 ∈ 𝒫(𝐑) and
also show that 𝑇 ∶ 𝒫(𝐑) → 𝒫(𝐑) is a linear map.
11 Suppose 𝑝 ∈ 𝒫(𝐂). Define 𝑞 ∶ 𝐂 → 𝐂 by
𝑞(𝑧) = 𝑝(𝑧) 𝑝(𝑧).
Prove that 𝑞 is a polynomial with real coefficients.
Chapter 4 Polynomials 131
𝑟𝑝 + 𝑠𝑞 = 1.
𝑇(𝑟, 𝑠) = 𝑟𝑝 + 𝑠𝑞.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0
International License (https://creativecommons.org/licenses/by-nc/4.0), which permits any noncommercial use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to original author and source, provide a link to the Creative Commons license, and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended use
is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
Chapter 5
Eigenvalues and Eigenvectors
Linear maps from one vector space to another vector space were the objects of
study in Chapter 3. Now we begin our investigation of operators, which are linear
maps from a vector space to itself. Their study constitutes the most important
part of linear algebra.
To learn about an operator, we might try restricting it to a smaller subspace.
Asking for that restriction to be an operator will lead us to the notion of invariant
subspaces. Each one-dimensional invariant subspace arises from a vector that
the operator maps into a scalar multiple of the vector. This path will lead us to
eigenvectors and eigenvalues.
We will then prove one of the most important results in linear algebra: every
operator on a finite-dimensional nonzero complex vector space has an eigenvalue.
This result will allow us to show that for each operator on a finite-dimensional
complex vector space, there is a basis of the vector space with respect to which
the matrix of the operator has at least almost half its entries equal to 0.
• 𝐅 denotes 𝐑 or 𝐂 .
• 𝑉 denotes a vector space over 𝐅.
Hans-Peter Postel CC BY
5A Invariant Subspaces
Eigenvalues
Must an operator 𝑇 ∈ ℒ(𝑉) have any invariant subspaces other than {0}
and 𝑉? Later we will see that this question has an affirmative answer if 𝑉 is
finite-dimensional and dim 𝑉 > 1 (for 𝐅 = 𝐂 ) or dim 𝑉 > 2 (for 𝐅 = 𝐑); see
5.19 and Exercise 29 in Section 5B.
The previous example noted that null 𝑇 and range 𝑇 are invariant under 𝑇.
However, these subspaces do not necessarily provide easy answers to the question
above about the existence of invariant subspaces other than {0} and 𝑉, because
null 𝑇 may equal {0} and range 𝑇 may equal 𝑉 (this happens when 𝑇 is invertible).
We will return later to a deeper study of invariant subspaces. Now we turn to
an investigation of the simplest possible nontrivial invariant subspaces—invariant
subspaces of dimension one.
Take any 𝑣 ∈ 𝑉 with 𝑣 ≠ 0 and let 𝑈 equal the set of all scalar multiples of 𝑣:
𝑈 = {𝜆𝑣 ∶ 𝜆 ∈ 𝐅} = span(𝑣).
for (𝑥, 𝑦, 𝑧) ∈ 𝐅3. Then 𝑇(3, 1, −1) = (18, 6, −6) = 6(3, 1, −1). Thus 6 is an
eigenvalue of 𝑇.
Section 5A Invariant Subspaces 135
The equivalences in the next result, along with many deep results in linear
algebra, are valid only in the context of finite-dimensional vector spaces.
Proof Conditions (a) and (b) are equivalent because the equation 𝑇𝑣 = 𝜆𝑣
is equivalent to the equation (𝑇 − 𝜆𝐼)𝑣 = 0. Conditions (b), (c), and (d) are
equivalent by 3.65.
Now 𝑧 cannot equal 0 [otherwise 5.10 implies that 𝑤 = 0; we are looking for
solutions to 5.10 such that (𝑤, 𝑧) is not the 0 vector], so the equation above
leads to the equation
−1 = 𝜆2.
The solutions to this equation are 𝜆 = 𝑖 and 𝜆 = −𝑖.
You can verify that 𝑖 and −𝑖 are eigenvalues of 𝑇. Indeed, the eigenvectors
corresponding to the eigenvalue 𝑖 are the vectors of the form (𝑤, −𝑤𝑖), with
𝑤 ∈ 𝐂 and 𝑤 ≠ 0. Furthermore, the eigenvectors corresponding to the
eigenvalue −𝑖 are the vectors of the form (𝑤, 𝑤𝑖), with 𝑤 ∈ 𝐂 and 𝑤 ≠ 0.
In the next proof, we again use the equivalence
𝑇𝑣 = 𝜆𝑣 ⟺ (𝑇 − 𝜆𝐼)𝑣 = 0.
Proof Suppose the desired result is false. Then there exists a smallest positive
integer 𝑚 such that there exists a linearly dependent list 𝑣1 , …, 𝑣𝑚 of eigenvectors
of 𝑇 corresponding to distinct eigenvalues 𝜆1 , …, 𝜆𝑚 of 𝑇 (note that 𝑚 ≥ 2 because
an eigenvector is, by definition, nonzero). Thus there exist 𝑎1 , …, 𝑎𝑚 ∈ 𝐅, none of
which are 0 (because of the minimality of 𝑚), such that
𝑎1 𝑣1 + ⋯ + 𝑎𝑚 𝑣𝑚 = 0.
Apply 𝑇 − 𝜆𝑚 𝐼 to both sides of the equation above, getting
𝑎1 (𝜆1 − 𝜆𝑚 )𝑣1 + ⋯ + 𝑎𝑚 − 1 (𝜆𝑚 − 1 − 𝜆𝑚 )𝑣𝑚 − 1 = 0.
Because the eigenvalues 𝜆1 , …, 𝜆𝑚 are distinct, none of the coefficients above
equal 0. Thus 𝑣1 , …, 𝑣𝑚 − 1 is a linearly dependent list of 𝑚 − 1 eigenvectors of 𝑇
corresponding to distinct eigenvalues, contradicting the minimality of 𝑚. This
contradiction completes the proof.
The result above leads to a short proof of the result below, which puts an upper
bound on the number of distinct eigenvalues that an operator can have.
5.12 operator cannot have more eigenvalues than dimension of vector space
5.13 notation: 𝑇 𝑚
where 𝑚 and 𝑛 are arbitrary integers if 𝑇 is invertible and are nonnegative integers
if 𝑇 is not invertible.
Having defined powers of an operator, we can now define what it means to
apply a polynomial to an operator.
𝑝(𝑧) = 𝑎0 + 𝑎1 𝑧 + 𝑎2 𝑧2 + ⋯ + 𝑎𝑚 𝑧𝑚
𝑝(𝑇) = 𝑎0 𝐼 + 𝑎1 𝑇 + 𝑎2 𝑇 2 + ⋯ + 𝑎𝑚 𝑇 𝑚.
This is a new use of the symbol 𝑝 because we are applying 𝑝 to operators, not
just elements of 𝐅. The idea here is that to evaluate 𝑝(𝑇), we simply replace 𝑧 with
𝑇 in the expression defining 𝑝. Note that the constant term 𝑎0 in 𝑝(𝑧) becomes the
operator 𝑎0 𝐼 (which is a reasonable choice because 𝑎0 = 𝑎0 𝑧0 and thus we should
replace 𝑎0 with 𝑎0 𝑇 0, which equals 𝑎0 𝐼).
138 Chapter 5 Eigenvalues and Eigenvectors
If we fix an operator 𝑇 ∈ ℒ(𝑉), then the function from 𝒫(𝐅) to ℒ(𝑉) given
by 𝑝 ↦ 𝑝(𝑇) is linear, as you should verify.
(𝑝𝑞)(𝑧) = 𝑝(𝑧)𝑞(𝑧)
for all 𝑧 ∈ 𝐅.
Proof
𝑚 𝑛
(a) Suppose 𝑝(𝑧) = ∑ 𝑎𝑗 𝑧𝑗 and 𝑞(𝑧) = ∑ 𝑏𝑘 𝑧𝑘 for all 𝑧 ∈ 𝐅. Then
𝑗=0 𝑘=0
𝑚 𝑛
(𝑝𝑞)(𝑧) = ∑ ∑ 𝑎𝑗 𝑏𝑘 𝑧𝑗 + 𝑘.
𝑗 = 0 𝑘=0
Thus
𝑚 𝑛
(𝑝𝑞)(𝑇) = ∑ ∑ 𝑎𝑗 𝑏𝑘 𝑇 𝑗 + 𝑘
𝑗 = 0 𝑘=0
𝑚 𝑛
= ( ∑ 𝑎𝑗 𝑇 𝑗 )( ∑ 𝑏𝑘 𝑇 𝑘 )
𝑗=0 𝑘=0
= 𝑝(𝑇)𝑞(𝑇).
We observed earlier that if 𝑇 ∈ ℒ(𝑉), then the subspaces null 𝑇 and range 𝑇
are invariant under 𝑇 (see 5.4). Now we show that the null space and the range of
every polynomial of 𝑇 are also invariant under 𝑇.
Suppose 𝑇 ∈ ℒ(𝑉) and 𝑝 ∈ 𝒫(𝐅). Then null 𝑝(𝑇) and range 𝑝(𝑇) are
invariant under 𝑇.
Exercises 5A
22 Suppose 𝑇 ∈ ℒ(𝑉) and there exist nonzero vectors 𝑢 and 𝑤 in 𝑉 such that
𝑇𝑢 = 3𝑤 and 𝑇𝑤 = 3𝑢.
Prove that 3 or −3 is an eigenvalue of 𝑇.
23 Suppose 𝑉 is finite-dimensional and 𝑆, 𝑇 ∈ ℒ(𝑉). Prove that 𝑆𝑇 and 𝑇𝑆
have the same eigenvalues.
24 Suppose 𝐴 is an 𝑛-by-𝑛 matrix with entries in 𝐅. Define 𝑇 ∈ ℒ(𝐅𝑛 ) by
𝑇𝑥 = 𝐴𝑥, where elements of 𝐅𝑛 are thought of as 𝑛-by-1 column vectors.
(a) Suppose the sum of the entries in each row of 𝐴 equals 1. Prove that 1
is an eigenvalue of 𝑇.
(b) Suppose the sum of the entries in each column of 𝐴 equals 1. Prove that
1 is an eigenvalue of 𝑇.
The proof above makes crucial use of the fundamental theorem of algebra.
The comment following Exercise 16 helps explain why the fundamental theorem
of algebra is so tightly connected to the result above.
The hypothesis in the result above that 𝐅 = 𝐂 cannot be replaced with the
hypothesis that 𝐅 = 𝐑 , as shown by Example 5.9. The next example shows that
the finite-dimensional hypothesis in the result above also cannot be deleted.
Thus 3𝑒1 − 6𝑇𝑒1 = −𝑇 5 𝑒1 . The list 𝑒1 , 𝑇𝑒1 , 𝑇 2 𝑒1 , 𝑇 3 𝑒1 , 𝑇 4 𝑒1 , which equals the list
𝑒1 , 𝑒2 , 𝑒3 , 𝑒4 , 𝑒5 , is linearly independent, so no other linear combination of this list
equals −𝑇 5 𝑒1 . Hence the minimal polynomial of 𝑇 is 3 − 6𝑧 + 𝑧5.
𝑝(𝑧) = (𝑧 − 𝜆)𝑞(𝑧),
𝑝(𝑇)𝑣 = 𝑝(𝜆)𝑣.
A nonzero polynomial has at most as many distinct zeros as its degree (see 4.8).
Thus (a) of the previous result, along with the result that the minimal polynomial
of an operator on 𝑉 has degree at most dim 𝑉, gives an alternative proof of 5.12,
which states that an operator on 𝑉 has at most dim 𝑉 distinct eigenvalues.
Every monic polynomial is the minimal polynomial of some operator, as
shown by Exercise 16, which generalizes Example 5.26. Thus 5.27(a) shows that
finding exact expressions for the eigenvalues of an operator is equivalent to the
problem of finding exact expressions for the zeros of a polynomial (and thus is
not possible for some operators).
The matrix of 𝑇 with respect to the standard basis of 𝐂5 is the 5-by-5 matrix in
Example 5.26. As we showed in that example, the minimal polynomial of 𝑇 is
the polynomial
3 − 6𝑧 + 𝑧5.
No zero of the polynomial above can be expressed using rational numbers,
roots of rational numbers, and the usual rules of arithmetic (a proof of this would
take us considerably beyond linear algebra). Because the zeros of the polynomial
above are the eigenvalues of 𝑇 [by 5.27(a)], we cannot find an exact expression
for any eigenvalue of 𝑇 in any familiar form.
Numeric techniques, which we will not discuss here, show that the zeros of the
polynomial above, and thus the eigenvalues of 𝑇, are approximately the following
five complex numbers:
Note that the two nonreal zeros of this polynomial are complex conjugates of
each other, as we expect for a polynomial with real coefficients (see 4.14).
148 Chapter 5 Eigenvalues and Eigenvectors
The next result completely characterizes the polynomials that when applied to
an operator give the 0 operator.
See Exercise 25 for a result about quotient operators that is analogous to the
result above.
The next result shows that the constant term of the minimal polynomial of an
operator determines whether the operator is invertible.
Section 5B The Minimal Polynomial 149
Proof Recall that null(𝑇 2 + 𝑏𝑇 + 𝑐𝐼) is invariant under 𝑇 (by 5.18). By replacing
𝑉 with null(𝑇 2 + 𝑏𝑇 + 𝑐𝐼) and replacing 𝑇 with 𝑇 restricted to null(𝑇 2 + 𝑏𝑇 + 𝑐𝐼),
we can assume that 𝑇 2 + 𝑏𝑇 + 𝑐𝐼 = 0; we now need to prove that dim 𝑉 is even.
Suppose 𝜆 ∈ 𝐑 and 𝑣 ∈ 𝑉 are such that 𝑇𝑣 = 𝜆𝑣. Then
2 𝑏2
0 = (𝑇 2 + 𝑏𝑇 + 𝑐𝐼)𝑣 = (𝜆2 + 𝑏𝜆 + 𝑐)𝑣 = ((𝜆 + 2𝑏 ) + 𝑐 − 4
)𝑣.
The term in large parentheses above is a positive number. Thus the equation above
implies that 𝑣 = 0. Hence we have shown that 𝑇 has no eigenvectors.
Let 𝑈 be a subspace of 𝑉 that is invariant under 𝑇 and has the largest dimension
among all subspaces of 𝑉 that are invariant under 𝑇 and have even dimension. If
𝑈 = 𝑉, then we are done; otherwise assume there exists 𝑤 ∈ 𝑉 such that 𝑤 ∉ 𝑈.
Let 𝑊 = span(𝑤, 𝑇𝑤). Then 𝑊 is invariant under 𝑇 because 𝑇(𝑇𝑤) =
−𝑏𝑇𝑤 − 𝑐𝑤. Furthermore, dim 𝑊 = 2 because otherwise 𝑤 would be an eigen-
vector of 𝑇. Now
dim(𝑈 + 𝑊) = dim 𝑈 + dim 𝑊 − dim(𝑈 ∩ 𝑊) = dim 𝑈 + 2,
where 𝑈 ∩ 𝑊 = {0} because otherwise 𝑈 ∩ 𝑊 would be a one-dimensional
subspace of 𝑉 that is invariant under 𝑇 (impossible because 𝑇 has no eigenvectors).
Because 𝑈 + 𝑊 is invariant under 𝑇, the equation above shows that there exists
a subspace of 𝑉 invariant under 𝑇 of even dimension larger than dim 𝑈. Thus the
assumption that 𝑈 ≠ 𝑉 was incorrect. Hence 𝑉 has even dimension.
150 Chapter 5 Eigenvalues and Eigenvectors
The next result states that on odd-dimensional vector spaces, every operator
has an eigenvalue. We already know this result for finite-dimensional complex
vectors spaces (without the odd hypothesis). Thus in the proof below, we will
assume that 𝐅 = 𝐑 .
Exercises 5B
9 Suppose 𝑇 ∈ ℒ(𝑉) is such that with respect to some basis of 𝑉, all entries
of the matrix of 𝑇 are rational numbers. Explain why all coefficients of the
minimal polynomial of 𝑇 are rational numbers.
10 Suppose 𝑉 is finite-dimensional, 𝑇 ∈ ℒ(𝑉), and 𝑣 ∈ 𝑉. Prove that
span(𝑣, 𝑇𝑣, …, 𝑇 𝑚 𝑣) = span(𝑣, 𝑇𝑣, …, 𝑇 dim 𝑉 − 1 𝑣)
for all integers 𝑚 ≥ dim 𝑉 − 1.
11 Suppose 𝑉 is a two-dimensional vector space, 𝑇 ∈ ℒ(𝑉), and the matrix of
𝑎 𝑐
𝑇 with respect to some basis of 𝑉 is ( ).
𝑏 𝑑
(a) Show that 𝑇 2 − (𝑎 + 𝑑)𝑇 + (𝑎𝑑 − 𝑏𝑐)𝐼 = 0.
(b) Show that the minimal polynomial of 𝑇 equals
⎧
{𝑧 − 𝑎 if 𝑏 = 𝑐 = 0 and 𝑎 = 𝑑,
⎨𝑧2 − (𝑎 + 𝑑)𝑧 + (𝑎𝑑 − 𝑏𝑐)
{ otherwise.
⎩
152 Chapter 5 Eigenvalues and Eigenvectors
𝑎0 + 𝑎1 𝑧 + ⋯ + 𝑎𝑛 − 1 𝑧𝑛 − 1 + 𝑧𝑛.
The matrix above is called the companion matrix of the polynomial above.
This exercise shows that every monic polynomial is the minimal polynomial
of some operator. Hence a formula or an algorithm that could produce
exact eigenvalues for each operator on each 𝐅𝑛 could then produce exact
zeros for each polynomial [by 5.27(a)]. Thus there is no such formula or
algorithm. However, efficient numeric methods exist for obtaining very good
approximations for the eigenvalues of an operator.
5C Upper-Triangular Matrices
In Chapter 3 we defined the matrix of a linear map from a finite-dimensional vector
space to another finite-dimensional vector space. That matrix depends on a choice
of basis of each of the two vector spaces. Now that we are studying operators,
which map a vector space to itself, the emphasis is on using only one basis.
The notation ℳ(𝑇, (𝑣1 , …, 𝑣𝑛 )) is used if the basis is not clear from the context.
Operators have square matrices (meaning that the number of rows equals the
number of columns), rather than the more general rectangular matrices that we
considered earlier for linear maps.
If 𝑇 is an operator on 𝐅𝑛 and no ba-
The 𝑘 th column of the matrix ℳ(𝑇) is
sis is specified, assume that the basis in formed from the coefficients used to
question is the standard one (where the write 𝑇𝑣 as a linear combination of
𝑘 th basis vector is 1 in the 𝑘 th slot and 0
𝑘
the basis 𝑣1 , …, 𝑣𝑛 .
in all other slots). You can then think of
the 𝑘 th column of ℳ(𝑇) as 𝑇 applied to the 𝑘 th basis vector, where we identify
𝑛-by-1 column vectors with elements of 𝐅𝑛.
The diagonal of a square matrix consists of the entries on the line from the
upper left corner to the bottom right corner.
A square matrix is called upper triangular if all entries below the diagonal
are 0.
⎛
𝜆1 ∗ ⎞
⎜ ⎟
⎜
⎜ ⋱ ⎟;
⎟
⎝ 0 𝜆𝑛 ⎠
the 0 in the matrix above indicates that We often use ∗ to denote matrix entries
all entries below the diagonal in this that we do not know or that are irrele-
𝑛-by-𝑛 matrix equal 0. Upper-triangular vant to the questions being discussed.
matrices can be considered reasonably
simple—if 𝑛 is large, then at least almost half the entries in an 𝑛-by-𝑛 upper-
triangular matrix are 0.
156 Chapter 5 Eigenvalues and Eigenvectors
Proof First suppose (a) holds. To prove that (b) holds, suppose 𝑘 ∈ {1, …, 𝑛}. If
𝑗 ∈ {1, …, 𝑛}, then
𝑇𝑣𝑗 ∈ span(𝑣1 , …, 𝑣𝑗 )
because the matrix of 𝑇 with respect to 𝑣1 , …, 𝑣𝑛 is upper triangular. Because
span(𝑣1 , …, 𝑣𝑗 ) ⊆ span(𝑣1 , …, 𝑣𝑘 ) if 𝑗 ≤ 𝑘, we see that
𝑇𝑣𝑗 ∈ span(𝑣1 , …, 𝑣𝑘 )
for each 𝑗 ∈ {1, …, 𝑘}. Thus span(𝑣1 , …, 𝑣𝑘 ) is invariant under 𝑇, completing the
proof that (a) implies (b).
Now suppose (b) holds, so span(𝑣1 , …, 𝑣𝑘 ) is invariant under 𝑇 for each
𝑘 = 1, …, 𝑛. In particular, 𝑇𝑣𝑘 ∈ span(𝑣1 , …, 𝑣𝑘 ) for each 𝑘 = 1, …, 𝑛. Thus
(b) implies (c).
Now suppose (c) holds, so 𝑇𝑣𝑘 ∈ span(𝑣1 , …, 𝑣𝑘 ) for each 𝑘 = 1, …, 𝑛. This
means that when writing each 𝑇𝑣𝑘 as a linear combination of the basis vectors
𝑣1 , …, 𝑣𝑛 , we need to use only the vectors 𝑣1 , …, 𝑣𝑘 . Hence all entries under the
diagonal of ℳ(𝑇) are 0. Thus ℳ(𝑇) is an upper-triangular matrix, completing
the proof that (c) implies (a).
We have shown that (a) ⟹ (b) ⟹ (c) ⟹ (a), which shows that (a), (b),
and (c) are equivalent.
The next result tells us that if 𝑇 ∈ ℒ(𝑉) and with respect to some basis of 𝑉
we have
𝜆
⎛ 1
∗ ⎞
ℳ(𝑇) = ⎜ ⎜
⎜ ⋱ ⎟
⎟,
⎟
⎝ 0 𝜆𝑛 ⎠
then 𝑇 satisfies a simple equation depending on 𝜆1 , …, 𝜆𝑛 .
Suppose 𝑇 ∈ ℒ(𝑉) and 𝑉 has a basis with respect to which 𝑇 has an upper-
triangular matrix with diagonal entries 𝜆1 , …, 𝜆𝑛 . Then
(𝑇 − 𝜆1 𝐼)⋯(𝑇 − 𝜆𝑛 𝐼) = 0.
Section 5C Upper-Triangular Matrices 157
Proof First suppose 𝑇 has an upper-triangular matrix with respect to some basis
of 𝑉. Let 𝛼1 , …, 𝛼𝑛 denote the diagonal entries of that matrix. Define a polynomial
𝑞 ∈ 𝒫(𝐅) by
𝑞(𝑧) = (𝑧 − 𝛼1 )⋯(𝑧 − 𝛼𝑛 ).
Then 𝑞(𝑇) = 0, by 5.40. Hence 𝑞 is a polynomial multiple of the minimal polyno-
mial of 𝑇, by 5.29. Thus the minimal polynomial of 𝑇 equals (𝑧 − 𝜆1 )⋯(𝑧 − 𝜆𝑚 )
for some 𝜆1 , …, 𝜆𝑚 ∈ 𝐅 with {𝜆1 , …, 𝜆𝑚 } ⊆ {𝛼1 , …, 𝛼𝑛 }.
To prove the implication in the other direction, now suppose the minimal
polynomial of 𝑇 equals (𝑧 − 𝜆1 )⋯(𝑧 − 𝜆𝑚 ) for some 𝜆1 , …, 𝜆𝑚 ∈ 𝐅. We will use
induction on 𝑚. To get started, if 𝑚 = 1 then 𝑧 − 𝜆1 is the minimal polynomial of
𝑇, which implies that 𝑇 = 𝜆1 𝐼, which implies that the matrix of 𝑇 (with respect
to any basis of 𝑉) is upper triangular.
Now suppose 𝑚 > 1 and the desired result holds for all smaller positive
integers. Let
𝑈 = range(𝑇 − 𝜆𝑚 𝐼).
Then 𝑈 is invariant under 𝑇 [this is a special case of 5.18 with 𝑝(𝑧) = 𝑧 − 𝜆𝑚 ].
Thus 𝑇|𝑈 is an operator on 𝑈.
If 𝑢 ∈ 𝑈, then 𝑢 = (𝑇 − 𝜆𝑚 𝐼)𝑣 for some 𝑣 ∈ 𝑉 and
(𝑇 − 𝜆1 𝐼)⋯(𝑇 − 𝜆𝑚 − 1 𝐼)𝑢 = (𝑇 − 𝜆1 𝐼)⋯(𝑇 − 𝜆𝑚 𝐼)𝑣 = 0.
From 5.45 and 5.46, we conclude (using 5.39) that 𝑇 has an upper-triangular
matrix with respect to the basis 𝑢1 , …, 𝑢𝑀 , 𝑣1 , …, 𝑣𝑁 of 𝑉, as desired.
160 Chapter 5 Eigenvalues and Eigenvectors
The set of numbers {𝜆1 , …, 𝜆𝑚 } from the previous result equals the set of
eigenvalues of 𝑇 (because the set of zeros of the minimal polynomial of 𝑇 equals
the set of eigenvalues of 𝑇, by 5.27), although the list 𝜆1 , …, 𝜆𝑚 in the previous
result may contain repetitions.
In Chapter 8 we will improve even the wonderful result below; see 8.37 and
8.46.
Proof The desired result follows immediately from 5.44 and the second version
of the fundamental theorem of algebra (see 4.13).
For an extension of the result above to two operators 𝑆 and 𝑇 such that
𝑆𝑇 = 𝑇𝑆,
see 5.80. Also, for an extension to more than two operators, see Exercise 9(b) in
Section 5E.
Caution: If an operator 𝑇 ∈ ℒ(𝑉) has a upper-triangular matrix with respect
to some basis 𝑣1 , …, 𝑣𝑛 of 𝑉, then the eigenvalues of 𝑇 are exactly the entries on
the diagonal of ℳ(𝑇), as shown by 5.41, and furthermore 𝑣1 is an eigenvector of
𝑇. However, 𝑣2 , …, 𝑣𝑛 need not be eigenvectors of 𝑇. Indeed, a basis vector 𝑣𝑘 is
an eigenvector of 𝑇 if and only if all entries in the 𝑘 th column of the matrix of 𝑇
are 0, except possibly the 𝑘 th entry.
You may recall from a previous The row echelon form of the matrix
course that every matrix of numbers can of an operator does not give us a list
be changed to a matrix in what is called of the eigenvalues of the operator. In
row echelon form. If one begins with a contrast, an upper-triangular matrix
square matrix, the matrix in row echelon with respect to some basis gives us a
form will be an upper-triangular matrix. list of all the eigenvalues of the op-
Do not confuse this upper-triangular ma- erator. However, there is no method
trix with the upper-triangular matrix of for computing exactly such an upper-
an operator with respect to some basis triangular matrix, even though 5.47
whose existence is proclaimed by 5.47 (if guarantees its existence if 𝐅 = 𝐂 .
𝐅 = 𝐂 )—there is no connection between
these upper-triangular matrices.
Exercises 5C
9 Suppose 𝐵 is a square matrix with complex entries. Prove that there exists
an invertible square matrix 𝐴 with complex entries such that 𝐴−1 𝐵𝐴 is an
upper-triangular matrix.
162 Chapter 5 Eigenvalues and Eigenvectors
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0
International License (https://creativecommons.org/licenses/by-nc/4.0), which permits any noncommercial use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to original author and source, provide a link to the Creative Commons license, and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended use
is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
Section 5D Diagonalizable Operators 163
5D Diagonalizable Operators
Diagonal Matrices
For 𝜆 ∈ 𝐅, we will find it convenient to have a name and a notation for the set
of vectors that an operator 𝑇 maps to 𝜆 times the vector.
For 𝑇 ∈ ℒ(𝑉) and 𝜆 ∈ 𝐅, the set 𝐸(𝜆, 𝑇) is a subspace of 𝑉 because the null
space of each linear map on 𝑉 is a subspace of 𝑉. The definitions imply that 𝜆 is
an eigenvalue of 𝑇 if and only if 𝐸(𝜆, 𝑇) ≠ {0}.
𝐸(𝜆1 , 𝑇) + ⋯ + 𝐸(𝜆𝑚 , 𝑇)
Hence conditions (b), (c), and (d) of 5.55 fail (of course, because these conditions
are equivalent, it is sufficient to check that only one of them fails). Thus condition
(a) of 5.55 also fails. Hence 𝑇 is not diagonalizable, regardless of whether 𝐅 = 𝐑
or 𝐅 = 𝐂 .
The next result shows that if an operator has as many distinct eigenvalues as
the dimension of its domain, then the operator is diagonalizable.
In later chapters we will find additional conditions that imply that certain
operators are diagonalizable. For example, see the real spectral theorem (7.29)
and the complex spectral theorem (7.31).
The result above gives a sufficient condition for an operator to be diagonal-
izable. However, this condition is not necessary. For example, the operator 𝑇
on 𝐅3 defined by 𝑇(𝑥, 𝑦, 𝑧) = (6𝑥, 6𝑦, 7𝑧) has only two eigenvalues (6 and 7) and
dim 𝐅3 = 3, but 𝑇 is diagonalizable (by the standard basis of 𝐅3 ).
Section 5D Diagonalizable Operators 167
The next example illustrates the im- For a spectacular application of these
portance of diagonalization, which can techniques, see Exercise 21, which
be used to compute high powers of an shows how to use diagonalization to
operator, taking advantage of the equa- find an exact formula for the 𝑛th term
tion 𝑇 𝑘 𝑣 = 𝜆𝑘 𝑣 if 𝑣 is an eigenvector of of the Fibonacci sequence.
𝑇 with eigenvalue 𝜆.
𝑇(𝑥, 𝑦, 𝑧) = 𝜆(𝑥, 𝑦, 𝑧)
for 𝜆 = 2, then for 𝜆 = 5, and then for 𝜆 = 8. Solving these simple equations
shows that for 𝜆 = 2 we have an eigenvector (1, 0, 0), for 𝜆 = 5 we have an
eigenvector (1, 3, 0), and for 𝜆 = 8 we have an eigenvector (1, 6, 6).
Thus (1, 0, 0), (1, 3, 0), (1, 6, 6) is a basis of 𝐅3 consisting of eigenvectors of 𝑇,
and with respect to this basis the matrix of 𝑇 is the diagonal matrix
2 0 0
⎛
⎜ ⎞
⎟
⎜
⎜ 0 5 0 ⎟
⎟.
⎝ 0 0 8 ⎠
To compute 𝑇 100 (0, 0, 1), for example, write (0, 0, 1) as a linear combination
of our basis of eigenvectors:
𝑇 100 (0, 0, 1) = 61 (𝑇 100 (1, 0, 0)) − 13 (𝑇 100 (1, 3, 0)) + 61 (𝑇 100 (1, 6, 6))
Because there are 𝑛 choices for 𝑗 in the definition above, 𝑇 has 𝑛 Gershgorin
disks. If 𝐅 = 𝐂 , then for each 𝑗 ∈ {1, …, 𝑛}, the corresponding Gershgorin disk
is a closed disk in 𝐂 centered at 𝐴𝑗, 𝑗 , which is the 𝑗th entry on the diagonal of 𝐴.
The radius of this closed disk is the sum of the absolute values of the entries in
row 𝑗 of 𝐴, excluding the diagonal entry. If 𝐅 = 𝐑 , then the Gershgorin disks are
closed intervals in 𝐑 .
In the special case that the square matrix 𝐴 above is a diagonal matrix, each
Gershgorin disk consists of a single point that is a diagonal entry of 𝐴 (and
each eigenvalue of 𝑇 is one of those points, as required by the next result). One
consequence of our next result is that if the nondiagonal entries of 𝐴 are small,
then each eigenvalue of 𝑇 is near a diagonal entry of 𝐴.
Section 5D Diagonalizable Operators 171
Using 5.68, we see that the coefficient of 𝑣𝑗 on the left side of 5.69 equals 𝜆𝑐𝑗 ,
which must equal the coefficient of 𝑣𝑗 on the right side of 5.70. In other words,
𝑛
𝜆𝑐𝑗 = ∑ 𝐴𝑗, 𝑘 𝑐𝑘 .
𝑘=1
Subtract 𝐴𝑗, 𝑗 𝑐𝑗 from each side of the equation above and then divide both sides
by 𝑐𝑗 to get
𝑛 𝑐𝑘
|𝜆 − 𝐴𝑗, 𝑗 | = ∣ ∑ 𝐴𝑗, 𝑘 ∣
𝑘=1 𝑐𝑗
𝑘≠𝑗
𝑛
≤ ∑ |𝐴𝑗, 𝑘 |.
𝑘=1
𝑘≠𝑗
Exercises 5D
for every 𝜆 ∈ 𝐂 .
6 Suppose 𝑇 ∈ ℒ(𝐅5 ) and dim 𝐸(8, 𝑇) = 4. Prove that 𝑇 − 2𝐼 or 𝑇 − 6𝐼 is
invertible.
7 Suppose 𝑇 ∈ ℒ(𝑉) is invertible. Prove that
𝐸(𝜆, 𝑇) = 𝐸( 1𝜆 , 𝑇 −1 )
11 Find 𝑇 ∈ ℒ(𝐂3 ) such that 6 and 7 are eigenvalues of 𝑇 and such that 𝑇 does
not have a diagonal matrix with respect to any basis of 𝐂3.
12 Suppose 𝑇 ∈ ℒ(𝐂3 ) is such that 6 and 7 are eigenvalues of 𝑇. Furthermore,
suppose 𝑇 does not have a diagonal matrix with respect to any basis of 𝐂3.
Prove that there exists (𝑧1 , 𝑧2 , 𝑧3 ) ∈ 𝐂3 such that
23 Suppose the definition of the Gershgorin disks is changed so that the radius of
the 𝑘 th disk is the sum of the absolute values of the entries in column (instead
of row) 𝑘 of 𝐴, excluding the diagonal entry. Show that the Gershgorin disk
theorem (5.67) still holds with this changed definition.
Section 5E Commuting Operators 175
5E Commuting Operators
For example, if 𝑇 is an operator and 𝑝, 𝑞 ∈ 𝒫(𝐅), then 𝑝(𝑇) and 𝑞(𝑇) commute
[see 5.17(b)].
As another example, if 𝐼 is the identity operator on 𝑉, then 𝐼 commutes with
every operator on 𝑉.
where the indices 𝑗 and 𝑘 take on all nonnegative integer values such that 𝑗 + 𝑘 ≤ 𝑚,
each 𝑎𝑗, 𝑘 is in 𝐑 , and 𝑥𝑗 𝑦𝑘 denotes the function on 𝐑2 defined by (𝑥, 𝑦) ↦ 𝑥𝑗 𝑦𝑘.
Define operators 𝐷𝑥 , 𝐷𝑦 ∈ ℒ(𝒫𝑚(𝐑2 )) by
𝜕𝑝 𝜕𝑝
𝐷𝑥 𝑝 = = ∑ 𝑗𝑎 𝑥𝑗 − 1 𝑦𝑘 and 𝐷𝑦 𝑝 = = ∑ 𝑘𝑎 𝑥𝑗 𝑦𝑘 − 1,
𝜕𝑥 𝑗 + 𝑘 ≤ 𝑚 𝑗, 𝑘 𝜕𝑦 𝑗 + 𝑘 ≤ 𝑚 𝑗, 𝑘
The next result shows that two operators commute if and only if their matrices
(with respect to the same basis) commute.
Proof We have
𝑆 and 𝑇 commute ⟺ 𝑆𝑇 = 𝑇𝑆
⟺ ℳ(𝑆𝑇) = ℳ(𝑇𝑆)
⟺ ℳ(𝑆)ℳ(𝑇) = ℳ(𝑇)ℳ(𝑆)
⟺ ℳ(𝑆) and ℳ(𝑇) commute,
as desired.
The next result shows that if two operators commute, then every eigenspace
for one operator is invariant under the other operator. This result, which we will
use several times, is one of the main reasons why a pair of commuting operators
behaves better than a pair of operators that does not commute.
The equation above shows that 𝑇𝑣 ∈ 𝐸(𝜆, 𝑆). Thus 𝐸(𝜆, 𝑆) is invariant under 𝑇.
Two diagonalizable operators on the same vector space have diagonal matrices
with respect to the same basis if and only if the two operators commute.
Proof First suppose 𝑆, 𝑇 ∈ ℒ(𝑉) have diagonal matrices with respect to the
same basis. The product of two diagonal matrices of the same size is the diagonal
matrix obtained by multiplying the corresponding elements of the two diagonals.
Thus any two diagonal matrices of the same size commute. Thus 𝑆 and 𝑇 commute,
by 5.74.
Section 5E Commuting Operators 177
To prove the implication in the other direction, now suppose that 𝑆, 𝑇 ∈ ℒ(𝑉)
are diagonalizable operators that commute. Let 𝜆1 , …, 𝜆𝑚 denote the distinct
eigenvalues of 𝑆. Because 𝑆 is diagonalizable, 5.55(c) shows that
5.77 𝑉 = 𝐸(𝜆1 , 𝑆) ⊕ ⋯ ⊕ 𝐸(𝜆𝑚 , 𝑆).
For each 𝑘 = 1, …, 𝑚, the subspace 𝐸(𝜆𝑘 , 𝑆) is invariant under 𝑇 (by 5.75).
Because 𝑇 is diagonalizable, 5.65 implies that 𝑇|𝐸( 𝜆𝑘, 𝑆) is diagonalizable for
each 𝑘. Hence for each 𝑘 = 1, …, 𝑚, there is a basis of 𝐸(𝜆𝑘 , 𝑆) consisting of
eigenvectors of 𝑇. Putting these bases together gives a basis of 𝑉 (because of
5.77), with each vector in this basis being an eigenvector of both 𝑆 and 𝑇. Thus 𝑆
and 𝑇 both have diagonal matrices with respect to this basis, as desired.
See Exercise 2 for an extension of the result above to more than two operators.
Suppose 𝑉 is a finite-dimensional nonzero complex vector space. Then every
operator on 𝑉 has an eigenvector (see 5.19). The next result shows that if two
operators on 𝑉 commute, then there is a vector in 𝑉 that is an eigenvector for both
operators (but the two commuting operators might not have a common eigenvalue).
For an extension of the next result to more than two operators, see Exercise 9(a).
The next result extends 5.47 (the existence of a basis that gives an upper-
triangular matrix) to two commuting operators.
Proof Let 𝑛 = dim 𝑉. We will use induction on 𝑛. The desired result holds if
𝑛 = 1 because all 1-by-1 matrices are upper triangular. Now suppose 𝑛 > 1 and
the desired result holds for all complex vector spaces whose dimension is 𝑛 − 1.
Let 𝑣1 be any common eigenvector of 𝑆 and 𝑇 (using 5.78). Hence 𝑆𝑣1 ∈
span(𝑣1 ) and 𝑇𝑣1 ∈ span(𝑣1 ). Let 𝑊 be a subspace of 𝑉 such that
𝑉 = span(𝑣1 ) ⊕ 𝑊;
see 2.33 for the existence of 𝑊. Define a linear map 𝑃 ∶ 𝑉 → 𝑊 by
𝑃(𝑎𝑣1 + 𝑤) = 𝑤
for each 𝑎 ∈ 𝐂 and each 𝑤 ∈ 𝑊. Define 𝑆̂, 𝑇̂ ∈ ℒ(𝑊) by
̂ = 𝑃(𝑆𝑤) and 𝑇𝑤
𝑆𝑤 ̂ = 𝑃(𝑇𝑤)
for each 𝑤 ∈ 𝑊. To apply our induction hypothesis to 𝑆̂ and 𝑇̂ , we must first show
that these two operators on 𝑊 commute. To do this, suppose 𝑤 ∈ 𝑊. Then there
exists 𝑎 ∈ 𝐂 such that
(𝑆̂𝑇)𝑤
̂ ̂
= 𝑆(𝑃(𝑇𝑤)) ̂
= 𝑆(𝑇𝑤 − 𝑎𝑣 ) = 𝑃(𝑆(𝑇𝑤 − 𝑎𝑣 )) = 𝑃((𝑆𝑇)𝑤),
1 1
Exercise 9(b) extends the result above to more than two operators.
Section 5E Commuting Operators 179
Proof There is a basis of 𝑉 with respect to which both 𝑆 and 𝑇 have upper-
triangular matrices (by 5.80). With respect to that basis,
ℳ(𝑆 + 𝑇) = ℳ(𝑆) + ℳ(𝑇) and ℳ(𝑆𝑇) = ℳ(𝑆)ℳ(𝑇),
as stated in 3.35 and 3.43.
The definition of matrix addition shows that each entry on the diagonal of
ℳ(𝑆 + 𝑇) equals the sum of the corresponding entries on the diagonals of ℳ(𝑆)
and ℳ(𝑇). Similarly, because ℳ(𝑆) and ℳ(𝑇) are upper-triangular matrices,
the definition of matrix multiplication shows that each entry on the diagonal of
ℳ(𝑆𝑇) equals the product of the corresponding entries on the diagonals of ℳ(𝑆)
and ℳ(𝑇). Furthermore, ℳ(𝑆 + 𝑇) and ℳ(𝑆𝑇) are upper-triangular matrices
(see Exercise 2 in Section 5B).
Every entry on the diagonal of ℳ(𝑆) is an eigenvalue of 𝑆, and every entry
on the diagonal of ℳ(𝑇) is an eigenvalue of 𝑇 (by 5.41). Every eigenvalue
of 𝑆 + 𝑇 is on the diagonal of ℳ(𝑆 + 𝑇), and every eigenvalue of 𝑆𝑇 is on
the diagonal of ℳ(𝑆𝑇) (these assertions follow from 5.41). Putting all this
together, we conclude that every eigenvalue of 𝑆 + 𝑇 is an eigenvalue of 𝑆 plus
an eigenvalue of 𝑇, and every eigenvalue of 𝑆𝑇 is an eigenvalue of 𝑆 times an
eigenvalue of 𝑇.
Exercises 5E
• 𝐅 denotes 𝐑 or 𝐂 .
• 𝑉 and 𝑊 denote vector spaces over 𝐅.
Matthew Petroff CC BY-SA
The George Peabody Library, now part of Johns Hopkins University, opened while
James Sylvester (1814–1897) was the university’s first mathematics professor. Sylvester’s
publications include the first use of the word matrix in mathematics.
© Sheldon Axler 2024
S. Axler, Linear Algebra Done Right, Undergraduate Texts in Mathematics,
181
https://doi.org/10.1007/978-3-031-41026-0_6
182 Chapter 6 Inner Product Spaces
The norm is not linear on 𝐑𝑛. To inject linearity into the discussion, we
introduce the dot product.
𝑥 ⋅ 𝑦 = 𝑥1 𝑦1 + ⋯ + 𝑥𝑛 𝑦𝑛 ,
An inner product is a generalization of the dot product. At this point you may
be tempted to guess that an inner product is defined by abstracting the properties
of the dot product discussed in the last paragraph. For real vector spaces, that
guess is correct. However, so that we can make a definition that will be useful
for both real and complex vector spaces, we need to examine the complex case
before making the definition.
Section 6A Inner Products and Norms 183
definiteness
⟨𝑣, 𝑣⟩ = 0 if and only if 𝑣 = 0.
conjugate symmetry
⟨𝑢, 𝑣⟩ = ⟨𝑣, 𝑢⟩ for all 𝑢, 𝑣 ∈ 𝑉.
184 Chapter 6 Inner Product Spaces
Every real number equals its complex Most mathematicians define inner
conjugate. Thus if we are dealing with products as above, but many physicists
a real vector space, then in the last con- use a definition that requires homo-
dition above we can dispense with the geneity in the second slot instead of
complex conjugate and simply state that the first slot.
⟨𝑢, 𝑣⟩ = ⟨𝑣, 𝑢⟩ for all 𝑢, 𝑣 ∈ 𝑉.
The most important example of an inner product space is 𝐅𝑛 with the Euclidean
inner product given by (a) in the example above. When 𝐅𝑛 is referred to as an
inner product space, you should assume that the inner product is the Euclidean
inner product unless explicitly told otherwise.
Section 6A Inner Products and Norms 185
So that we do not have to keep repeating the hypothesis that 𝑉 and 𝑊 are inner
product spaces, we make the following assumption.
6.5 notation: 𝑉, 𝑊
For the rest of this chapter and the next chapter, 𝑉 and 𝑊 denote inner product
spaces over 𝐅.
Note the slight abuse of language here. An inner product space is a vector
space along with an inner product on that vector space. When we say that a vector
space 𝑉 is an inner product space, we are also thinking that an inner product on
𝑉 is lurking nearby or is clear from the context (or is the Euclidean inner product
if the vector space is 𝐅𝑛 ).
(a) For each fixed 𝑣 ∈ 𝑉, the function that takes 𝑢 ∈ 𝑉 to ⟨𝑢, 𝑣⟩ is a linear
map from 𝑉 to 𝐅.
(b) ⟨0, 𝑣⟩ = 0 for every 𝑣 ∈ 𝑉.
(c) ⟨𝑣, 0⟩ = 0 for every 𝑣 ∈ 𝑉.
(d) ⟨𝑢, 𝑣 + 𝑤⟩ = ⟨𝑢, 𝑣⟩ + ⟨𝑢, 𝑤⟩ for all 𝑢, 𝑣, 𝑤 ∈ 𝑉.
(e) ⟨𝑢, 𝜆𝑣⟩ = 𝜆⟨𝑢, 𝑣⟩ for all 𝜆 ∈ 𝐅 and all 𝑢, 𝑣 ∈ 𝑉.
Proof
(a) For 𝑣 ∈ 𝑉, the linearity of 𝑢 ↦ ⟨𝑢, 𝑣⟩ follows from the conditions of additivity
and homogeneity in the first slot in the definition of an inner product.
(b) Every linear map takes 0 to 0. Thus (b) follows from (a).
(c) If 𝑣 ∈ 𝑉, then the conjugate symmetry property in the definition of an inner
product and (b) show that ⟨𝑣, 0⟩ = ⟨0, 𝑣⟩ = 0 = 0.
(d) Suppose 𝑢, 𝑣, 𝑤 ∈ 𝑉. Then
⟨𝑢, 𝑣 + 𝑤⟩ = ⟨𝑣 + 𝑤, 𝑢⟩
= ⟨𝑣, 𝑢⟩ + ⟨𝑤, 𝑢⟩
= ⟨𝑣, 𝑢⟩ + ⟨𝑤, 𝑢⟩
= ⟨𝑢, 𝑣⟩ + ⟨𝑢, 𝑤⟩.
Norms
Our motivation for defining inner products came initially from the norms of
vectors on 𝐑2 and 𝐑3. Now we see that each inner product determines a norm.
(b) For 𝑓 in the vector space of continuous real-valued functions on [−1, 1] and
with inner product given as in 6.3(c), we have
1
‖ 𝑓 ‖ = √∫ 𝑓 2.
−1
Suppose 𝑣 ∈ 𝑉.
(a) ‖𝑣‖ = 0 if and only if 𝑣 = 0.
(b) ‖𝜆𝑣‖ = |𝜆| ‖𝑣‖ for all 𝜆 ∈ 𝐅.
Proof
(a) The desired result holds because ⟨𝑣, 𝑣⟩ = 0 if and only if 𝑣 = 0.
(b) Suppose 𝜆 ∈ 𝐅. Then
= 𝜆𝜆⟨𝑣, 𝑣⟩
= |𝜆|2 ‖𝑣‖2.
The proof of (b) in the result above illustrates a general principle: working
with norms squared is usually easier than working directly with norms.
Section 6A Inner Products and Norms 187
In the definition above, the order of The word orthogonal comes from the
the two vectors does not matter, because Greek word orthogonios, which means
⟨𝑢, 𝑣⟩ = 0 if and only if ⟨𝑣, 𝑢⟩ = 0. In- right-angled.
stead of saying 𝑢 and 𝑣 are orthogonal,
sometimes we say 𝑢 is orthogonal to 𝑣.
Exercise 15 asks you to prove that if 𝑢, 𝑣 are nonzero vectors in 𝐑2, then
⟨𝑢, 𝑣⟩ = ‖𝑢‖ ‖𝑣‖ cos 𝜃,
where 𝜃 is the angle between 𝑢 and 𝑣 (thinking of 𝑢 and 𝑣 as arrows with initial
point at the origin). Thus two nonzero vectors in 𝐑2 are orthogonal (with respect
to the Euclidean inner product) if and only if the cosine of the angle between
them is 0, which happens if and only if the vectors are perpendicular in the usual
sense of plane geometry. Thus you can think of the word orthogonal as a fancy
word meaning perpendicular.
We begin our study of orthogonality with an easy result.
Proof
(a) Recall that 6.6(b) states that ⟨0, 𝑣⟩ = 0 for every 𝑣 ∈ 𝑉.
(b) If 𝑣 ∈ 𝑉 and ⟨𝑣, 𝑣⟩ = 0, then 𝑣 = 0 (by definition of inner product).
For the special case 𝑉 = 𝐑2, the next theorem was known over 3,500 years ago
in Babylonia and then rediscovered and proved over 2,500 years ago in Greece.
Of course, the proof below is not the original proof.
An orthogonal decomposition:
𝑢 expressed as a scalar multiple of 𝑣 plus a vector orthogonal to 𝑣.
𝑢 = 𝑐𝑣 + (𝑢 − 𝑐𝑣).
The equation above shows that we should choose 𝑐 to be ⟨𝑢, 𝑣⟩/‖𝑣‖2. Making this
choice of 𝑐, we can write
⟨𝑢, 𝑣⟩ ⟨𝑢, 𝑣⟩
𝑢= 2
𝑣 + (𝑢 − 𝑣).
‖𝑣‖ ‖𝑣‖2
As you should verify, the equation displayed above explicitly writes 𝑢 as a scalar
multiple of 𝑣 plus a vector orthogonal to 𝑣. Thus we have proved the following
key result.
⟨𝑢, 𝑣⟩ ⟨𝑢, 𝑣⟩
Suppose 𝑢, 𝑣 ∈ 𝑉, with 𝑣 ≠ 0. Set 𝑐 = and 𝑤 = 𝑢 − 𝑣. Then
‖𝑣‖2 ‖𝑣‖2
𝑢 = 𝑐𝑣 + 𝑤 and ⟨𝑤, 𝑣⟩ = 0.
The orthogonal decomposition 6.13 will be used in the proof of the Cauchy–
Schwarz inequality, which is our next result and is one of the most important
inequalities in mathematics.
Section 6A Inner Products and Norms 189
Suppose 𝑢, 𝑣 ∈ 𝑉. Then
|⟨𝑢, 𝑣⟩| ≤ ‖𝑢‖ ‖𝑣‖.
This inequality is an equality if and only if one of 𝑢, 𝑣 is a scalar multiple of
the other.
Proof If 𝑣 = 0, then both sides of the desired inequality equal 0. Thus we can
assume that 𝑣 ≠ 0. Consider the orthogonal decomposition
⟨𝑢, 𝑣⟩
𝑢= 𝑣+𝑤
‖𝑣‖2
given by 6.13, where 𝑤 is orthogonal to 𝑣. By the Pythagorean theorem,
⟨𝑢, 𝑣⟩ 2
‖𝑢‖2 = ∥ 𝑣∥ + ‖𝑤‖2
‖𝑣‖2
2
∣⟨𝑢, 𝑣⟩∣
= + ‖𝑤‖2
‖𝑣‖2
2
∣⟨𝑢, 𝑣⟩∣
6.15 ≥ .
‖𝑣‖2
Multiplying both sides of this inequality by ‖𝑣‖2 and then taking square roots
gives the desired inequality.
The proof in the paragraph above Augustin-Louis Cauchy (1789–1857)
shows that the Cauchy–Schwarz inequal- proved 6.16(a) in 1821. In 1859,
ity is an equality if and only if 6.15 is Cauchy’s student Viktor Bunyakovsky
an equality. This happens if and only (1804–1889) proved integral inequal-
if 𝑤 = 0. But 𝑤 = 0 if and only if 𝑢 ities like the one in 6.16(b). A few
is a multiple of 𝑣 (see 6.13). Thus the decades later, similar discoveries by
Cauchy–Schwarz inequality is an equal- Hermann Schwarz (1843–1921) at-
ity if and only if 𝑢 is a scalar multiple of 𝑣 tracted more attention and led to the
or 𝑣 is a scalar multiple of 𝑢 (or both; the name of this inequality.
phrasing has been chosen to cover cases
in which either 𝑢 or 𝑣 equals 0).
(a) If 𝑥1 , …, 𝑥𝑛 , 𝑦1 , …, 𝑦𝑛 ∈ 𝐑 , then
Suppose 𝑢, 𝑣 ∈ 𝑉. Then
‖𝑢 + 𝑣‖ ≤ ‖𝑢‖ + ‖𝑣‖.
Proof We have
‖𝑢 + 𝑣‖2 = ⟨𝑢 + 𝑣, 𝑢 + 𝑣⟩
= ⟨𝑢, 𝑢⟩ + ⟨𝑣, 𝑣⟩ + ⟨𝑢, 𝑣⟩ + ⟨𝑣, 𝑢⟩
= ⟨𝑢, 𝑢⟩ + ⟨𝑣, 𝑣⟩ + ⟨𝑢, 𝑣⟩ + ⟨𝑢, 𝑣⟩
= ‖𝑢‖2 + ‖𝑣‖2 + 2 Re⟨𝑢, 𝑣⟩
6.18 ≤ ‖𝑢‖2 + ‖𝑣‖2 + 2∣⟨𝑢, 𝑣⟩∣
6.19 ≤ ‖𝑢‖2 + ‖𝑣‖2 + 2‖𝑢‖ ‖𝑣‖
2
= (‖𝑢‖ + ‖𝑣‖) ,
where 6.19 follows from the Cauchy–Schwarz inequality (6.14). Taking square
roots of both sides of the inequality above gives the desired inequality.
The proof above shows that the triangle inequality is an equality if and only if
we have equality in 6.18 and 6.19. Thus we have equality in the triangle inequality
if and only if
6.20 ⟨𝑢, 𝑣⟩ = ‖𝑢‖ ‖𝑣‖.
If one of 𝑢, 𝑣 is a nonnegative real multiple of the other, then 6.20 holds. Con-
versely, suppose 6.20 holds. Then the condition for equality in the Cauchy–
Schwarz inequality (6.14) implies that one of 𝑢, 𝑣 is a scalar multiple of the other.
This scalar must be a nonnegative real number, by 6.20, completing the proof.
Suppose 𝑢, 𝑣 ∈ 𝑉. Then
Proof We have
‖𝑢 + 𝑣‖2 + ‖𝑢 − 𝑣‖2 = ⟨𝑢 + 𝑣, 𝑢 + 𝑣⟩ + ⟨𝑢 − 𝑣, 𝑢 − 𝑣⟩
as desired.
Exercises 6A
for all 𝑢, 𝑣 ∈ 𝑉. Show that ⟨⋅, ⋅⟩1 is an inner product on 𝑉 if and only if 𝑆 is
injective.
3 (a) Show that the function taking an ordered pair ((𝑥1 , 𝑥2 ), (𝑦1 , 𝑦2 )) of
elements of 𝐑2 to |𝑥1 𝑦1 | + |𝑥2 𝑦2 | is not an inner product on 𝐑2.
(b) Show that the function taking an ordered pair ((𝑥1 , 𝑥2 , 𝑥3 ), (𝑦1 , 𝑦2 , 𝑦3 ))
of elements of 𝐑3 to 𝑥1 𝑦1 + 𝑥3 𝑦3 is not an inner product on 𝐑3.
4 Suppose 𝑇 ∈ ℒ(𝑉) is such that ‖𝑇𝑣‖ ≤ ‖𝑣‖ for every 𝑣 ∈ 𝑉. Prove that
𝑇 − √2 𝐼 is injective.
192 Chapter 6 Inner Product Spaces
13 Show that the square of an average is less than or equal to the average of the
squares. More precisely, show that if 𝑎1 , …, 𝑎𝑛 ∈ 𝐑 , then the square of the
average of 𝑎1 , …, 𝑎𝑛 is less than or equal to the average of 𝑎12, …, 𝑎𝑛2.
14 Suppose 𝑣 ∈ 𝑉 and 𝑣 ≠ 0. Prove that 𝑣/‖𝑣‖ is the unique closest element on
the unit sphere of 𝑉 to 𝑣. More precisely, prove that if 𝑢 ∈ 𝑉 and ‖𝑢‖ = 1,
then
𝑣
∥𝑣 − ∥ ≤ ‖𝑣 − 𝑢‖,
‖𝑣‖
with equality only if 𝑢 = 𝑣/‖𝑣‖.
15 Suppose 𝑢, 𝑣 are nonzero vectors in 𝐑2. Prove that
16 The angle between two vectors (thought of as arrows with initial point at
the origin) in 𝐑2 or 𝐑3 can be defined geometrically. However, geometry is
not as clear in 𝐑𝑛 for 𝑛 > 3. Thus the angle between two nonzero vectors
𝑥, 𝑦 ∈ 𝐑𝑛 is defined to be
⟨𝑥, 𝑦⟩
arccos ,
‖𝑥‖ ‖𝑦‖
where the motivation for this definition comes from Exercise 15. Explain
why the Cauchy–Schwarz inequality is needed to show that this definition
makes sense.
17 Prove that
𝑛 2 𝑛 𝑛 𝑏𝑘2
( ∑ 𝑎𝑘 𝑏𝑘 ) ≤ ( ∑ 𝑘𝑎𝑘2 )( ∑ )
𝑘=1 𝑘=1 𝑘=1 𝑘
for all real numbers 𝑎1 , …, 𝑎𝑛 and 𝑏1 , …, 𝑏𝑛 .
18 (a) Suppose 𝑓 ∶ [1, ∞) → [0, ∞) is continuous. Show that
∞ 2 ∞
2
(∫ 𝑓 ) ≤ ∫ 𝑥2 ( 𝑓 (𝑥)) 𝑑𝑥.
1 1
where ℳ(𝑇)𝑗, 𝑘 denotes the entry in row 𝑗, column 𝑘 of the matrix of 𝑇 with
respect to the basis 𝑣1 , …, 𝑣𝑛 .
20 Prove that if 𝑢, 𝑣 ∈ 𝑉, then ∣ ‖𝑢‖ − ‖𝑣‖ ∣ ≤ ‖𝑢 − 𝑣‖.
The inequality above is called the reverse triangle inequality. For the
reverse triangle inequality when 𝑉 = 𝐂 , see Exercise 2 in Chapter 4.
such that ‖𝑢‖ = 0 if and only if 𝑢 = 0, ‖𝛼𝑢‖ = |𝛼|‖𝑢‖ for all 𝛼 ∈ 𝐅 and all
𝑢 ∈ 𝑈, and ‖𝑢 + 𝑣‖ ≤ ‖𝑢‖ + ‖𝑣‖ for all 𝑢, 𝑣 ∈ 𝑈. Prove that a norm satisfying
the parallelogram equality comes from an inner product (in other words,
show that if ‖ ⋅ ‖ is a norm on 𝑈 satisfying the parallelogram equality, then
there is an inner product ⟨⋅, ⋅⟩ on 𝑈 such that ‖𝑢‖ = ⟨𝑢, 𝑢⟩1/2 for all 𝑢 ∈ 𝑈).
29 Suppose 𝑉1 , …, 𝑉𝑚 are inner product spaces. Show that the equation
⟨(𝑢1 , …, 𝑢𝑚 ), (𝑣1 , …, 𝑣𝑚 )⟩ = ⟨𝑢1 , 𝑣1 ⟩ + ⋯ + ⟨𝑢𝑚 , 𝑣𝑚 ⟩
(a) Show that ⟨⋅, ⋅⟩𝐂 makes 𝑉𝐂 into a complex inner product space.
(b) Show that if 𝑢, 𝑣 ∈ 𝑉, then
⟨𝑢, 𝑣⟩𝐂 = ⟨𝑢, 𝑣⟩ and ‖𝑢 + 𝑖𝑣‖𝐂2 = ‖𝑢‖2 + ‖𝑣‖2.
See Exercise 8 in Section 1B for the definition of the complexification 𝑉𝐂 .
Section 6A Inner Products and Norms 195
6B Orthonormal Bases
Orthonormal Lists and the Gram–Schmidt Procedure
The orthonormal list above is often used for modeling periodic phenomena,
such as tides.
(e) Suppose we make 𝒫2 (𝐑) into an inner product space using the inner product
given by
1
⟨𝑝, 𝑞⟩ = ∫ 𝑝𝑞
−1
for all 𝑝, 𝑞 ∈ 𝒫2 (𝐑). The standard basis 1, 𝑥, 𝑥2 of 𝒫2 (𝐑) is not an orthonor-
mal list because the vectors in that list do not have norm 1. Dividing each
vector by its norm gives the list 1/√2, √3/2𝑥, √5/2𝑥2, in which each vector
has norm 1, and the second vector is orthogonal to the first and third vectors.
However, the first and third vectors are not orthogonal. Thus this is not an
orthonormal list. Soon we will see how to construct an orthonormal list from
the standard basis 1, 𝑥, 𝑥2 (see Example 6.34).
198 Chapter 6 Inner Product Spaces
Orthonormal lists are particularly easy to work with, as illustrated by the next
result.
for all 𝑎1 , …, 𝑎𝑚 ∈ 𝐅.
Proof Because each 𝑒𝑘 has norm 1, this follows from repeated applications of
the Pythagorean theorem (6.12).
The next definition introduces one of the most useful concepts in the study of
inner product spaces.
Similarly, the other three vectors in the list above also have norm 1.
Note that
⟨( 12 , 12 , 12 , 12 ), ( 12 , 12 , − 12 , − 12 )⟩ = 1
2
⋅ 1
2
+ 1
2
⋅ 1
2
+ 1
2
⋅ (− 12 ) + 1
2
⋅ (− 12 ) = 0.
Similarly, the inner product of any two distinct vectors in the list above also
equals 0.
Thus the list above is orthonormal. Because we have an orthonormal list of
length four in the four-dimensional vector space 𝐅4, this list is an orthonormal
basis of 𝐅4 (by 6.28).
Notice how the next result makes The formula below for ‖𝑣‖ is called
each inner product space of dimension Parseval’s identity. It was published in
𝑛 behave like 𝐅𝑛, with the role of the 1799 in the context of Fourier series.
coordinates of a vector in 𝐅𝑛 played by
⟨𝑣, 𝑒1 ⟩, …, ⟨𝑣, 𝑒𝑛 ⟩.
𝑣 = 𝑎 1 𝑒1 + ⋯ + 𝑎 𝑛 𝑒𝑛 .
7( 12 , 12 , 12 , 12 ) − 4( 12 , 12 , − 12 , − 12 ) + ( 12 , − 12 , − 12 , 12 ) + 2(− 12 , 12 , − 12 , 12 ).
span(𝑣1 , …, 𝑣𝑘 ) = span(𝑒1 , …, 𝑒𝑘 )
for each 𝑘 = 1, …, 𝑚.
1 ⟨𝑣𝑘 , 𝑓1 ⟩ ⟨𝑣𝑘 , 𝑓𝑘 − 1 ⟩
= ⟨𝑣 − 𝑓 −⋯− 𝑓 ,𝑓⟩
‖ 𝑓𝑘 ‖ ‖ 𝑓 𝑗 ‖ 𝑘 ‖ 𝑓 1 ‖2 1 ‖ 𝑓𝑘 − 1 ‖ 2 𝑘 − 1 𝑗
1
= (⟨𝑣 , 𝑓 ⟩ − ⟨𝑣𝑘 , 𝑓𝑗 ⟩)
‖ 𝑓𝑘 ‖ ‖ 𝑓 𝑗 ‖ 𝑘 𝑗
= 0.
Thus 𝑒1 , …, 𝑒𝑘 is an orthonormal list.
From the definition of 𝑒𝑘 given in 6.32, we see that 𝑣𝑘 ∈ span(𝑒1 , …, 𝑒𝑘 ).
Combining this information with 6.33 shows that
span(𝑣1 , …, 𝑣𝑘 ) ⊆ span(𝑒1 , …, 𝑒𝑘 ).
Both lists above are linearly independent (the 𝑣’s by hypothesis, and the 𝑒’s by
orthonormality and 6.25). Thus both subspaces above have dimension 𝑘, and
hence they are equal, completing the induction step and thus completing the
proof.
202 Chapter 6 Inner Product Spaces
√ 1 , √ 3 𝑥, √ 45 (𝑥2 − 1 ).
2 2 8 3
The orthonormal list above has length three, which is the dimension of 𝒫2 (𝐑).
Hence this orthonormal list is an orthonormal basis of 𝒫2 (𝐑) [by 6.28].
Now we can answer the question about the existence of orthonormal bases.
Sometimes we need to know not only that an orthonormal basis exists, but also
that every orthonormal list can be extended to an orthonormal basis. In the next
corollary, the Gram–Schmidt procedure shows that such an extension is always
possible.
Section 6B Orthonormal Bases 203
⎛
∗ ∗ ⎞
⎜ ⎟
⎜
⎜ ⋱ ⎟,
⎟
⎝ 0 ∗ ⎠
where the 0 in the matrix above indicates that all entries below the diagonal
equal 0, and asterisks are used to denote entries on and above the diagonal.
In the last chapter, we gave a necessary and sufficient condition for an operator
to have an upper-triangular matrix with respect to some basis (see 5.44). Now that
we are dealing with inner product spaces, we would like to know whether there
exists an orthonormal basis with respect to which we have an upper-triangular
matrix. The next result shows that the condition for an operator to have an upper-
triangular matrix with respect to some orthonormal basis is the same as the
condition to have an upper-triangular matrix with respect to an arbitrary basis.
For complex vector spaces, the next Issai Schur (1875–1941) published a
result is an important application of the proof of the next result in 1909.
result above. See Exercise 20 for a ver-
sion of Schur’s theorem that applies simultaneously to more than one operator.
Proof The desired result follows from the second version of the fundamental
theorem of algebra (4.13) and 6.37.
is a linear functional on 𝐅3. We could write this linear functional in the form
𝜑(𝑧) = ⟨𝑧, 𝑤⟩
If 𝑣 ∈ 𝑉, then the map that sends 𝑢 The next result is named in honor
to ⟨𝑢, 𝑣⟩ is a linear functional on 𝑉. The of Frigyes Riesz (1880–1956), who
next result states that every linear func- proved several theorems early in the
tional on 𝑉 is of this form. For example, twentieth century that look very much
we can take 𝑣 = (2, −5, 1) in Example like the result below.
6.40.
Suppose we make the vector space 𝒫5 (𝐑) into an inner product space by
defining ⟨𝑝, 𝑞⟩ = ∫−11
𝑝𝑞. Let 𝜑 be as in Example 6.41. It is not obvious that there
exists 𝑞 ∈ 𝒫5 (𝐑) such that
1
∫ 𝑝(𝑡)(cos(𝜋𝑡)) 𝑑𝑡 = ⟨𝑝, 𝑞⟩
−1
for every 𝑝 ∈ 𝒫5 (𝐑) [we cannot take 𝑞(𝑡) = cos(𝜋𝑡) because that choice of 𝑞 is
not an element of 𝒫5 (𝐑)]. The next result tells us the somewhat surprising result
that there indeed exists a polynomial 𝑞 ∈ 𝒫5 (𝐑) such that the equation above
holds for all 𝑝 ∈ 𝒫5 (𝐑).
𝜑(𝑢) = ⟨𝑢, 𝑣⟩
for every 𝑢 ∈ 𝑉.
Proof First we show that there exists a vector 𝑣 ∈ 𝑉 such that 𝜑(𝑢) = ⟨𝑢, 𝑣⟩ for
every 𝑢 ∈ 𝑉. Let 𝑒1 , …, 𝑒𝑛 be an orthonormal basis of 𝑉. Then
𝜑(𝑢) = 𝜑(⟨𝑢, 𝑒1 ⟩𝑒1 + ⋯ + ⟨𝑢, 𝑒𝑛 ⟩𝑒𝑛 )
= ⟨𝑢, 𝑒1 ⟩𝜑(𝑒1 ) + ⋯ + ⟨𝑢, 𝑒𝑛 ⟩𝜑(𝑒𝑛 )
for every polynomial 𝑝 ∈ 𝒫2 (𝐑). To do this, we make 𝒫2 (𝐑) into an inner product
space by defining ⟨𝑝, 𝑞⟩ to be the right side of the equation above for 𝑝, 𝑞 ∈ 𝒫2 (𝐑).
Note that the left side of the equation above does not equal the inner product
in 𝒫2 (𝐑) of 𝑝 and the function 𝑡 ↦ cos(𝜋𝑡) because this last function is not a
polynomial.
Define a linear functional 𝜑 on 𝒫2 (𝐑) by letting
1
𝜑(𝑝) = ∫ 𝑝(𝑡)(cos(𝜋𝑡)) 𝑑𝑡
−1
for each 𝑝 ∈ 𝒫2 (𝐑). Now use the orthonormal basis from Example 6.34 and
apply formula 6.43 from the proof of the Riesz representation theorem to see that
if 𝑝 ∈ 𝒫2 (𝐑), then 𝜑(𝑝) = ⟨𝑝, 𝑞⟩, where
1 1
𝑞(𝑥) = (∫ √ 12 cos(𝜋𝑡) 𝑑𝑡)√ 12 + (∫ √ 32 𝑡 cos(𝜋𝑡) 𝑑𝑡)√ 32 𝑥
−1 −1
1
+ (∫ √ 45
8
(𝑡2 − 13 ) cos(𝜋𝑡) 𝑑𝑡)√ 45
8
(𝑥2 − 13 ).
−1
Exercises 6B
Prove that
𝑎02 ∞ 𝜋
+ ∑ (𝑎𝑘2 + 𝑏𝑘2 ) ≤ ∫ 𝑓 2.
2 𝑘=1 −𝜋
16 Suppose 𝑉 is finite-dimensional. Suppose ⟨⋅, ⋅⟩1 , ⟨⋅, ⋅⟩2 are inner products on
𝑉 with corresponding norms ‖ ⋅ ‖1 and ‖ ⋅ ‖2 . Prove that there exists a positive
number 𝑐 such that ‖𝑣‖1 ≤ 𝑐‖𝑣‖2 for every 𝑣 ∈ 𝑉.
17 Suppose 𝐅 = 𝐂 and 𝑉 is finite-dimensional. Prove that if 𝑇 is an operator
on 𝑉 such that 1 is the only eigenvalue of 𝑇 and ‖𝑇𝑣‖ ≤ ‖𝑣‖ for all 𝑣 ∈ 𝑉,
then 𝑇 is the identity operator.
18 Suppose 𝑢1 , …, 𝑢𝑚 is a linearly independent list in 𝑉. Show that there exists
𝑣 ∈ 𝑉 such that ⟨𝑢𝑘 , 𝑣⟩ = 1 for all 𝑘 ∈ {1, …, 𝑚}.
for all 𝑓, 𝑔 ∈ 𝐶[−1, 1]. Let 𝜑 be the linear functional on 𝐶[−1, 1] defined
by 𝜑( 𝑓 ) = 𝑓 (0). Show that there does not exist 𝑔 ∈ 𝐶[−1, 1] such that
𝜑( 𝑓 ) = ⟨ 𝑓, 𝑔⟩
Law professor Richard Friedman presenting a case before the U.S. Supreme
Court in 2010:
Mr. Friedman: I think that issue is entirely orthogonal to the issue here
because the Commonwealth is acknowledging—
Chief Justice Roberts: I’m sorry. Entirely what?
Mr. Friedman: Orthogonal. Right angle. Unrelated. Irrelevant.
Chief Justice Roberts: Oh.
Justice Scalia: What was that adjective? I liked that.
Mr. Friedman: Orthogonal.
Chief Justice Roberts: Orthogonal.
Mr. Friedman: Right, right.
Justice Scalia: Orthogonal, ooh. (Laughter.)
Justice Kennedy: I knew this case presented us a problem. (Laughter.)
Section 6C Orthogonal Complements and Minimization Problems 211
Proof
(a) Suppose 𝑈 is a subset of 𝑉. Then ⟨𝑢, 0⟩ = 0 for every 𝑢 ∈ 𝑈; thus 0 ∈ 𝑈 ⟂.
Suppose 𝑣, 𝑤 ∈ 𝑈 ⟂. If 𝑢 ∈ 𝑈, then
Recall that if 𝑈 and 𝑊 are subspaces of 𝑉, then 𝑉 is the direct sum of 𝑈 and
𝑊 (written 𝑉 = 𝑈 ⊕ 𝑊) if each element of 𝑉 can be written in exactly one way
as a vector in 𝑈 plus a vector in 𝑊 (see 1.41). Furthermore, this happens if and
only if 𝑉 = 𝑈 + 𝑊 and 𝑈 ∩ 𝑊 = {0} (see 1.46).
The next result shows that every finite-dimensional subspace of 𝑉 leads to a
natural direct sum decomposition of 𝑉. See Exercise 16 for an example showing
that the result below can fail without the hypothesis that the subspace 𝑈 is finite-
dimensional.
𝑉 = 𝑈 ⊕ 𝑈 ⟂.
𝑉 = 𝑈 + 𝑈 ⟂.
We have
6.50 𝑣 = ⟨𝑣 , 𝑒1 ⟩𝑒1 + ⋯ + ⟨𝑣, 𝑒𝑚 ⟩𝑒𝑚 + 𝑣⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
⏟⏟⏟⏟⏟⏟⏟⏟⏟ − ⟨𝑣, 𝑒1 ⟩𝑒1 − ⋯ − ⟨𝑣, 𝑒𝑚 ⟩𝑒𝑚 .
𝑢 𝑤
Let 𝑢 and 𝑤 be defined as in the equation above (as was done in the proof of 6.26).
Because each 𝑒𝑘 ∈ 𝑈, we see that 𝑢 ∈ 𝑈. Because 𝑒1 , …, 𝑒𝑚 is an orthonormal
list, for each 𝑘 = 1, …, 𝑚 we have
⟨𝑤, 𝑒𝑘 ⟩ = ⟨𝑣, 𝑒𝑘 ⟩ − ⟨𝑣, 𝑒𝑘 ⟩
= 0.
Thus 𝑤 is orthogonal to every vector in span(𝑒1 , …, 𝑒𝑚 ), which shows that 𝑤 ∈ 𝑈 ⟂.
Hence we have written 𝑣 = 𝑢 + 𝑤, where 𝑢 ∈ 𝑈 and 𝑤 ∈ 𝑈 ⟂, completing the
proof that 𝑉 = 𝑈 + 𝑈 ⟂.
From 6.48(d), we know that 𝑈 ∩ 𝑈 ⟂ = {0}. Now equation 𝑉 = 𝑈 + 𝑈 ⟂
implies that 𝑉 = 𝑈 ⊕ 𝑈 ⟂ (see 1.46).
Proof The formula for dim 𝑈 ⟂ follows immediately from 6.49 and 3.94.
𝑈 ⟂ = {0} ⟺ 𝑈 = 𝑉.
Proof
(a) To show that 𝑃𝑈 is a linear map on 𝑉, suppose 𝑣1 , 𝑣2 ∈ 𝑉. Write
𝑣1 = 𝑢1 + 𝑤1 and 𝑣2 = 𝑢2 + 𝑤2
with 𝑢1 , 𝑢2 ∈ 𝑈 and 𝑤1 , 𝑤2 ∈ 𝑈 ⟂. Thus 𝑃𝑈 𝑣1 = 𝑢1 and 𝑃𝑈 𝑣2 = 𝑢2 . Now
𝑣1 + 𝑣2 = (𝑢1 + 𝑢2 ) + (𝑤1 + 𝑤2 ),
𝜑𝑣 (𝑢) = ⟨𝑢, 𝑣⟩
See Exercise 13 for yet another proof of the Riesz representation theorem.
Minimization Problems
The following problem often arises: The remarkable simplicity of the solu-
Given a subspace 𝑈 of 𝑉 and a point tion to this minimization problem has
𝑣 ∈ 𝑉, find a point 𝑢 ∈ 𝑈 such that led to many important applications of
‖𝑣 − 𝑢‖ is as small as possible. The next inner product spaces outside of pure
result shows that 𝑢 = 𝑃𝑈 𝑣 is the unique mathematics.
solution of this minimization problem.
‖𝑣 − 𝑃𝑈 𝑣‖ ≤ ‖𝑣 − 𝑢‖.
Proof We have
= ‖𝑣 − 𝑢‖2,
218 Chapter 6 Inner Product Spaces
is as small as possible.
Let 𝐶[−𝜋, 𝜋] denote the real inner product space of continuous real-valued
functions on [−𝜋, 𝜋] with inner product
𝜋
6.64 ⟨ 𝑓, 𝑔⟩ = ∫ 𝑓 𝑔.
−𝜋
Let 𝑣 ∈ 𝐶[−𝜋, 𝜋] be the function defined by 𝑣(𝑥) = sin 𝑥. Let 𝑈 denote the
subspace of 𝐶[−𝜋, 𝜋] consisting of the polynomials with real coefficients and of
degree at most 5. Our problem can now be reformulated as follows:
Find 𝑢 ∈ 𝑈 such that ‖𝑣 − 𝑢‖ is as small as possible.
To compute the solution to our ap- A computer that can integrate is useful
proximation problem, first apply the here.
Gram–Schmidt procedure (using the in-
ner product given by 6.64) to the basis 1, 𝑥, 𝑥2, 𝑥3, 𝑥4, 𝑥5 of 𝑈, producing an ortho-
normal basis 𝑒1 , 𝑒2 , 𝑒3 , 𝑒4 , 𝑒5 , 𝑒6 of 𝑈. Then, again using the inner product given
by 6.64, compute 𝑃𝑈 𝑣 using 6.57(i) (with 𝑚 = 6). Doing this computation shows
that 𝑃𝑈 𝑣 is the function 𝑢 defined by
6.65 𝑢(𝑥) = 0.987862𝑥 − 0.155271𝑥3 + 0.00564312𝑥5,
where the 𝜋’s that appear in the exact answer have been replaced with a good
decimal approximation. By 6.61, the polynomial 𝑢 above is the best approximation
to the sine function on [−𝜋, 𝜋] using polynomials of degree at most 5 (here “best
approximation” means in the sense of minimizing ∫−𝜋 𝜋
| sin 𝑥 − 𝑢(𝑥)|2 𝑑𝑥).
Section 6C Orthogonal Complements and Minimization Problems 219
To see how good this approximation is, the next figure shows the graphs of
both the sine function and our approximation 𝑢 given by 6.65 over the interval
[−𝜋, 𝜋].
Our approximation 6.65 is so accurate that the two graphs are almost identical—
our eyes may see only one graph! Here the red graph is placed almost exactly
over the blue graph. If you are viewing this on an electronic device, enlarge the
picture above by 400% near 𝜋 or −𝜋 to see a small gap between the two graphs.
Another well-known approximation to the sine function by a polynomial of
degree 5 is given by the Taylor polynomial 𝑝 defined by
𝑥3 𝑥5
6.66 𝑝(𝑥) = 𝑥 − + .
3! 5!
To see how good this approximation is, the next picture shows the graphs of both
the sine function and the Taylor polynomial 𝑝 over the interval [−𝜋, 𝜋].
Pseudoinverse
Suppose 𝑇 ∈ ℒ(𝑉, 𝑊) and 𝑏 ∈ 𝑊. Consider the problem of finding 𝑥 ∈ 𝑉 such
that
𝑇𝑥 = 𝑏.
For example, if 𝑉 = 𝐅𝑛 and 𝑊 = 𝐅𝑚, then the equation above could represent a
system of 𝑚 linear equations in 𝑛 unknowns.
If 𝑇 is invertible, then the unique solution to the equation above is 𝑥 = 𝑇 −1 𝑏.
However, if 𝑇 is not invertible, then for some 𝑏 ∈ 𝑊 there may not exist any
solutions of the equation above, and for some 𝑏 ∈ 𝑊 there may exist infinitely
many solutions of the equation above.
If 𝑇 is not invertible, then we can still try to do as well as possible with the
equation above. For example, if the equation above has no solutions, then instead
of solving the equation 𝑇𝑥 − 𝑏 = 0, we can try to find 𝑥 ∈ 𝑉 such that ‖𝑇𝑥 − 𝑏‖
is as small as possible. As another example, if the equation above has infinitely
many solutions 𝑥 ∈ 𝑉, then among all those solutions we can try to find one such
that ‖𝑥‖ is as small as possible.
The pseudoinverse will provide the tool to solve the equation above as well
as possible, even when 𝑇 is not invertible. We need the next result to define the
pseudoinverse.
In the next two proofs, we will use without further comment the result that if
𝑉 is finite-dimensional and 𝑇 ∈ ℒ(𝑉, 𝑊), then null 𝑇, (null 𝑇)⟂, and range 𝑇 are
all finite-dimensional.
Proof Suppose that 𝑣 ∈ (null 𝑇)⟂ and 𝑇|(null 𝑇)⟂ 𝑣 = 0. Hence 𝑇𝑣 = 0 and
thus 𝑣 ∈ (null 𝑇) ∩ (null 𝑇)⟂, which implies that 𝑣 = 0 [by 6.48(d)]. Hence
null 𝑇|(null 𝑇)⟂ = {0}, which implies that 𝑇|(null 𝑇)⟂ is injective, as desired.
Clearly range 𝑇|(null 𝑇)⟂ ⊆ range 𝑇. To prove the inclusion in the other direction,
suppose 𝑤 ∈ range 𝑇. Hence there exists 𝑣 ∈ 𝑉 such that 𝑤 = 𝑇𝑣. There exist
𝑢 ∈ null 𝑇 and 𝑥 ∈ (null 𝑇)⟂ such that 𝑣 = 𝑢 + 𝑥 (by 6.49). Now
𝑇|(null 𝑇)⟂ 𝑥 = 𝑇𝑥 = 𝑇𝑣 − 𝑇𝑢 = 𝑤 − 0 = 𝑤,
which shows that 𝑤 ∈ range 𝑇|(null 𝑇)⟂ . Hence range 𝑇 ⊆ range 𝑇|(null 𝑇)⟂ , complet-
ing the proof that range 𝑇|(null 𝑇)⟂ = range 𝑇.
for each 𝑤 ∈ 𝑊.
Proof
(a) Suppose 𝑇 is invertible. Then (null 𝑇)⟂ = 𝑉 and range 𝑇 = 𝑊. Thus
𝑇|(null 𝑇)⟂ = 𝑇 and 𝑃range 𝑇 is the identity operator on 𝑊. Hence 𝑇 † = 𝑇 −1.
(b) Suppose 𝑤 ∈ range 𝑇. Thus
𝑇𝑇 † 𝑤 = 𝑇(𝑇|(null 𝑇)⟂ )−1 𝑤 = 𝑤 = 𝑃range 𝑇 𝑤.
Proof
This linear map is neither injective nor surjective, but we can compute its pseudo-
inverse. To do this, first note that range 𝑇 = {(𝑥, 𝑦, 0) ∶ 𝑥, 𝑦 ∈ 𝐅}. Thus
𝑃range 𝑇 (𝑥, 𝑦, 𝑧) = (𝑥, 𝑦, 0)
Because the list (−1, 1, 0, 0), (−1, 0, 1, −2) is linearly independent, this list is a
basis of null 𝑇.
Now suppose (𝑥, 𝑦, 𝑧) ∈ 𝐅3. Then
6.72 𝑇 † (𝑥, 𝑦, 𝑧) = (𝑇|(null 𝑇)⟂ )−1 𝑃range 𝑇 (𝑥, 𝑦, 𝑧) = (𝑇|(null 𝑇)⟂ )−1 (𝑥, 𝑦, 0).
The right side of the equation above is the vector (𝑎, 𝑏, 𝑐, 𝑑) ∈ 𝐅4 such that
𝑇(𝑎, 𝑏, 𝑐, 𝑑) = (𝑥, 𝑦, 0) and (𝑎, 𝑏, 𝑐, 𝑑) ∈ (null 𝑇)⟂. In other words, 𝑎, 𝑏, 𝑐, 𝑑 must
satisfy the following equations:
𝑎+𝑏+𝑐=𝑥
2𝑐 + 𝑑 = 𝑦
−𝑎 + 𝑏 = 0
−𝑎 + 𝑐 − 2𝑑 = 0,
where the first two equations are equivalent to the equation 𝑇(𝑎, 𝑏, 𝑐, 𝑑) = (𝑥, 𝑦, 0)
and the last two equations come from the condition for (𝑎, 𝑏, 𝑐, 𝑑) to be orthogo-
nal to each of the basis vectors (−1, 1, 0, 0), (−1, 0, 1, −2) in this basis of null 𝑇.
Thinking of 𝑥 and 𝑦 as constants and 𝑎, 𝑏, 𝑐, 𝑑 as unknowns, we can solve the
system above of four equations in four unknowns, getting
1 1 1 1
𝑎= 11
(5𝑥 − 2𝑦), 𝑏 = 11
(5𝑥 − 2𝑦), 𝑐 = 11
(𝑥 + 4𝑦), 𝑑 = 11
(−2𝑥 + 3𝑦).
Hence 6.72 tells us that
1
𝑇 † (𝑥, 𝑦, 𝑧) = 11
(5𝑥 − 2𝑦, 5𝑥 − 2𝑦, 𝑥 + 4𝑦, −2𝑥 + 3𝑦).
The formula above for 𝑇 † shows that 𝑇𝑇 † (𝑥, 𝑦, 𝑧) = (𝑥, 𝑦, 0) for all (𝑥, 𝑦, 𝑧) ∈ 𝐅3,
which illustrates the equation 𝑇𝑇 † = 𝑃range 𝑇 from 6.69(b).
224 Chapter 6 Inner Product Spaces
Exercises 6C
‖𝑃𝑣‖ ≤ ‖𝑣‖
𝜑𝑣 (𝑢) = ⟨𝑢, 𝑣⟩
for all 𝑢 ∈ 𝑉.
(a) Show that 𝑣 ↦ 𝜑𝑣 is an injective linear map from 𝑉 to 𝑉 ′.
(b) Use (a) and a dimension-counting argument to show that 𝑣 ↦ 𝜑𝑣 is an
isomorphism from 𝑉 onto 𝑉 ′.
The purpose of this exercise is to give an alternative proof of the Riesz
representation theorem (6.42 and 6.58) when 𝐅 = 𝐑 . Thus you should not
use the Riesz representation theorem as a tool in your solution.
15 In 𝐑4 , let
𝑈 = span((1, 1, 0, 0), (1, 1, 1, 2)).
Find 𝑢 ∈ 𝑈 such that ‖𝑢 − (1, 2, 3, 4)‖ is as small as possible.
16 Suppose 𝐶[−1, 1] is the vector space of continuous real-valued functions
on the interval [−1, 1] with inner product given by
1
⟨ 𝑓, 𝑔⟩ = ∫ 𝑓𝑔
−1
1
17 Find 𝑝 ∈ 𝒫3 (𝐑) such that 𝑝(0) = 0, 𝑝′(0) = 0, and ∫ ∣2 + 3𝑥 − 𝑝(𝑥)∣2 𝑑𝑥 is
0
as small as possible.
𝜋
18 Find 𝑝 ∈ 𝒫5 (𝐑) that makes ∫ ∣sin 𝑥 − 𝑝(𝑥)∣ 𝑑𝑥 as small as possible.
2
−𝜋
The polynomial 6.65 is an excellent approximation to the answer to this
exercise, but here you are asked to find the exact solution, which involves
powers of 𝜋. A computer that can perform symbolic integration should
help.
𝑇(𝑎, 𝑏, 𝑐) = (𝑎 + 𝑏 + 𝑐, 2𝑏 + 3𝑐).
𝑇𝑇 † 𝑇 = 𝑇 and 𝑇 † 𝑇𝑇 † = 𝑇 †.
Both formulas above clearly hold if 𝑇 is invertible because in that case we
can replace 𝑇 † with 𝑇 −1.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0
International License (https://creativecommons.org/licenses/by-nc/4.0), which permits any noncommercial use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to original author and source, provide a link to the Creative Commons license, and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended use
is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
Chapter 7
Operators on Inner Product Spaces
The deepest results related to inner product spaces deal with the subject to which
we now turn—linear maps and operators on inner product spaces. As we will see,
good theorems can be proved by exploiting properties of the adjoint.
The hugely important spectral theorem will provide a complete description
of self-adjoint operators on real inner product spaces and of normal operators
on complex inner product spaces. We will then use the spectral theorem to help
understand positive operators and unitary operators, which will lead to unitary
matrices and matrix factorizations. The spectral theorem will also lead to the
popular singular value decomposition, which will lead to the polar decomposition.
The most important results in the rest of this book are valid only in finite
dimensions. Thus from now on we assume that 𝑉 and 𝑊 are finite-dimensional.
• 𝐅 denotes 𝐑 or 𝐂 .
• 𝑉 and 𝑊 are nonzero finite-dimensional inner product spaces over 𝐅.
Petar Milošević CC BY-SA
Market square in Lviv, a city that has had several names and has been in several
countries because of changing international borders. From 1772 until 1918, the city was
in Austria and was called Lemberg. Between World War I and World War II, the city was
in Poland and was called Lwów. During this time, mathematicians in Lwów, particularly
Stefan Banach (1892–1945) and his colleagues, developed the basic results of modern
functional analysis, using tools of analysis to study infinite-dimensional vector spaces.
Since the end of World War II, Lviv has been in Ukraine, which was part of the
Soviet Union until Ukraine became an independent country in 1991.
© Sheldon Axler 2024
S. Axler, Linear Algebra Done Right, Undergraduate Texts in Mathematics,
227
https://doi.org/10.1007/978-3-031-41026-0_7
228 Chapter 7 Operators on Inner Product Spaces
To see why the definition above The word adjoint has another meaning
makes sense, suppose 𝑇 ∈ ℒ(𝑉, 𝑊). Fix in linear algebra. In case you en-
𝑤 ∈ 𝑊. Consider the linear functional counter the second meaning elsewhere,
be warned that the two meanings for
𝑣 ↦ ⟨𝑇𝑣, 𝑤⟩
adjoint are unrelated to each other.
on 𝑉 that maps 𝑣 ∈ 𝑉 to ⟨𝑇𝑣, 𝑤⟩; this
linear functional depends on 𝑇 and 𝑤. By the Riesz representation theorem (6.42),
there exists a unique vector in 𝑉 such that this linear functional is given by taking
the inner product with it. We call this unique vector 𝑇 ∗ 𝑤. In other words, 𝑇 ∗ 𝑤 is
the unique vector in 𝑉 such that
⟨𝑇𝑣, 𝑤⟩ = ⟨𝑣, 𝑇 ∗ 𝑤⟩
for every 𝑣 ∈ 𝑉.
In the equation above, the inner product on the left takes place in 𝑊 and the
inner product on the right takes place in 𝑉. However, we use the same notation
⟨⋅, ⋅⟩ for both inner products.
The equation above and the definition of the adjoint imply that
In the two examples above, 𝑇 ∗ turned The two examples above and the proof
out to be not just a function from 𝑉 to below use a common technique for
𝑊 but a linear map from 𝑉 to 𝑊. This computing 𝑇 ∗ : start with a formula
behavior is true in general, as shown by for ⟨𝑇𝑣, 𝑤⟩ then manipulate it to get
the next result. just 𝑣 in the first slot; the entry in the
second slot will then be 𝑇 ∗ 𝑤.
The next result shows the relationship between the null space and the range of
a linear map and its adjoint.
As we will soon see, the next definition is intimately connected to the matrix
of the adjoint of a linear map.
The next result shows how to compute The adjoint of a linear map does not
the matrix of 𝑇 ∗ from the matrix of 𝑇. depend on a choice of basis. Thus
Caution: With respect to nonorthonor- we frequently emphasize adjoints of
mal bases, the matrix of 𝑇 ∗ does not nec- linear maps instead of transposes or
essarily equal the conjugate transpose of conjugate transposes of matrices.
the matrix of 𝑇.
Proof In this proof, we will write ℳ(𝑇) and ℳ(𝑇 ∗ ) instead of the longer
expressions ℳ(𝑇, (𝑒1 , …, 𝑒𝑛 ), ( 𝑓1 , …, 𝑓𝑚 )) and ℳ(𝑇 ∗, ( 𝑓1 , …, 𝑓𝑚 ), (𝑒1 , …, 𝑒𝑛 )).
Recall that we obtain the 𝑘 th column of ℳ(𝑇) by writing 𝑇𝑒𝑘 as a linear
combination of the 𝑓𝑗 ’s; the scalars used in this linear combination then become
the 𝑘 th column of ℳ(𝑇). Because 𝑓1 , …, 𝑓𝑚 is an orthonormal basis of 𝑊, we
know how to write 𝑇𝑒𝑘 as a linear combination of the 𝑓𝑗 ’s [see 6.30(a)]:
𝑇𝑒𝑘 = ⟨𝑇𝑒𝑘 , 𝑓1 ⟩ 𝑓1 + ⋯ + ⟨𝑇𝑒𝑘 , 𝑓𝑚 ⟩ 𝑓𝑚 .
Thus
the entry in row 𝑗, column 𝑘, of ℳ(𝑇) is ⟨𝑇𝑒𝑘 , 𝑓𝑗 ⟩.
In the statement above, replace 𝑇 with 𝑇 ∗ and interchange 𝑒1 , …, 𝑒𝑛 and
𝑓1 , …, 𝑓𝑚 . This shows that the entry in row 𝑗, column 𝑘, of ℳ(𝑇 ∗ ) is ⟨𝑇 ∗ 𝑓𝑘 , 𝑒𝑗 ⟩,
which equals ⟨ 𝑓𝑘 , 𝑇𝑒𝑗 ⟩, which equals ⟨𝑇𝑒𝑗 , 𝑓𝑘 ⟩, which equals the complex conjugate
of the entry in row 𝑘, column 𝑗, of ℳ(𝑇). Thus ℳ(𝑇 ∗ ) = (ℳ(𝑇))∗.
Self-Adjoint Operators
Now we switch our attention to operators on inner product spaces. Instead of
considering linear maps from 𝑉 to 𝑊, we will focus on linear maps from 𝑉 to 𝑉;
recall that such linear maps are called operators.
A good analogy to keep in mind is that the adjoint on ℒ(𝑉) plays a role similar
to that of the complex conjugate on 𝐂 . A complex number 𝑧 is real if and only if
𝑧 = 𝑧; thus a self-adjoint operator (𝑇 = 𝑇 ∗ ) is analogous to a real number.
We will see that the analogy discussed An operator 𝑇 ∈ ℒ(𝑉) is self-adjoint
above is reflected in some important prop- if and only if
erties of self-adjoint operators, beginning
with eigenvalues in the next result. ⟨𝑇𝑣, 𝑤⟩ = ⟨𝑣, 𝑇𝑤⟩
If 𝐅 = 𝐑 , then by definition every for all 𝑣, 𝑤 ∈ 𝑉.
eigenvalue is real, so the next result is
interesting only when 𝐅 = 𝐂 .
The next result is false for real inner product spaces. As an example, consider
the operator 𝑇 ∈ ℒ(𝐑2 ) that is a counterclockwise rotation of 90∘ around the
origin; thus 𝑇(𝑥, 𝑦) = (−𝑦, 𝑥). Notice that 𝑇𝑣 is orthogonal to 𝑣 for every 𝑣 ∈ 𝐑2,
even though 𝑇 ≠ 0.
Proof If 𝑢, 𝑤 ∈ 𝑉, then
⟨𝑇(𝑢 + 𝑤), 𝑢 + 𝑤⟩ − ⟨𝑇(𝑢 − 𝑤), 𝑢 − 𝑤⟩
⟨𝑇𝑢, 𝑤⟩ =
4
⟨𝑇(𝑢 + 𝑖𝑤), 𝑢 + 𝑖𝑤⟩ − ⟨𝑇(𝑢 − 𝑖𝑤), 𝑢 − 𝑖𝑤⟩
+ 𝑖,
4
as can be verified by computing the right side. Note that each term on the right
side is of the form ⟨𝑇𝑣, 𝑣⟩ for appropriate 𝑣 ∈ 𝑉.
Now suppose ⟨𝑇𝑣, 𝑣⟩ = 0 for every 𝑣 ∈ 𝑉. Then the equation above implies
that ⟨𝑇𝑢, 𝑤⟩ = 0 for all 𝑢, 𝑤 ∈ 𝑉, which then implies that 𝑇𝑢 = 0 for every 𝑢 ∈ 𝑈
(take 𝑤 = 𝑇𝑢). Hence 𝑇 = 0, as desired.
The next result is false for real inner The next result provides another good
product spaces, as shown by considering example of how self-adjoint operators
any operator on a real inner product space behave like real numbers.
that is not self-adjoint.
Proof If 𝑣 ∈ 𝑉, then
7.15 ⟨𝑇 ∗ 𝑣, 𝑣⟩ = ⟨𝑣, 𝑇 ∗ 𝑣⟩ = ⟨𝑇𝑣, 𝑣⟩.
Now
𝑇 is self-adjoint ⟺ 𝑇 − 𝑇 ∗ = 0
⟺ ⟨(𝑇 − 𝑇 ∗ )𝑣, 𝑣⟩ = 0 for every 𝑣 ∈ 𝑉
⟺ ⟨𝑇𝑣, 𝑣⟩ − ⟨𝑇𝑣, 𝑣⟩ = 0 for every 𝑣 ∈ 𝑉
⟺ ⟨𝑇𝑣, 𝑣⟩ ∈ 𝐑 for every 𝑣 ∈ 𝑉,
where the second equivalence follows from 7.13 as applied to 𝑇 − 𝑇 ∗ and the
third equivalence follows from 7.15.
Section 7A Self-Adjoint and Normal Operators 235
Proof We have already proved this (without the hypothesis that 𝑇 is self-adjoint)
when 𝑉 is a complex inner product space (see 7.13). Thus we can assume that 𝑉
is a real inner product space. If 𝑢, 𝑤 ∈ 𝑉, then
⟨𝑇(𝑢 + 𝑤), 𝑢 + 𝑤⟩ − ⟨𝑇(𝑢 − 𝑤), 𝑢 − 𝑤⟩
7.17 ⟨𝑇𝑢, 𝑤⟩ = ,
4
as can be proved by computing the right side using the equation
⟨𝑇𝑤, 𝑢⟩ = ⟨𝑤, 𝑇𝑢⟩ = ⟨𝑇𝑢, 𝑤⟩,
where the first equality holds because 𝑇 is self-adjoint and the second equality
holds because we are working in a real inner product space.
Now suppose ⟨𝑇𝑣, 𝑣⟩ = 0 for every 𝑣 ∈ 𝑉. Because each term on the right
side of 7.17 is of the form ⟨𝑇𝑣, 𝑣⟩ for appropriate 𝑣, this implies that ⟨𝑇𝑢, 𝑤⟩ = 0
for all 𝑢, 𝑤 ∈ 𝑉. This implies that 𝑇𝑢 = 0 for every 𝑢 ∈ 𝑉 (take 𝑤 = 𝑇𝑢). Hence
𝑇 = 0, as desired.
Normal Operators
This operator 𝑇 is not self-adjoint because the entry in row 2, column 1 (which
equals 3) does not equal the complex conjugate of the entry in row 1, column 2
(which equals −3).
The matrix of 𝑇𝑇 ∗ equals
2 −3 2 3 13 0
( )( ), which equals ( ).
3 2 −3 2 0 13
In the next section we will see why normal operators are worthy of special
attention. The next result provides a useful characterization of normal operators.
Proof We have
𝑇 is normal ⟺ 𝑇 ∗ 𝑇 − 𝑇𝑇 ∗ = 0
⟺ ⟨(𝑇 ∗ 𝑇 − 𝑇𝑇 ∗ )𝑣, 𝑣⟩ = 0 for every 𝑣 ∈ 𝑉
⟺ ⟨𝑇 ∗ 𝑇𝑣, 𝑣⟩ = ⟨𝑇𝑇 ∗ 𝑣, 𝑣⟩ for every 𝑣 ∈ 𝑉
⟺ ⟨𝑇𝑣, 𝑇𝑣⟩ = ⟨𝑇 ∗ 𝑣, 𝑇 ∗ 𝑣⟩ for every 𝑣 ∈ 𝑉
where we used 7.16 to establish the second equivalence (note that the operator
𝑇 ∗ 𝑇 − 𝑇𝑇 ∗ is self-adjoint).
The next result presents several consequences of the result above. Compare
(e) of the next result to Exercise 3. That exercise states that the eigenvalues of
the adjoint of each operator are equal (as a set) to the complex conjugates of
the eigenvalues of the operator. The exercise says nothing about eigenvectors,
because an operator and its adjoint may have different eigenvectors. However,
(e) of the next result implies that a normal operator and its adjoint have the same
eigenvectors.
Section 7A Self-Adjoint and Normal Operators 237
Proof
(a) Suppose 𝑣 ∈ 𝑉. Then
where the middle equivalence above follows from 7.20. Thus null 𝑇 = null 𝑇 ∗.
(b) We have
range 𝑇 = (null 𝑇 ∗ )⟂ = (null 𝑇)⟂ = range 𝑇 ∗,
where the first equality comes from 7.6(d), the second equality comes from
(a) in this result, and the third equality comes from 7.6(b).
(c) We have
where the first equality comes from 6.49, the second equality comes from
7.6(b), and the third equality comes from (b) in this result.
(d) Suppose 𝜆 ∈ 𝐅. Then
= 𝑇𝑇 ∗ − 𝜆𝑇 − 𝜆𝑇 ∗ + |𝜆|2 𝐼
= 𝑇 ∗ 𝑇 − 𝜆𝑇 − 𝜆𝑇 ∗ + |𝜆|2 𝐼
= (𝑇 ∗ − 𝜆𝐼)(𝑇 − 𝜆𝐼)
= (𝑇 − 𝜆𝐼)∗ (𝑇 − 𝜆𝐼).
Because every self-adjoint operator is normal, the next result applies in partic-
ular to self-adjoint operators.
As stated here, the next result makes sense only when 𝐅 = 𝐂 . However, see
Exercise 12 for a version that makes sense when 𝐅 = 𝐂 and when 𝐅 = 𝐑 .
Suppose 𝐅 = 𝐂 and 𝑇 ∈ ℒ(𝑉). Under the analogy between ℒ(𝑉) and 𝐂 ,
with the adjoint on ℒ(𝑉) playing a similar role to that of the complex conjugate on
𝐂 , the operators 𝐴 and 𝐵 as defined by 7.24 correspond to the real and imaginary
parts of 𝑇. Thus the informal title of the result below should make sense.
Exercises 7A
‖𝑇𝑒1 ‖2 + ⋯ + ‖𝑇𝑒𝑛 ‖2 = ∥𝑇 ∗ 𝑓1 ∥ + ⋯ + ∥𝑇 ∗ 𝑓𝑚 ∥ .
2 2
The numbers ‖𝑇𝑒1 ‖2, …, ‖𝑇𝑒𝑛 ‖2 in the equation above depend on the ortho-
normal basis 𝑒1 , …, 𝑒𝑛 , but the right side of the equation does not depend on
𝑒1 , …, 𝑒𝑛 . Thus the equation above shows that the sum on the left side does
not depend on which orthonormal basis 𝑒1 , …, 𝑒𝑛 is used.
for all 𝑣 ∈ 𝑉.
240 Chapter 7 Operators on Inner Product Spaces
𝐵∗ = −𝐵.
Suppose that 𝑇 ∈ ℒ(𝑉). Prove that 𝑇 is normal if and only if there exist
commuting operators 𝐴 and 𝐵 such that 𝐴 is self-adjoint, 𝐵 is a skew operator,
and 𝑇 = 𝐴 + 𝐵.
13 Suppose 𝐅 = 𝐑 . Define 𝒜 ∈ ℒ(ℒ(𝑉)) by 𝒜𝑇 = 𝑇 ∗ for all 𝑇 ∈ ℒ(𝑉).
(a) Find all eigenvalues of 𝒜.
(b) Find the minimal polynomial of 𝒜.
16 Suppose 𝐅 = 𝐑 .
(a) Show that the set of self-adjoint operators on 𝑉 is a subspace of ℒ(𝑉).
(b) What is the dimension of the subspace of ℒ(𝑉) in (a) [in terms of
dim 𝑉]?
𝑉 = span(1, cos 𝑥, cos 2𝑥, …, cos 𝑛𝑥, sin 𝑥, sin 2𝑥, …, sin 𝑛𝑥).
𝑇 ′(𝜑𝑤 ) = 𝜑𝑇 ∗𝑤
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0
International License (https://creativecommons.org/licenses/by-nc/4.0), which permits any noncommercial use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to original author and source, provide a link to the Creative Commons license, and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended use
is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
Section 7B Spectral Theorem 243
7B Spectral Theorem
Recall that a diagonal matrix is a square matrix that is 0 everywhere except
possibly on the diagonal. Recall that an operator on 𝑉 is called diagonalizable if
the operator has a diagonal matrix with respect to some basis of 𝑉. Recall also
that this happens if and only if there is a basis of 𝑉 consisting of eigenvectors of
the operator (see 5.55).
The nicest operators on 𝑉 are those for which there is an orthonormal basis
of 𝑉 with respect to which the operator has a diagonal matrix. These are precisely
the operators 𝑇 ∈ ℒ(𝑉) such that there is an orthonormal basis of 𝑉 consisting
of eigenvectors of 𝑇. Our goal in this section is to prove the spectral theorem,
which characterizes these operators as the self-adjoint operators when 𝐅 = 𝐑 and
as the normal operators when 𝐅 = 𝐂 .
The spectral theorem is probably the most useful tool in the study of operators
on inner product spaces. Its extension to certain infinite-dimensional inner product
spaces (see, for example, Section 10D of the author’s book Measure, Integration
& Real Analysis) plays a key role in functional analysis.
Because the conclusion of the spectral theorem depends on 𝐅, we will break
the spectral theorem into two pieces, called the real spectral theorem and the
complex spectral theorem.
Suppose 𝑇 ∈ ℒ(𝑉) is self-adjoint and 𝑏, 𝑐 ∈ 𝐑 are such that 𝑏2 < 4𝑐. Then
𝑇 2 + 𝑏𝑇 + 𝑐𝐼
is an invertible operator.
244 Chapter 7 Operators on Inner Product Spaces
|𝑏| ‖𝑣‖ 2 𝑏2
= (‖𝑇𝑣‖ − ) + (𝑐 − )‖𝑣‖2
2 4
> 0,
where the third line above holds by the Cauchy–Schwarz inequality (6.14). The
last inequality implies that (𝑇 2 + 𝑏𝑇 + 𝑐𝐼)𝑣 ≠ 0. Thus 𝑇 2 + 𝑏𝑇 + 𝑐𝐼 is injective,
which implies that it is invertible (see 3.65).
The next result will be a key tool in our proof of the real spectral theorem.
Proof First suppose 𝐅 = 𝐂 . The zeros of the minimal polynomial of 𝑇 are the
eigenvalues of 𝑇 [by 5.27(a)]. All eigenvalues of 𝑇 are real (by 7.12). Thus the
second version of the fundamental theorem of algebra (see 6.69) tells us that the
minimal polynomial of 𝑇 has the desired form.
Now suppose 𝐅 = 𝐑 . By the factorization of a polynomial over 𝐑 (see 4.16)
there exist 𝜆1 , …, 𝜆𝑚 ∈ 𝐑 and 𝑏1 , …, 𝑏𝑁 , 𝑐1 , …, 𝑐𝑁 ∈ 𝐑 with 𝑏𝑘2 < 4𝑐𝑘 for each 𝑘
such that the minimal polynomial of 𝑇 equals
7.28 (𝑧 − 𝜆1 )⋯(𝑧 − 𝜆𝑚 )(𝑧2 + 𝑏1 𝑧 + 𝑐1 )⋯(𝑧2 + 𝑏𝑁 𝑧 + 𝑐𝑁 );
here either 𝑚 or 𝑁 might equal 0, meaning that there are no terms of the corre-
sponding form. Now
(𝑇 − 𝜆1 𝐼)⋯(𝑇 − 𝜆𝑚 𝐼)(𝑇 2 + 𝑏1 𝑇 + 𝑐1 𝐼)⋯(𝑇 2 + 𝑏𝑁 𝑇 + 𝑐𝑁 𝐼) = 0.
If 𝑁 > 0, then we could multiply both sides of the equation above on the right by
the inverse of 𝑇 2 + 𝑏𝑁 𝑇 + 𝑐𝑁 𝐼 (which is an invertible operator by 7.26) to obtain a
polynomial expression of 𝑇 that equals 0. The corresponding polynomial would
have degree two less than the degree of 7.28, violating the minimality of the
degree of the polynomial with this property. Thus we must have 𝑁 = 0, which
means that the minimal polynomial in 7.28 has the form (𝑧 − 𝜆1 )⋯(𝑧 − 𝜆𝑚 ), as
desired.
The result above along with 5.27(a) implies that every self-adjoint operator
has an eigenvalue. In fact, as we will see in the next result, self-adjoint operators
have enough eigenvectors to form a basis.
Section 7B Spectral Theorem 245
The next result, which gives a complete description of the self-adjoint operators
on a real inner product space, is one of the major theorems in linear algebra.
Proof First suppose (a) holds, so 𝑇 is self-adjoint. Our results on minimal poly-
nomials, specifically 6.37 and 7.27, imply that 𝑇 has an upper-triangular matrix
with respect to some orthonormal basis of 𝑉. With respect to this orthonormal
basis, the matrix of 𝑇 ∗ is the transpose of the matrix of 𝑇. However, 𝑇 ∗ = 𝑇.
Thus the transpose of the matrix of 𝑇 equals the matrix of 𝑇. Because the matrix
of 𝑇 is upper-triangular, this means that all entries of the matrix above and below
the diagonal are 0. Hence the matrix of 𝑇 is a diagonal matrix with respect to the
orthonormal basis. Thus (a) implies (b).
Conversely, now suppose (b) holds, so 𝑇 has a diagonal matrix with respect to
some orthonormal basis of 𝑉. That diagonal matrix equals its transpose. Thus
with respect to that basis, the matrix of 𝑇 ∗ equals the matrix of 𝑇. Hence 𝑇 ∗ = 𝑇,
proving that (b) implies (a).
The equivalence of (b) and (c) follows from the definitions [or see the proof
that (a) and (b) are equivalent in 5.55].
Proof First suppose (a) holds, so 𝑇 is normal. By Schur’s theorem (6.38), there is
an orthonormal basis 𝑒1 , …, 𝑒𝑛 of 𝑉 with respect to which 𝑇 has an upper-triangular
matrix. Thus we can write
𝑎1, 1 ⋯ 𝑎 1, 𝑛
⎛ ⎞
7.32 ℳ(𝑇, (𝑒1 , …, 𝑒𝑛 )) = ⎜
⎜
⎜ ⋱ ⋮ ⎟
⎟
⎟.
⎝ 0 𝑎 𝑛, 𝑛 ⎠
We will show that this matrix is actually a diagonal matrix.
We see from the matrix above that
‖𝑇𝑒1 ‖2 = |𝑎1, 1 |2,
∥𝑇 ∗ 𝑒1 ∥ = |𝑎1, 1 |2 + |𝑎1, 2 |2 + ⋯ + |𝑎1, 𝑛 |2.
2
Because 𝑇 is normal, ‖𝑇𝑒1 ‖ = ∥𝑇 ∗ 𝑒1 ∥ (see 7.20). Thus the two equations above
imply that all entries in the first row of the matrix in 7.32, except possibly the first
entry 𝑎1, 1 , equal 0.
Now 7.32 implies
‖𝑇𝑒2 ‖2 = |𝑎2, 2 |2
(because 𝑎1, 2 = 0, as we showed in the paragraph above) and
∥𝑇 ∗ 𝑒2 ∥ = |𝑎2, 2 |2 + |𝑎2, 3 |2 + ⋯ + |𝑎2, 𝑛 |2.
2
Because 𝑇 is normal, ‖𝑇𝑒2 ‖ = ∥𝑇 ∗ 𝑒2 ∥. Thus the two equations above imply that
all entries in the second row of the matrix in 7.32, except possibly the diagonal
entry 𝑎2, 2 , equal 0.
Continuing in this fashion, we see that all nondiagonal entries in the matrix
7.32 equal 0. Thus (b) holds, completing the proof that (a) implies (b).
Now suppose (b) holds, so 𝑇 has a diagonal matrix with respect to some
orthonormal basis of 𝑉. The matrix of 𝑇 ∗ (with respect to the same basis) is
obtained by taking the conjugate transpose of the matrix of 𝑇; hence 𝑇 ∗ also has a
diagonal matrix. Any two diagonal matrices commute; thus 𝑇 commutes with 𝑇 ∗,
which means that 𝑇 is normal. In other words, (a) holds, completing the proof
that (b) implies (a).
The equivalence of (b) and (c) follows from the definitions (also see 5.55).
Section 7B Spectral Theorem 247
See Exercises 13 and 20 for alternative proofs that (a) implies (b) in the
previous result.
Exercises 14 and 15 interpret the real spectral theorem and the complex
spectral theorem by expressing the domain space as an orthogonal direct sum of
eigenspaces.
See Exercise 16 for a version of the complex spectral theorem that applies
simultaneously to more than one operator.
The main conclusion of the complex spectral theorem is that every normal
operator on a complex finite-dimensional inner product space is diagonalizable
by an orthonormal basis, as illustrated by the next example.
Exercises 7B
13 Without using the complex spectral theorem, use the version of Schur’s
theorem that applies to two commuting operators (take ℰ = {𝑇, 𝑇 ∗ } in
Exercise 20 in Section 6B) to give a different proof that if 𝐅 = 𝐂 and
𝑇 ∈ ℒ(𝑉) is normal, then 𝑇 has a diagonal matrix with respect to some
orthonormal basis of 𝑉.
14 Suppose 𝐅 = 𝐑 and 𝑇 ∈ ℒ(𝑉). Prove that 𝑇 is self-adjoint if and only
if all pairs of eigenvectors corresponding to distinct eigenvalues of 𝑇 are
orthogonal and 𝑉 = 𝐸(𝜆1 , 𝑇) ⊕ ⋯ ⊕ 𝐸(𝜆𝑚 , 𝑇), where 𝜆1 , …, 𝜆𝑚 denote the
distinct eigenvalues of 𝑇.
15 Suppose 𝐅 = 𝐂 and 𝑇 ∈ ℒ(𝑉). Prove that 𝑇 is normal if and only if all pairs
of eigenvectors corresponding to distinct eigenvalues of 𝑇 are orthogonal
and 𝑉 = 𝐸(𝜆1 , 𝑇) ⊕ ⋯ ⊕ 𝐸(𝜆𝑚 , 𝑇), where 𝜆1 , …, 𝜆𝑚 denote the distinct
eigenvalues of 𝑇.
16 Suppose 𝐅 = 𝐂 and ℰ ⊆ ℒ(𝑉). Prove that there is an orthonormal basis
of 𝑉 with respect to which every element of ℰ has a diagonal matrix if and
only if 𝑆 and 𝑇 are commuting normal operators for all 𝑆, 𝑇 ∈ ℰ.
This exercise extends the complex spectral theorem to the context of a
collection of commuting normal operators.
Section 7B Spectral Theorem 249
is not invertible.
This exercise shows that the hypothesis that 𝑇 is self-adjoint cannot be
deleted in 7.26, even for real vector spaces.
22 Give an example of an operator 𝑇 ∈ ℒ(𝐂3 ) such that 2 and 3 are the only
eigenvalues of 𝑇 and 𝑇 2 − 5𝑇 + 6𝐼 ≠ 0.
23 Suppose 𝑇 ∈ ℒ(𝑉) is self-adjoint, 𝜆 ∈ 𝐅, and 𝜖 > 0. Suppose there exists
𝑣 ∈ 𝑉 such that ‖𝑣‖ = 1 and
7C Positive Operators
for all 𝑣 ∈ 𝑉.
Proof We will prove that (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒ (e) ⇒ (f) ⇒ (a).
First suppose (a) holds, so that 𝑇 is positive, which implies that 𝑇 is self-adjoint
(by definition of positive operator). To prove the other condition in (b), suppose
𝜆 is an eigenvalue of 𝑇. Let 𝑣 be an eigenvector of 𝑇 corresponding to 𝜆. Then
0 ≤ ⟨𝑇𝑣, 𝑣⟩ = ⟨𝜆𝑣, 𝑣⟩ = 𝜆⟨𝑣, 𝑣⟩.
Thus 𝜆 is a nonnegative number. Hence (b) holds, showing that (a) implies (b).
Now suppose (b) holds, so that 𝑇 is self-adjoint and all eigenvalues of 𝑇 are
nonnegative. By the spectral theorem (7.29 and 7.31), there is an orthonormal
basis 𝑒1 , …, 𝑒𝑛 of 𝑉 consisting of eigenvectors of 𝑇. Let 𝜆1 , …, 𝜆𝑛 be the eigenval-
ues of 𝑇 corresponding to 𝑒1 , …, 𝑒𝑛 ; thus each 𝜆𝑘 is a nonnegative number. The
matrix of 𝑇 with respect to 𝑒1 , …, 𝑒𝑛 is the diagonal matrix with 𝜆1 , …, 𝜆𝑛 on the
diagonal, which shows that (b) implies (c).
Now suppose (c) holds. Suppose 𝑒1 , …, 𝑒𝑛 is an orthonormal basis of 𝑉 such
that the matrix of 𝑇 with respect to this basis is a diagonal matrix with nonnegative
numbers 𝜆1 , …, 𝜆𝑛 on the diagonal. The linear map lemma (3.4) implies that
there exists 𝑅 ∈ ℒ(𝑉) such that
𝑅𝑒𝑘 = √ 𝜆𝑘 𝑒𝑘
for each 𝑘 = 1, …, 𝑛. As you should verify, 𝑅 is a positive operator. Furthermore,
𝑅2 𝑒𝑘 = 𝜆𝑘 𝑒𝑘 = 𝑇𝑒𝑘 for each 𝑘, which implies that 𝑅2 = 𝑇. Thus 𝑅 is a positive
square root of 𝑇. Hence (d) holds, which shows that (c) implies (d).
Every positive operator is self-adjoint (by definition of positive operator).
Thus (d) implies (e).
Now suppose (e) holds, meaning that there exists a self-adjoint operator 𝑅 on
𝑉 such that 𝑇 = 𝑅2. Then 𝑇 = 𝑅∗ 𝑅 (because 𝑅∗ = 𝑅). Hence (e) implies (f).
Finally, suppose (f) holds. Let 𝑅 ∈ ℒ(𝑉) be such that 𝑇 = 𝑅∗ 𝑅. Then
∗ ∗
𝑇 = (𝑅∗ 𝑅) = 𝑅∗ (𝑅∗ ) = 𝑅∗ 𝑅 = 𝑇. Hence 𝑇 is self-adjoint. To complete the
∗
proof that (a) holds, note that
⟨𝑇𝑣, 𝑣⟩ = ⟨𝑅∗ 𝑅𝑣, 𝑣⟩ = ⟨𝑅𝑣, 𝑅𝑣⟩ ≥ 0
for every 𝑣 ∈ 𝑉. Thus 𝑇 is positive, showing that (f) implies (a).
Section 7C Positive Operators 253
Every nonnegative number has a unique nonnegative square root. The next
result shows that positive operators enjoy a similar property.
7.39 each positive operator has only one positive square root
Thus
𝑅𝑣 = ∑ 𝑎𝑘 √ 𝜆𝑒𝑘 = √ 𝜆𝑣,
{𝑘 ∶ 𝜆𝑘 = 𝜆}
as desired.
The notation defined below makes sense thanks to the result above.
7.40 notation: √𝑇
The statement of the next result does not involve a square root, but the clean
proof makes nice use of the square root of a positive operator.
Proof We have
2
0 = ⟨𝑇𝑣, 𝑣⟩ = ⟨√𝑇√𝑇𝑣, 𝑣⟩ = ⟨√𝑇𝑣, √𝑇𝑣⟩ = ∥√𝑇𝑣∥ .
Exercises 7C
1 Suppose 𝑇 ∈ ℒ(𝑉). Prove that if both 𝑇 and −𝑇 are positive operators, then
𝑇 = 0.
𝑇𝑣 = 𝑤 and 𝑇𝑤 = 𝑣.
Prove that 𝑣 = 𝑤.
11 Suppose 𝑇 is a positive operator on 𝑉 and 𝑈 is a subspace of 𝑉 invariant
under 𝑇. Prove that 𝑇|𝑈 ∈ ℒ(𝑈) is a positive operator on 𝑈.
12 Suppose 𝑇 ∈ ℒ(𝑉) is a positive operator. Prove that 𝑇 𝑘 is a positive operator
for every positive integer 𝑘.
256 Chapter 7 Operators on Inner Product Spaces
‖𝑆𝑣‖ = ‖𝑣‖
If 𝑆 ∈ ℒ(𝑉, 𝑊) is an isometry and The Greek word isos means equal; the
𝑣 ∈ 𝑉 is such that 𝑆𝑣 = 0, then Greek word metron means measure.
‖𝑣‖ = ‖𝑆𝑣‖ = ‖0‖ = 0, Thus isometry literally means equal
measure.
which implies that 𝑣 = 0. Thus every
isometry is injective.
The next result gives conditions equivalent to being an isometry. The equiv-
alence of (a) and (c) shows that a linear map is an isometry if and only if it
preserves inner products. The equivalence of (a) and (d) shows that a linear map
is an isometry if and only if it maps some orthonormal basis to an orthonormal list.
Thus the isometries given by Example 7.45 include all isometries. Furthermore,
a linear map is an isometry if and only if it maps every orthonormal basis to an
orthonormal list [because whether or not (a) holds does not depend on the basis
𝑒1 , …, 𝑒𝑛 ].
Section 7D Isometries, Unitary Operators, and Matrix Factorization 259
The equivalence of (a) and (e) in the next result shows that a linear map is an
isometry if and only if the columns of its matrix (with respect to any orthonormal
bases) form an orthonormal list. Here we are identifying the columns of an 𝑚-by-𝑛
matrix with elements of 𝐅𝑚 and then using the Euclidean inner product on 𝐅𝑚 .
See Exercises 1 and 11 for additional conditions that are equivalent to being
an isometry.
260 Chapter 7 Operators on Inner Product Spaces
Unitary Operators
In this subsection, we confine our attention to linear maps from a vector space to
itself. In other words, we will be working with operators.
The next result (7.53) lists several conditions that are equivalent to being a
unitary operator. All the conditions equivalent to being an isometry in 7.49 should
be added to this list. The extra conditions in 7.53 arise because of limiting the
context to linear maps from a vector space to itself. For example, 7.49 shows that
a linear map 𝑆 ∈ ℒ(𝑉, 𝑊) is an isometry if and only if 𝑆∗ 𝑆 = 𝐼, while 7.53 shows
that an operator 𝑆 ∈ ℒ(𝑉) is a unitary operator if and only if 𝑆∗ 𝑆 = 𝑆𝑆∗ = 𝐼.
Another difference is that 7.49(d) mentions an orthonormal list, while 7.53(d)
mentions an orthonormal basis. Also, 7.49(e) mentions the columns of ℳ(𝑇),
while 7.53(e) mentions the rows of ℳ(𝑇). Furthermore, ℳ(𝑇) in 7.49(e) is with
respect to an orthonormal basis of 𝑉 and an orthonormal basis of 𝑊, while ℳ(𝑇)
in 7.53(e) is with respect to a single basis of 𝑉 doing double duty.
Section 7D Isometries, Unitary Operators, and Matrix Factorization 261
𝑆∗ 𝑆 = 𝐼
by the equivalence of (a) and (b) in 7.49. Multiply both sides of this equation by
𝑆−1 on the right, getting 𝑆∗ = 𝑆−1 . Thus 𝑆𝑆∗ = 𝑆𝑆−1 = 𝐼, as desired, proving
that (a) implies (b).
The definitions of invertible and inverse show that (b) implies (c).
Now suppose (c) holds, so 𝑆 is invertible and 𝑆−1 = 𝑆∗. Thus 𝑆∗ 𝑆 = 𝐼. Hence
𝑆𝑒1 , …, 𝑆𝑒𝑛 is an orthonormal list in 𝑉, by the equivalence of (b) and (d) in 7.49.
The length of this list equals dim 𝑉. Thus 𝑆𝑒1 , …, 𝑆𝑒𝑛 is an orthonormal basis of 𝑉,
proving that (c) implies (d).
Now suppose (d) holds, so 𝑆𝑒1 , …, 𝑆𝑒𝑛 is an orthonormal basis of 𝑉. The
equivalence of (a) and (d) in 7.49 shows that 𝑆 is a unitary operator. Thus
∗
(𝑆∗ ) 𝑆∗ = 𝑆𝑆∗ = 𝐼,
where the last equation holds because we have already shown that (a) implies (b) in
this result. The equation above and the equivalence of (a) and (b) in 7.49 show that
𝑆∗ is an isometry. Thus the columns of ℳ(𝑆∗, (𝑒1 , …, 𝑒𝑛 )) form an orthonormal ba-
sis of 𝐅𝑛 [by the equivalence of (a) and (e) of 7.49]. The rows of ℳ(𝑆, (𝑒1 , …, 𝑒𝑛 ))
are the complex conjugates of the columns of ℳ(𝑆∗, (𝑒1 , …, 𝑒𝑛 )). Thus the rows
of ℳ(𝑆, (𝑒1 , …, 𝑒𝑛 )) form an orthonormal basis of 𝐅𝑛 , proving that (d) implies (e).
Now suppose (e) holds. Thus the columns of ℳ(𝑆∗, (𝑒1 , …, 𝑒𝑛 )) form an
orthonormal basis of 𝐅𝑛 . The equivalence of (a) and (e) in 7.49 shows that 𝑆∗ is
an isometry, proving that (e) implies (f).
Now suppose (f) holds, so 𝑆∗ is a unitary operator. The chain of implications
we have already proved in this result shows that (a) implies (f). Applying this
result to 𝑆∗ shows that (𝑆∗ )∗ is a unitary operator, proving that (f) implies (a).
We have shown that (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒ (e) ⇒ (f) ⇒ (a), completing the
proof.
262 Chapter 7 Operators on Inner Product Spaces
Recall our analogy between 𝐂 and ℒ(𝑉). Under this analogy, a complex
number 𝑧 corresponds to an operator 𝑆 ∈ ℒ(𝑉), and 𝑧 corresponds to 𝑆∗. The
real numbers (𝑧 = 𝑧) correspond to the self-adjoint operators (𝑆 = 𝑆∗ ), and the
nonnegative numbers correspond to the (badly named) positive operators.
Another distinguished subset of 𝐂 is the unit circle, which consists of the
complex numbers 𝑧 such that |𝑧| = 1. The condition |𝑧| = 1 is equivalent to the
condition 𝑧𝑧 = 1. Under our analogy, this corresponds to the condition 𝑆∗ 𝑆 = 𝐼,
which is equivalent to 𝑆 being a unitary operator. Hence the analogy shows that
the unit circle in 𝐂 corresponds to the set of unitary operators. In the next two
results, this analogy appears in the eigenvalues of unitary operators. Also see
Exercise 15 for another example of this analogy.
Proof Suppose (a) holds, so 𝑆 is a unitary operator. The equivalence of (a) and
(b) in 7.53 shows that 𝑆 is normal. Thus the complex spectral theorem (7.31)
shows that there is an orthonormal basis 𝑒1 , …, 𝑒𝑛 of 𝑉 consisting of eigenvectors
of 𝑆. Every eigenvalue of 𝑆 has absolute value 1 (by 7.54), completing the proof
that (a) implies (b).
Now suppose (b) holds. Let 𝑒1 , …, 𝑒𝑛 be an orthonormal basis of 𝑉 consisting
of eigenvectors of 𝑆 whose corresponding eigenvalues 𝜆1 , …, 𝜆𝑛 all have absolute
value 1. Then 𝑆𝑒1 , …, 𝑆𝑒𝑛 is also an orthonormal basis of 𝑉 because
⎧
{0 if 𝑗 ≠ 𝑘,
⟨𝑆𝑒𝑗 , 𝑆𝑒𝑘 ⟩ = ⟨𝜆𝑗 𝑒𝑗 , 𝜆𝑘 𝑒𝑘 ⟩ = 𝜆𝑗 𝜆𝑘 ⟨𝑒𝑗 , 𝑒𝑘 ⟩ = ⎨
{
⎩1 if 𝑗 = 𝑘
for all 𝑗, 𝑘 = 1, …, 𝑛. Thus the equivalence of (a) and (d) in 7.53 shows that 𝑆 is
unitary, proving that (b) implies (a).
Section 7D Isometries, Unitary Operators, and Matrix Factorization 263
QR Factorization
In this subsection, we shift our attention from operators to matrices. This switch
should give you good practice in identifying an operator with a square matrix
(after picking a basis of the vector space on which the operator is defined). You
should also become more comfortable with translating concepts and results back
and forth between the context of operators and the context of square matrices.
When starting with 𝑛-by-𝑛 matrices instead of operators, unless otherwise
specified assume that the associated operators live on 𝐅𝑛 (with the Euclidean inner
product) and that their matrices are computed with respect to the standard basis
of 𝐅𝑛 .
We begin by making the following definition, transferring the notion of a
unitary operator to a unitary matrix.
The QR factorization stated and proved below is the main tool in the widely
used QR algorithm (not discussed here) for finding good approximations to
eigenvalues and eigenvectors of square matrices. In the result below, if the matrix
𝐴 is in 𝐅𝑛, 𝑛, then the matrices 𝑄 and 𝑅 are also in 𝐅𝑛, 𝑛.
7.58 QR factorization
𝐴 = 𝑄𝑅.
where 𝑅𝑗, 𝑘 denotes the entry in row 𝑗, column 𝑘 of 𝑅. If 𝑗 > 𝑘, then 𝑒𝑗 is orthogonal
to span(𝑒1 , …, 𝑒𝑘 ) and hence 𝑒𝑗 is orthogonal to 𝑣𝑘 (by 7.59). In other words, if
𝑗 > 𝑘 then ⟨𝑣𝑘 , 𝑒𝑗 ⟩ = 0. Thus 𝑅 is an upper-triangular matrix.
Let 𝑄 be the unitary matrix whose columns are 𝑒1 , …, 𝑒𝑛 . If 𝑘 ∈ {1, …, 𝑛},
then the 𝑘 th column of 𝑄𝑅 equals a linear combination of the columns of 𝑄, with
the coefficients for the linear combination coming from the 𝑘 th column of 𝑅—see
3.51(a). Hence the 𝑘 th column of 𝑄𝑅 equals
⟨𝑣𝑘 , 𝑒1 ⟩𝑒1 + ⋯ + ⟨𝑣𝑘 , 𝑒𝑘 ⟩𝑒𝑘 ,
The proof of the QR factorization shows that the columns of the unitary matrix
can be computed by applying the Gram–Schmidt procedure to the columns of the
matrix to be factored. The next example illustrates the computation of the QR
factorization based on the proof that we just completed.
The QR factorization will be the major tool used in the proof of the Cholesky
factorization (7.63) in the next subsection. For another nice application of the QR
factorization, see the proof of Hadamard’s inequality (9.66).
266 Chapter 7 Operators on Inner Product Spaces
𝑅𝑥 = 𝑄∗ 𝑏.
Cholesky Factorization
We begin this subsection with a characterization of positive invertible operators
in terms of inner products.
The next definition transfers the result above to the language of matrices. Here
we are using the usual Euclidean inner product on 𝐅𝑛 and identifying elements of
𝐅𝑛 with 𝑛-by-1 column vectors.
⟨𝐵𝑥, 𝑥⟩ > 0
𝐵 = 𝑅∗ 𝑅.
In the first paragraph of the proof above, we could have chosen 𝐴 to be the
unique positive definite matrix that is a square root of 𝐵 (see 7.39). However,
the proof was presented with the more general choice of 𝐴 because for specific
positive definite matrices 𝐵, it may be easier to find a different choice of 𝐴.
268 Chapter 7 Operators on Inner Product Spaces
Exercises 7D
3 (a) Show that the product of two unitary operators on 𝑉 is a unitary operator.
(b) Show that the inverse of a unitary operator on 𝑉 is a unitary operator.
This exercise shows that the set of unitary operators on 𝑉 is a group, where
the group operation is the usual product of two operators.
7.64 properties of 𝑇 ∗ 𝑇
Proof
(a) We have
∗ ∗
(𝑇 ∗ 𝑇) = 𝑇 ∗ (𝑇 ∗ ) = 𝑇 ∗ 𝑇.
Thus 𝑇 ∗ 𝑇 is self-adjoint.
If 𝑣 ∈ 𝑉, then
⟨(𝑇 ∗ 𝑇)𝑣, 𝑣⟩ = ⟨𝑇 ∗ (𝑇𝑣), 𝑣⟩ = ⟨𝑇𝑣, 𝑇𝑣⟩ = ‖𝑇𝑣‖2 ≥ 0.
Suppose 𝑇 ∈ ℒ(𝑉, 𝑊). The singular values of 𝑇 are the nonnegative square
roots of the eigenvalues of 𝑇 ∗ 𝑇, listed in decreasing order, each included as
many times as the dimension of the corresponding eigenspace of 𝑇 ∗ 𝑇.
Proof The linear map 𝑇 is injective if and only if null 𝑇 = {0}, which happens
if and only if null 𝑇 ∗ 𝑇 = {0} [by 7.64(b)], which happens if and only if 0 is not
an eigenvalue of 𝑇 ∗ 𝑇, which happens if and only if 0 is not a singular value of 𝑇,
completing the proof of (a).
The spectral theorem applied to 𝑇 ∗ 𝑇 shows that dim range 𝑇 ∗ 𝑇 equals the num-
ber of positive eigenvalues of 𝑇 ∗ 𝑇 (counting repetitions). Thus 7.64(c) implies
that dim range 𝑇 equals the number of positive singular values of 𝑇, proving (b).
Use (b) and 2.39 to show that (c) holds.
Proof We have
𝑆 is an isometry ⟺ 𝑆∗ 𝑆 = 𝐼
⟺ all eigenvalues of 𝑆∗ 𝑆 equal 1
⟺ all singular values of 𝑆 equal 1,
where the first equivalence comes from 7.49 and the second equivalence comes
from the spectral theorem (7.29 or 7.31) applied to the self-adjoint operator 𝑆∗ 𝑆.
Section 7E Singular Value Decomposition 273
for every 𝑣 ∈ 𝑉.
Proof Let 𝑠1 , …, 𝑠𝑛 denote the singular values of 𝑇 (thus 𝑛 = dim 𝑉). Because
𝑇 ∗ 𝑇 is a positive operator [see 7.64(a)], the spectral theorem implies that there
exists an orthonormal basis 𝑒1 , …, 𝑒𝑛 of 𝑉 with
7.72 𝑇 ∗ 𝑇𝑒𝑘 = 𝑠𝑘2 𝑒𝑘
for each 𝑘 = 1, …, 𝑛.
For each 𝑘 = 1, …, 𝑚, let
𝑇𝑒𝑘
7.73 𝑓𝑘 = .
𝑠𝑘
If 𝑗, 𝑘 ∈ {1, …, 𝑚}, then
1 1 𝑠𝑘 ⎧
{0 if 𝑗 ≠ 𝑘,
⟨ 𝑓𝑗 , 𝑓𝑘 ⟩ = ⟨𝑇𝑒𝑗 , 𝑇𝑒𝑘 ⟩ = ⟨𝑒𝑗 , 𝑇 ∗ 𝑇𝑒𝑘 ⟩ = ⟨𝑒𝑗 , 𝑒𝑘 ⟩ = ⎨
𝑠𝑗 𝑠𝑘 𝑠𝑗 𝑠𝑘 𝑠𝑗 {
⎩1 if 𝑗 = 𝑘.
Thus 𝑓1 , …, 𝑓𝑚 is an orthonormal list in 𝑊.
If 𝑘 ∈ {1, …, 𝑛} and 𝑘 > 𝑚, then 𝑠𝑘 = 0 and hence 𝑇 ∗ 𝑇𝑒𝑘 = 0 (by 7.72), which
implies that 𝑇𝑒𝑘 = 0 [by 7.64(b)].
Suppose 𝑣 ∈ 𝑉. Then
𝑇𝑣 = 𝑇(⟨𝑣, 𝑒1 ⟩𝑒1 + ⋯ + ⟨𝑣, 𝑒𝑛 ⟩𝑒𝑛 )
= ⟨𝑣, 𝑒1 ⟩𝑇𝑒1 + ⋯ + ⟨𝑣, 𝑒𝑚 ⟩𝑇𝑒𝑚
= 𝑠1 ⟨𝑣, 𝑒1 ⟩ 𝑓1 + ⋯ + 𝑠𝑚 ⟨𝑣, 𝑒𝑚 ⟩ 𝑓𝑚 ,
where the last index in the first line switched from 𝑛 to 𝑚 in the second line
because 𝑇𝑒𝑘 = 0 if 𝑘 > 𝑚 (as noted in the paragraph above) and the third line
follows from 7.73. The equation above is our desired result.
274 Chapter 7 Operators on Inner Product Spaces
The table below compares the spectral theorem (7.29 and 7.31) with the
singular value decomposition (7.70).
spectral theorem singular value decomposition
describes only self-adjoint operators describes arbitrary linear maps from an
(when 𝐅 = 𝐑 ) or normal operators (when inner product space to a possibly different
𝐅 = 𝐂) inner product space
produces a single orthonormal basis produces two orthonormal lists, one for
domain space and one for range space,
that are not necessarily the same even
when range space equals domain space
different proofs depending on whether same proof works regardless of whether
𝐅 = 𝐑 or 𝐅 = 𝐂 𝐅 = 𝐑 or 𝐅 = 𝐂
The singular value decomposition gives us a new way to understand the adjoint
and the inverse of a linear map. Specifically, the next result shows that given a
singular value decomposition of a linear map 𝑇 ∈ ℒ(𝑉, 𝑊), we can obtain the
adjoint of 𝑇 simply by interchanging the roles of the 𝑒’s and the 𝑓 ’s (see 7.77).
Similarly, we can obtain the pseudoinverse 𝑇 † (see 6.68) of 𝑇 by interchanging
the roles of the 𝑒’s and the 𝑓 ’s and replacing each positive singular value 𝑠𝑘 of 𝑇
with 1/𝑠𝑘 (see 7.78).
Section 7E Singular Value Decomposition 275
and
⟨𝑤, 𝑓1 ⟩ ⟨𝑤, 𝑓𝑚 ⟩
7.78 𝑇†𝑤 = 𝑒1 + ⋯ + 𝑒𝑚
𝑠1 𝑠𝑚
for every 𝑤 ∈ 𝑊.
= 𝑠1 ⟨𝑣, 𝑒1 ⟩⟨ 𝑓1 , 𝑤⟩ + ⋯ + 𝑠𝑚 ⟨𝑣, 𝑒𝑚 ⟩⟨ 𝑓𝑚 , 𝑤⟩
= ⟨𝑤, 𝑓1 ⟩ 𝑓1 + ⋯ + ⟨𝑤, 𝑓𝑚 ⟩ 𝑓𝑚
= 𝑃range 𝑇 𝑤,
where the second line holds because 7.76 implies that 𝑇𝑒𝑘 = 𝑠𝑘 𝑓𝑘 if 𝑘 = 1, …, 𝑚,
and the last line above holds because 7.76 implies that 𝑓1 , …, 𝑓𝑚 spans range 𝑇 and
thus is an orthonormal basis of range 𝑇 [and hence 6.57(i) applies]. The equation
above, the observation that 𝑣 ∈ (null 𝑇)⟂ [see Exercise 8(b)], and the definition
of 𝑇 † 𝑤 (see 6.68) show that 𝑣 = 𝑇 † 𝑤, proving 7.78.
276 Chapter 7 Operators on Inner Product Spaces
𝑇𝑣 = 5⟨𝑣, 𝑒1 ⟩ 𝑓1 + √2⟨𝑣, 𝑒2 ⟩ 𝑓2
for all 𝑣 ∈ 𝐅4 .
An orthonormal basis of 𝐸(25, 𝑇 ∗ 𝑇) is the vector (0, 0, 0, 1); an orthonormal
basis of 𝐸(2, 𝑇 ∗ 𝑇) is the vector ( √1 , √1 , 0, 0). Thus, following the proof of 7.70,
2 2
we take
1 1
𝑒1 = (0, 0, 0, 1) and 𝑒2 = ( , , 0, 0)
√2 √2
and
𝑇𝑒1 𝑇𝑒2
𝑓1 = = (−1, 0, 0) and 𝑓2 = = (0, 0, 1).
5 √2
The next result translates the singular value decomposition from the context
of linear maps to the context of matrices. Specifically, the following result gives
a factorization of an arbitrary matrix as the product of three nice matrices. The
proof gives an explicit construction of these three matrices in terms of the singular
value decomposition.
In the next result, the phrase “orthogonal columns” should be interpreted to
mean that the columns are orthogonal with respect to the standard Euclidean inner
product.
Section 7E Singular Value Decomposition 277
Proof Let 𝑇 ∶ 𝐅𝑛 → 𝐅𝑀 be the linear map whose matrix with respect to the
standard bases equals 𝐴. Then dim range 𝑇 = 𝑚 (by 3.78). Let
7.81 𝑇𝑣 = 𝑠1 ⟨𝑣, 𝑒1 ⟩ 𝑓1 + ⋯ + 𝑠𝑚 ⟨𝑣, 𝑒𝑚 ⟩ 𝑓𝑚
Thus 𝐴𝐶 = 𝐵𝐷.
Multiply both sides of this last equation by 𝐶∗ (the conjugate transpose of 𝐶)
on the right to get
𝐴𝐶𝐶∗ = 𝐵𝐷𝐶∗.
Note that the rows of 𝐶∗ are the complex conjugates of 𝑒1 , …, 𝑒𝑚 . Thus if
𝑘 ∈ {1, …, 𝑚}, then the definition of matrix multiplication shows that 𝐶∗ 𝑒𝑘 = 𝑢𝑘 ;
hence 𝐶𝐶∗ 𝑒𝑘 = 𝑒𝑘 . Thus 𝐴𝐶𝐶∗ 𝑣 = 𝐴𝑣 for all 𝑣 ∈ span(𝑒1 , …, 𝑒𝑚 ).
If 𝑣 ∈ (span(𝑒1 , …, 𝑒𝑚 ))⟂ , then 𝐴𝑣 = 0 (as follows from 7.81) and 𝐶∗ 𝑣 = 0
(as follows from the definition of matrix multiplication). Hence 𝐴𝐶𝐶∗ 𝑣 = 𝐴𝑣 for
all 𝑣 ∈ (span(𝑒1 , …, 𝑒𝑚 ))⟂ .
Because 𝐴𝐶𝐶∗ and 𝐴 agree on span(𝑒1 , …, 𝑒𝑚 ) and on (span(𝑒1 , …, 𝑒𝑚 ))⟂ , we
conclude that 𝐴𝐶𝐶∗ = 𝐴. Thus the displayed equation above becomes
𝐴 = 𝐵𝐷𝐶∗,
as desired.
Note that the matrix 𝐴 in the result above has 𝑀𝑛 entries. In comparison, the
matrices 𝐵, 𝐷, and 𝐶 above have a total of
𝑚(𝑀 + 𝑚 + 𝑛)
entries. Thus if 𝑀 and 𝑛 are large numbers and the rank 𝑚 is considerably less
than 𝑀 and 𝑛, then the number of entries that must be stored on a computer to
represent 𝐴 is considerably less than 𝑀𝑛.
278 Chapter 7 Operators on Inner Product Spaces
Exercises 7E
1 Suppose 𝑇 ∈ ℒ(𝑉, 𝑊). Show that 𝑇 = 0 if and only if all singular values of
𝑇 are 0.
for every 𝑣 ∈ 𝑉.
(a) Prove that 𝑓1 , …, 𝑓𝑚 is an orthonormal basis of range 𝑇.
(b) Prove that 𝑒1 , …, 𝑒𝑚 is an orthonormal basis of (null 𝑇)⟂ .
(c) Prove that 𝑠1 , …, 𝑠𝑚 are the positive singular values of 𝑇.
(d) Prove that if 𝑘 ∈ {1, …, 𝑚}, then 𝑒𝑘 is an eigenvector of 𝑇 ∗ 𝑇 with corre-
sponding eigenvalue 𝑠𝑘2.
(e) Prove that
𝑇𝑇 ∗ 𝑤 = 𝑠12 ⟨𝑤, 𝑓1 ⟩ 𝑓1 + ⋯ + 𝑠𝑚2 ⟨𝑤, 𝑓𝑚 ⟩ 𝑓𝑚
for all 𝑤 ∈ 𝑊.
Section 7E Singular Value Decomposition 279
9 Suppose 𝑇 ∈ ℒ(𝑉, 𝑊). Show that 𝑇 and 𝑇 ∗ have the same positive singular
values.
10 Suppose 𝑇 ∈ ℒ(𝑉, 𝑊) has singular values 𝑠1 , …, 𝑠𝑛 . Prove that if 𝑇 is an
invertible linear map, then 𝑇 −1 has singular values
1 1
, …, .
𝑠𝑛 𝑠1
11 Suppose that 𝑇 ∈ ℒ(𝑉, 𝑊) and 𝑣1 , …, 𝑣𝑛 is an orthonormal basis of 𝑉. Let
𝑠1 , …, 𝑠𝑛 denote the singular values of 𝑇.
(a) Prove that ‖𝑇𝑣1 ‖2 + ⋯ + ‖𝑇𝑣𝑛 ‖2 = 𝑠12 + ⋯ + 𝑠𝑛2.
(b) Prove that if 𝑊 = 𝑉 and 𝑇 is a positive operator, then
⟨𝑇𝑣1 , 𝑣1 ⟩ + ⋯ + ⟨𝑇𝑣𝑛 , 𝑣𝑛 ⟩ = 𝑠1 + ⋯ + 𝑠𝑛 .
See the comment after Exercise 5 in Section 7A.
14 Suppose 𝑇 ∈ ℒ(𝑉, 𝑊). Let 𝑠𝑛 denote the smallest singular value of 𝑇. Prove
that 𝑠𝑛 ‖𝑣‖ ≤ ‖𝑇𝑣‖ for every 𝑣 ∈ 𝑉.
15 Suppose 𝑇 ∈ ℒ(𝑉) and 𝑠1 ≥ ⋯ ≥ 𝑠𝑛 are the singular values of 𝑇. Prove
that if 𝜆 is an eigenvalue of 𝑇, then 𝑠1 ≥ |𝜆| ≥ 𝑠𝑛 .
Matrices unfold
Singular values gleam like stars
Order in chaos shines
‖𝑇𝑣‖ ≤ 𝑠1 ‖𝑣‖
for all 𝑣 ∈ 𝑉.
Proof Let 𝑠1 , …, 𝑠𝑚 denote the positive For a lower bound on ‖𝑇𝑣‖, look at
singular values of 𝑇, and let 𝑒1 , …, 𝑒𝑚 be Exercise 14 in Section 7E.
an orthonormal list in 𝑉 and 𝑓1 , …, 𝑓𝑚 be
an orthonormal list in 𝑊 that provide a singular value decomposition of 𝑇. Thus
7.83 𝑇𝑣 = 𝑠1 ⟨𝑣, 𝑒1 ⟩ 𝑓1 + ⋯ + 𝑠𝑚 ⟨𝑣, 𝑒𝑚 ⟩ 𝑓𝑚
for all 𝑣 ∈ 𝑉. Hence if 𝑣 ∈ 𝑉 then
2 2
‖𝑇𝑣‖2 = 𝑠12 ∣⟨𝑣, 𝑒1 ⟩∣ + ⋯ + 𝑠𝑚2 ∣⟨𝑣, 𝑒𝑚 ⟩∣
2 2
≤ 𝑠12 (∣⟨𝑣, 𝑒1 ⟩∣ + ⋯ + ∣⟨𝑣, 𝑒𝑚 ⟩∣ )
≤ 𝑠12 ‖𝑣‖2,
where the last inequality follows from Bessel’s inequality (6.26). Taking square
roots of both sides of the inequality above shows that ‖𝑇𝑣‖ ≤ 𝑠1 ‖𝑣‖, as desired.
Proof
(a) Because ‖𝑇𝑣‖ ≥ 0 for every 𝑣 ∈ 𝑉, the definition of ‖𝑇‖ implies that ‖𝑇‖ ≥ 0.
(b) Suppose ‖𝑇‖ = 0. Thus 𝑇𝑣 = 0 for all 𝑣 ∈ 𝑉 with ‖𝑣‖ ≤ 1. If 𝑢 ∈ 𝑉 with
𝑢 ≠ 0, then 𝑢
𝑇𝑢 = ‖𝑢‖ 𝑇( ) = 0,
‖𝑢‖
where the last equality holds because 𝑢/‖𝑢‖ has norm 1. Because 𝑇𝑢 = 0 for
all 𝑢 ∈ 𝑉, we have 𝑇 = 0.
Conversely, if 𝑇 = 0 then 𝑇𝑣 = 0 for all 𝑣 ∈ 𝑉 and hence ‖𝑇‖ = 0.
(c) Suppose 𝜆 ∈ 𝐅. Then
‖𝜆𝑇‖ = max{‖𝜆𝑇𝑣‖ ∶ 𝑣 ∈ 𝑉 and ‖𝑣‖ ≤ 1}
= |𝜆| max{‖𝑇𝑣‖ ∶ 𝑣 ∈ 𝑉 and ‖𝑣‖ ≤ 1}
= |𝜆| ‖𝑇‖.
(d) Suppose 𝑆 ∈ ℒ(𝑉, 𝑊). The definition of ‖𝑆 + 𝑇‖ implies that there exists
𝑣 ∈ 𝑉 such that ‖𝑣‖ ≤ 1 and ‖𝑆 + 𝑇‖ = ∥(𝑆 + 𝑇)𝑣∥. Now
‖𝑆 + 𝑇‖ = ∥(𝑆 + 𝑇)𝑣∥ = ‖𝑆𝑣 + 𝑇𝑣‖ ≤ ‖𝑆𝑣‖ + ‖𝑇𝑣‖ ≤ ‖𝑆‖ + ‖𝑇‖,
completing the proof of (d).
282 Chapter 7 Operators on Inner Product Spaces
For 𝑆, 𝑇 ∈ ℒ(𝑉, 𝑊), the quantity ‖𝑆 − 𝑇‖ is often called the distance between
𝑆 and 𝑇. Informally, think of the condition that ‖𝑆 − 𝑇‖ is a small number as
meaning that 𝑆 and 𝑇 are close together. For example, Exercise 9 asserts that for
every 𝑇 ∈ ℒ(𝑉), there is an invertible operator as close to 𝑇 as we wish.
Proof
(a) See 7.85.
(b) Let 𝑣 ∈ 𝑉 be such that 0 < ‖𝑣‖ ≤ 1. Let 𝑢 = 𝑣/‖𝑣‖. Then
𝑣 𝑣 ‖𝑇𝑣‖
‖𝑢‖ = ∥ ∥=1 and ‖𝑇𝑢‖ = ∥𝑇( )∥ = ≥ ‖𝑇𝑣‖.
‖𝑣‖ ‖𝑣‖ ‖𝑣‖
Thus when finding the maximum of ‖𝑇𝑣‖ with ‖𝑣‖ ≤ 1, we can restrict
attention to vectors in 𝑉 with norm 1, proving (b).
(c) Suppose 𝑣 ∈ 𝑉 and 𝑣 ≠ 0. Then the definition of ‖𝑇‖ implies that
𝑣
∥𝑇( )∥ ≤ ‖𝑇‖,
‖𝑣‖
which implies that
7.89 ‖𝑇𝑣‖ ≤ ‖𝑇‖ ‖𝑣‖.
Now suppose 𝑐 ≥ 0 and ‖𝑇𝑣‖ ≤ 𝑐‖𝑣‖ for all 𝑣 ∈ 𝑉. This implies that
‖𝑇𝑣‖ ≤ 𝑐
for all 𝑣 ∈ 𝑉 with ‖𝑣‖ ≤ 1. Taking the maximum of the left side of the
inequality above over all 𝑣 ∈ 𝑉 with ‖𝑣‖ ≤ 1 shows that ‖𝑇‖ ≤ 𝑐. Thus ‖𝑇‖ is
the smallest number 𝑐 such that ‖𝑇𝑣‖ ≤ 𝑐‖𝑣‖ for all 𝑣 ∈ 𝑉.
When working with norms of linear maps, you will probably frequently use
the inequality 7.89.
For computing an approximation of the norm of a linear map 𝑇 given the
matrix of 𝑇 with respect to some orthonormal bases, 7.88(a) is likely to be most
useful. The matrix of 𝑇 ∗ 𝑇 is quickly computable from matrix multiplication.
Then a computer can be asked to find an approximation for the largest eigenvalue
of 𝑇 ∗ 𝑇 (excellent numeric algorithms exist for this purpose). Then taking the
square root and using 7.88(a) gives an approximation for the norm of 𝑇 (which
usually cannot be computed exactly).
Section 7F Consequences of Singular Value Decomposition 283
You may want to construct an alternative proof of the result above using
Exercise 9 in Section 7E, which asserts that a linear map and its adjoint have the
same positive singular values.
Furthermore, if
𝑇𝑣 = 𝑠1 ⟨𝑣, 𝑒1 ⟩ 𝑓1 + ⋯ + 𝑠𝑚 ⟨𝑣, 𝑒𝑚 ⟩ 𝑓𝑚
is a singular value decomposition of 𝑇 and 𝑇𝑘 ∈ ℒ(𝑉, 𝑊) is defined by
𝑇𝑘 𝑣 = 𝑠1 ⟨𝑣, 𝑒1 ⟩ 𝑓1 + ⋯ + 𝑠𝑘 ⟨𝑣, 𝑒𝑘 ⟩ 𝑓𝑘
Proof If 𝑣 ∈ 𝑉 then
2 2
∥(𝑇 − 𝑇𝑘 )𝑣∥ = ∥𝑠𝑘 + 1 ⟨𝑣, 𝑒𝑘 + 1 ⟩ 𝑓𝑘 + 1 + ⋯ + 𝑠𝑚 ⟨𝑣, 𝑒𝑚 ⟩ 𝑓𝑚 ∥
2 2
= 𝑠𝑘 + 12 ∣⟨𝑣, 𝑒𝑘 + 1 ⟩∣ + ⋯ + 𝑠𝑚2 ∣⟨𝑣, 𝑒𝑚 ⟩∣
2 2
≤ 𝑠𝑘 + 12 (∣⟨𝑣, 𝑒𝑘 + 1 ⟩∣ + ⋯ + ∣⟨𝑣, 𝑒𝑚 ⟩∣ )
≤ 𝑠𝑘 + 12 ‖𝑣‖2.
Thus ‖𝑇 − 𝑇𝑘 ‖ ≤ 𝑠𝑘 + 1 . The equation (𝑇 − 𝑇𝑘 )𝑒𝑘 + 1 = 𝑠𝑘 + 1 𝑓𝑘 + 1 now shows that
‖𝑇 − 𝑇𝑘 ‖ = 𝑠𝑘 + 1 .
Suppose 𝑆 ∈ ℒ(𝑉, 𝑊) and dim range 𝑆 ≤ 𝑘. Thus 𝑆𝑒1 , …, 𝑆𝑒𝑘 + 1 , which is a
list of length 𝑘 + 1, is linearly dependent. Hence there exist 𝑎1 , …, 𝑎𝑘 + 1 ∈ 𝐅, not
all 0, such that
𝑎1 𝑆𝑒1 + ⋯ + 𝑎𝑘 + 1 𝑆𝑒𝑘 + 1 = 0.
Now 𝑎1 𝑒1 + ⋯ + 𝑎𝑘 + 1 𝑒𝑘 + 1 ≠ 0 because 𝑎1 , …, 𝑎𝑘 + 1 are not all 0. We have
2 2
∥(𝑇 − 𝑆)(𝑎1 𝑒1 + ⋯ + 𝑎𝑘 + 1 𝑒𝑘 + 1 )∥ = ∥𝑇(𝑎1 𝑒1 + ⋯ + 𝑎𝑘 + 1 𝑒𝑘 + 1 )∥
= ‖𝑠1 𝑎1 𝑓1 + ⋯ + 𝑠𝑘 + 1 𝑎𝑘 + 1 𝑓𝑘 + 1 ‖2
= 𝑠12 |𝑎1 |2 + ⋯ + 𝑠𝑘 + 12 |𝑎𝑘 + 1 |2
≥ 𝑠𝑘 + 12 (|𝑎1 |2 + ⋯ + |𝑎𝑘 + 1 |2 )
= 𝑠𝑘 + 12 ‖𝑎1 𝑒1 + ⋯ + 𝑎𝑘 + 1 𝑒𝑘 + 1 ‖2.
Because 𝑎1 𝑒1 + ⋯ + 𝑎𝑘 + 1 𝑒𝑘 + 1 ≠ 0, the inequality above implies that
‖𝑇 − 𝑆‖ ≥ 𝑠𝑘 + 1 .
Thus 𝑆 = 𝑇𝑘 minimizes ‖𝑇 − 𝑆‖ among 𝑆 ∈ ℒ(𝑉, 𝑊) with dim range 𝑆 ≤ 𝑘.
For other examples of the use of the singular value decomposition in best
approximation, see Exercise 22, which finds a subspace of given dimension on
which the restriction of a linear map is as small as possible, and Exercise 27,
which finds a unitary operator that is as close as possible to a given operator.
Section 7F Consequences of Singular Value Decomposition 285
Polar Decomposition
Recall our discussion before 7.54 of the analogy between complex numbers 𝑧
with |𝑧| = 1 and unitary operators. Continuing with this analogy, note that every
complex number 𝑧 except 0 can be written in the form
𝑧
𝑧=( )|𝑧|
|𝑧|
𝑧
=( )√𝑧𝑧,
|𝑧|
where the first factor, namely, 𝑧/|𝑧|, has absolute value 1.
Our analogy leads us to guess that every operator 𝑇 ∈ ℒ(𝑉) can be written as
a unitary operator times √𝑇 ∗ 𝑇. That guess is indeed correct. The corresponding
result is called the polar decomposition, which gives a beautiful description of an
arbitrary operator on 𝑉.
Note that if 𝑇 ∈ ℒ(𝑉), then 𝑇 ∗ 𝑇 is a positive operator [as was shown in
7.64(a)]. Thus the operator √𝑇 ∗ 𝑇 makes sense and is well defined as a positive
operator on 𝑉.
The polar decomposition that we are about to state and prove says that every
operator on 𝑉 is the product of a unitary operator and a positive operator. Thus
we can write an arbitrary operator on 𝑉 as the product of two nice operators,
each of which comes from a class that we can completely describe and that we
understand reasonably well. The unitary operators are described by 7.55 if 𝐅 = 𝐂 ;
the positive operators are described by the real and complex spectral theorems
(7.29 and 7.31).
Specifically, consider the case 𝐅 = 𝐂 , and suppose
𝑇 = 𝑆√𝑇 ∗ 𝑇
The polar decomposition below is valid on both real and complex inner product
spaces and for all operators on those spaces.
Suppose 𝑇 ∈ ℒ(𝑉). Then there exists a unitary operator 𝑆 ∈ ℒ(𝑉) such that
𝑇 = 𝑆√𝑇 ∗ 𝑇.
= ‖𝑣‖2.
Thus 𝑆 is a unitary operator.
Applying 𝑇 ∗ to both sides of 7.94 and then using the formula for 𝑇 ∗ given by
7.77 shows that
𝑇 ∗ 𝑇𝑣 = 𝑠12 ⟨𝑣, 𝑒1 ⟩𝑒1 + ⋯ + 𝑠𝑚2 ⟨𝑣, 𝑒𝑚 ⟩𝑒𝑚
for every 𝑣 ∈ 𝑉. Thus if 𝑣 ∈ 𝑉, then
√𝑇 ∗ 𝑇𝑣 = 𝑠 ⟨𝑣, 𝑒 ⟩𝑒 + ⋯ + 𝑠𝑚 ⟨𝑣, 𝑒𝑚 ⟩𝑒𝑚
1 1 1
because the operator that sends 𝑣 to the right side of the equation above is a
positive operator whose square equals 𝑇 ∗ 𝑇. Now
𝑆√𝑇 ∗ 𝑇𝑣 = 𝑆(𝑠1 ⟨𝑣, 𝑒1 ⟩𝑒1 + ⋯ + 𝑠𝑚 ⟨𝑣, 𝑒𝑚 ⟩𝑒𝑚 )
= 𝑠1 ⟨𝑣, 𝑒1 ⟩ 𝑓1 + ⋯ + 𝑠𝑚 ⟨𝑣, 𝑒𝑚 ⟩ 𝑓𝑚
= 𝑇𝑣,
where the last equation follows from 7.94.
Exercise 27 shows that the unitary operator 𝑆 produced in the proof above is
as close as a unitary operator can be to 𝑇.
Alternative proofs of the polar decomposition directly use the spectral theorem,
avoiding the singular value decomposition. However, the proof above seems
cleaner than those alternative proofs.
Section 7F Consequences of Singular Value Decomposition 287
The ellipsoid notation 𝐸(𝑠1 𝑓1 , …, 𝑠𝑛 𝑓𝑛 ) does not explicitly include the inner
product space 𝑉, even though the definition above depends on 𝑉. However, the in-
ner product space 𝑉 should be clear from the context and also from the requirement
that 𝑓1 , …, 𝑓𝑛 be an orthonormal basis of 𝑉.
The ellipsoid
𝐸(4 𝑓1 , 3 𝑓2 , 2 𝑓3 ) in 𝐑3 ,
where 𝑓1 , 𝑓2 , 𝑓3 is the
standard basis of 𝐑3 .
We now use the previous result to show that invertible operators take all
ellipsoids, not just the ball of radius 1, to ellipsoids.
Proof Because 𝑇 is invertible, the list 𝑇𝑣1 , …, 𝑇𝑣𝑛 is a basis of 𝑉. The linearity
of 𝑇 implies that
𝑇(𝑢 + 𝑎1 𝑣1 + ⋯ + 𝑎𝑛 𝑣𝑛 ) = 𝑇𝑢 + 𝑎1 𝑇𝑣1 + ⋯ + 𝑎𝑛 𝑇𝑣𝑛
for all 𝑎1 , …, 𝑎𝑛 ∈ (0, 1). Thus 𝑇(𝑢 + 𝑃(𝑣1 , …, 𝑣𝑛 )) = 𝑇𝑢 + 𝑃(𝑇𝑣1 , …, 𝑇𝑣𝑛 ).
Note that in the special case of 𝐑2 each box is a rectangle, but the terminology
box can be used in all dimensions.
The box (1, 0) + 𝑃(√2 𝑒1 , √2 𝑒2 ), where The box 𝑃(𝑒1 , 2𝑒2 , 𝑒3 ), where 𝑒1 , 𝑒2 , 𝑒3
1
𝑒1 = ( √ , 1
) and 𝑒2 = (− √ ,
1 1
). is the standard basis of 𝐑3 .
2 √2 2 √2
𝑇𝑣 = 𝑠1 ⟨𝑣, 𝑒1 ⟩ 𝑓1 + ⋯ + 𝑠𝑛 ⟨𝑣, 𝑒𝑛 ⟩ 𝑓𝑛 ,
volume(𝑢 + 𝑃(𝑟1 𝑒1 , …, 𝑟𝑛 𝑒𝑛 )) = 𝑟1 × ⋯ × 𝑟𝑛 .
The definition above agrees with the familiar formulas for the area (which we
are calling the volume) of a rectangle in 𝐑2 and for the volume of a box in 𝐑3 . For
example, the first box in Example 7.106 has two-dimensional volume (or area) 2
because the defining edges of that box have length √2 and √2. The second box
in Example 7.106 has three-dimensional volume 2 because the defining edges of
that box have length 1, 2, and 1.
292 Chapter 7 Operators on Inner Product Spaces
In the example above, 𝑇 maps boxes with respect to the basis 𝑒1 , 𝑒2 to boxes
with respect to the same basis; thus we can see how 𝑇 changes volume. In general,
an operator maps boxes to parallelepipeds that are not boxes. However, if we
choose the right basis (coming from the singular value decomposition!), then
boxes with respect to that basis get mapped to boxes with respect to a possibly
different basis, as shown in 7.107. This observation leads to a natural proof of
the following result.
Section 7F Consequences of Singular Value Decomposition 293
Exercises 7F
9 Suppose 𝑇 ∈ ℒ(𝑉). Prove that for every 𝜖 > 0, there exists an invertible
operator 𝑆 ∈ ℒ(𝑉) such that 0 < ‖𝑇 − 𝑆‖ < 𝜖.
10 Suppose dim 𝑉 > 1 and 𝑇 ∈ ℒ(𝑉) is not invertible. Prove that for every
𝜖 > 0, there exists 𝑆 ∈ ℒ(𝑉) such that 0 < ‖𝑇 − 𝑆‖ < 𝜖 and 𝑆 is not
invertible.
11 Suppose 𝐅 = 𝐂 and 𝑇 ∈ ℒ(𝑉). Prove that for every 𝜖 > 0 there exists a
diagonalizable operator 𝑆 ∈ ℒ(𝑉) such that 0 < ‖𝑇 − 𝑆‖ < 𝜖.
15 Define 𝑇 ∈ ℒ(𝐅3 ) by
𝑇(𝑧1 , 𝑧2 , 𝑧3 ) = (𝑧3 , 2𝑧1 , 3𝑧2 ).
In this chapter we delve deeper into the structure of operators, with most of the
attention on complex vector spaces. Some of the results in this chapter apply to
both real and complex vector spaces; thus we do not make a standing assumption
that 𝐅 = 𝐂 . Also, an inner product does not help with this material, so we return
to the general setting of a finite-dimensional vector space.
Even on a finite-dimensional complex vector space, an operator may not have
enough eigenvectors to form a basis of the vector space. Thus we will consider the
closely related objects called generalized eigenvectors. We will see that for each
operator on a finite-dimensional complex vector space, there is a basis of the vector
space consisting of generalized eigenvectors of the operator. The generalized
eigenspace decomposition then provides a good description of arbitrary operators
on a finite-dimensional complex vector space.
Nilpotent operators, which are operators that when raised to some power
equal 0, have an important role in these investigations. Nilpotent operators provide
a key tool in our proof that every invertible operator on a finite-dimensional
complex vector space has a square root and in our approach to Jordan form.
This chapter concludes by defining the trace and proving its key properties.
• 𝐅 denotes 𝐑 or 𝐂 .
• 𝑉 denotes a finite-dimensional nonzero vector space over 𝐅.
David Iliff CC BY-SA
The Long Room of the Old Library at the University of Dublin, where William Hamilton
(1805–1865) was a student and then a faculty member. Hamilton proved a special case
of what we now call the Cayley–Hamilton theorem in 1853.
© Sheldon Axler 2024
S. Axler, Linear Algebra Done Right, Undergraduate Texts in Mathematics,
297
https://doi.org/10.1007/978-3-031-41026-0_8
298 Chapter 8 Operators on Complex Vector Spaces
The following result states that if two For similar results about decreasing
consecutive terms in the sequence of sub- sequences of ranges, see Exercises 6,
spaces above are equal, then all later 7, and 8.
terms in the sequence are equal.
null 𝑇 𝑚 = null 𝑇 𝑚 + 1.
Then
null 𝑇 𝑚 = null 𝑇 𝑚 + 1 = null 𝑇 𝑚 + 2 = null 𝑇 𝑚 + 3 = ⋯ .
null 𝑇 𝑚 + 𝑘 = null 𝑇 𝑚 + 𝑘 + 1.
𝑇 𝑚 + 1 (𝑇 𝑘 𝑣) = 𝑇 𝑚 + 𝑘 + 1 𝑣 = 0.
Hence
𝑇 𝑘 𝑣 ∈ null 𝑇 𝑚 + 1 = null 𝑇 𝑚.
Thus 𝑇 𝑚 + 𝑘 𝑣 = 𝑇 𝑚 (𝑇 𝑘 𝑣) = 0, which means that 𝑣 ∈ null 𝑇 𝑚 + 𝑘. This implies that
null 𝑇 𝑚 + 𝑘 + 1 ⊆ null 𝑇 𝑚 + 𝑘, completing the proof.
The result above raises the question of whether there exists a nonnegative
integer 𝑚 such that null 𝑇 𝑚 = null 𝑇 𝑚 + 1. The next result shows that this equality
holds at least when 𝑚 equals the dimension of the vector space on which 𝑇
operates.
Section 8A Generalized Eigenvectors and Nilpotent Operators 299
Proof We only need to prove that null 𝑇 dim 𝑉 = null 𝑇 dim 𝑉 + 1 (by 8.2). Suppose
this is not true. Then, by 8.1 and 8.2, we have
where the symbol ⊊ means “contained in but not equal to”. At each of the
strict inclusions in the chain above, the dimension increases by at least 1. Thus
dim null 𝑇 dim 𝑉 + 1 ≥ dim 𝑉 + 1, a contradiction because a subspace of 𝑉 cannot
have a larger dimension than dim 𝑉.
It is not true that 𝑉 = null 𝑇 ⊕ range 𝑇 for every 𝑇 ∈ ℒ(𝑉). However, the
next result can be a useful substitute.
where the first equality above comes from 3.94 and the second equality comes
from the fundamental theorem of linear maps (3.21). The equation above implies
that null 𝑇 𝑛 ⊕ range 𝑇 𝑛 = 𝑉 (see 2.39), as desired.
Generalized Eigenvectors
Some operators do not have enough eigenvectors to lead to good descriptions of
their behavior. Thus in this subsection we introduce the concept of generalized
eigenvectors, which will play a major role in our description of the structure of an
operator.
To understand why we need more than eigenvectors, let’s examine the question
of describing an operator by decomposing its domain into invariant subspaces. Fix
𝑇 ∈ ℒ(𝑉). We seek to describe 𝑇 by finding a “nice” direct sum decomposition
𝑉 = 𝑉1 ⊕ ⋯ ⊕ 𝑉𝑚 ,
where each 𝑉𝑘 is a subspace of 𝑉 invariant under 𝑇. The simplest possible nonzero
invariant subspaces are one-dimensional. A decomposition as above in which
each 𝑉𝑘 is a one-dimensional subspace of 𝑉 invariant under 𝑇 is possible if and
only if 𝑉 has a basis consisting of eigenvectors of 𝑇 (see 5.55). This happens if
and only if 𝑉 has an eigenspace decomposition
8.7 𝑉 = 𝐸(𝜆1 , 𝑇) ⊕ ⋯ ⊕ 𝐸(𝜆𝑚 , 𝑇),
where 𝜆1 , …, 𝜆𝑚 are the distinct eigenvalues of 𝑇 (see 5.55).
The spectral theorem in the previous chapter shows that if 𝑉 is an inner product
space, then a decomposition of the form 8.7 holds for every self-adjoint operator
if 𝐅 = 𝐑 and for every normal operator if 𝐅 = 𝐂 because operators of those types
have enough eigenvectors to form a basis of 𝑉 (see 7.29 and 7.31).
However, a decomposition of the form 8.7 may not hold for more general
operators, even on a complex vector space. An example was given by the operator
in 5.57, which does not have enough eigenvectors for 8.7 to hold. Generalized
eigenvectors and generalized eigenspaces, which we now introduce, will remedy
this situation.
(𝑇 − 𝜆𝐼)𝑘 𝑣 = 0
Proof Let 𝑛 = dim 𝑉. We will use induction on 𝑛. To get started, note that
the desired result holds if 𝑛 = 1 because then every nonzero vector in 𝑉 is an
eigenvector of 𝑇.
Now suppose 𝑛 > 1 and the de- This step is where we use the hypothesis
sired result holds for all smaller values that 𝐅 = 𝐂 , because if 𝐅 = 𝐑 then 𝑇
of dim 𝑉. Let 𝜆 be an eigenvalue of 𝑇. may not have any eigenvalues.
Applying 8.4 to 𝑇 − 𝜆𝐼 shows that
Furthermore, range(𝑇 − 𝜆𝐼)𝑛 is invariant under 𝑇 [by 5.18 with 𝑝(𝑧) = (𝑧 − 𝜆)𝑛 ].
Let 𝑆 ∈ ℒ(range(𝑇 − 𝜆𝐼)𝑛 ) equal 𝑇 restricted to range(𝑇 − 𝜆𝐼)𝑛. Our induction
hypothesis applied to the operator 𝑆 implies that there is a basis of range(𝑇 − 𝜆𝐼)𝑛
consisting of generalized eigenvectors of 𝑆, which of course are generalized
eigenvectors of 𝑇. Adjoining that basis of range(𝑇−𝜆𝐼)𝑛 to a basis of null(𝑇−𝜆𝐼)𝑛
gives a basis of 𝑉 consisting of generalized eigenvectors of 𝑇.
If 𝐅 = 𝐑 and dim 𝑉 > 1, then some operators on 𝑉 have the property that
there exists a basis of 𝑉 consisting of generalized eigenvectors of the operator,
and (unlike what happens when 𝐅 = 𝐂 ) other operators do not have this property.
See Exercise 11 for a necessary and sufficient condition that determines whether
an operator has this property.
302 Chapter 8 Operators on Complex Vector Spaces
where 𝑏0 = 1 and the values of the other binomial coefficients 𝑏𝑘 do not matter.
Apply the operator (𝑇 − 𝛼𝐼)𝑚 − 1 to both sides of the equation above, getting
0 = (𝛼 − 𝜆)𝑛 (𝑇 − 𝛼𝐼)𝑚 − 1 𝑣.
Because (𝑇 − 𝛼𝐼)𝑚 − 1 𝑣 ≠ 0, the equation above implies that 𝛼 = 𝜆, as desired.
Section 8A Generalized Eigenvectors and Nilpotent Operators 303
Proof Suppose the desired result is false. Then there exists a smallest positive
integer 𝑚 such that there exists a linearly dependent list 𝑣1 , …, 𝑣𝑚 of generalized
eigenvectors of 𝑇 corresponding to distinct eigenvalues 𝜆1 , …, 𝜆𝑚 of 𝑇 (note that
𝑚 ≥ 2 because a generalized eigenvector is, by definition, nonzero). Thus there
exist 𝑎1 , …, 𝑎𝑚 ∈ 𝐅, none of which are 0 (because of the minimality of 𝑚), such
that
𝑎1 𝑣1 + ⋯ + 𝑎𝑚 𝑣𝑚 = 0.
Let 𝑛 = dim 𝑉. Apply (𝑇 − 𝜆𝑚 𝐼)𝑛 to both sides of the equation above, getting
8.13 𝑎1 (𝑇 − 𝜆𝑚 𝐼)𝑛 𝑣1 + ⋯ + 𝑎𝑚 − 1 (𝑇 − 𝜆𝑚 𝐼)𝑛 𝑣𝑚 − 1 = 0.
Suppose 𝑘 ∈ {1, …, 𝑚 − 1}. Then
(𝑇 − 𝜆𝑚 𝐼)𝑛 𝑣𝑘 ≠ 0
because otherwise 𝑣𝑘 would be a generalized eigenvector of 𝑇 corresponding to
the distinct eigenvalues 𝜆𝑘 and 𝜆𝑚 , which would contradict 8.11. However,
(𝑇 − 𝜆𝑘 𝐼)𝑛 ((𝑇 − 𝜆𝑚 𝐼)𝑛 𝑣𝑘 ) = (𝑇 − 𝜆𝑚 𝐼)𝑛 ((𝑇 − 𝜆𝑘 𝐼)𝑛 𝑣𝑘 ) = 0.
Thus the last two displayed equations show that (𝑇 − 𝜆𝑚 𝐼)𝑛 𝑣𝑘 is a generalized
eigenvector of 𝑇 corresponding to the eigenvalue 𝜆𝑘 . Hence
(𝑇 − 𝜆𝑚 𝐼)𝑛 𝑣1 , …, (𝑇 − 𝜆𝑚 𝐼)𝑛 𝑣𝑚 − 1
is a linearly dependent list (by 8.13) of 𝑚 − 1 generalized eigenvectors correspond-
ing to distinct eigenvalues, contradicting the minimality of 𝑚. This contradiction
completes the proof.
Nilpotent Operators
𝑇(𝑧1 , 𝑧2 , 𝑧3 , 𝑧4 ) = (0, 0, 𝑧1 , 𝑧2 )
is nilpotent because 𝑇 2 = 0.
(b) The operator on 𝐅3 whose matrix (with respect to the standard basis) is
−3 9 0
⎛
⎜ ⎞
⎟
⎜
⎜ −7 9 6 ⎟
⎟
⎝ 4 0 −6 ⎠
is nilpotent, as can be shown by cubing the matrix above to get the zero matrix.
(c) The operator of differentiation on 𝒫𝑚 (𝐑) is nilpotent because the (𝑚 + 1)th
derivative of every polynomial of degree at most 𝑚 equals 0. Note that on
this space of dimension 𝑚 + 1, we need to raise the nilpotent operator to the
power 𝑚 + 1 to get the 0 operator.
The next result shows that when rais- The Latin word nil means nothing or
ing a nilpotent operator to a power, we zero; the Latin word potens means
never need to use a power higher than the having power. Thus nilpotent literally
dimension of the space. For a slightly means having a power that is zero.
stronger result, see Exercise 18.
Suppose 𝑇 ∈ ℒ(𝑉).
(a) If 𝑇 is nilpotent, then 0 is an eigenvalue of 𝑇 and 𝑇 has no other
eigenvalues.
(b) If 𝐅 = 𝐂 and 0 is the only eigenvalue of 𝑇, then 𝑇 is nilpotent.
Proof
(a) To prove (a), suppose 𝑇 is nilpotent. Hence there is a positive integer 𝑚 such
that 𝑇 𝑚 = 0. This implies that 𝑇 is not injective. Thus 0 is an eigenvalue
of 𝑇.
Section 8A Generalized Eigenvectors and Nilpotent Operators 305
⎛
0 ∗ ⎞
⎜ ⎟
⎜
⎜ ⋱ ⎟,
⎟
⎝ 0 0 ⎠
Proof Suppose (a) holds, so 𝑇 is nilpotent. Thus there exists a positive integer
𝑛 such that 𝑇 𝑛 = 0. Now 5.29 implies that 𝑧𝑛 is a polynomial multiple of the
minimal polynomial of 𝑇. Thus the minimal polynomial of 𝑇 is 𝑧𝑚 for some
positive integer 𝑚, proving that (a) implies (b).
Now suppose (b) holds, so the minimal polynomial of 𝑇 is 𝑧𝑚 for some positive
integer 𝑚. This implies, by 5.27(a), that 0 (which is the only zero of 𝑧𝑚 ) is the
only eigenvalue of 𝑇. This further implies, by 5.44, that there is a basis of 𝑉 with
respect to which the matrix of 𝑇 is upper triangular. This also implies, by 5.41,
that all entries on the diagonal of this matrix are 0, proving that (b) implies (c).
Now suppose (c) holds. Then 5.40 implies that 𝑇 dim 𝑉 = 0. Thus 𝑇 is nilpotent,
proving that (c) implies (a).
306 Chapter 8 Operators on Complex Vector Spaces
Exercises 8A
1 Suppose 𝑇 ∈ ℒ(𝑉). Prove that if dim null 𝑇 4 = 8 and dim null 𝑇 6 = 9, then
dim null 𝑇 𝑚 = 9 for all integers 𝑚 ≥ 5.
2 Suppose 𝑇 ∈ ℒ(𝑉), 𝑚 is a positive integer, 𝑣 ∈ 𝑉, and 𝑇 𝑚 − 1 𝑣 ≠ 0 but
𝑇 𝑚 𝑣 = 0. Prove that 𝑣, 𝑇𝑣, 𝑇 2 𝑣, …, 𝑇 𝑚 − 1 𝑣 is linearly independent.
The result in this exercise is used in the proof of 8.45.
24 For each item in Example 8.15, find a basis of the domain vector space such
that the matrix of the nilpotent operator with respect to that basis has the
upper-triangular form promised by 8.18(c).
25 Suppose that 𝑉 is an inner product space and 𝑇 ∈ ℒ(𝑉) is nilpotent. Show
that there is an orthonormal basis of 𝑉 with respect to which the matrix of 𝑇
has the upper-triangular form promised by 8.18(c).
308 Chapter 8 Operators on Complex Vector Spaces
Proof Suppose 𝑣 ∈ null(𝑇 − 𝜆𝐼)dim 𝑉. The definitions imply 𝑣 ∈ 𝐺(𝜆, 𝑇). Thus
𝐺(𝜆, 𝑇) ⊇ null(𝑇 − 𝜆𝐼)dim 𝑉.
Conversely, suppose 𝑣 ∈ 𝐺(𝜆, 𝑇). Thus there is a positive integer 𝑘 such
that 𝑣 ∈ null(𝑇 − 𝜆𝐼)𝑘. From 8.1 and 8.3 (with 𝑇 − 𝜆𝐼 replacing 𝑇), we get
𝑣 ∈ null(𝑇 − 𝜆𝐼)dim 𝑉. Thus 𝐺(𝜆, 𝑇) ⊆ null(𝑇 − 𝜆𝐼)dim 𝑉, completing the proof.
In Example 8.10, we saw that the eigenvalues of 𝑇 are 0 and 5, and we found
the corresponding sets of generalized eigenvectors. Taking the union of those sets
with {0}, we have
In Example 8.21, the domain space 𝐂3 is the direct sum of the generalized
eigenspaces of the operator 𝑇 in that example. Our next result shows that this
behavior holds in general. Specifically, the following major result shows that if
𝐅 = 𝐂 and 𝑇 ∈ ℒ(𝑉), then 𝑉 is the direct sum of the generalized eigenspaces
of 𝑇, each of which is invariant under 𝑇 and on which 𝑇 is a nilpotent operator
plus a scalar multiple of the identity. Thus the next result achieves our goal of
decomposing 𝑉 into invariant subspaces on which 𝑇 has a known behavior.
As we will see, the proof follows from putting together what we have learned
about generalized eigenspaces and then using our result that for each operator
𝑇 ∈ ℒ(𝑉), there exists a basis of 𝑉 consisting of generalized eigenvectors of 𝑇.
Proof
(a) Suppose 𝑘 ∈ {1, …, 𝑚}. Then 8.20 shows that
𝐺(𝜆𝑘 , 𝑇) = null(𝑇 − 𝜆𝑘 𝐼)dim 𝑉.
Thus 5.18, with 𝑝(𝑧) = (𝑧− 𝜆𝑘 )dim 𝑉, implies that 𝐺(𝜆𝑘 , 𝑇) is invariant under 𝑇,
proving (a).
(b) Suppose 𝑘 ∈ {1, …, 𝑚}. If 𝑣 ∈ 𝐺(𝜆𝑘 , 𝑇), then (𝑇 − 𝜆𝑘 𝐼)dim 𝑉 𝑣 = 0 (by 8.20).
Thus ((𝑇 − 𝜆𝑘 𝐼)|𝐺( 𝜆𝑘, 𝑇) )dim 𝑉 = 0. Hence (𝑇 − 𝜆𝑘 𝐼)|𝐺( 𝜆𝑘, 𝑇) is nilpotent,
proving (b).
(c) To show that 𝐺(𝜆1 , 𝑇) + ⋯ + 𝐺(𝜆𝑚 , 𝑇) is a direct sum, suppose
𝑣1 + ⋯ + 𝑣𝑚 = 0,
where each 𝑣𝑘 is in 𝐺(𝜆𝑘 , 𝑇). Because generalized eigenvectors of 𝑇 cor-
responding to distinct eigenvalues are linearly independent (by 8.12), this
implies that each 𝑣𝑘 equals 0. Thus 𝐺(𝜆1 , 𝑇) + ⋯ + 𝐺(𝜆𝑚 , 𝑇) is a direct sum
(by 1.45).
Finally, each vector in 𝑉 can be written as a finite sum of generalized eigen-
vectors of 𝑇 (by 8.9). Thus
𝑉 = 𝐺(𝜆1 , 𝑇) ⊕ ⋯ ⊕ 𝐺(𝜆𝑚 , 𝑇),
proving (c).
Multiplicity of an Eigenvalue
If 𝑉 is a complex vector space and 𝑇 ∈ ℒ(𝑉), then the decomposition of 𝑉 pro-
vided by the generalized eigenspace decomposition (8.22) can be a powerful tool.
The dimensions of the subspaces involved in this decomposition are sufficiently
important to get a name, which is given in the next definition.
The second bullet point above holds because 𝐺(𝜆, 𝑇) = null(𝑇 − 𝜆𝐼)dim 𝑉 (see
8.20).
Thus the eigenvalue 6 has multiplicity 2 In this example, the multiplicity of each
and the eigenvalue 7 has multiplicity 1. eigenvalue equals the number of times
The direct sum 𝐂3 = 𝐺(6, 𝑇) ⊕ 𝐺(7, 𝑇) that eigenvalue appears on the diago-
is the generalized eigenspace decom- nal of an upper-triangular matrix rep-
position promised by 8.22. A basis resenting the operator. This behavior
of 𝐂3 consisting of generalized eigen- always happens, as we will see in 8.31.
vectors of 𝑇, as promised by 8.9, is
(1, 0, 0), (0, 1, 0), (10, 2, 1). There does not exist a basis of 𝐂3 consisting of eigen-
vectors of this operator.
Proof The desired result follows from the generalized eigenspace decomposition
(8.22) and the formula for the dimension of a direct sum (see 3.94).
The terms algebraic multiplicity and geometric multiplicity are used in some
books. In case you encounter this terminology, be aware that the algebraic multi-
plicity is the same as the multiplicity defined here and the geometric multiplicity
is the dimension of the corresponding eigenspace. In other words, if 𝑇 ∈ ℒ(𝑉)
and 𝜆 is an eigenvalue of 𝑇, then
Note that as defined above, the algebraic multiplicity also has a geometric meaning
as the dimension of a certain null space. The definition of multiplicity given here
is cleaner than the traditional definition that involves determinants; 9.62 implies
that these definitions are equivalent.
If 𝑉 is an inner product space, 𝑇 ∈ ℒ(𝑉) is normal, and 𝜆 is an eigenvalue
of 𝑇, then the algebraic multiplicity of 𝜆 equals the geometric multiplicity of 𝜆,
as can be seen from applying Exercise 27 in Section 7A to the normal operator
𝑇 − 𝜆𝐼. As a special case, the singular values of 𝑆 ∈ ℒ(𝑉, 𝑊) (here 𝑉 and 𝑊 are
both finite-dimensional inner product spaces) depend on the multiplicities (either
algebraic or geometric) of the eigenvalues of the self-adjoint operator 𝑆∗ 𝑆.
The next definition associates a monic polynomial with each operator on a
finite-dimensional complex vector space.
Proof Our result about the sum of the multiplicities (8.25) implies (a). The
definition of the characteristic polynomial implies (b).
Most texts define the characteristic polynomial using determinants (the two
definitions are equivalent by 9.62). The approach taken here, which is considerably
simpler, leads to the following nice proof of the Cayley–Hamilton theorem.
Proof Let 𝜆1 , …, 𝜆𝑚 be the distinct eigenvalues of 𝑇, and let 𝑑𝑘 = dim 𝐺(𝜆𝑘 , 𝑇).
For each 𝑘 ∈ {1, …, 𝑚}, we know that (𝑇 − 𝜆𝑘 𝐼)|𝐺( 𝜆𝑘, 𝑇) is nilpotent. Thus we have
(𝑇 − 𝜆𝑘 𝐼)𝑑𝑘 |𝐺( 𝜆𝑘, 𝑇) = 0 Arthur Cayley (1821–1895) published
three mathematics papers before com-
(by 8.16) for each 𝑘 ∈ {1, …, 𝑚}. pleting his undergraduate degree.
The generalized eigenspace decom-
position (8.22) states that every vector in 𝑉 is a sum of vectors in
𝐺(𝜆1 , 𝑇), …, 𝐺(𝜆𝑚 , 𝑇). Thus to prove that 𝑞(𝑇) = 0, we only need to show
that 𝑞(𝑇)|𝐺( 𝜆𝑘, 𝑇) = 0 for each 𝑘.
Fix 𝑘 ∈ {1, …, 𝑚}. We have
𝑞(𝑇) = (𝑇 − 𝜆1 𝐼)𝑑1 ⋯(𝑇 − 𝜆𝑚 𝐼)𝑑𝑚.
The operators on the right side of the equation above all commute, so we can
move the factor (𝑇 − 𝜆𝑘 𝐼)𝑑𝑘 to be the last term in the expression on the right.
Because (𝑇 − 𝜆𝑘 𝐼)𝑑𝑘 |𝐺( 𝜆𝑘, 𝑇) = 0, we have 𝑞(𝑇)|𝐺( 𝜆𝑘, 𝑇) = 0, as desired.
The next result implies that if the minimal polynomial of an operator 𝑇 ∈ ℒ(𝑉)
has degree dim 𝑉 (as happens almost always—see the paragraphs following 5.24),
then the characteristic polynomial of 𝑇 equals the minimal polynomial of 𝑇.
Proof The desired result follows immediately from the Cayley–Hamilton theo-
rem (8.29) and 5.29.
Section 8B Generalized Eigenspace Decomposition 313
Now we can prove that the result suggested by Example 8.24 holds for all
operators on finite-dimensional complex vector spaces.
for each eigenvalue 𝜆 of 𝑇. The sum of the multiplicities 𝑚 𝜆 over all eigenvalues
𝜆 of 𝑇 equals 𝑛, the dimension of 𝑉 (by 8.25). The sum of the numbers 𝑑 𝜆 over
all eigenvalues 𝜆 of 𝑇 also equals 𝑛, because the diagonal of 𝐴 has length 𝑛.
Thus summing both sides of 8.34 over all eigenvalues 𝜆 of 𝑇 produces an
equality. Hence 8.34 must actually be an equality for each eigenvalue 𝜆 of 𝑇.
Thus the multiplicity of 𝜆 as an eigenvalue of 𝑇 equals the number of times that
𝜆 appears on the diagonal of 𝐴, as desired.
≈ 100𝜋 Chapter 8 Operators on Complex Vector Spaces
Proof Each (𝑇 − 𝜆𝑘 𝐼)|𝐺( 𝜆𝑘 , 𝑇) is nilpotent (see 8.22). For each 𝑘, choose a basis
of 𝐺(𝜆𝑘 , 𝑇), which is a vector space of dimension 𝑑𝑘 , such that the matrix of
(𝑇 − 𝜆𝑘 𝐼)|𝐺( 𝜆𝑘, 𝑇) with respect to this basis is as in 8.18(c). Thus with respect to
this basis, the matrix of 𝑇|𝐺( 𝜆𝑘, 𝑇) , which equals (𝑇 − 𝜆𝑘 𝐼)|𝐺( 𝜆𝑘, 𝑇) + 𝜆𝑘 𝐼|𝐺( 𝜆𝑘, 𝑇) ,
looks like the desired form shown above for 𝐴𝑘 .
The generalized eigenspace decomposition (8.22) shows that putting together
the bases of the 𝐺(𝜆𝑘 , 𝑇)’s chosen above gives a basis of 𝑉. The matrix of 𝑇 with
respect to this basis has the desired form.
Exercises 8B
4 Suppose dim 𝑉 ≥ 2 and 𝑇 ∈ ℒ(𝑉) is such that null 𝑇 dim 𝑉 − 2 ≠ null 𝑇 dim 𝑉 − 1.
Prove that 𝑇 has at most two distinct eigenvalues.
5 Suppose 𝑇 ∈ ℒ(𝑉) and 3 and 8 are eigenvalues of 𝑇. Let 𝑛 = dim 𝑉. Prove
that 𝑉 = (null 𝑇 𝑛 − 2 ) ⊕ (range 𝑇 𝑛 − 2 ).
6 Suppose 𝑇 ∈ ℒ(𝑉) and 𝜆 is an eigenvalue of 𝑇. Explain why the exponent
of 𝑧 − 𝜆 in the factorization of the minimal polynomial of 𝑇 is the smallest
positive integer 𝑚 such that (𝑇 − 𝜆𝐼)𝑚 |𝐺( 𝜆, 𝑇) = 0.
7 Suppose 𝑇 ∈ ℒ(𝑉) and 𝜆 is an eigenvalue of 𝑇 with multiplicity 𝑑. Prove
that 𝐺(𝜆, 𝑇) = null(𝑇 − 𝜆𝐼)𝑑.
If 𝑑 < dim 𝑉, then this exercise improves 8.20.
𝑉 = 𝑉1 ⊕ ⋯ ⊕ 𝑉𝑚 .
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0
International License (https://creativecommons.org/licenses/by-nc/4.0), which permits any noncommercial use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to original author and source, provide a link to the Creative Commons license, and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended use
is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
Section 8C Consequences of Generalized Eigenspace Decomposition 319
8.40 √1 + 𝑥 = 1 + 𝑎 𝑥 + 𝑎 𝑥2
1 2 + ⋯.
𝐼 + 𝑎1 𝑇 + 𝑎2 𝑇 2 + ⋯ + 𝑎𝑚 − 1 𝑇 𝑚 − 1.
Having made this guess, we can try to choose 𝑎1 , 𝑎2 , …, 𝑎𝑚 − 1 such that the operator
above has its square equal to 𝐼 + 𝑇. Now
2
(𝐼+𝑎1 𝑇 + 𝑎2 𝑇 2 + 𝑎3 𝑇 3 + ⋯ + 𝑎𝑚 − 1 𝑇 𝑚 − 1 )
= 𝐼 + 2𝑎1 𝑇 + (2𝑎2 + 𝑎12 )𝑇 2 + (2𝑎3 + 2𝑎1 𝑎2 )𝑇 3 + ⋯
+ (2𝑎𝑚 − 1 + terms involving 𝑎1 , …, 𝑎𝑚 − 2 )𝑇 𝑚 − 1.
We want the right side of the equation above to equal 𝐼 + 𝑇. Hence choose 𝑎1
such that 2𝑎1 = 1 (thus 𝑎1 = 1/2). Next, choose 𝑎2 such that 2𝑎2 + 𝑎12 = 0 (thus
𝑎2 = −1/8). Then choose 𝑎3 such that the coefficient of 𝑇 3 on the right side of
the equation above equals 0 (thus 𝑎3 = 1/16). Continue in this fashion for each
𝑘 = 4, …, 𝑚 − 1, at each step solving for 𝑎𝑘 so that the coefficient of 𝑇 𝑘 on the right
side of the equation above equals 0. Actually we do not care about the explicit
formula for the 𝑎𝑘 ’s. We only need to know that some choice of the 𝑎𝑘 ’s gives a
square root of 𝐼 + 𝑇.
320 Chapter 8 Operators on Complex Vector Spaces
The previous lemma is valid on real and complex vector spaces. However, the
result below holds only on complex vector spaces. For example, the operator of
multiplication by −1 on the one-dimensional real vector space 𝐑 has no square
root.
For the proof below, we need to know that every 𝑧 ∈ 𝐂 has
a square root in 𝐂 . To show this, write
𝑣 = 𝑢 1 + ⋯ + 𝑢𝑚 ,
By imitating the techniques in this subsection, you should be able to prove that
if 𝑉 is a complex vector space and 𝑇 ∈ ℒ(𝑉) is invertible, then 𝑇 has a 𝑘 th root
for every positive integer 𝑘.
Section 8C Consequences of Generalized Eigenspace Decomposition 321
Jordan Form
We know that if 𝑉 is a complex vector space, then for every 𝑇 ∈ ℒ(𝑉) there is a
basis of 𝑉 with respect to which 𝑇 has a nice upper-triangular matrix (see 8.37).
In this subsection we will see that we can do even better—there is a basis of 𝑉
with respect to which the matrix of 𝑇 contains 0’s everywhere except possibly on
the diagonal and the line directly above the diagonal.
We begin by looking at two examples of nilpotent operators.
The next example of a nilpotent operator has more complicated behavior than
the example above.
Our next goal is to show that every nilpotent operator 𝑇 ∈ ℒ(𝑉) behaves
similarly to the operator in the previous example. Specifically, there is a finite
collection of vectors 𝑣1 , …, 𝑣𝑛 ∈ 𝑉 such that there is a basis of 𝑉 consisting of
the vectors of the form 𝑇 𝑗 𝑣𝑘 , as 𝑘 varies from 1 to 𝑛 and 𝑗 varies (in reverse order)
from 0 to the largest nonnegative integer 𝑚𝑘 such that 𝑇 𝑚𝑘 𝑣𝑘 ≠ 0. With respect to
this basis, the matrix of 𝑇 looks like the matrix in the previous example. More
specifically, 𝑇 has a block diagonal matrix with respect to this basis, with each
block a square matrix that is 0 everywhere except on the line above the diagonal.
In the next definition, the diagonal of each 𝐴𝑘 is filled with some eigenvalue
𝜆𝑘 of 𝑇, the line directly above the diagonal of 𝐴𝑘 is filled with 1’s, and all other
entries in 𝐴𝑘 are 0 (to understand why each 𝜆𝑘 is an eigenvalue of 𝑇, see 5.41).
The 𝜆𝑘 ’s need not be distinct. Also, 𝐴𝑘 may be a 1-by-1 matrix (𝜆𝑘 ) containing
just an eigenvalue of 𝑇. If each 𝜆𝑘 is 0, then the next definition captures the
behavior described in the paragraph above (recall that if 𝑇 is nilpotent, then 0 is
the only eigenvalue of 𝑇).
Proof We will prove this result by induction on dim 𝑉. To get started, note that
the desired result holds if dim 𝑉 = 1 (because in that case, the only nilpotent
operator is the 0 operator). Now assume that dim 𝑉 > 1 and that the desired result
holds on all vector spaces of smaller dimension.
Section 8C Consequences of Generalized Eigenspace Decomposition 323
Let 𝑚 be the smallest positive integer such that 𝑇 𝑚 = 0. Thus there exists
𝑢 ∈ 𝑉 such that 𝑇 𝑚 − 1 𝑢 ≠ 0. Let
𝑈 = span(𝑢, 𝑇𝑢, …, 𝑇 𝑚 − 1 𝑢).
The list 𝑢, 𝑇𝑢, …, 𝑇 𝑚 − 1 𝑢 is linearly independent (see Exercise 2 in Section 8A).
If 𝑈 = 𝑉, then writing this list in reverse order gives a Jordan basis for 𝑇 and we
are done. Thus we can assume that 𝑈 ≠ 𝑉.
Note that 𝑈 is invariant under 𝑇. By our induction hypothesis, there is a basis
of 𝑈 that is a Jordan basis for 𝑇|𝑈 . The strategy of our proof is that we will find a
subspace 𝑊 of 𝑉 such that 𝑊 is also invariant under 𝑇 and 𝑉 = 𝑈 ⊕ 𝑊. Again
by our induction hypothesis, there will be a basis of 𝑊 that is a Jordan basis for
𝑇|𝑊 . Putting together the Jordan bases for 𝑇|𝑈 and 𝑇|𝑊 , we will have a Jordan
basis for 𝑇.
Let 𝜑 ∈ 𝑉 ′ be such that 𝜑(𝑇 𝑚 − 1 𝑢) ≠ 0. Let
𝑊 = {𝑣 ∈ 𝑉 ∶ 𝜑(𝑇 𝑘 𝑣) = 0 for each 𝑘 = 0, …, 𝑚 − 1}.
where each (𝑇 − 𝜆𝑘 𝐼)|𝐺( 𝜆𝑘, 𝑇) is nilpotent (see 8.22). Thus 8.45 implies that some
basis of each 𝐺(𝜆𝑘 , 𝑇) is a Jordan basis for (𝑇 − 𝜆𝑘 𝐼)|𝐺( 𝜆𝑘, 𝑇) . Put these bases
together to get a basis of 𝑉 that is a Jordan basis for 𝑇.
Exercises 8C
and
𝑇 𝑚1 + 1 𝑣1 = ⋯ = 𝑇 𝑚𝑛 + 1 𝑣𝑛 = 0,
then 𝑇 𝑚1 𝑣1 , …, 𝑇 𝑚𝑛 𝑣𝑛 is a basis of null 𝑇.
This exercise shows that 𝑛 = dim null 𝑇. Thus the positive integer 𝑛 that
appears above depends only on 𝑇 and not on the specific Jordan basis
chosen for 𝑇.
14 Suppose 𝐅 = 𝐂 and 𝑇 ∈ ℒ(𝑉). Prove that there does not exist a direct sum
decomposition of 𝑉 into two nonzero subspaces invariant under 𝑇 if and
only if the minimal polynomial of 𝑇 is of the form (𝑧 − 𝜆)dim 𝑉 for some
𝜆 ∈ 𝐂.
326 Chapter 8 Operators on Complex Vector Spaces
Matrix multiplication is not commutative, but the next result shows that the
order of matrix multiplication does not matter to the trace.
tr(𝐴𝐵) = tr(𝐵𝐴).
Proof Suppose
𝐴1, 1 ⋯ 𝐴1, 𝑛 𝐵1, 1 ⋯ 𝐵1, 𝑚
⎛
⎜ ⎞
⎟ ⎛
⎜ ⎞
⎟
⎜ ⎟ ⎜ ⎟
⎜ ⋮
𝐴=⎜ ⋮ ⎟,
⎟ ⎜ ⋮
𝐵=⎜ ⋮ ⎟.
⎟
⎝ 𝐴𝑚, 1 ⋯ 𝐴 𝑚, 𝑛 ⎠ ⎝ 𝐵𝑛, 1 ⋯ 𝐵 𝑛, 𝑚 ⎠
th
The 𝑗 term on the diagonal of the 𝑚-by-𝑚 matrix 𝐴𝐵 equals ∑𝑛𝑘 = 1 𝐴𝑗, 𝑘 𝐵𝑘, 𝑗 . Thus
𝑚 𝑛
tr(𝐴𝐵) = ∑ ∑ 𝐴𝑗, 𝑘 𝐵𝑘, 𝑗
𝑗 = 1 𝑘=1
𝑛 𝑚
= ∑ ∑ 𝐵𝑘, 𝑗 𝐴𝑗, 𝑘
𝑘=1 𝑗 = 1
𝑛
= ∑ (𝑘 th term on diagonal of the 𝑛-by-𝑛 matrix 𝐵𝐴)
𝑘=1
= tr(𝐵𝐴),
as desired.
Section 8D Trace: A Connection Between Matrices and Operators 327
Proof Let 𝐴 = ℳ(𝑇, (𝑢1 , …, 𝑢𝑛 )) and 𝐵 = ℳ(𝑇, (𝑣1 , …, 𝑣𝑛 )). The change-of-
basis formula tells us that there exists an invertible 𝑛-by-𝑛 matrix 𝐶 such that
𝐴 = 𝐶−1 𝐵𝐶 (see 3.84). Thus
tr 𝐴 = tr((𝐶−1 𝐵)𝐶)
= tr(𝐶(𝐶−1 𝐵))
= tr((𝐶𝐶−1 )𝐵)
= tr 𝐵,
where the second line comes from 8.49.
The trace has a close connection with the characteristic polynomial. Suppose
𝐅 = 𝐂 , 𝑇 ∈ ℒ(𝑉), and 𝜆1 , …, 𝜆𝑛 are the eigenvalues of 𝑇, with each eigenvalue
included as many times as its multiplicity. Then by definition (see 8.26), the
characteristic polynomial of 𝑇 equals
(𝑧 − 𝜆1 )⋯(𝑧 − 𝜆𝑛 ).
Expanding the polynomial above, we can write the characteristic polynomial of 𝑇
in the form
𝑧𝑛 − (𝜆1 + ⋯ + 𝜆𝑛 )𝑧𝑛 − 1 + ⋯ + (−1)𝑛 (𝜆1 ⋯𝜆𝑛 ).
The expression above immediately leads to the next result. Also see 9.65,
which does not require the hypothesis that 𝐅 = 𝐂 .
The next result gives a nice formula for the trace of an operator on an inner
product space.
tr 𝑇 = ⟨𝑇𝑒1 , 𝑒1 ⟩ + ⋯ + ⟨𝑇𝑒𝑛 , 𝑒𝑛 ⟩.
Proof The desired formula follows from the observation that the entry in row 𝑘,
column 𝑘 of ℳ(𝑇, (𝑒1 , …, 𝑒𝑛 )) equals ⟨𝑇𝑒𝑘 , 𝑒𝑘 ⟩ [use 6.30(a) with 𝑣 = 𝑇𝑒𝑘 ].
tr(𝑆𝑇) = tr(𝑇𝑆)
Proof Choose a basis of 𝑉. All matrices of operators in this proof will be with
respect to that basis. Suppose 𝑆, 𝑇 ∈ ℒ(𝑉).
If 𝜆 ∈ 𝐅, then
tr(𝜆𝑇) = tr ℳ(𝜆𝑇) = tr(𝜆ℳ(𝑇)) = 𝜆 tr ℳ(𝑇) = 𝜆 tr 𝑇,
where the first and last equalities come from the definition of the trace of an
operator, the second equality comes from 3.38, and the third equality follows
from the definition of the trace of a square matrix.
Also,
tr(𝑆 + 𝑇) = tr ℳ(𝑆 + 𝑇) = tr(ℳ(𝑆) + ℳ(𝑇)) = tr ℳ(𝑆) + tr ℳ(𝑇) = tr 𝑆 + tr 𝑇,
where the first and last equalities come from the definition of the trace of an
operator, the second equality comes from 3.35, and the third equality follows
from the definition of the trace of a square matrix. The two paragraphs above
show that tr ∶ ℒ(𝑉) → 𝐅 is a linear functional on ℒ(𝑉).
Furthermore,
tr(𝑆𝑇) = tr ℳ(𝑆𝑇) = tr(ℳ(𝑆)ℳ(𝑇)) = tr(ℳ(𝑇)ℳ(𝑆)) = tr ℳ(𝑇𝑆) = tr(𝑇𝑆),
where the second and fourth equalities come from 3.43 and the crucial third
equality comes from 8.49.
The equation tr(𝑆𝑇) = tr(𝑇𝑆) leads The statement of the next result does
to our next result, which does not hold on not involve traces, but the short proof
infinite-dimensional vector spaces (see uses traces. When something like this
Exercise 13). However, additional hy- happens in mathematics, then usually
potheses on 𝑆, 𝑇, and 𝑉 lead to an infinite- a good definition lurks in the back-
dimensional generalization of the result ground.
below, with important applications to
quantum theory.
Exercises 8D
10 Prove that the trace is the only linear functional 𝜏 ∶ ℒ(𝑉) → 𝐅 such that
𝜏(𝑆𝑇) = 𝜏(𝑇𝑆)
for all 𝑆, 𝑇 ∈ ℒ(𝑉) and 𝜏(𝐼) = dim 𝑉.
Hint: Suppose that 𝑣1 , …, 𝑣𝑛 is a basis of 𝑉. For 𝑗, 𝑘 ∈ {1, …, 𝑛}, define
𝑃𝑗, 𝑘 ∈ ℒ(𝑉) by 𝑃𝑗, 𝑘 (𝑎1 𝑣1 + ⋯ + 𝑎𝑛 𝑣𝑛 ) = 𝑎𝑘 𝑣𝑗 . Prove that
⎧
{1 if 𝑗 = 𝑘,
𝜏(𝑃𝑗, 𝑘 ) = ⎨
{0
⎩ if 𝑗 ≠ 𝑘.
Then for 𝑇 ∈ ℒ(𝑉), use the equation 𝑇 = ∑𝑘𝑛 = 1 ∑𝑛𝑗= 1 ℳ(𝑇)𝑗, 𝑘 𝑃𝑗, 𝑘 to
show that 𝜏(𝑇) = tr 𝑇.
11 Suppose 𝑉 and 𝑊 are inner product spaces and 𝑇 ∈ ℒ(𝑉, 𝑊). Prove that if
𝑒1 , …, 𝑒𝑛 is an orthonormal basis of 𝑉 and 𝑓1 , …, 𝑓𝑚 is an orthonormal basis
of 𝑊, then
𝑛 𝑚
tr(𝑇 ∗ 𝑇) = ∑ ∑ |⟨𝑇𝑒𝑘 , 𝑓𝑗 ⟩|2.
𝑘=1 𝑗 = 1
The numbers ⟨𝑇𝑒𝑘 , 𝑓𝑗 ⟩ are the entries of the matrix of 𝑇 with respect to the
orthonormal bases 𝑒1 , …, 𝑒𝑛 and 𝑓1 , …, 𝑓𝑚 . These numbers depend on the
bases, but tr(𝑇 ∗ 𝑇) does not depend on a choice of bases. Thus this exercise
shows that the sum of the squares of the absolute values of the matrix entries
does not depend on which orthonormal bases are used.
• 𝐅 denotes 𝐑 or 𝐂 .
• 𝑉 and 𝑊 denote finite-dimensional nonzero vector spaces over 𝐅.
Matthew Petroff CC BY-SA
The Mathematical Institute at the University of Göttingen. This building opened in 1930,
when Emmy Noether (1882–1935) had already been a research mathematician and
faculty member at the university for 15 years (the first eight years without salary).
Noether was fired by the Nazi government in 1933. By then Noether and her
collaborators had created many of the foundations of modern algebra, including an
abstract algebra viewpoint that contributed to the development of linear algebra.
© Sheldon Axler 2024
S. Axler, Linear Algebra Done Right, Undergraduate Texts in Mathematics,
332
https://doi.org/10.1007/978-3-031-41026-0_9
Section 9A Bilinear Forms and Quadratic Forms 333
For example, if 𝑉 is a real inner prod- Recall that the term linear functional,
uct space, then the function that takes an used in the definition above, means
ordered pair (𝑢, 𝑣) ∈ 𝑉 × 𝑉 to ⟨𝑢, 𝑣⟩ is a linear function that maps into the
a bilinear form on 𝑉. If 𝑉 is a nonzero scalar field 𝐅. Thus the term bilinear
complex inner product space, then this functional would be more consistent
function is not a bilinear form because terminology than bilinear form, which
the inner product is not linear in the sec- unfortunately has become standard.
ond slot (complex scalars come out of the
second slot as their complex conjugates).
If 𝐅 = 𝐑 , then a bilinear form differs from an inner product in that an inner
product requires symmetry [meaning that 𝛽(𝑣, 𝑤) = 𝛽(𝑤, 𝑣) for all 𝑣, 𝑤 ∈ 𝑉]
and positive definiteness [meaning that 𝛽(𝑣, 𝑣) > 0 for all 𝑣 ∈ 𝑉\{0}], but these
properties are not required for a bilinear form.
is a bilinear form on 𝐅3 .
• Suppose 𝐴 is an 𝑛-by-𝑛 matrix with 𝐴𝑗, 𝑘 ∈ 𝐅 in row 𝑗, column 𝑘. Define a
bilinear form 𝛽𝐴 on 𝐅𝑛 by
𝑛 𝑛
𝛽𝐴 ((𝑥1 , …, 𝑥𝑛 ), (𝑦1 , …, 𝑦𝑛 )) = ∑ ∑ 𝐴𝑗, 𝑘 𝑥𝑗 𝑦𝑘 .
𝑘=1 𝑗 = 1
The first bullet point is a special case of this bullet point with 𝑛 = 3 and
0 1 0
⎛ ⎞
𝐴=⎜
⎜
⎜ 0 0 −5 ⎟
⎟
⎟.
⎝ 2 0 0 ⎠
334 Chapter 9 Multilinear Algebra and Determinants
• Suppose 𝑉 is a real inner product space and 𝑇 ∈ ℒ(𝑉). Then the function
𝛽 ∶ 𝑉 × 𝑉 → 𝐑 defined by
𝛽(𝑢, 𝑣) = ⟨𝑢, 𝑇𝑣⟩
is a bilinear form on 𝑉.
• If 𝑛 is a positive integer, then the function 𝛽 ∶ 𝒫𝑛 (𝐑) × 𝒫𝑛 (𝐑) → 𝐑 defined by
𝛽(𝑝, 𝑞) = 𝑝(2) ⋅ 𝑞′ (3)
is a bilinear form on 𝒫𝑛 (𝐑).
• Suppose 𝜑, 𝜏 ∈ 𝑉 ′. Then the function 𝛽 ∶ 𝑉 × 𝑉 → 𝐅 defined by
𝛽(𝑢, 𝑣) = 𝜑(𝑢) ⋅ 𝜏(𝑣)
is a bilinear form on 𝑉.
• More generally, suppose that 𝜑1 , …, 𝜑𝑛 , 𝜏1 , …, 𝜏𝑛 ∈ 𝑉 ′. Then the function
𝛽 ∶ 𝑉 × 𝑉 → 𝐅 defined by
𝛽(𝑢, 𝑣) = 𝜑1 (𝑢) ⋅ 𝜏1 (𝑣) + ⋯ + 𝜑𝑛 (𝑢) ⋅ 𝜏𝑛 (𝑣)
is a bilinear form on 𝑉.
A bilinear form on 𝑉 is a function from 𝑉 × 𝑉 to 𝐅. Because 𝑉 × 𝑉 is a vector
space, this raises the question of whether a bilinear form can also be a linear map
from 𝑉× 𝑉 to 𝐅. Note that none of the bilinear forms in 9.2 are linear maps except
in some special cases in which the bilinear form is the zero map. Exercise 3 shows
that a bilinear form 𝛽 on 𝑉 is a linear map on 𝑉 × 𝑉 only if 𝛽 = 0.
ℳ(𝛽)𝑗, 𝑘 = 𝛽(𝑒𝑗 , 𝑒𝑘 ).
If the basis 𝑒1 , …, 𝑒𝑛 is not clear from the context, then the notation
ℳ(𝛽, (𝑒1 , …, 𝑒𝑛 )) is used.
Section 9A Bilinear Forms and Quadratic Forms 335
Recall that 𝐅𝑛, 𝑛 denotes the vector space of 𝑛-by-𝑛 matrices with entries in 𝐅
and that dim 𝐅𝑛, 𝑛 = 𝑛2 (see 3.39 and 3.40).
Proof The map 𝛽 ↦ ℳ(𝛽) is clearly a linear map of 𝑉 (2) into 𝐅𝑛, 𝑛.
For 𝐴 ∈ 𝐅𝑛, 𝑛, define a bilinear form 𝛽𝐴 on 𝑉 by
𝑛 𝑛
𝛽𝐴 (𝑥1 𝑒1 + ⋯ + 𝑥𝑛 𝑒𝑛 , 𝑦1 𝑒1 + ⋯ + 𝑦𝑛 𝑒𝑛 ) = ∑ ∑ 𝐴𝑗, 𝑘 𝑥𝑗 𝑦𝑘
𝑘=1 𝑗 = 1
𝛽𝐴 is the same as the bilinear form 𝛽𝐴 in the second bullet point of Example 9.2).
The linear map 𝛽 ↦ ℳ(𝛽) from 𝑉 (2) to 𝐅𝑛, 𝑛 and the linear map 𝐴 ↦ 𝛽𝐴 from
𝐅 to 𝑉 (2) are inverses of each other because 𝛽ℳ(𝛽) = 𝛽 for all 𝛽 ∈ 𝑉 (2) and
𝑛, 𝑛
Thus ℳ(𝛼) = ℳ(𝛽)ℳ(𝑇). The proof that ℳ(𝜌) = ℳ(𝑇)t ℳ(𝛽) is similar.
336 Chapter 9 Multilinear Algebra and Determinants
The result below shows how the matrix of a bilinear form changes if we change
the basis. The formula in the result below should be compared to the change-
of-basis formula for the matrix of an operator (see 3.84). The two formulas are
similar, except that the transpose 𝐶 t appears in the formula below and the inverse
𝐶−1 appears in the change-of-basis formula for the matrix of an operator.
𝐴 = 𝐶 t 𝐵𝐶.
Proof The linear map lemma (3.4) tells us that there exists an operator 𝑇 ∈ ℒ(𝑉)
such that 𝑇 𝑓𝑘 = 𝑒𝑘 for each 𝑘 = 1, …, 𝑛. The definition of the matrix of an operator
with respect to a basis implies that
ℳ(𝑇, ( 𝑓1 , …, 𝑓𝑛 )) = 𝐶.
𝐴 = ℳ(𝜌, ( 𝑓1 , …, 𝑓𝑛 ))
= 𝐶 t ℳ(𝛼, 𝑓1 , …, 𝑓𝑛 ))
= 𝐶 t 𝐵𝐶,
where the second and third lines each follow from 9.6.
Now the change-of-basis formula 9.7 asserts that 𝐴 = 𝐶 t 𝐵𝐶, which you can verify
with matrix multiplication using the matrices above.
Section 9A Bilinear Forms and Quadratic Forms 337
𝜌(𝑢, 𝑤) = 𝜌(𝑤, 𝑢)
An operator on 𝑉 may have a symmetric matrix with respect to some but not all
bases of 𝑉. In contrast, the next result shows that a bilinear form on 𝑉 has a sym-
metric matrix with respect to either all bases of 𝑉 or with respect to no bases of 𝑉.
= 𝜌(𝑤, 𝑢),
where the third line holds because ℳ(𝜌) is a symmetric matrix. The equation
above shows that 𝜌 is a symmetric bilinear form, proving that (c) implies (a).
At this point, we have proved that (a), (b), (c) are equivalent. Because every
diagonal matrix is symmetric, (d) implies (c). To complete the proof, we will
show that (a) implies (d) by induction on 𝑛 = dim 𝑉.
If 𝑛 = 1, then (a) implies (d) because every 1-by-1 matrix is diagonal. Now
suppose 𝑛 > 1 and the implication (a) ⟹ (d) holds for one less dimension.
Suppose (a) holds, so 𝜌 is a symmetric bilinear form. If 𝜌 = 0, then the matrix of
𝜌 with respect to every basis of 𝑉 is the zero matrix, which is a diagonal matrix.
Hence we can assume that 𝜌 ≠ 0, which means there exist 𝑢, 𝑤 ∈ 𝑉 such that
𝜌(𝑢, 𝑤) ≠ 0. Now
Because the left side of the equation above is nonzero, the three terms on the right
cannot all equal 0. Hence there exists 𝑣 ∈ 𝑉 such that 𝜌(𝑣, 𝑣) ≠ 0.
Let 𝑈 = {𝑢 ∈ 𝑉 ∶ 𝜌(𝑢, 𝑣) = 0}. Thus 𝑈 is the null space of the linear
functional 𝑢 ↦ 𝜌(𝑢, 𝑣) on 𝑉. This linear functional is not the zero linear functional
because 𝑣 ∉ 𝑈. Thus dim 𝑈 = 𝑛 − 1. By our induction hypothesis, there is a
basis 𝑒1 , …, 𝑒𝑛 − 1 of 𝑈 such that the symmetric bilinear form 𝜌|𝑈 × 𝑈 has a diagonal
matrix with respect to this basis.
Because 𝑣 ∉ 𝑈, the list 𝑒1 , …, 𝑒𝑛 − 1 , 𝑣 is a basis of 𝑉. Suppose 𝑘 ∈ {1, …, 𝑛−1}.
Then 𝜌(𝑒𝑘 , 𝑣) = 0 by the construction of 𝑈. Because 𝜌 is symmetric, we also
have 𝜌(𝑣, 𝑒𝑘 ) = 0. Thus the matrix of 𝜌 with respect to 𝑒1 , …, 𝑒𝑛 − 1 , 𝑣 is a diagonal
matrix, completing the proof that (a) implies (d).
Section 9A Bilinear Forms and Quadratic Forms 339
The previous result states that every symmetric bilinear form has a diagonal
matrix with respect to some basis. If our vector space happens to be a real inner
product space, then the next result shows that every symmetric bilinear form has
a diagonal matrix with respect to some orthonormal basis. Note that the inner
product here is unrelated to the bilinear form.
𝛼(𝑣, 𝑣) = 0
The next result shows that a bilinear form is alternating if and only if switching
the order of the two inputs multiplies the output by −1.
𝛼(𝑢, 𝑤) = −𝛼(𝑤, 𝑢)
for all 𝑢, 𝑤 ∈ 𝑉.
Now we show that the vector space of bilinear forms on 𝑉 is the direct sum of
the symmetric bilinear forms on 𝑉 and the alternating bilinear forms on 𝑉.
9.17
(2) (2)
𝑉 (2) = 𝑉sym ⊕ 𝑉alt
Proof The definition of symmetric bilinear form implies that the sum of any two
symmetric bilinear forms on 𝑉 is a bilinear form on 𝑉, and any scalar multiple of
any bilinear form on 𝑉 is a bilinear form on 𝑉. Thus 𝑉sym (2)
is a subspace of 𝑉 (2).
Similarly, the verification that 𝑉alt is a subspace of 𝑉 is straightforward.
(2) (2)
Quadratic Forms
The quadratic form in the example above is typical of quadratic forms on 𝐅𝑛,
as shown in the next result.
Proof First suppose 𝑞 is a quadratic form on 𝐅𝑛. Thus there exists a bilinear form
𝛽 on 𝐅𝑛 such that 𝑞 = 𝑞𝛽 . Let 𝐴 be the matrix of 𝛽 with respect to the standard
basis of 𝐅𝑛. Then for all (𝑥1 , …, 𝑥𝑛 ) ∈ 𝐅𝑛, we have the desired equation
𝑛 𝑛
𝑞(𝑥1 , …, 𝑥𝑛 ) = 𝛽((𝑥1 , …, 𝑥𝑛 ), (𝑥1 , …, 𝑥𝑛 )) = ∑ ∑ 𝐴𝑗, 𝑘 𝑥𝑗 𝑥𝑘 .
𝑘=1 𝑗 = 1
Conversely, suppose there exist numbers 𝐴𝑗, 𝑘 ∈ 𝐅 for 𝑗, 𝑘 ∈ {1, …, 𝑛} such that
𝑛 𝑘
𝑞(𝑥1 , …, 𝑥𝑛 ) = ∑ ∑ 𝐴𝑗, 𝑘 𝑥𝑗 𝑥𝑘
𝑘=1 𝑗 = 1
𝑛 𝑘
𝛽((𝑥1 , …, 𝑥𝑛 ), (𝑦1 , …, 𝑦𝑛 )) = ∑ ∑ 𝐴𝑗, 𝑘 𝑥𝑗 𝑦𝑘 .
𝑘=1 𝑗 = 1
Then 𝑞 = 𝑞𝛽 , as desired.
342 Chapter 9 Multilinear Algebra and Determinants
Proof First suppose (a) holds, so 𝑞 is a quadratic form. Hence there exists a
bilinear form 𝛽 such that 𝑞 = 𝑞𝛽 . By 9.17, there exist a symmetric bilinear form 𝜌
on 𝑉 and an alternating bilinear form 𝛼 on 𝑉 such that 𝛽 = 𝜌 + 𝛼. Now
𝑞 = 𝑞𝛽 = 𝑞𝜌 + 𝑞𝛼 = 𝑞𝜌 .
If 𝜌′ ∈ 𝑉sym
(2)
also satisfies 𝑞𝜌′ = 𝑞, then 𝑞𝜌′ − 𝜌 = 0; thus 𝜌′ − 𝜌 ∈ 𝑉sym
(2) (2)
∩ 𝑉alt ,
which implies that 𝜌 = 𝜌 (by 9.17). This completes the proof that (a) implies (b).
′
Now suppose (b) holds, so there exists a symmetric bilinear form 𝜌 on 𝑉 such
that 𝑞 = 𝑞𝜌 . If 𝜆 ∈ 𝐅 and 𝑣 ∈ 𝑉 then
𝑞(𝜆𝑣) = 𝜌(𝜆𝑣, 𝜆𝑣) = 𝜆𝜌(𝑣, 𝜆𝑣) = 𝜆2 𝜌(𝑣, 𝑣) = 𝜆2 𝑞(𝑣),
showing that the first part of (c) holds.
If 𝑢, 𝑤 ∈ 𝑉, then
𝑞(𝑢 + 𝑤) − 𝑞(𝑢) − 𝑞(𝑤) = 𝜌(𝑢 + 𝑤, 𝑢 + 𝑤) − 𝜌(𝑢, 𝑢) − 𝜌(𝑤, 𝑤) = 2𝜌(𝑢, 𝑤).
Thus the function (𝑢, 𝑤) ↦ 𝑞(𝑢 + 𝑤)−𝑞(𝑢)−𝑞(𝑤) equals 2𝜌, which is a symmetric
bilinear form on 𝑉, completing the proof that (b) implies (c).
Clearly (c) implies (d).
Now suppose (d) holds. Let 𝜌 be the symmetric bilinear form on 𝑉 defined by
𝑞(𝑢 + 𝑤) − 𝑞(𝑢) − 𝑞(𝑤)
𝜌(𝑢, 𝑤) = .
2
If 𝑣 ∈ 𝑉, then
𝑞(2𝑣) − 𝑞(𝑣) − 𝑞(𝑣) 4𝑞(𝑣) − 2𝑞(𝑣)
𝜌(𝑣, 𝑣) = = = 𝑞(𝑣).
2 2
Thus 𝑞 = 𝑞𝜌 , completing the proof that (d) implies (a).
Section 9A Bilinear Forms and Quadratic Forms 343
The next result states that for each quadratic form we can choose a basis such
that the quadratic form looks like a weighted sum of squares of the coordinates,
meaning that there are no cross terms of the form 𝑥𝑗 𝑥𝑘 with 𝑗 ≠ 𝑘.
for all 𝑥1 , …, 𝑥𝑛 ∈ 𝐅.
(b) If 𝐅 = 𝐑 and 𝑉 is an inner product space, then the basis in (a) can be
chosen to be an orthonormal basis of 𝑉.
Proof
(a) There exists a symmetric bilinear form 𝜌 on 𝑉 such that 𝑞 = 𝑞𝜌 (by 9.21). Now
there exists a basis 𝑒1 , …, 𝑒𝑛 of 𝑉 such that ℳ(𝜌, (𝑒1 , …, 𝑒𝑛 )) is a diagonal
matrix (by 9.12). Let 𝜆1 , …, 𝜆𝑛 denote the entries on the diagonal of this
matrix. Thus
{ 𝜆 if 𝑗 = 𝑘,
⎧
𝜌(𝑒𝑗 , 𝑒𝑘 ) = ⎨ 𝑗
⎩0 if 𝑗 ≠ 𝑘
{
for all 𝑗, 𝑘 ∈ {1, …, 𝑛}. If 𝑥1 , …, 𝑥𝑛 ∈ 𝐅, then
𝑞(𝑥1 𝑒1 + ⋯ + 𝑥𝑛 𝑒𝑛 ) = 𝜌(𝑥1 𝑒1 + ⋯ + 𝑥𝑛 𝑒𝑛 , 𝑥1 𝑒1 + ⋯ + 𝑥𝑛 𝑒𝑛 )
𝑛 𝑛
= ∑ ∑ 𝑥𝑗 𝑥𝑘 𝜌(𝑒𝑗 , 𝑒𝑘 )
𝑘=1 𝑗 = 1
= 𝜆1 𝑥12 + ⋯ + 𝜆𝑛 𝑥𝑛2,
as desired.
(b) Suppose 𝐅 = 𝐑 and 𝑉 is an inner product space. Then 9.13 tells us that the
basis in (a) can be chosen to be an orthonormal basis of 𝑉.
344 Chapter 9 Multilinear Algebra and Determinants
Exercises 9A
Define 𝜌 ∶ 𝑉 × 𝑉 → 𝐑 by
1
𝜌(𝑝, 𝑞) = ∫ 𝑝𝑞″.
0
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0
International License (https://creativecommons.org/licenses/by-nc/4.0), which permits any noncommercial use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to original author and source, provide a link to the Creative Commons license, and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended use
is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
346 Chapter 9 Multilinear Algebra and Determinants
9.24 definition: 𝑉 𝑚
⏟
𝑉𝑚 = 𝑉 × ⋯ × 𝑉.
𝑚 times
𝑣 ↦ 𝛽(𝑢1 , …, 𝑢𝑘 − 1 , 𝑣, 𝑢𝑘 + 1 , …, 𝑢𝑚 )
Then 𝛽 ∈ 𝑉 (4).
• Define 𝛽 ∶ (ℒ(𝑉)) → 𝐅 by
𝑚
𝛼(𝑣1 , …, 𝑣𝑚 ) = 0.
= 0.
The next result states that if 𝑚 > dim 𝑉, then there are no alternating 𝑚-linear
forms on 𝑉 other than the function on 𝑉 𝑚 that is identically 0.
Use the multilinear properties of 𝛼 to expand the right side of the equation above
(as in the proof of 9.16) to get
𝛼(𝑣2 , 𝑣1 , 𝑣3 , …, 𝑣𝑚 ) = −𝛼(𝑣1 , 𝑣2 , 𝑣3 , …, 𝑣𝑚 ).
Similarly, swapping the vectors in any two slots of 𝛼(𝑣1 , …, 𝑣𝑚 ) changes the
value of 𝛼 by a factor of −1.
More generally, we see that if we do an odd number of swaps, then the value of 𝛼
changes by a factor of −1, and if we do an even number of swaps, then the value
of 𝛼 does not change.
To deal with arbitrary multiple swaps, we need a bit of information about
permutations.
sign(𝑗1 , …, 𝑗𝑚 ) = (−1)𝑁,
Hence the sign of a permutation equals 1 if the natural order has been changed
an even number of times and equals −1 if the natural order has been changed an
odd number of times.
Our use of permutations now leads in a natural way to the following beautiful
formula for alternating 𝑛-linear forms on an 𝑛-dimensional vector space.
Then
𝑛 𝑛
= ∑ ⋯ ∑ 𝑏𝑗1, 1 ⋯𝑏𝑗𝑛, 𝑛 𝛼(𝑒𝑗1 , …, 𝑒𝑗𝑛 )
𝑗1 = 1 𝑗𝑛 = 1
where the third line holds because 𝛼(𝑒𝑗1 , …, 𝑒𝑗𝑛 ) = 0 if 𝑗1 , …, 𝑗𝑛 are not distinct
integers, and the last line holds by 9.35.
Section 9B Alternating Multilinear Forms 351
The following result will be the key to our definition of the determinant in the
next section.
(dim 𝑉)
9.37 dim 𝑉alt =1
(dim 𝑉)
The vector space 𝑉alt has dimension one.
Proof Let 𝑛 = dim 𝑉. Suppose 𝛼 and 𝛼′ are alternating 𝑛-linear forms on 𝑉 with
𝛼 ≠ 0. Let 𝑒1 , …, 𝑒𝑛 be such that 𝛼(𝑒1 , …, 𝑒𝑛 ) ≠ 0. There exists 𝑐 ∈ 𝐅 such that
𝛼′ (𝑒1 , …, 𝑒𝑛 ) = 𝑐𝛼(𝑒1 , …, 𝑒𝑛 ).
Furthermore, 9.28 implies that 𝑒1 , …, 𝑒𝑛 is linearly independent and thus is a basis
of 𝑉.
Suppose 𝑣1 , …, 𝑣𝑛 ∈ 𝑉. Let 𝑏𝑗, 𝑘 be as in 9.36 for 𝑗, 𝑘 = 1, …, 𝑛. Then
𝛼′ (𝑣1 , …, 𝑣𝑛 ) = 𝛼′ (𝑒1 , …, 𝑒𝑛 ) ∑ (sign(𝑗1 , …, 𝑗𝑛 ))𝑏𝑗1, 1 ⋯𝑏𝑗𝑛, 𝑛
(𝑗1 , …, 𝑗𝑛 ) ∈ perm 𝑛
= 𝑐𝛼(𝑣1 , …, 𝑣𝑛 ),
where the first and last lines above come from 9.36. The equation above implies
that 𝛼′ = 𝑐𝛼. Thus 𝛼′, 𝛼 is not a linearly independent list, which implies that
dim 𝑉alt
(𝑛)
≤ 1.
To complete the proof, we only need to show that there exists a nonzero
alternating 𝑛-linear form 𝛼 on 𝑉 (thus eliminating the possibility that dim 𝑉alt (𝑛)
Earlier we showed that the value of The formula 9.38 used in the last proof
an alternating multilinear form applied to construct a nonzero alternating 𝑛-
to a linearly dependent list is 0; see 9.28. linear form came from the formula in
The next result provides a converse of 9.36, and that formula arose naturally
9.28 for 𝑛-linear multilinear forms when from the properties of an alternating
𝑛 = dim 𝑉. In the following result, the multilinear form.
statement that 𝛼 is nonzero means (as
usual for a function) that 𝛼 is not the function on 𝑉 𝑛 that is identically 0.
𝛼(𝑒1 , …, 𝑒𝑛 ) ≠ 0
Exercises 9B
{(𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 ) ∈ 𝑉 4 ∶ 𝛼(𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 ) = 0}
is a subspace of 𝑉 4.
Section 9B Alternating Multilinear Forms 353
9C Determinants
Defining the Determinant
The next definition will lead us to a clean, beautiful, basis-free definition of the
determinant of an operator.
9.40 definition: 𝛼𝑇
𝛼𝑇 = (det 𝑇) 𝛼
(dim 𝑉)
for all 𝛼 ∈ 𝑉alt .
Our next task is to define and give a formula for the determinant of a square
matrix. To do this, we associate with each square matrix an operator and then
define the determinant of the matrix to be the determinant of the associated
operator.
Proof Apply 9.36 with 𝑉 = 𝐅𝑛 and 𝑒1 , …, 𝑒𝑛 the standard basis of 𝐅𝑛 and 𝛼 the
alternating 𝑛-linear form on 𝐅𝑛 that takes 𝑣1 , …, 𝑣𝑛 to det ( 𝑣1 ⋯ 𝑣𝑛 ) [see
9.45]. If each 𝑣𝑘 is the 𝑘 th column of 𝐴, then each 𝑏𝑗, 𝑘 in 9.36 equals 𝐴𝑗, 𝑘 . Finally,
𝛼(𝑒1 , …, 𝑒𝑛 ) = det ( 𝑒1 ⋯ 𝑒𝑛 ) = det 𝐼 = 1.
Thus the formula in 9.36 becomes the formula stated in this result.
The sum in the formula in 9.46 contains 𝑛! terms. Because 𝑛! grows rapidly as
𝑛 increases, the formula in 9.46 is not a viable method to evaluate determinants
even for moderately sized 𝑛. For example, 10! is over three million, and 100! is
approximately 10158 , leading to a sum that the fastest computer cannot evaluate.
We will soon see some results that lead to faster evaluations of determinants than
direct use of the sum in 9.46.
Proof If (𝑗1 , …, 𝑗𝑛 ) ∈ perm 𝑛 with (𝑗1 , …, 𝑗𝑛 ) ≠ (1, …, 𝑛), then 𝑗𝑘 > 𝑘 for some
𝑘 ∈ {1, …, 𝑛}, which implies that 𝐴𝑗𝑘, 𝑘 = 0. Thus the only permutation that
can make a nonzero contribution to the sum in 9.46 is the permutation (1, …, 𝑛).
Because 𝐴𝑘, 𝑘 = 𝜆𝑘 for each 𝑘 = 1, …, 𝑛, this implies that det 𝐴 = 𝜆1 ⋯𝜆𝑛 .
Section 9C Determinants 357
Properties of Determinants
Our definition of the determinant leads to the following magical proof that the
determinant is multiplicative.
Proof
(a) Let 𝑛 = dim 𝑉. Suppose 𝛼 ∈ 𝑉alt
(𝑛)
and 𝑣1 , …, 𝑣𝑛 ∈ 𝑉. Then
𝛼𝑆𝑇 (𝑣1 , …, 𝑣𝑛 ) = 𝛼(𝑆𝑇𝑣1 , …, 𝑆𝑇𝑣𝑛 )
= (det 𝑆)𝛼(𝑇𝑣1 , …, 𝑇𝑣𝑛 )
= (det 𝑆)(det 𝑇)𝛼(𝑣1 , …, 𝑣𝑛 ),
where the first equation follows from the definition of 𝛼𝑆𝑇 , the second equation
follows from the definition of det 𝑆, and the third equation follows from the
definition of det 𝑇. The equation above implies that det(𝑆𝑇) = (det 𝑆)(det 𝑇).
(b) Let 𝑆, 𝑇 ∈ ℒ(𝐅𝑛 ) be such that ℳ(𝑆) = 𝐴 and ℳ(𝑇) = 𝐵, where all matrices
of operators in this proof are with respect to the standard basis of 𝐅𝑛. Then
ℳ(𝑆𝑇) = ℳ(𝑆)ℳ(𝑇) = 𝐴𝐵 (see 3.43). Thus
det(𝐴𝐵) = det(𝑆𝑇) = (det 𝑆)(det 𝑇) = (det 𝐴)(det 𝐵),
where the second equality comes from the result in (a).
= 𝛼(𝑇𝑆𝑤1 , …, 𝑇𝑆𝑤𝑛 )
= 𝛼𝑇 (𝑆𝑤1 , …, 𝑆𝑤𝑛 )
= (det 𝑇)𝜏(𝑤1 , …, 𝑤𝑛 ).
The equation above and the definition of the determinant of the operator 𝑆−1 𝑇𝑆
imply that det(𝑆−1 𝑇𝑆) = det 𝑇.
Section 9C Determinants 359
For the special case in which 𝑉 = 𝐅𝑛 and 𝑒1 , …, 𝑒𝑛 is the standard basis of 𝐅𝑛,
the next result is true by the definition of the determinant of a matrix. The left
side of the equation in the next result does not depend on a choice of basis, which
means that the right side is independent of the choice of basis.
where the first line comes from 9.52, the second line comes from the definition of
the determinant of a matrix, and the third line follows from 9.54.
The next result gives a more intuitive way to think about determinants than the
definition or the formula in 9.46. We could make the characterization in the result
below the definition of the determinant of an operator on a finite-dimensional
complex vector space, with the current definition then becoming a consequence
of that definition.
Suppose 𝐅 = 𝐂 and 𝑇 ∈ ℒ(𝑉). Then det 𝑇 equals the product of the eigen-
values of 𝑇, with each eigenvalue included as many times as its multiplicity.
As the next result shows, the determinant interacts nicely with the transpose of
a square matrix, with the dual of an operator, and with the adjoint of an operator
on an inner product space.
360 Chapter 9 Multilinear Algebra and Determinants
det(𝑇 ∗ ) = det 𝑇.
Proof
(a) Let 𝑛 be a positive integer. Define 𝛼 ∶ (𝐅𝑛 )𝑛 → 𝐅 by
t
𝛼(( 𝑣1 ⋯ 𝑣𝑛 )) = det(( 𝑣1 ⋯ 𝑣𝑛 ) )
for all 𝑣1 , …, 𝑣𝑛 ∈ 𝐅𝑛. The formula in 9.46 for the determinant of a matrix
shows that 𝛼 is an 𝑛-linear form on 𝐅𝑛.
Suppose 𝑣1 , …, 𝑣𝑛 ∈ 𝐅𝑛 and 𝑣𝑗 = 𝑣𝑘 for some 𝑗 ≠ 𝑘. If 𝐵 is an 𝑛-by-𝑛 matrix,
t
then ( 𝑣1 ⋯ 𝑣𝑛 ) 𝐵 cannot equal the identity matrix because row 𝑗 and
t t
row 𝑘 of ( 𝑣1 ⋯ 𝑣𝑛 ) 𝐵 are equal. Thus ( 𝑣1 ⋯ 𝑣𝑛 ) is not invertible,
which implies that 𝛼(( 𝑣1 ⋯ 𝑣𝑛 )) = 0. Hence 𝛼 is an alternating 𝑛-
linear form on 𝐅𝑛.
Note that 𝛼 applied to the standard basis of 𝐅𝑛 equals 1. Because the vector
space of alternating 𝑛-linear forms on 𝐅𝑛 has dimension one (by 9.37), this
implies that 𝛼 is the determinant function. Thus (a) holds.
(b) The equation det 𝑇 ′ = det 𝑇 follows from (a) and 9.53 and 3.132.
(c) Pick an orthonormal basis of 𝑉. The matrix of 𝑇 ∗ with respect to that basis is
the conjugate transpose of the matrix of 𝑇 with respect to that basis (by 7.9).
Thus 9.53, 9.46, and (a) imply that det(𝑇 ∗ ) = det 𝑇.
(a) If either two columns or two rows of a square matrix are equal, then the
determinant of the matrix equals 0.
(b) Suppose 𝐴 is a square matrix and 𝐵 is the matrix obtained from 𝐴 by
swapping either two columns or two rows. Then det 𝐴 = − det 𝐵.
(c) If one column or one row of a square matrix is multiplied by a scalar, then
the value of the determinant is multiplied by the same scalar.
(d) If a scalar multiple of one column of a square matrix to added to another
column, then the value of the determinant is unchanged.
(e) If a scalar multiple of one row of a square matrix to added to another row,
then the value of the determinant is unchanged.
Section 9C Determinants 361
Proof All the assertions in this result follow from the result that the maps
t
𝑣1 , …, 𝑣𝑛 ↦ det ( 𝑣1 ⋯ 𝑣𝑛 ) and 𝑣1 , …, 𝑣𝑛 ↦ det ( 𝑣1 ⋯ 𝑣𝑛 ) are both
alternating 𝑛-linear forms on 𝐅𝑛 [see 9.45 and 9.56(a)].
For example, to prove (d) suppose 𝑣1 , …, 𝑣𝑛 ∈ 𝐅𝑛 and 𝑐 ∈ 𝐅. Then
det( 𝑣1 + 𝑐𝑣2 𝑣2 ⋯ 𝑣𝑛 )
= det ( 𝑣1 𝑣2 ⋯ 𝑣𝑛 ) + 𝑐 det ( 𝑣2 𝑣2 𝑣3 ⋯ 𝑣𝑛 )
= det ( 𝑣1 𝑣2 ⋯ 𝑣 𝑛 ),
where the first equation follows from the multilinearity property and the second
equation follows from the alternating property. The equation above shows that
adding a multiple of the second column to the first column does not change the
value of the determinant. The same conclusion holds for any two columns. Thus
(d) holds.
The proof of (e) follows from (d) and from 9.56(a). The proofs of (a), (b), and
(c) use similar tools and are left to the reader.
For matrices whose entries are concrete numbers, the result above leads to a
much faster way to evaluate the determinant than direct application of the formula
in 9.46. Specifically, apply the Gaussian elimination procedure of swapping
rows [by 9.48(b), this changes the determinant by a factor of −1], multiplying
a row by a nonzero constant [by 9.48(c), this changes the determinant by the
same constant], and adding a multiple of one row to another row [by 9.48(e), this
does not change the determinant] to produce an upper-triangular matrix, whose
determinant is the product of the diagonal entries (by 9.48). If your software keeps
track of the number of row swaps and of the constants used when multiplying a
row by a constant, then the determinant of the original matrix can be computed.
Because a number 𝜆 ∈ 𝐅 is an eigenvalue of an operator 𝑇 ∈ ℒ(𝑉) if and
only if det(𝜆𝐼 − 𝑇) = 0 (by 9.51), you may be tempted to think that one way
to find eigenvalues quickly is to choose a basis of 𝑉, let 𝐴 = ℳ(𝑇), evaluate
det(𝜆𝐼 − 𝐴), and then solve the equation det(𝜆𝐼 − 𝐴) = 0 for 𝜆. However, that
procedure is rarely efficient, except when dim 𝑉 = 2 (or when dim 𝑉 equals 3 or
4 if you are willing to use the cubic or quartic formulas). One problem is that the
procedure described in the paragraph above for evaluating a determinant does not
work when the matrix includes a symbol (such as the 𝜆 in 𝜆𝐼 − 𝐴). This problem
arises because decisions need to be made in the Gaussian elimination procedure
about whether certain quantities equal 0, and those decisions become complicated
in expressions involving a symbol 𝜆.
Recall that an operator on a finite-dimensional inner product space is unitary
if it preserves norms (see 7.51 and the paragraph following it). Every eigenvalue
of a unitary operator has absolute value 1 (by 7.54). Thus the product of the
eigenvalues of a unitary operator has absolute value 1. Hence (at least in the case
𝐅 = 𝐂 ) the determinant of a unitary operator has absolute value 1 (by 9.55). The
next result gives a proof that works without the assumption that 𝐅 = 𝐂 .
362 Chapter 9 Multilinear Algebra and Determinants
Proof By the spectral theorem (7.29 or 7.31), 𝑉 has an orthonormal basis con-
sisting of eigenvectors of 𝑇. Thus by the last bullet point of 9.42, det 𝑇 equals a
product of the eigenvalues of 𝑇, possibly with repetitions. Each eigenvalue of 𝑇 is
a nonnegative number (by 7.38). Thus we conclude that det 𝑇 ≥ 0.
Suppose 𝑉 is an inner product space and 𝑇 ∈ ℒ(𝑉). Recall that the list of
nonnegative square roots of the eigenvalues of 𝑇 ∗ 𝑇 (each included as many times
as its multiplicity) is called the list of singular values of 𝑇 (see Section 7E).
Proof We have
|det 𝑇|2 = (det 𝑇)(det 𝑇) = (det(𝑇 ∗ ))(det 𝑇) = det(𝑇 ∗ 𝑇),
where the middle equality comes from 9.56(c) and the last equality comes from
9.49(a). Taking square roots of both sides of the equation above shows that
|det 𝑇| = √det(𝑇 ∗ 𝑇).
Let 𝑠1 , …, 𝑠𝑛 denote the list of singular values of 𝑇. Thus 𝑠12, …, 𝑠𝑛2 is the
list of eigenvalues of 𝑇 ∗ 𝑇 (with appropriate repetitions), corresponding to an
orthonormal basis of 𝑉 consisting of eigenvectors of 𝑇 ∗ 𝑇. Hence the last bullet
point of 9.42 implies that
det(𝑇 ∗ 𝑇) = 𝑠12 ⋯𝑠𝑛2.
Thus |det 𝑇| = 𝑠1 ⋯𝑠𝑛 , as desired.
Section 9C Determinants 363
𝑧 ↦ det(𝑧𝐼 − 𝑇)
Proof If 𝐅 = 𝐂 , then the equation 𝑞(𝑇) = 0 follows from 9.62 and 8.29.
Now suppose 𝐅 = 𝐑 . Fix a basis of 𝑉, and let 𝐴 be the matrix of 𝑇 with
respect to this basis. Let 𝑆 be the operator on 𝐂dim 𝑉 such that the matrix of 𝑆
(with respect to the standard basis of 𝐂dim 𝑉 ) is 𝐴. For all 𝑧 ∈ 𝐑 we have
𝑞(𝑧) = det(𝑧𝐼 − 𝑇) = det(𝑧𝐼 − 𝐴) = det(𝑧𝐼 − 𝑆).
Thus 𝑞 is the characteristic polynomial of 𝑆. The case 𝐅 = 𝐂 (first sentence of
this proof) now implies that 0 = 𝑞(𝑆) = 𝑞(𝐴) = 𝑞(𝑇).
Proof The constant term of a polynomial function of 𝑧 is the value of the poly-
nomial when 𝑧 = 0. Thus the constant term of the characteristic polynomial of 𝑇
equals det(−𝑇), which equals (−1)𝑛 det 𝑇 (by the third bullet point of 9.42).
Fix a basis of 𝑉, and let 𝐴 be the matrix of 𝑇 with respect to this basis. The
matrix of 𝑧𝐼 − 𝑇 with respect to this basis is 𝑧𝐼 − 𝐴. The term coming from the
identity permutation {1, …, 𝑛} in the formula 9.46 for det(𝑧𝐼 − 𝐴) is
(𝑧 − 𝐴1, 1 )⋯(𝑧 − 𝐴𝑛, 𝑛 ).
The coefficient of 𝑧𝑛 − 1 in the expression above is −(𝐴1, 1 + ⋯ + 𝐴𝑛, 𝑛 ), which equals
− tr 𝑇. The terms in the formula for det(𝑧𝐼 − 𝐴) coming from other elements of
perm 𝑛 contain at most 𝑛 − 2 factors of the form 𝑧 − 𝐴𝑘, 𝑘 and thus do not contribute
to the coefficient of 𝑧𝑛 − 1 in the characteristic polynomial of 𝑇.
Section 9C Determinants 365
In the result below, think of the The next result was proved by Jacques
columns of the 𝑛-by-𝑛 matrix 𝐴 as ele- Hadamard (1865–1963) in 1893.
ments of 𝐅𝑛. The norms appearing below
then arise from the standard inner product on 𝐅𝑛 . Recall that the notation 𝑅⋅, 𝑘 in
the proof below means the 𝑘 th column of the matrix 𝑅 (as was defined in 3.44).
Proof If 𝐴 is not invertible, then det 𝐴 = 0 and hence the desired inequality
holds in this case.
Thus assume that 𝐴 is invertible. The QR factorization (7.58) tells us that
there exist a unitary matrix 𝑄 and an upper-triangular matrix 𝑅 whose diagonal
contains only positive numbers such that 𝐴 = 𝑄𝑅. We have
|det 𝐴| = |det 𝑄| |det 𝑅|
= |det 𝑅|
𝑛
= ∏ 𝑅 𝑘, 𝑘
𝑘=1
𝑛
≤ ∏ ‖𝑅⋅, 𝑘 ‖
𝑘=1
𝑛
= ∏ ‖𝑄𝑅⋅, 𝑘 ‖
𝑘=1
𝑛
= ∏ ‖𝑣𝑘 ‖,
𝑘=1
where the first line comes from 9.49(b), the second line comes from 9.58, the
third line comes from 9.48, and the fifth line holds because 𝑄 is an isometry.
The matrix in the next result is called the Vandermonde matrix. Vandermonde
matrices have important applications in polynomial interpolation, the discrete
Fourier transform, and other areas of mathematics. The proof of the next result is
a nice illustration of the power of switching between matrices and linear maps.
1 𝛽1 𝛽12 ⋯ 𝛽1𝑛 − 1
⎛
⎜ ⎞
⎟
⎜ ⎟
⎜
⎜
⎜ 1 𝛽2 𝛽22 ⋯ 𝛽2𝑛 − 1 ⎟
⎟
⎟
⎜
⎜ ⎟
⎟
det ⎜
⎜ ⎟
⎟ = ∏ (𝛽𝑘 − 𝛽𝑗 ).
⎜
⎜ ⋱ ⎟
⎟ 1 ≤ 𝑗 <𝑘≤𝑛
⎜
⎜ ⎟
⎟
⎜
⎜ ⎟
⎟
⎝ 1 𝛽𝑛 𝛽𝑛2 ⋯ 𝛽𝑛𝑛 − 1 ⎠
Now det 𝐴 = det 𝐶 = ∏ (𝛽𝑘 − 𝛽𝑗 ), where we have used 9.56(a) and 9.48.
1≤𝑗<𝑘≤𝑛
Section 9C Determinants 367
Exercises 9C
Prove that 𝑝 is a polynomial of degree dim 𝑉 and that the coefficient of 𝑧dim 𝑉
in this polynomial is det 𝑆.
13 Suppose 𝐅 = 𝐂 , 𝑇 ∈ ℒ(𝑉), and 𝑛 = dim 𝑉 > 2. Let 𝜆1 , …, 𝜆𝑛 denote
the eigenvalues of 𝑇, with each eigenvalue included as many times as its
multiplicity.
(a) Find a formula for the coefficient of 𝑧𝑛 − 2 in the characteristic polynomial
of 𝑇 in terms of 𝜆1 , …, 𝜆𝑛 .
(b) Find a formula for the coefficient of 𝑧 in the characteristic polynomial
of 𝑇 in terms of 𝜆1 , …, 𝜆𝑛 .
|det 𝑇| = √det(𝑇 ∗ 𝑇)
20 Suppose 𝐴 is an 𝑛-by-𝑛 matrix, and suppose 𝑐 is such that |𝐴𝑗, 𝑘 | ≤ 𝑐 for all
𝑗, 𝑘 ∈ {1, …, 𝑛}. Prove that
|det 𝐴| ≤ 𝑐𝑛 𝑛𝑛/2.
The formula for the determinant of a matrix (9.46) shows that |det 𝐴| ≤ 𝑐𝑛 𝑛!.
However, the estimate given by this exercise is much better. For example, if
𝑐 = 1 and 𝑛 = 100, then 𝑐𝑛 𝑛! ≈ 10158, but the estimate given by this exercise
is the much smaller number 10100. If 𝑛 is an integer power of 2, then the
inequality above is sharp and cannot be improved.
for all 𝐴, 𝐵 ∈ 𝐂𝑛, 𝑛 and 𝛿(𝐴) equals the product of the diagonal entries of 𝐴
for each diagonal matrix 𝐴 ∈ 𝐂𝑛, 𝑛 . Prove that
𝛿(𝐴) = det 𝐴
I find that in my own elementary lectures, I have, for pedagogical reasons, pushed
determinants more and more into the background. Too often I have had the expe-
rience that, while the students acquired facility with the formulas, which are so
useful in abbreviating long expressions, they often failed to gain familiarity with
their meaning, and skill in manipulation prevented the student from going into all
the details of the subject and so gaining a mastery.
9D Tensor Products
Tensor Product of Two Vector Spaces
The motivation for our next topic comes from wanting to form the product of
a vector 𝑣 ∈ 𝑉 and a vector 𝑤 ∈ 𝑊. This product will be denoted by 𝑣 ⊗ 𝑤,
pronounced “𝑣 tensor 𝑤”, and will be an element of some new vector space called
𝑉 ⊗ 𝑊 (also pronounced “𝑉 tensor 𝑊 ”).
We already have a vector space 𝑉 × 𝑊 (see Section 3E), called the product of
𝑉 and 𝑊. However, 𝑉 × 𝑊 will not serve our purposes here because it does not
provide a natural way to multiply an element of 𝑉 by an element of 𝑊. We would
like our tensor product to satisfy some of the usual properties of multiplication.
For example, we would like the distributive property to be satisfied, meaning that
if 𝑣1 , 𝑣2 , 𝑣 ∈ 𝑉 and 𝑤1 , 𝑤2 , 𝑤 ∈ 𝑊, then
(𝑣1 + 𝑣2 ) ⊗ 𝑤 = 𝑣1 ⊗ 𝑤 + 𝑣2 ⊗ 𝑤 and 𝑣 ⊗ (𝑤1 + 𝑤2 ) = 𝑣 ⊗ 𝑤1 + 𝑣 ⊗ 𝑤2 .
We would also like scalar multiplica- To produce ⊗ in TEX, type \otimes.
tion to interact well with this new multi-
plication, meaning that
𝜆(𝑣 ⊗ 𝑤) = (𝜆𝑣) ⊗ 𝑤 = 𝑣 ⊗ (𝜆𝑤)
for all 𝜆 ∈ 𝐅, 𝑣 ∈ 𝑉, and 𝑤 ∈ 𝑊.
Furthermore, it would be nice if each basis of 𝑉 when combined with each
basis of 𝑊 produced a basis of 𝑉 ⊗ 𝑊. Specifically, if 𝑒1 , …, 𝑒𝑚 is a basis of 𝑉
and 𝑓1 , …, 𝑓𝑛 is a basis of 𝑊, then we would like a list (in any order) consisting
of 𝑒𝑗 ⊗ 𝑓𝑘 , as 𝑗 ranges from 1 to 𝑚 and 𝑘 ranges from 1 to 𝑛, to be a basis of
𝑉⊗ 𝑊. This implies that dim(𝑉⊗ 𝑊) should equal (dim 𝑉)(dim 𝑊). Recall that
dim(𝑉 × 𝑊) = dim 𝑉 + dim 𝑊 (see 3.92), which shows that the product 𝑉 × 𝑊
will not serve our purposes here.
To produce a vector space whose dimension is (dim 𝑉)(dim 𝑊) in a natural
fashion from 𝑉 and 𝑊, we look at the vector space of bilinear functionals, as
defined below.
for 𝑎1 , …, 𝑎𝑚 , 𝑏1 , …, 𝑏𝑛 ∈ 𝐅.
The linear map 𝛽 ↦ ℳ(𝛽) from ℬ(𝑉, 𝑊) to 𝐅𝑚, 𝑛 and the linear map 𝐶 ↦ 𝛽𝐶
from 𝐅𝑚, 𝑛 to ℬ(𝑉, 𝑊) are inverses of each other because 𝛽ℳ(𝛽) = 𝛽 for all
𝛽 ∈ ℬ(𝑉, 𝑊) and ℳ(𝛽𝐶 ) = 𝐶 for all 𝐶 ∈ 𝐅𝑚, 𝑛, as you should verify.
Thus both maps are isomorphisms and the two spaces that they connect have the
same dimension. Hence dim ℬ(𝑉, 𝑊) = dim 𝐅𝑚, 𝑛 = 𝑚𝑛 = (dim 𝑉)(dim 𝑊).
We can quickly prove that the definition of 𝑉⊗𝑊 gives it the desired dimension.
Proof Because a vector space and its dual have the same dimension (by 3.111),
we have dim 𝑉 ′ = dim 𝑉 and dim 𝑊 ′ = dim 𝑊. Thus 9.70 tells us that the
dimension of ℬ(𝑉 ′ , 𝑊 ′ ) equals (dim 𝑉)(dim 𝑊).
and
𝜆(𝑣 ⊗ 𝑤) = (𝜆𝑣) ⊗ 𝑤 = 𝑣 ⊗ (𝜆𝑤).
Lists are, by definition, ordered. The order matters when, for example, we
form the matrix of an operator with respect to a basis. For lists in this section
with two indices, such as {𝑒𝑗 ⊗ 𝑓𝑘 }𝑗 = 1, …, 𝑚; 𝑘 = 1, …, 𝑛 in the next result, the ordering
does not matter and we do not specify it—just choose any convenient ordering.
The linear independence of elements of 𝑉 ⊗ 𝑊 in (a) of the result below
captures the idea that there are no relationships among vectors in 𝑉 ⊗ 𝑊 other
than the relationships that come from bilinearity of the tensor product (see 9.73)
and the relationships that may be present due to linear dependence of a list of
vectors in 𝑉 or a list of vectors in 𝑊.
9.74 basis of 𝑉 ⊗ 𝑊
{𝑒𝑗 ⊗ 𝑓𝑘 }𝑗 = 1, …, 𝑚; 𝑘 = 1, …, 𝑛
𝑣1 𝑤1 ⋯ 𝑣1 𝑤𝑛
⎛
⎜ ⎞
⎟
⎜
⎜ ⎟
⎟
⎜
⎜ ⋱ ⎟
⎟ .
⎜ ⎟
⎝ 𝑣𝑚 𝑤1 ⋯ 𝑣𝑚 𝑤𝑛 ⎠
See Exercises 5 and 6 for practice in using the identification from the example
above.
We now define bilinear maps, which differ from bilinear functionals in that
the target space can be an arbitrary vector space rather than just the scalar field.
𝑇 # (𝑣, 𝑤) = 𝑇(𝑣 ⊗ 𝑤)
𝑛 𝑚
̂ ⊗ 𝑓)
= ∑ ∑ 𝑎𝑗 𝑏𝑘 Γ(𝑒𝑗 𝑘
𝑘=1 𝑗 = 1
𝑛 𝑚
= ∑ ∑ 𝑎𝑗 𝑏𝑘 Γ(𝑒𝑗 , 𝑓𝑘 )
𝑘=1 𝑗 = 1
= Γ(𝑣, 𝑤),
as desired, where the second line holds because Γ̂ is linear, the third line holds by
the definition of Γ̂, and the fourth line holds because Γ is bilinear.
The uniqueness of the linear map Γ̂ satisfying Γ(𝑣 ̂ ⊗ 𝑤) = Γ(𝑣, 𝑤) follows
from 9.74(b), completing the proof of (a).
To prove (b), define a function 𝑇 # ∶ 𝑉× 𝑊 → 𝑈 by 𝑇 # (𝑣, 𝑤) = 𝑇(𝑣 ⊗ 𝑤) for all
(𝑣, 𝑤) ∈ 𝑉 × 𝑊. The bilinearity of the tensor product (see 9.73) and the linearity
of 𝑇 imply that 𝑇 # is bilinear.
Clearly the choice of 𝑇 # that satisfies the conditions is unique.
376 Chapter 9 Multilinear Algebra and Determinants
To prove 9.79(a), we could not just define Γ(𝑣̂ ⊗ 𝑤) = Γ(𝑣, 𝑤) for all 𝑣 ∈ 𝑉
and 𝑤 ∈ 𝑊 (and then extend Γ linearly to all of 𝑉 ⊗ 𝑊) because elements of
̂
𝑉 ⊗ 𝑊 do not have unique representations as finite sums of elements of the form
𝑣 ⊗ 𝑤. Our proof used a basis of 𝑉 and a basis of 𝑊 to get around this problem.
Although our construction of Γ̂ in the proof of 9.79(a) depended on a basis of
̂ ⊗ 𝑤) = Γ(𝑣, 𝑤) that holds for all 𝑣 ∈ 𝑉 and
𝑉 and a basis of 𝑊, the equation Γ(𝑣
𝑤 ∈ 𝑊 shows that Γ̂ does not depend on the choice of bases for 𝑉 and 𝑊.
Suppose 𝑉 and 𝑊 are inner product spaces. Then there is a unique inner
product on 𝑉 ⊗ 𝑊 such that
⟨𝑣 ⊗ 𝑤, 𝑢 ⊗ 𝑥⟩ = ⟨𝑣, 𝑢⟩⟨𝑤, 𝑥⟩
9.82 definition: inner product on tensor product of two inner product spaces
⟨𝑣 ⊗ 𝑤, 𝑢 ⊗ 𝑥⟩ = ⟨𝑣, 𝑢⟩⟨𝑤, 𝑥⟩
Take 𝑢 = 𝑣 and 𝑥 = 𝑤 in the equation above and then take square roots to
show that
‖𝑣 ⊗ 𝑤‖ = ‖𝑣‖ ‖𝑤‖
for all 𝑣 ∈ 𝑉 and all 𝑤 ∈ 𝑊.
The construction of the inner product in the proof of 9.80 depended on an
orthonormal basis 𝑒1 , …, 𝑒𝑚 of 𝑉 and an orthonormal basis 𝑓1 , …, 𝑓𝑛 of 𝑊. Formula
9.81 implies that {𝑒𝑗 ⊗ 𝑓𝑘 }𝑗 = 1, …, 𝑚; 𝑘 = 1, …, 𝑛 is a doubly indexed orthonormal list in
𝑉⊗ 𝑊 and hence is an orthonormal basis of 𝑉⊗ 𝑊 [by 9.74(b)]. The importance
of the next result arises because the orthonormal bases used there can be different
from the orthonormal bases used to define the inner product in 9.80. Although
the notation for the bases is the same in the proof of 9.80 and in the result below,
think of them as two different sets of orthonormal bases.
{𝑒𝑗 ⊗ 𝑓𝑘 }𝑗 = 1, …, 𝑚; 𝑘 = 1, …, 𝑛
is an orthonormal basis of 𝑉 ⊗ 𝑊.
9.84 notation: 𝑉1 , …, 𝑉𝑚
For the rest of this subsection, 𝑚 denotes an integer greater than 1 and
𝑉1 , …, 𝑉𝑚 denote finite-dimensional vector spaces.
Now we can define the tensor product of multiple vector spaces and the tensor
product of elements of those vector spaces. The following definition is completely
analogous to our previous definition (9.71) in the case 𝑚 = 2.
The next result can be proved by following the pattern of the proof of the
analogous result when 𝑚 = 2 (see 9.72).
9.90 basis of 𝑉1 ⊗ ⋯ ⊗ 𝑉𝑚
is a basis of 𝑉1 ⊗ ⋯ ⊗ 𝑉𝑚 .
The next result can be proved by following the pattern of the proof of 9.79.
𝑇 # (𝑣1 , …, 𝑣𝑚 ) = 𝑇(𝑣1 ⊗ ⋯ ⊗ 𝑣𝑚 )
See Exercises 12 and 13 for tensor products of multiple inner product spaces.
Exercises 9D
𝑣1 ⊗ 𝑤1 + 𝑣2 ⊗ 𝑤2 + 𝑣3 ⊗ 𝑤3 = 0
𝑣1 ⊗ 𝑤1 + ⋯ + 𝑣𝑚 ⊗ 𝑤𝑚 = 0.
Prove that 𝑤1 = ⋯ = 𝑤𝑚 = 0.
4 Suppose dim 𝑉 > 1 and dim 𝑊 > 1. Prove that
{𝑣 ⊗ 𝑤 ∶ (𝑣, 𝑤) ∈ 𝑉 × 𝑊}
is not a subspace of 𝑉 ⊗ 𝑊.
This exercise implies that if dim 𝑉 > 1 and dim 𝑊 > 1, then
{𝑣 ⊗ 𝑤 ∶ (𝑣, 𝑤) ∈ 𝑉 × 𝑊} ≠ 𝑉 ⊗ 𝑊.
Section 9D Tensor Products 381
{𝑣1 ⊗ 𝑤1 + 𝑣2 ⊗ 𝑤2 ∶ 𝑣1 , 𝑣2 ∈ 𝑉 and 𝑤1 , 𝑤2 ∈ 𝑊} ≠ 𝑉 ⊗ 𝑊.
𝑣1 ⊗ 𝑤1 + ⋯ + 𝑣𝑚 ⊗ 𝑤𝑚 = 0.
is an orthonormal basis of 𝑉1 ⊗ ⋯ ⊗ 𝑉𝑚 .
Photo Credits
• page v: Photos by Carrie Heeter and Bishnu Sarangi. Public domain image.
• page 1: Original painting by Pierre Louis Dumesnil; 1884 copy by Nils
Forsberg. Public domain image downloaded on 29 March 2022 from
https://commons.wikimedia.org/wiki/File:René_Descartes_i_samtal_med_Sveriges_drottning,_Kristina.jpg.
• This book was typeset in LuaLATEX by the author, who wrote the LATEX code to
implement the book’s design.
• The LATEX software used for this book was written by Leslie Lamport. The TEX
software, which forms the base for LATEX, was written by Donald Knuth.
• The main text font in this book is the Open Type Format version of TEX Gyre
Termes, a font based on Times, which was designed by Stanley Morison and
Victor Lardent for the British newspaper The Times in 1931.
• The main math font in this book is the Open Type Format version of TEX Gyre
Pagella Math, a font based on Palatino, which was designed by Hermann Zapf.
• The sans serif font used for page headings and some other design elements is
the Open Type Format version of TEX Gyre Heros, a font based on Helvetica,
which was designed by Max Miedinger and Eduard Hoffmann.
• The LuaLATEX packages fontspec and unicode-math, both written by Will
Robertson, were used to manage fonts.
• The LATEX package fontsize, written by Ivan Valbusa, was used to gracefully
change the main fonts to 10.5 point size.
• The figures in the book were produced by Mathematica, using Mathematica
code written by the author. Mathematica was created by Stephen Wolfram.
The Mathematica package MaTeX, written by Szabolcs Horvát, was used to
place LATEX-generated labels in the Mathematica figures.
• The LATEX package graphicx, written by David Carlisle and Sebastian Rahtz,
was used to include photos and figures.
• The LATEX package multicol, written by Frank Mittelbach, was used to get
around LATEX’s limitation that two-column format must start on a new page
(needed for the Symbol Index and the Index).
• The LATEX packages TikZ, written by Till Tantau, and tcolorbox, written by
Thomas Sturm, were used to produce the definition boxes and result boxes.
• The LATEX package color, written by David Carlisle, was used to add appropriate
color to various design elements.
• The LATEX package wrapfig, written by Donald Arseneau, was used to wrap
text around the comment boxes.
• The LATEX package microtype, written by Robert Schlicht, was used to reduce
hyphenation and produce more pleasing right justification.
© Sheldon Axler 2024
S. Axler, Linear Algebra Done Right, Undergraduate Texts in Mathematics,
390
https://doi.org/10.1007/978-3-031-41026-0