LA
LA
LA
Addition and scalar multiplication of vectors in a vector space satisfy the following
properties, all of which hold for all vectors (u, v, w) in the vector space, and all
scalars (a, b):
(i) commutativity of vector addition: u + v = v + u,
(ii) associativity of vector addition: u + (v + w) = (u + v) + w,
(iii) distributivity of scalar multiplication: a(u+v) = au+av and (a+b)u = au+bu,
(iv) associativity of scalar multiplication: (ab)u = a(bu),
(v) there is a zero vector: a special vector 0 such that 0 + v = v + 0 = v for all
vectors v,
(vi) special property of the scalar 1: 1u = u.
2
a vector in R or C is written as a letter, then it will be in bold face (e.g., v) –
n n
Sometimes we will consider other vector spaces. For example, the set P of all
polynomial functions is a vector space, as is the set of all functions from R to R, or
(e.g.) from [0, 1] to R, with [af + bg](x) defined in each case to be af (x) + bg(x).
Another example, which we shall look at in more detail shortly, is the set D of all
functions f : R → R that have derivatives of all orders, i.e., such that f 0 , f 00 , f 000 , . . .
are all defined.
2
Another vector space that we shall encounter P∞ 2 later is the set ` of infinite sequences
(a0 , a1 , . . . ) of real numbers such that i=0 ai < ∞, i.e., the sum of the squares
is finite. Addition and scalar multiplication in `2 are defined by: (a0 , a1 , . . . ) +
(b0 , b1 , . . . ) = (a0 + b0 , a1 + b1 , . . . ) and r(a0 , a1 , . . . ) = (ra0 , ra1 , . . . ), for r a real
number and (a0 , a1 , . . . ) and (b0 , b1 , . . . ) in `2 .
For example, consider the subset S = {(1, 0, 0, 0)t , (2, 1, 0, 0)t , (3, 2, 1, 0)t , (4, 3, 2, 1)t }
of R4 . Suppose that there are real numbers a1 , . . . , a4 such that 0 = (0, 0, 0, 0)t =
a1 (1, 0, 0, 0)t +a2 (2, 1, 0, 0)t +a3 (3, 2, 1, 0)t +a4 (4, 3, 2, 1)t . This equation is equivalent
to the system:
In any solution, a4 = 0. Now a3 + 2a4 = 0 implies that a3 = 0, and so on: the only
solution is a1 = a2 = a3 = a4 = 0. Therefore the set S is linearly independent.
The linear span of a finite set S = {v1 , v2 , . . . , vk } of vectors in a real vector space
V is the subset
Xk
span(S) = { ai vi | ai ∈ R}
i=1
For the converse, suppose that A = {v1 , . . . , vk } is not linearly independent, mean-
ing that there are scalars a1 , . . . , ak , not all zero, such that a1 v1 + · · · + ak vk = 0.
Without loss of generality, assume that a1 6= 0. Dividing by a1 gives that v1 =
− aa21 v2 − · · · − aak1 vk , implying that v1 is in span(A \ {v1 }). 2
1.4 Subspaces
In the vector space R3 , the set {(1, 0, 0)t , (0, 1, 0)t } does not span R3 . But the set
U = {(x, y, z)t ∈ R3 | z = 0} is also a real vector space, and these two vectors do
form a basis for this smaller space U .
The n × n identity matrix is the n × n matrix In with 0s off the diagonal and 1s on
the diagonal. Sometimes it is more convenient to write simply I in place of In for
the identity matrix.
A matrix B satisfying (i)-(iii) is said to be in reduced row echelon form, and the
process of converting a matrix to such a form by means of row operations is known
as Gaussian elimination.
1 2 3
For example, starting with the matrix , through row operations we can
3 2 1
7
1 0 −1
obtain the matrix , which is in reduced row echelon form.
0 1 2
Gaussian elimination gives a method for solving a system of n linear equations with
m variables:
a11 x1 + · · · + a1m xm = b1
a21 x1 + · · · + a2m xm = b2
···
an1 x1 + · · · + anm xm = bn .
This can be written as a matrix equation:
a11 . . . a1m x1 b1
a . . . a x b
21 2m 2 2
.. .. .. .. = .. .
. . . . .
an1 . . . anm xm bn
The method is to write down the augmented matrix: the n × m matrix followed by
the last n × 1 column:
a11 . . . a1m b1
a ... a
21 2m b2
.. .. .. ..
. . . .
an1 . . . anm bn
and then perform row operations on the augmented matrix, until the original matrix
is in reduced row echelon form. Then we can read off the solutions, if any, from the
result. A solution will not exist if, in the reduced row echelon form, there is a row
consisting only of zeros corresponding to a non-zero entry in the n × 1 column.
For example, consider the equations 3x1 + 2x3 = 6 and −x1 + x2 + x3 = 5. We get
the augmented matrix
3 0 2 6
.
−1 1 1 5
Gaussian elimination yields the augmented matrix
1 0 2/3 2
.
0 1 5/3 7
Looking just at the first two columns, which are the columns containing leading 1s,
we can read off one solution: x1 = 2, x2 = 7 and x3 = 0. Additional solutions
are provided by allowing x3 to be any real number (x3 is called a free variable) and
setting x1 = 2 − 32 x3 and x2 = 7 − 53 x3 .
8
2.2 Dimension
Theorem 2.2.1: If V is a finite-dimensional vector space, and B is a basis of V
with n elements, then any set of n + 1 elements of V is linearly dependent.
Proof: Suppose B = {v1 , . . . , vn } is a basis of V with n elements, and let C =
{w1 , . . . , wn+1 } be any set of n + 1 elements. As B is a basis, we can write wj =
Pn
i=1 aij vi for each j.
P
Consider solving the system of equations: n+1 j=1 aij xj = 0 for i = 1, 2, . . . , n.
As there are more unknowns than equations (i.e., as a matrix equation, there are
more columns than rows in the matrix), when we perform Gaussian elimination we
get at least one column without a leading 1, in other words we get at least one free
variable. This means that there is a solution (x̂1 , . . . , x̂n+1 ) to the equations with
not all the x̂j equal to 0.
P Pn+1
As n+1j=1 aij x̂j = 0 for all i, it follows also that ( j=1 aij x̂j )vi = 0 for all i, and
9
therefore
X
n X
n+1 X
n+1 X
n X
n+1
0= aij x̂j vi = x̂j aij vi = x̂j wj .
i=1 j=1 j=1 i=1 j=1
As not all the x̂j are zero, this shows that the set C is linearly dependent, as claimed.
2
Corollary 2.2.2: Any two bases of a finite-dimensional vector space have the same
size.
The common size of any basis of a finite-dimensional vector space V is called the
dimension of V , and written dim(V ).
Switching row i with row j is the same as multiplication on the left by the n × n
1 ... 0 ... 0
... .. .. ..
. . 1 .
.. . .. .. .. , whose entries b are all zero except for b = 1
matrix B = . .. . . . pq pp
. . .. ..
.. 1 .. . .
... ... ... ... 1
if p 6∈ {i, j} and bij = bji = 1.
Adding a times the jth row to the ith row is the same as multiplication on the left
10
1 ... ... ... ...
... .. .. .. ..
. . . .
.. . . . ..
.
by the n × n matrix . .. .. .. , where a is in the (i, j) position and
. ..
.. a ... ..
. .
... ... ... ... 1
all other entries are 0 except for 1s on the diagonal.
Note that each elementary matrix has an inverse that is also an elementary matrix.
You should already be familiar with the notion of the determinant det(A) of a matrix
A, how to calculate it, and in particular that a square matrix A is invertible if and
only if det(A) is not zero, which in turn is the case if and only if the only solution
to Av = 0 is v = 0. Recall that a matrix is singular if its determinant is zero, and
non-singular otherwise. We use the ideas above to prove an additional property of
determinants:
det(AB) = det(A1 · · · Aj B1 · · · Bk ) =
To prove the claim, we must consider the three types of row operation in turn.
(1) If A represents the multiplication of a row by λ 6= 0, the determinant of A is λ
and the determinant of AM is λ times the determinant of M .
(2) If A represents the switching of two rows, the determinant of A is −1 and the
determinant of AM is minus the determinant of A.
(3) If A represents adding a multiple of one row to another row, the determinant of
A is 1, and the determinant of AM is the same as the determinant of A. 2
11
2.4 Inverses
The determination of whether an n × n matrix has an inverse, and its calculation
(if it exists) can be done in several ways. When the dimension n is large, the easiest
way is through Gaussian elimination.
Given an n × n matrix A that may or may not have an inverse, place the identity
matrix In alongside it. If A is invertible, there are elementary matrices B1 , B2 , . . . , Bk
corresponding to row operations such that B1 B2 · · · Bk A = In , meaning that the
product B1 B2 · · · Bk is equal to A−1 . Applying the same row operations to In as to
A transforms In into B1 B2 · · · Bk In = B1 B2 · · · Bk , the inverse of A.
0 2 −5
Example: We find the inverse of A = −1 0 1 .
3 −1 2
Starting with the augmented matrix (A|In ), we perform row operations until the
first matrix is reduced to In :
0 2 −5 1 0 0 1 0 −1 0 −1 0
−1 0 1 0 1 0 ; R1 ↔ R2, −R1 → R1 0 2 −5 1 0 0 ;
3 −1 2 0 0 1 3 −1 2 0 0 1
1 0 −1 0 −1 0 1 0 −1 0 −1 0
R3−3R1 → R3 0 2
−5 1 0 0 ; R2+R3 → R2 0 1 0 1 3 1 ;
0 −1 5 0 3 1 0 −1 5 0 3 1
1 0 −1 0 −1 0 1 0 −1 0 −1 0
R3
R2 + R3 → R3 0 1 0 1 3 1 ; → R3 0 1 0 1 3 1 ;
5
0 0 5 1 6 2 0 0 1 1/5 6/5 2/5
1 0 0 1/5 1/5 2/5
R1 + R3 → R1 0 1 0 1 3 1 .
0 0 1 1/5 6/5 2/5
The final matrix on the right is A−1 . Whenever we calculate an inverse, it is easy
to check whether the answer is correct:
1/5 1/5 2/5 0 2 −5 1 0 0
1 3 1 −1 0 1 = 0 1 0 .
1/5 6/5 2/5 3 −1 2 0 0 1
So indeed A−1 A = In .
12
Theorem 2.5.1: If W (x) is not zero for some value x, then the functions f1 , . . . , fn
are linearly independent.
Proof: Suppose that f1 , f2 , . . . , fn are linearly dependent, which means that there
are scalars a1 , a2 , . . . , an , not all zero, such that a1 f1 + a2 f2 + · · · + an fn is the zero
function. Since the derivative of the zero function is also the zero function we have,
for all real x,
f1 (x) f2 (x) . . . fn (x) a1 0
f 0 (x) f2 (x) . . . fn (x) a2 0
0 0
1
.. .. .. .. = .. .
. . . . .
(n−1) (n−1) (n−1)
f1 (x) f2 (x) . . . fn (x) an 0
As the vector (a1 , . . . , an )t is non-zero, this implies that the determinant W (x) of
the matrix is zero for all choices of x. 2
The other direction of Theorem 2.5.1 doesn’t hold: it is possible for W (x) to be zero
for all x ∈ R, and for the functions still to be linearly independent.
13
Every column of the matrix AC,BT corresponds to a member of the basis B; the jth
column is the result of the application of T to the jth member of the ordered basis
B, written in terms of the basis C. Every row corresponds to a member of the basis
C.
For example, if B = (u1 , u2 ) is an ordered basis for the space U , and C = (v1 , v2 , v3 )
14
is an ordered basis for the space V , T (u1 ) = 2v1 + 3v2 and T (u2 ) = v2 − v3 , then
2 0
AC,B
T = 3 1 .
0 −1
X
l X
l X
n X
n X l X
n
T2 (T1 (uj )) = αkj T2 (vk ) = αkj βik wi = ( βik αkj )wi = γij wi ,
k=1 k=1 i=1 i=1 k=1 i=1
where the γij are the entries of the matrix product AD,C C,B
T 2 AT 1 . 2
Example: Let B = ((1, 1, 0)t , (0, −1, 2)t , (−1, 0, 3)t ) and C = ((2, −1)t , (1, 2)t ).
Then
1 0 −1
2 1
MB = 1 −1 0 MC = .
−1 2
0 2 3
Returning to the original problem, we have the matrix AT , which represents the
16
The formula of Theorem 3.2.2 also tells us how to get the matrix AT from the matrix
AC,B
T . Because MB and MC are invertible, we can multiply the above formula on
the left by MC and on the right by MB−1 to get
AT = MC AC,B
T MB−1 .
0 0
This also tells us how to go from AC,B
T to ACT ,B for some other alternative ordered
bases B 0 and C 0 :
0 0
ACT ,B = MC−10 AT MB 0 = MC−10 MC AC,B
T MB−1 MB 0 .
1 0 2
Let’s work out an example. Let AT = .
0 −1 1
Let B = ((1, 1, 0)t , (1, 0, 1)t , (0, 1, 0)t ) and C = ((2, 1)t , (1, 2)t ).
Now
1 1 0
2 1 2/3 −1/3
MB = 1 0 1 ; MC = ; MC−1 = ;
1 2 −1/3 2/3
0 1 0
1 1 0
2/3 −1/3 1 0 2 1 0 1 = 1 5/3 1/3
AC,B = .
T −1/3 2/3 0 −1 1 −1 −1/3 −2/3
0 1 0
1
1 0 2 1 = 1 , and we see that
As a check, we calculate AT (1, 1, 0)t =
0 −1 1 −1
0
u1 = (1, 1, 0)t is mapped by T to (1, −1)t . This can be re-written as (2, 1)t −(1, 2)t =
17
of AC,B
T is (1, −1)t .
Similarly, the vector u2 = (1, 0, 1)t is mapped to (3, 1)t = 35 (2, 1)t − 31 (1, 2)t , and the
vector u3 = (0, 1, 0)t is mapped to (0, −1) = 31 (2, 1)t − 32 (1, 2)t .
3.3 Diagonalisation
For instance, we might want AB,B
T to be a diagonal matrix.
Recall that an eigenvalue of an n × n matrix A is a scalar λ such that there exists
some non-zero vector v with Av = λv. The vector v is called an eigenvector
corresponding to the eigenvalue λ.
A scalar λ is an eigenvalue of the matrix A if and only if x = λ is a root of the
characteristic polynomial det(xI − A). This is easy to prove: det(λI − A) = 0 if and
only if there is some non-zero vector v such that (λI − A)v = 0, i.e., λv − Av = 0,
and such a vector is an eigenvector corresponding to eigenvalue λ.
Given an n×n real matrix A, you should know a method for finding a real eigenvalue
– if one exists – and a corresponding eigenvector, and also to diagonalise the matrix
with these eigenvectors, if it is possible. The basic idea is to write A = P −1 DP ,
where D is a diagonal matrix whose entries are the eigenvalues, and the columns of
P are the corresponding eigenvectors. In order for P to be invertible, we need its
columns to form an ordered basis of Rn . (We shall return to this topic later in the
course, to see what can be done if this is not possible, i.e., in the case where the
n × n matrix A does not have a set of n linearly independent eigenvectors.)
What we are really doing here is a change of basis. The linear transformation
represented by A with respect to the standard ordered basis of Rn is represented by
D with respect to the (ordered) basis of eigenvectors we found.
18
3.4 Similarity
Two n × n matrices A and B are similar if there exists an invertible matrix P such
that P AP −1 = B.
Similarity is an equivalence relation:
(a) every matrix A is similar to itself, as IAI = A;
(b) if A is similar to B, then there is an invertible matrix P so that P AP −1 = B –
then we have P −1 BP = A, so B is similar to A;
(c) if A is similar to B, and B is similar to C, then there are invertible matrices P
and Q with P AP −1 = B and QBQ−1 = C, so QP AP −1 Q−1 = QBQ−1 = C, and
therefore A is similar to C.
We can restate what we have done in the previous section in terms of similarity.
If A and B are similar then they can represent the same linear transformation, with
respect to a change of basis represented by the matrix P . Conversely, if A and
B can represent the same linear transformation, then there will be a P such that
P AP −1 = B, and so A and B are similar.
Define the trace of a matrix to be the sum of the diagonal entries. It is an exercise
to show that if f (x) = xn + an−1 xn−1 + · · · + a0 is the characteristic polynomial of the
n × n matrix A, then −an−1 is equal to the trace of A. Therefore similar matrices
have the same trace.
In the same way, we see that similar matrices have the same determinant. Also, if
λ is an eigenvalue of A, and B is similar to A, then λ is an eigenvalue of B.
19
N (A) = {u ∈ Rm | Au = 0}.
Theorem 4.1.1: Both the image and the kernel of a linear transformation are
subspaces.
Image: Suppose v1 and v2 are in im(T ), with v1 = T (u1 ) and v2 = T (u2 ). For
scalars a and b, T (au1 + bu2 ) = aT (u1 ) + bT (u2 ) = av1 + bv2 , so av1 + bv2 ∈ im(T ).
2
For an n×m matrix A, this result also tells us that the range R(A) of A is a subspace
of Rn , and the nullspace N (A) is a subspace of Rm . The range R(A) is spanned
by the set {Ae1 , . . . , Aem }, which are just the column vectors of A. In other words,
R(A) is the linear span of the column vectors of A, also known as the column space
of A.
The next important result relates the dimensions of the kernel and the image of a
linear transformation.
In terms of matrices, the result above says that, for an n×m matrix A, the dimension
of R(A) (the rank r(A) of A) plus the dimension of N (A) (the nullity n(A) of A)
equals m.
The image of T is the linear span of the set {T (u1 , . . . , T (um )}. Indeed, each
vector T (ui ) is certainly in im(T ), while any vector u ∈ U can be written as u =
x1 u1 + · · · xm um , and so T (u) = x1 T (u1 ) + · · · + xm T (um ).
A basis of im(T ) can be found from a subset of columns of AC,B T that is linearly
independent and spans the same space. To find the kernel, one must solve the system
of linear equations given by T (x1 u1 + · · · + xm um ) = x1 T (u1 ) + · · · + xm T (um ) = 0.
Fortunately, one way to find a basis of the image follows from the same technique
for finding a basis of the kernel.
A basis of im(T ) is {T (uj ) | j is not a free variable}. Why? Let k be a free vari-
P
able. Since
P −u k + j∈{1,...,m}\F αj uj is in ker(T ) for some scalars αj , we can write
T (uk ) = j∈{1,...,m}\F αj T (uj ), and therefore {T (uj ) | j is not a free variable} spans
the image. By Theorem 3.2.2, the dimension of the image is m minus the number
of free variables, which is the number of variables that are not free, and therefore
this set is also a basis of the image.
Gaussian elimination also shows how to extend a basis of ker(T ) to a basis of U . The
set C = {uj | j is not a free variable}, when added to a basis B of ker(T ), will result
in a basis of U . Since the size of these two sets adds up to m = dim(U ), it suffices to
P P
show that this set is linearly independent. Suppose
P that j∈C aj u j + w∈B bw w = 0.
By applying the transformation T , we get j∈C aj T (uj ) = 0. But, since we showed
already that the T (uj ), for the non-free variables j,Pform a basis of the image of T ,
it follows that all the aj are zero. We are left with w∈B bw w = 0. As B is a basis
of ker(T ), we conclude that all the bw are also zero.
22
are in the linear span of these first two, so the two vectors span the range of AT .
As for the kernel, both the third and the fourth variables are free variables. To
find one basis vector of ker(T ), set the coefficient x3 for e3 equal to −1 and the
coefficient x4 for e4 equal to 0: to find the other two coefficients, we solve the two
equations x1 + (−1)(−2) = 0 and x2 + (−1)3 = 0, to get the vector (−2, 3, −1, 0)t in
ker(T ). Similarly, setting x4 = −1 and x3 = 0 yields the vector (2, −2, 0, −1)t in the
kernel. These two vectors (−2, 3, −1, 0)t and (2, −2, 0, −1)t form a basis of ker(T ).
The vectors (1, 0, 0, 0)t and (0, 1, 0, 0)t corresponding to the non-free variables will
extend any basis of the kernel to a basis of R4 .
2 3 1
Placing the vectors (2, 3, 1)t and (1, 4, −1)t into rows, we get the matrix .
1 4 −1
1 4 −1
Swap the rows, and subtract twice the first row from the second row, to get .
0 −5 3
1 0 7/5
Continue to get the matrix . As the third column does not have a
0 1 −3/5
leading 1, the vector (0, 0, 1)t will extend any basis of the image to a basis of R3 .
leading 1s.
This tells us that the row rank and column rank of a matrix in reduced row echelon
form are equal, and therefore that the row rank and column rank of any matrix are
equal. The common value of the row rank and the column rank of a matrix is called
simply the rank of the matrix.
Moreover, if we start with a matrix A, and row-reduce it to a matrix B in reduced
row echelon form, then the non-zero rows of B form a basis for the row space of A.
Also, the columns of B containing leading 1s correspond to columns of A that form
a basis of the column space R(A) of A.
• The function is linear in the first variable: for each fixed v, the function taking
u to u · v is linear, i.e.,
It turns out that this particular set of conditions (a) is possessed by a number of
important examples of functions from V × V to R, where V is a real vector space,
and (b) has a large number of useful consequences and applications. So we consider
a general function satisfying these three conditions, and start by giving it a name.
Definition: An inner product on a real vector space V is a function h·, ·i that
associates a real number hu, vi to each pair of vectors u, v ∈ V , satisfying the
following three conditions.
• The function is linear in the first variable: for each fixed v in V , the function
taking u to hu, vi is linear, i.e.,
Definition: A real vector space V equipped with an inner product is called a real
inner product space.
So Rn , equipped with the dot product, is a real inner product space. Let us see some
other examples of inner products, starting with a different inner product on R2 .
Example 2: Consider the function h·, ·i from R2 × R2 to R defined by hu, vi =
u1 v1 + 2u2 v2 + u2 v1 + u1 v2 .
It is not hard to check that this function also has all the three properties listed
above, and so is an inner product on R2 . To see positivity, note that hu, ui =
u21 + 2u22 + 2u1 u2 = (u1 + u2 )2 + u22 , which is always non-negative, and can only be
zero if u1 + u2 = 0 = u2 , which only happens for u = 0.
By
R 1 its definition, the function h·, ·i is symmetric. It is also positive, because hf, f i =
2
−1 (f (x)) dx, which is always non-negative, and is only zero – for a polynomial
function f – if f is the zero function.
Example P 4: Recall that `2 is the set of infinite sequences (a1 , a2 , . . . ) of real numbers
∞
such
P that i=1 a 2
i < ∞. For a = (a1 , . . . ) and b = (b1 , . . . ) in ` , define ha, bi to
2
be ∞ 2
i=1 ai bi . It is easy to check that this defines an inner product on ` ; this is a
generalisation of the standard dot product in Rn .
A function from V × V to R which is linear in both its first and second variables is
called a bilinear form on V .
(2) Linearity in the first and second variables implies that hu, 0i = h0, ui = 0. So
in particular h0, 0i = 0 in any real inner product space. In the positivity condition,
we are careful to demand only that hu, ui > 0 for all non-zero u.
For the rest of this subsection, we focus on the case where V is finite-dimensional,
with a basis B = {w1 , . . . , wn }.
An inner product on V is determined by the values hwi , wj i of the inner product
on pairs of basis elements. Indeed, take any vectors x = x1 w1 + · · · xn wn and
y = y1 w1 + · · · + yn wn in V . By linearity in the first variable,
hx, yi = hx1 w1 + · · · + xn wn , yi
can be decomposed as
x1 hw1 , yi + · · · + xn hwn , yi,
and then, by linearity in the second variable, this can be broken down to the double
sum
Xn Xn
xi yj hwi , wj i.
i=1 j=1
On the other hand, if we can write hx, yi = xt Ay for some matrix A, then the form
h·, ·i satisfies both linearity conditions. For instance,
To say that the function h·, ·i defined by hx, yi = xt Ay is symmetric means that
xt Ay = yt Ax for all x and y in V . As the 1×1 matrix yt Ax is equal to its transpose
xt At y, the function is symmetric if and only if xt Ay = xt At y for all x and y – and
this is equivalent to saying that A = At , i.e., A is symmetric.
A real symmetric matrix A such that xt Ax > 0 for all non-zero x is called positive
definite. Thus hx, yi = xt Ay defines an inner product on Rn if and only if the matrix
A is positive definite. We shall return to this property later in the course.
The inner product also measures the degree to which one vector has a “component”
in the direction of another vector. Let v be a non-zero vector in Rn . As av will
represent the same direction, for any scalar a ∈ R, we may choose instead the vector
||v||
||v|| , which is a unit vector, as || ||v|| || = ||v|| = 1. From now on, we assume that the
v v
vector v is a unit vector. Choose now any vector u ∈ Rn . The inner product hu, vi
measures the component of u in the direction of v. For example, if u is perpendicular
to v, then u has no component in the direction of v, and indeed hu, vi = 0. The
unit vector v has component 1 in its own direction, and indeed hv, vi = ||v||2 = 1.
Definition: The angle θ between two vectors x and y in a real inner product space
hx,yi
is the unique value 0 ≤ θ ≤ π such that cos θ = ||x||·||y|| .
If x and y are unit vectors, the angle between them is arccoshx, yi.
This definition of angle is chosen to coincide with the well known result in Rn , with
the dot product, that ||x − y||2 = ||x||2 + ||y||2 − 2||x|| · ||y|| cos θ, where θ is the
29
Proof: Let α be a real variable, and consider the square of the norm of x + αy, i.e.,
||x + αy||2 = hx + αy, x + αyi, which is at least zero by positivity. Using linearity
and symmetry, we write this as hx + αy, x + αyi = hx, xi + 2αhx, yi + α2 hy, yi ≥ 0.
The above equation holds for all real α. Whenever a 6= 0, and at2 + bt + c ≥ 0
for all choices of t ∈ R, it must be that 4ac ≥ b2 (since otherwise there would be
two distinct real solutions of the quadratic equation, implying that there are some
choices of t for which at2 +bt+c < 0). Setting c = hx, xi, b = 2hx, yi and a = hy, yi,
this yields hx, xi hy, yi ≥ |hx, yi|2 , giving ||x||2 ||y||2 ≥ |hx, yi|2 . 2
Theorem 5.2.3: The Triangle Inequality For any pair x, y of vectors in a real
inner product space, ||x + y|| ≤ ||x|| + ||y||.
Thus an orthonormal set A in a real inner product space is always linearly indepen-
dent; if also it spans V , it is called an orthonormal basis of V .
Now normalise this vector, setting vk+1 = v̂k+1 /||v̂k+1 ||. The set {v1 , . . . vk , vk+1 }
is also an orthonormal set in U . Continue in this way until we have a basis of U .
If the inner product is defined by the standard dot product in Rm , then there
is another way to find a vector v perpendicular to a collection of other vectors
u1 , . . . , uk : create the k × m matrix A by placing the the vectors u1 , . . . , uk into the
rows of A, and solve the equation Av = 0. A vector v ∈ R that satisfies this equation
will be perpendicular to each of the vectors u1 , . . . , uk , as the matrix multiplication
Av is performed by taking the dot product of v with each of the u1 , . . . , uk , and
each result must be zero. The solution set will be the set of all vectors v such that
v ⊥ ui for all i = 1, 2, . . . , k.
This method, however, has its limitations. It won’t work if the inner product is not
the dot product. Furthermore, if the inner product is the standard dot product,
but the vector v is required to belong to some particular vector subspace, then one
must solve for both membership in the kernel (of the transformation defined by the
above matrix A) and membership in the desired vector subspace.
Find an orthonormal basis of ker(T ) (with respect to the standard Euclidean dot
product), and extend it to an orthonormal basis of R4 .
First we find a basis for the kernel of T . We start by row reducing the given matrix
to
1 −1 0 −4 1 0 1 −2
0 1 1 2 and then 0 1 1 2 .
0 1 1 2 0 0 0 0
We can now read off a basis {(1, 1, −1, 0)t , (−2, 2, 0, −1)t } for the nullspace of the
matrix, i.e., for the kernel of T , but this basis is not orthonormal. Because the norm
of the second vector is simpler, namely 3, we take this vector (−2, 2, 0, −1)t first,
and divide by the norm to obtain v1 = (−2/3, 2/3, 0, −1/3)t , a unit vector. Now
32
we have the good fortune that (1, 1, −1, 0)t · (−2/3, 2/3, 0, −1/3)t = 0, so the second
vector v2 can be taken as √13 (1, 1, −1, 0)t .
Now take another row vector v4∗ = (0, 1, 1, 2)t ; we know this is orthogonal to v1 and
v2 , but it is not orthogonal to v3 . Using the Gram-Schmidt process, we obtain:
√ √
6 6
v̂4 = (0, 1, 1, 2)t − (0, 1, 1, 2)t · ( (1, 0, 1, −2)t ) (1, 0, 1, −2)t =
6 6
1 1 1 3
(0, 1, 1, 2)t − (−3) (1, 0, 1, −2)t = (0, 1, 1, 2)t + (1, 0, 1, −2)t = ( , 1, , 1)t
6 2 2 2
Finally, we normalise v̂4 to get v4 = 3√2 (1, 2, 3, 2)t . The set {v1 , v2 , v3 , v4 } is an
1
Notice that the vectors written in the columns form an orthonormal basis. The same
is true of the vectors written into the rows.
Theorem 6.1.1: Consider the inner product space Rn , with the dot product, and
let T : Rn → Rn be a linear transformation represented by the matrix AT with
respect to the standard basis. The following are equivalent:
(1) AT is orthogonal;
(2) T maps the standard basis of Rn to an orthonormal basis of Rn , meaning that
the columns of AT form an orthonormal basis of Rn ;
(3) T preserves distance, i.e, ||T (v)|| = ||v|| for all v ∈ Rn ,
(4) T preserves the dot product, meaning T (u) · T (v) = u · v for all u, v ∈ Rn .
Proof:
(1) and (2) are equivalent: For every 1 ≤ i ≤ n, set vi = T (ei ) = AT ei . Then
vi appears as the ith column of AT , and therefore also as the ith row of AtT .
Writing each vi as vi = (a1i , . . . ani ), we see that AtT AT = I if and only if a1i a1j +
· · · + ani anj = vi · vj is equal to 0 if i 6= j, while a21i + · · · + a2ni = vi · vi is equal to 1.
These conditions are equivalent to saying that {v1 , . . . , vn } is an orthonormal basis.
√
(3) and (4) are equivalent: We know that kxk = x · x. Also, one can check
that
1 1
x · y = ||x + y||2 − ||x − y||2 ,
4 4
for all x, y ∈ R . This means that the dot product and the norm can each be written
n
in terms of the other, and therefore preserving one is equivalent to preserving the
other.
34
(2) and (4) are equivalent The inner product is determined entirely by what it
does to pairs of basis elements. Therefore, preserving the inner product is equivalent
to requiring that vi · vj = ei · ej for all pairs i and j, and this is equivalent to (2). 2
6.2 Closure
Theorem 6.2.1: Let A and B be n × n orthogonal matrices. Then:
(1) A−1 is orthogonal,
(2) AB is orthogonal,
(3) det(A) is equal to either 1 or −1.
Properties (1) and (2) above are called “closure” properties: the class of n × n
orthogonal matrices is closed under taking inverses, and taking products. We shall
see similar results for other interesting classes of matrices later.
The orthogonal matrices with determinant equal to 1 are called orientation-preserving
and the orthogonal matrices with determinant −1 are called orientation-reversing.
If the angle between (1, 0)t and v2 is θ + π2 , then v2 = (cos(θ + π/2), sin(θ + π/2))t =
(− sin θ, cos θ)t , and the transformation that takes e1 to v1 and e2 to v2 is the
rotation by the angle θ, represented by the matrix
cos θ − sin θ
A= ,
sin θ cos θ
and the determinant of A is cos2 θ + sin2 θ = 1.
If the angle between (1, 0)t and v2 is θ − π2 , then v2 = (cos(θ − π/2), sin(θ − π/2))t =
35
(sin θ, − cos θ)t , and the transformation that takes e1 to v1 and e2 to v2 is represented
by the matrix
cos θ sin θ
A= ,
sin θ − cos θ
with determinant − cos2 θ − sin2 θ = −1.
The transformation represented by A is a reflection about the line through the origin
and the vector u = (cos( 2θ ), sin( 2θ ))t .
Conversely, because any rotation or reflection of R2 that fixes the origin defines a
linear transformation that is distance-preserving, its corresponding matrix is orthog-
onal.
6.4 Dimension 3
What happens in R3 ? Surprisingly, the structure of orthogonal linear transforma-
tions in R3 is almost identical to that of R2 .
The general method is to write down the matrix representing the transformation
with respect to a convenient orthornormal ordered basis, and then change basis.
One may check that indeed (2, 0, 1)t is mapped to (−2, 0, 1)t by A, while (0, 1, 0)t
and (1, 0, −2)t are mapped to themselves.
Using the trace allows for an easier calculate of the angle of rotation of a 3 × 3
orthogonal matrix, without even finding the axis of rotation! One needs to know
only whether the matrix is orientation-reversing or orientation-preserving.
An orientation-preserving orthogonal matrix A, representing a rotation about some
axis through angle θ, is similar to the matrix B that represents a rotation about
38
cos θ − sin θ 0
the axis (0, 0, 1)t through angle θ, and we know that B = sin θ cos θ 0. The
0 0 1
trace of B is 1 + 2 cos θ, and therefore the trace of the similar matrix A is also
1 + 2 cos θ.
8 −1 4
For instance, the order-preserving orthogonal matrix C = 19 1 −8 −4 above has
4 4 −7
trace −7/9: if it represents a rotation through an angle θ, then 1 + 2 cos θ = −7/9,
so cos θ = −8/9, in agreement with our earlier calculation.
Similarly, if A is an orientation-reversing
orthogonal matrix, with angle of rotation
cos θ − sin θ 0
θ, then it is similar to sin θ cos θ 0 , and so its trace will equal −1 + 2 cos θ.
0 0 −1
column, and so
For example, the matrix C in the previous section represented a rotation about
(4, 0, 1)t , and it mapped (0, 1, 0)t to 19 (−1, −8, 4)t . We have already seen that the
angle of rotation is arccos(−8/9) – we take this to mean the number between 0 and
+π whose cosine is −8/9. We calculate
4 0 −1
4 1
det 0 1 −8 = det > 0,
−1 4
1 0 4
so the rotation is anticlockwise.
For any non-zero complex number c = a + bi, the modulus |c| is positive, and the
complex number c/|c|2 is a multiplicative inverse for c.
When performing arithmetic with complex numbers, never leave i in the lower part
of a fraction as a final answer. If necessary, multiply top and bottom of the fraction
1+i
by the conjugate of what is on the bottom. For example, if the number −3+i appears,
−3−i
then multiply this fraction by −3−i to get (−3−i)(−3+i) = −2−4i
(−3−i)(1+i)
10 =
−1−2i
5 .
As one can take the conjugate of a complex number, so one can take the conjugate
of a vector in Cn . If v = (c1 , . . . , cn )t , then v = (c1 , . . . , cn )t .
Also one can take the conjugate of a matrix: if A is a complex n × m matrix with
entries aij , then the conjugate matrix A has entries aij .
The complex conjugate of a product of complex numbers c and d satisfies cd = cd.
It follows that, for complex matrices C and D, CD = CD
We can calculate this determinant of A using the same calculation as for det(A),
except that at every step of the calculation we have the complex conjugate. So
det(A) = det(A).
42
• The dot product is linear in the first variable: for all w, z1 , z2 in Cn , and all
complex numbers α1 , α2 , we have
(α1 z1 + α2 z2 ) · w = α1 (z1 · w) + α2 (z2 · w).
Notice the differences between the real and complex cases. Another key difference
comes when we look at linearity properties in the second variable. We see that, for
all z, w1 , w2 in Cn , and all complex numbers α1 , α2 ,
z · (α1 w1 + α2 w2 ) = (α1 w1 + α2 w2 ) · z
= α1 (w1 · z) + α2 (w2 · z)
= α1 w1 · z + α2 w2 · z
= α1 (z · w1 ) + α2 (z · w2 ).
We say that the dot product is conjugate linear in the second variable.
• The function is linear in the first variable: for all w, z1 , z2 in V , and all complex
numbers α1 , α2 , we have
hα1 z1 + α2 z2 , wi = α1 hz1 , wi + α2 hz2 , wi.
43
A complex vector space equipped with a complex inner product is called a complex
inner product space.
From any
p complex inner product, we can define a norm for a vector:
||v|| = hv, vi.
Example 1: We have already seen that the complex dot product hu, vi = u · v :=
u1 v1 + · · · + un vn is an inner product on Cn . This is sometimes called the Euclidean
inner product on Cn . In matrix form, we can write
1 ... 0 v1
. . .
u · v = u1 . . . un .. .. .. ... = ut v.
0 ... 1 vn
The square of the norm of a vector v ∈ Cn , with respect to the dot product, is given
by v · v. If v = (v1 , . . . , vn )t , then we have
||v||2 = v1 v1 + · · · + vn vn = |v1 |2 + · · · + |vn |2 .
p
Therefore ||v|| = |v1 |2 + · · · + |vn |2 .
44
Example 2: v1 = 12 (1, i, 1 + i)t , v2 = 21 (i, −1, 1 − i)t . The norm of each vector is 1,
and their dot product is 41 (−i − i + (1 + i)(1 + i)) = 14 (−2i + 1 − 1 + 2i) = 0.
There is no essential difference between the proof of the following result and that of
Theorem 5.3.1.
As with real inner product spaces, the Gram-Schmidt procedure can be performed
∗
with complex inner product spaces. The slight difference is that the new vector vk+1
should appear in the first position, with the other previous vectors v1 , v2 , . . . , vk in
the second position. The inner product hu, vi represents the component of u in the
45
1 −i −1 1
0 1 i 0 .
0 i 1 1−i
Add i times the second row to the first row and −i times the second row to the
third row:
1 0 −2 1
0 1 i 0 .
0 0 2 1−i
46
Add the third row to the first row, and then divide the third row by 2:
1 0 0 2−i
0 1 i 0 .
0 0 1 (1 − i)/2
Finally, subtract i times the third row from the second row:
1 0 0 2−i
0 1 0 (−1 − i)/2 .
0 0 1 (1 − i)/2
We read off the solution x1 = 2 − i, x2 = (−1 − i)/2, x3 = (1 − i)/2.
1 1−i i
Example: Find a basis for the nullspace of the matrix A = 0 1 2 − i, and
0 1+i 3+i
extend it to an orthonormal basis of C .
3
The pairing of the n × n matrix A and its adjoint A∗ has a special relationship to
the complex inner product. This relationship is expressed in the following result.
Theorem 8.1.1: For every pair x, y of vectors of Cn , and any n×n complex matrix
A, we have (Ax) · y = x · (A∗ y).
Proof: We have
(Ax) · y = (Ax)t y = xt At y = xt A∗ y = x · (A∗ y).
2
Theorem 8.2.1: Consider the space Cn with the standard dot product. Let A be
an invertible n × n complex matrix. Then the following are equivalent:
(1) A is unitary;
(2) the columns of A form an orthonormal basis of Cn ;
48
(3) the transformation defined by A preserves the inner product, i.e., (Au) · (Av) =
u · v, for all u, v ∈ Cn .
(4) the transformation defined by A preserves distance: ||Av|| = ||v|| for all v ∈ Cn .
Proof:
(1) and (2) are equivalent: For every 1 ≤ i ≤ n, let vi = Aei be the ith column
of A. Expressing vi as vi = (a1i , . . . ani )t and vi = (a1i , . . . ani )t , we have A∗ A = I
if and only if a1i a1j + · · · + ani anj = vj · vi is equal to 0 if i 6= j, and otherwise
a1i a1i + · · · + ani ani = vi · vi is equal to 1. These are precisely the conditions for the
vi to form an orthonormal basis.
(2) and (3) are equivalent: The inner product is determined entirely by what it
does to pairs of basis elements. Therefore, preserving the inner product is equivalent
to (Aei ) · (Aej ) = ei · ej for all pairs i and j, and this is equivalent to (2).
(3) and (4) are equivalent: The norm of a vector is defined in terms of the inner
√
product: kvk = v · v. So, if A preserves inner products, it also preserves norms.
On the other hand, Exercise 30 shows that the inner product of two vectors can
be written as a function of the norms of certain associated vectors. Therefore, if A
preserves norms, then it also preserves inner products. 2
If A is a complex matrix, then det(A) is a complex number. Part (3) above says
that, if A is unitary, then this complex number has modulus 1.
Proof: (i) Suppose A is invertible and Hermitian. Then (A−1 )∗ = (A∗ )−1 = A−1 ,
so A−1 is Hermitian.
(ii) Let u be any eigenvector of A corresponding to an eigenvalue λ, so u 6= 0
and Au = λu. The inner product (Au) · u is equal to λ||u||2 , as (Au) · u =
(λu) · u = λ(u · u) = λ||u||2 . But, since A∗ = A, Theorem 8.1.1 tells us that
(Au) · u = u · (A∗ u) = u · (Au) = u · (λu) = λ||u||2 . Since u is non-zero, we conclude
that λ = λ, so λ is real. 2
For a sesquilinear form h·, ·i to define a complex inner product, it must be Hermitian
and positive. For the form to be positive means that hx, xi > 0 for all vectors x 6= 0.
(1) Let x be a variable, form the matrix xI − A, and take its determinant. This
will be a polynomial f (x) in x, of degree n. This polynomial f (x) is called the
characteristic polynomial of the matrix A.
(2) Find the roots of the characteristic polynomial f (x), which are the eigenvalues
of A.
(3) For each eigenvalue λ of A, find a basis of the nullspace of λI − A (the eigenspace
corresponding to eigenvalue λ). Because the determinant of each λI − A is zero,
each eigenspace will be a subspace of dimension at least one, and will contain an
eigenvector.
(4) If the sum of the dimensions of all the eigenspaces equals n, the dimension of
the vector space, then we have found a basis of Rn whose elements are eigenvectors.
Writing the basis into the columns of a matrix S, we have A = SDS −1 and D =
S −1 AS, for a diagonal matrix D that contains the eigenvalues of A on the diagonal.
Our aim is to examine where this process may fail, and what we can do (partly) to
remedy this.
The first place where the process could fail is in step (2). The problem is that the
characteristic polynomial det(xI − A) may have no real roots.
52
such that the ri are complex numbers and a is also a complex number, which is the
coefficient of the leading term xn of the polynomial. If f (x) = det(xI − A) is the
characteristic polynomial of a matrix A, then a = 1.
We do the same for the other eigenvalue cos θ) − sin θi, obtaining a one-dimensional
nullspace spanned by (1, i)t .
Notice that these two eigenvectors (1, −i)t and (1, i)t corresponding to the two eigen-
values cos θ+i sin θ and cos θ−i sin θ, respectively, are orthogonal. For some purposes
it is useful to normalise them, so as to get a unitary change of basis matrix
√ √
2 1 1 −1 2 1 i
S= with inverse S = .
2 −i i 2 1 −i
53
Example: Consider
5 5 −5
A = 3 3 −5 .
4 0 −2
We calculate
x − 5 −5 5
det(Ix − A) = det −3 x − 3 5 = x3 − 6x2 + 4x + 40.
−4 0 x+2
Because the characteristic polynomial f (x) = x3 − 6x2 + 4x + 40 is real and of odd
degree, there must be a real root, and we look for one.
If f (x) has an integer root, it must be a divisor of 40. It’s not hard to see that
f (x) is positive for all positive x. We try f (−1) = 29, and then f (−2) = 0, so we
have found one root. Factoring (x + 2) out gives f (x) = (x + 2)(x
√
2
− 8x + 20). The
other√ roots of f are the roots of the quadratic, which are 4 + 64−80
2 = 4 + 2i and
4 − 64−80
2 = 4 − 2i. These are our eigenvalues.
In general, if f (x) is a real polynomial, and c is a root of f (x) with Im(c) 6= 0, then
c is also a root. To see this, consider the complex conjugate of the entire equation
f (c) = 0. The real coefficients in f remain untouched by conjugation, and cn = cn
for each n. So the result is f (c) = f (c) = 0 = 0. As our matrix A has only real
entries, it has a real characteristic polynomial, and it follows that 4 + 2i is a root if
and only if 4 − 2i is a root.
We could now search for an eigenvector associated with 4−2i using the same method,
but we have already done the work. Letting v = (3 + i, 1 + i, 2)t and λ = 4 + 2i,
we have (A − λI)v = 0. Taking the complex conjugate of this equation yields
A − λIv = 0. But, since A is a real matrix, A = A, and hence v is an eigenvector
corresponding to the eigenvalue λ. In this example, we conclude that (3 − i, 1 − i, 2)t
is an eigenvector corresponding to 4 − 2i. We now have the change of basis matrix
3+i 3−i 0
S = 1 + i 1 − i 1 .
2 2 1
The following is the calculation to find the inverse of S:
3+i 3−i 0 1 0 0 2 2 −1 1 −1 0
1+i 1−i 1 0 1 0 ; R1 − R2 → R1; 1 + i 1 − i 1 0 1 0 ;
2 2 1 0 0 1 2 2 1 0 0 1
55
2 2 −1 1 −1 0
(1 − i)R2 → R2; 2 −2i 1 − i 0 1 − i 0 ; R2 − R1 → R2, R3 − R1 → R3;
2 2 1 0 0 1
2 2 −1 1 −1 0
0 −2 − 2i 2 − i −1 2 − i 0 ; (1 − i)R2 → R2, 2R1 + R2 → R1, R3 → R3;
2
0 0 2 −1 1 1
4 0 −1 − 3i 1 + i −1 − 3i 0
0 −4 1 − 3i −1 + i 1 − 3i 0 ; R1+(1+3i)R3 → R1, R2+(−1+3i)R3 → R2;
0 0 1 −1/2 1/2 1/2
4 0 0 (1 − i)/2 (−1 − 3i)/2 (1 + 3i)/2
0 −4 0 (−1 − i)/2 (1 − 3i)/2 (−1 + 3i)/2 ; R1 → R1, − R2 → R2;
4 4
0 0 1 −1/2 1/2 1/2
1 0 0 (1 − i)/8 (−1 − 3i)/8 (1 + 3i)/8
0 1 0 (1 + i)/8 (−1 + 3i)/8 (1 − 3i)/8 .
0 0 1 −1/2 1/2 1/2
We have the diagonalisation:
1 − i −1 − 3i 1 + 3i 5 5 −5 3+i 3−i 0
1
S −1 AS = 1 + i −1 + 3i 1 − 3i 3 3 −5 1 + i 1 − i 1
8
−4 4 4 4 0 −2 2 2 1
4 + 2i 0 0
= 0 4 − 2i 0 .
0 0 −2
56
Proof: Let the Ui be as in Theorem 10.2.2, and let Bi be a basis for Ui , for each i.
By part (ii) of the Theorem, the union of the Bi forms a basis for Cn . Each element
of Bi is in the nullspace of (A − λi I)mi .
Now f (A) = (A − λ1 I)m1 · · · (A − λi I)mi · · · (A − λk I)mk , and the terms (A − λj )
all commute, so f (A) is of the form C(A − λi I)mi for some matrix C. Therefore
f (A)v = 0 for each v in the basis Bi .
Thus the matrix f (A) takes each basis element to 0, which implies that f (A) is the
zero matrix. 2
We shall explore the theoretical consequences of this later. There is a fairly simple
method, broadly following the theoretical discussion earlier, that puts a matrix into
upper triangular form. See Ostaszewski for details. We shall not study this method:
instead we’ll look at a method that accomplishes even more.
Usually the process of finding the basis for the Jordan normal form is fairly easy.
Only when some subspace corresponding to a single eigenvalue has dimension at
least 4 might the process start to become complicated. In the practical examples that
follow, we’ll restrict ourselves to matrices where none of the serious complications
can occur; in the Appendix, we’ll discuss a more complex case in the abstract.
3 −1 −4 7
1 1 −3 5
Example: Let A be the matrix
0 1 −1 −1. We put A into Jordan normal
0 0 0 2
form.
3 − x −1 −4 7
1 1−x −3 5
First, we find the determinant of A−xI = 0
. Because
1 −1 − x −1
0 0 0 2−x
the last row has zeros except for the 2 − x, it is best to start with this row: we see
that det(A − xI) is equal to (2 − x) times
3 − x −1 −4
det 1 1−x −3 = (3 − x)(x2 + 2) − (1 + x) − 4 = −x3 + 3x2 − 3x + 1,
0 1 −1 − x
The eigenvalue 2, with algebraic multiplicity 1, is not a problem; there is one eigen-
value corresponding to this eigenvalue, which is found to be v4 = (8, 7, 2, 1)t .
The nullspace of (A − I)3 must be three-dimensional. We calculate:
2 −1 −4 7 2 −1 −4 7 3 −6 3 20
1 0 −3 5 1 0 −3 5 2 −4 2 15
(A − I)2 =
0 1 −2 −1 0 1 −2 −1 = 1 −2 1 6 ;
0 0 0 1 0 0 0 1 0 0 0 1
3 −6 3 20 2 −1 −4 7 0 0 0 8
2 −4 2 15 1 0 −3 5 0 0 0 7
(A − I)3 =
1 −2 1 6 0 1 −2 −1 = 0 0 0 2 .
0 0 0 1 0 0 0 1 0 0 0 1
The nullspace of (A − I)3 is spanned by the standard vectors e1 , e2 , and e3 . The
nullspace of (A−I)2 is spanned by {(1, 0, −1, 0)t , (0, 1, 2, 0)t }. Take any vector in the
61
nullspace of (A − I)3 but not in the nullspace of (A − I)2 – a convenient choice is the
vector e3 = (0, 0, 1, 0)t – and call it v3 . Find the vector (A−I)v3 = (−4, −3, −2, 0)t ,
and call it v2 . This vector must lie in the nullspace of (A − I)2 but not in the
nullspace of A − I. Now find (A − I)v2 = (3, 2, 1, 0)t , and call this vector v1 . Notice
that A takes v3 to v3 + v2 , takes v2 to v2 + v1 , and takes v1 to v1 . Along with
v4 = (8, 7, 2, 1)t , an eigenvector for the eigenvalue 2, we have our basis for putting
A into Jordan normal form:
3 −4 0 8 3 −4 0 4
2 −3 0 7 2 −3 0 5
S= 1 −2 1 2
; S −1
=
1 −2 1 4 ;
0 0 0 1 0 0 0 1
3 −4 0 4 3 −1 −4 7 3 −4 0 8
2 −3 0 5 1 1 −3 5 2 −3 0 7
S −1 AS =
1 −2 1 4 0 1 −1 −1 1 −2 1 2 =
0 0 0 1 0 0 0 2 0 0 0 1
5 −7 0 9 3 −4 0 8 1 1 0 0
3 −5 1 9 2 −3 0 7 0 1 1 0
1 −2 1 4 1 −2 1 2 = 0 0 1 0 .
0 0 0 2 0 0 0 1 0 0 0 2
How does one know the Jordan normal form corresponding to an eigenvalue λ, i.e.,
how do we tell how many chains of basis elements there are of each length, given
the dimensions of the nullspaces of (A − λI)n ?
We work out an example that shows the general method. Suppose we know that
the algebraic multiplicity is 9 and the dimensions of the nullspaces are as follows:
◦ ◦ ◦ ◦
and then build the second level so that the total number of points is 6,
◦ ◦
◦ ◦ ◦ ◦
and continue in this way until we have built up a pyramid of nine points:
◦
◦ ◦
◦ ◦
◦ ◦ ◦ ◦
We have four towers of points, the first tower being of height 4, the second of
height 3, and the last two both of height 1. First we search for a vector v in the
nullspace of (A−λI)4 that is not in the nullspace of (A−λI)3 , and apply A−λI to it
repeatedly until reaching the last vector (A−λ)3 v in the chain before 0 = (A−λ)4 v.
Next we search for a vector w in the nullspace of (A − λ)3 that is both not in the
nullspace of (A−λ)2 and linearly independent from the set of all the previous vectors
(A − λ)v, (A − λ)2 v, (A − λ)3 v found in the nullspace of (A − λI)3 . Next we find a
non-zero vector q in the nullspace of A−λ that is not in the linear span of the vectors
(A − λI)3 v and (A − λI)2 v. Lastly, we find a non-zero vector r in the nullspace of
A − λI that is not in span({(A − λI)3 v, (A − λI)2 v, q}). The vectors v, w, q, r,
and the applications of A − λI to these vectors, will form the basis for the Jordan
normal form. The result will be the following matrix in Jordan normal form:
λ 1 0 0 0 0 0 0 0
0 0
λ 1 0 0 0 0 0
0 0
0 λ 1 0 0 0 0
0 0
0 0 λ 0 0 0 0
0 0 0 0 λ 1 0 0 0 .
0 0 0 0 0 λ 1 0 0
0 0 0 0 0 0 λ 0 0
0 0 0 0 0 0 0 λ 0
0 0 0 0 0 0 0 0 λ
63
Let y(t) be a complex valued function of a real variable t. A simple type of differential
equation is y 0 (t) = ay(t), where a is a constant. This equation is solved by y(t) =
ceat , as y 0 (t) = aceat . Indeed, all solutions look like this, since if y is a solution
then the derivative of y(t)e−at is (by the chain rule) y 0 (t)e−at − ay(t)e−at = (y 0 (t) −
ay(t))e−at = 0. The only functions g whose derivatives are zero everywhere are
constant functions, and therefore with c being that constant from y(t)e−at = c we
have y(t) = ceat . The value of y(0) determines the value of c, as e0 = 1 and so
y(0) = c.
The above system of equations is called homogeneous; the derivative of each function
is in the linear span of itself and the other functions.
One can place these equations in matrix form and solve the following equations:
0
y1 a11 a12 . . . a1n y1
y0 a a . . . a y
2 21 22 2n 2
.. = .. .. .. .. .. .
. . . . . .
yn0 an1 an2 . . . ann yn
The idea of a matrix equation is not accidental. Taking the derivative is a linear
transformation defined on the subspace of all functions that have derivatives.
If taking the derivative took one of these n functions to a function outside of the
linear span of these n functions, then the resulting system of equations would be
non-homogeneous. Solving non-homogeneous systems of equations can be difficult
or easy. However, once one solution is found, solving the homogeneous system of
64
y20 3 −2 1 y2
We know that, for any n × n matrix A, there will be some invertible matrix S with
S −1 AS = E, for some matrix E in Jordan normal form. For now, we assume that E
65
is diagonal, and later we will show how to solve the equations when E is in Jordan
normal form.
We can now write down expressions for the functions fk , namely fk (t) = ck ebk,k t ,
where ck is some constant. Then we find the functions y1 , . . . , yn according to the
66
formula
y1 f1
y f
2 2
.. = S .. .
. .
yn fn
We illustrate this method by using it to solve the (hopefully familiar) second order
2
differential equation ddt2y + y = 0. As described above, this can be reformulated as
y1 = dy
dt , y0 = y, and 0
y0 0 1 y0
0 = .
y1 −1 0 y1
0 1 −x 1
Set M = . The determinant of is x2 + 1. So the eigenvalues of
−1 0 −1 −x
−i 1
M are i and −i. We find the nullspace of M −iI = . Recognising that the
−1 −i
rows are multiples of each other (they have to be, since M −iI is singular so the rows
are linearly dependent), we have that v1 = (1, i)t is in the nullspace of M − iI, and
−i)
t
is an eigenvector of M for the eigenvalue i. Likewise
v2 = (1, is aneigenvector
1 1 y0 f0
of M for the eigenvalue −i. Setting S = , we have =S , and
i −i y1 f1
0 0
f0 y 0 1 y0
0 = S −1 00 = S −1 =
f1 y1 −1 0 y1
0 1 f i 0 f y 1 1 f0
S −1 S 0
= 0
and 0
= .
−1 0 f1 0 −i f1 y1 i −i f1
The solution is f0 = c0 eit and f1 = c1 e−it , and so we get the solution of y =
y0 = c0 eit + c1 e−it . The first derivative is ic0 eit − ic1 e−it and, by taking the second
2
derivative, we get ddt2y = −c0 eit − c1 e−it , which is indeed the negative of y. Putting
c0 = c1 = 21 gives the familiar solution y(t) = cos t, and putting c0 = −i/2 and
c1 = i/2 gives the solution y(t) = sin t.
y30 0 1 4 y3
67
We set A to be the matrix above, and calculate its characteristic polynomial, which
is (x − 2)3 . By the Cayley-Hamilton Theorem, (A − 2I)3 is the zero matrix and
therefore every vector is in the nullspace of (A − 2I)3 .
We calculate:
−1 0 1
A − 2I = 1 −1 −3 ;
0 1 2
−1 0 1 −1 0 1 1 1 1
(A − 2I) = 1 −1 −3 1 −1 −3 = −2
2
−2 −2 ;
0 1 2 0 1 2 1 1 1
−1 0 1 1 1 1 0 0 0
(A − 2I) = 1 −1 −3 −2 −2 −2 = 0
3
0 0 .
0 1 2 1 1 1 0 0 0
We see that A−2I has a one-dimensional nullspace, (A−2I)2 has a two-dimensional
nullspace, and (A − 2I)3 has a three-dimensional nullspace. Any vector v in the
nullspace of (A − 2I)3 that is not in the nullspace of (A − 2I)2 will generate a
sequence of three vectors v, (A − 2I)v, and (A − 2I)2 v that span
the nullspace of
2 1 0
(A − 2I)3 . This implies that A is similar to the matrix 0 2 1 in Jordan normal
0 0 2
form.
To find the change-of-basis matrix, we take the vector v3 = (0, 0, 1)t , which is in the
nullspace of (A − 2I)3 , but not in the nullspace of (A − 2I)2 . The image of v3 under
A − 2I is v2 := (1, −3, 2)t . The image of v2 under A − 2I is v1 := (1, −2, 1)t . And
of course A − 2I applied to v1 gives 0. Therefore, if we define the matrix S to be
1 1 0
S = −2 −3 0 ,
1 2 1
2 1 0
then we have that S −1 AS is equal to 0 2 1.
0 0 2
We therefore set
y1 1 1 0 f1
y2 = −2 −3 0 f2 .
y3 1 2 1 f3
68
We must solve 0
f1 2 1 0 f1
f20 = 0 2 1 f2 .
f30 0 0 2 f3
The rows translate to f30 = 2f3 , f20 = 2f2 + f3 , f10 = 2f1 + f2 . This leads to the
solution
f3 = c3 e2t
f2 = c2 e2t + c3 te2t
2
f1 = c1 e2t + c2 te2t + c3 t2 e2t .
t2 λt tk−1 λt
gk = ck e + ck−1 te + ck−2 e + · · · + c1
λt λt
e .
2 (k − 1)!
Proof: Certainly this is true for the function g1 . Assume it is true for all the
functions g1 , . . . , gk . The function g = gk+1 must satisfy g 0 = λg + gk . Multiply by
e−λt on both sides to get
e−λt g 0 − λe−λt g = e−λt gk .
Notice that the left side of the equation is the derivative of e−λt g, so that e−λt g is
equal to the indefinite integral of the function e−λt gk . We have assumed that gk is
e , and so e−λt gk = ck + ck−1 t +
2
tk−1 λt
equal to ck eλt + ck−1 teλt + ck−2 t2 eλt + · · · + c1 (k−1)!
. The indefinite integral of the function e−λt gk is ck+1 + ck t +
2 k−1
ck−2 t2 + · · · + c1 (k−1)!
t
2 3 k
ck−1 t2 + ck−2 t6 + · · · + c1 tk! . Multiplying by eλt gives the conclusion. 2
70
We consider a simple model of how ‘populations’ change over time. For example,
we may want to predict how many animals of a certain species will be alive in a few
years time, or how rapidly a progressive disease will spread throughout a given pop-
ulation. The model is called age-specific because it is sophisticated enough to take
into account the fact that the likelihood of an animal reproducing (or succumbing
to a progressive disease) is dependent on its age.
The main assumption is that, as females give birth, they are more essential to the
propagation of a species than the male. With regard to animals, the number of
males in the population is not so critical to the population dynamics, as one male
can be the father of a very large number of offspring. The same logic does not hold
as strongly for human society or for a few exceptional animal species.
Suppose that the maximum age of a female of the species is L. We shall divide the
interval [0, L] into n age classes,
h L h L 2L h (n − 1)L i
0, , , ,..., ,L .
n n n n
We are interested in the number of females in each age class and how this grows
with time. We assume that the population in each of these age classes is measured
at discrete time intervals of L/n; that is, at times
L 2L kL
t0 = 0, t1 = , t2 = , . . . , tk = ,....
n n n
(k)
Now, for 1 ≤ i ≤ n, let xi be the population in the ith age class Ci as measured
at time tk = kL/n where
h (i − 1)L iL
Ci = , ,
n n
We consider two demographic parameters which determine how these age-specific
populations change:
We assume that ai ≥ 0 for all i and 0 < bi ≤ 1 for i < n. If, for any i, we find that
ai > 0, we call the age class Ci fertile.
It is of course difficult to state precisely an upper bound for the age that can be
obtained by any individual. For our purposes in analyzing population dynamics, it
is more important that we have a well defined upper bound for fertility. In either
case, for humans a bound of 200 years would suffice.
Example: Suppose that the maximum age of a female of a certain species is 45 years
and that we consider the three age classes [0, 15), [15, 30), [30, 45]. If the demographic
parameters defined above are:
Now, if the initial population in each age class is 1000, then x(0) = (1000, 1000, 1000)t ,
which implies that after one time period (i.e. 15 years), the population in each age
class will be given by
0 4 3 1000 7000
x(1) = 1/2 0 0 1000 = 500 ,
0 1/4 0 1000 250
after two time periods (i.e. 30 years), the population in each age class will be given
by
0 4 3 7000 2750
(2) (1)
x = Lx = 1/2 0 0 500 = 3500 ,
0 1/4 0 250 125
and after three time periods (i.e. 45 years), the population in each age class will be
given by
0 4 3 2750 14375
x = Lx = 1/2
(3) (2)
0 0 3500 = 1375 .
0 1/4 0 125 875
It is natural to ask what the long-term population distribution will be, and to find
this, we need eigenvalues and eigenvectors.
Lk = (P EP −1 )k = (P EP −1 )(P EP −1 ) · · · (P EP −1 ) = P E k P −1 .
| {z }
k times
k k −1
This, in turn, gives in general L = P E P . This drastically simplifies the task of
finding the population distribution in the kth time period.
The result above shows that, if the matrix A has some eigenvalue of algebraic mul-
tiplicity 1, with a norm greater than the norms of all the other eigenvalues, then, as
long as the initial population distribution has a non-zero coefficient for this eigen-
vector, the long term behaviour
of Ak will be dominated by this eigenvalue. This
l
follows from the fact that kl ≤ k l and if λ > 1 then, for any fixed l, limk→∞ λkk = 0.
That is exactly what happens with Leslie matrices.
We know that the eigenvalues of the Leslie matrix are the solutions of the equation
f (x) = 0, and so given this fact, it can be seen (by dividing by xn ) that
f (x) = 0 ⇐⇒ g(x) = 1,
where g(x) is given by
a1 a2 b1 an b1 b2 · · · bn−1
g(x) = + 2 + ··· + .
x x xn
The function g(x) has the following three properties:
• g(x) is a decreasing function for x > 0, with g 0 (x) < 0 for all x > 0,
• as x → 0+ , g(x) → ∞,
• as x → ∞, g(x) → 0.
Consequently, we can conclude that there is a unique positive real solution of the
equation g(x) = 1. That is, L has a unique positive real eigenvalue, which we call
λ1 . Furthermore, this eigenvalue λ1 is of algebraic multiplicity 1, i.e. (x − λ1 ) is
a factor of f (x), but (x − λ1 )2 is not. Indeed, if (x − λ1 )2 is a factor, then the
evaluated derivative f 0 (λ1 ) would be zero. Solving for g(x) in terms of f (x) and
using the quotient rule, one discovers that f (λ1 ) = 0 and f 0 (λ1 ) = 0 would imply
that g 0 (λ1 ) = 0, contradicting that g 0 (x) is strictly negative for positive x.
Given any eigenvalue λ of the Leslie matrix L, we can write down the equation a
corresponding eigenvector v satisfies:
a1 a2 . . . an−2 an−1 an v1 λv1
b 0 . . . 0 0
1 0 v2 λv2
.. .. .. .. .. .. .. = .. .
. . . . . . . .
0 0 . . . bn−2 0 0 vn−1 λvn−1
0 0 . . . 0 bn−1 0 vn λvn
Solving from the bottom up, we can express all the vi in terms of vn : vn−1 =
(λ/bn−1 )vn , vn−2 = (λ/bn−2 )vn−1 = (λ2 /bn−2 bn−1 )vn , and so on until
v1 = (λn−1 /b1 b2 · · · bn−1 )vn . The top equation is sure to be satisfied, as λ is an
eigenvalue and so we know there is some solution to the equation above. Dividing
by the value of v1 , we can write down the eigenvector v corresponding to λ is
1
b1 /λ
2
v= b1 b2 /λ .
.
.
.
b1 b2 · · · bn−1 /λn−1
75
Theorem 12.2.2: If two successive entries ai and ai+1 of the Leslie matrix L are
both non-zero, and the b1 , . . . , bi are all non-zero then for any eigenvalue λ of L other
than λ1 , |λ| < λ1 .
In other words, if there are two successive fertile age classes with a positive proba-
bility of reaching both of these age classes, then the eigenvalue λ1 is dominant.
So, let us suppose that there are two successive fertile classes reached with positive
probability. Let v1 , as above, be the eigenvector that corresponds to λ1 . Let E be
the matrix in Jordan normal form such that L = P EP −1 and
P = v1 v2 · · · vn .
x(k) 1
k
= k
P E k P −1 x(0) ,
λ1 λ1
But, since λ1 is dominant, we know that |λ/λ1 | < 1 for all other eigenvalues λ, and
so
1 0 ... 0
1 k 0 0 . . . 0
lim k E = .. .. .. .. .
k→∞ λ1 . . . .
0 0 ... 0
Consequently, we find that
x(k) 1
lim k = lim k P E k P −1 x(0) = cv1 ,
k→∞ λ1 k→∞ λ1
76
where the constant c is the first entry of the vector given by P −1 x(0) , in other
words the coefficient for the eigenvector v1 . Thus, for large values of k, we have
the approximation x(k) ' cλk1 v1 , given that the initial population has a positive
coefficient c for the eigenvector v1 , and if so then the proportion of the population
lying in each age class is, in the long run, constant, which tells us that the population
in each age class grows approximately by a factor of λ1 every time period (i.e. every
L/n years).
Of course if we start with only women beyond child-bearing age, the population
dynamics will ignore the eigenvalue λ1 and move quickly toward zero. Furthermore,
if we start with complex numbers that are not non-negative real numbers, then
the long term behaviour may fail to follow the eigenvalue λ1 and multiples of its
eigenvector v1 , but then the dynamics would not follow real population dynamics.
12.3 Example
Returning to the earlier example we find that the characteristic polynomial of the
Leslie matrix
0 4 3
L = 1/2 0 0 ,
0 1/4 0
is given by f (λ) = λ3 − 2λ − 38 . As L has two successive positive entries in its top
row, we have two successive fertile classes. Consequently, we expect a dominant
eigenvalue, and this turns out to be the root λ1 = 3/2. Using the formula above,
the eigenvector v1 corresponding to this eigenvalue is,
1 1 1
v1 = b1 /λ1 = (1/2)/(3/2) = 1/3 .
b1 b2 /λ21 (1/2)(1/4)/(3/2)2 1/18
Thus, for large k, our approximation gives
k 1
3
x(k) 'c 1/3 ,
2
1/18
and so this tells us that the proportion of the population lying in each age class
is, in the long run, constant and given by the ratio 1 : 1/3 : 1/18. From this we
also deduce that x(k) ' 32 x(k−1) , which tells us that the population in each age class
grows approximately by a factor of 3/2 (i.e. increases by 50%) every time period,
which in this case is every 15 years.
77
This may be useful, because we can write down etJ explicitly. Again, we need only
evaluate etJ when J is a Jordan block. See Exercise 56.
79
Theorem 13.1.1 For any n × n matrix A, there is a unitary matrix S such that
S −1 AS = S ∗ AS is upper triangular.
Proof: Let B ∗ = (v1∗ , . . . , vn∗ ) be an ordered basis such that MB−1∗ AMB ∗ is up-
per triangular. Perform the Gram-Schmidt procedure on the vectors v1∗ , . . . , vn∗ to
obtain an ordered orthonormal basis B = (v1 , . . . , vn ) so that, for i = 1, . . . , n,
span({v1 , . . . , vi }) is equal to span({v1∗ , . . . , vi∗ }). As the matrix MB−1∗ AMB ∗ is up-
per triangular, the vector Avi∗ is in span({v1∗ , . . . vi∗ }), and so the vector Avi is in
span({v1 , . . . , vi }), for i = 1, . . . , n. Thus MB−1 AMB is also upper triangular. As B
is an orthonormal basis, MB is a unitary matrix, and that completes the proof. 2
Proof: One direction is easy, namely that if A is diagonal then AA∗ = A∗ A and
hence A is normal.
80
Suppose that A is normal and upper triangular, so aij P = 0 whenever P i > j. The
∗ ∗
condition that AA = A A says that, for all choices i, j, k aik ajk = k aki akj . We
need to show that aik = 0 whenever k > i.
P P
Just restricting to i = j, we have k |aP ik | =
2
k |aki | for allP
2
i. Since aij = 0 for all
j < i, this can be re-written as |aii | + k>i |aik | = |aii | + k<i |aki |2 .
2 2 2
P
Choosing i = 1, we have |a11 |2 + k>1 |a1k |2P= |a11 |2 , and therefore a1k = 0 for all
k > 1. Next choose i = 2: we have |a22 |2 + k>2 |a2k |2 = |a22 |2 + |a12 |2 = |a22 |2 , as
we have already established that a12 = 0. We conclude that a2k = 0 for all k > 2.
We can continue in this way, showing at every stage i that aik = 0 for all k > i. 2
Proof:
(ii) ⇔ (iii): To say that S −1 AS is a diagonal matrix means that the columns of
S form a basis of eigenvectors of A, and to say that S is unitary means that these
columns form an orthonormal set. So to say that A is unitarily diagonalisable means
that there is an orthonormal basis of eigenvectors of A.
(ii) ⇒ (i) If a matrix A is unitarily diagonalisable, then there is a unitary matrix S
with S ∗ AS = D, for some diagonal matrix D. Since A = SDS ∗ and A∗ = SD∗ S ∗ ,
we can write AA∗ = SDS ∗ SD∗ S ∗ = SDD∗ S ∗ . Similarly A∗ A = SD∗ S ∗ SDS ∗ =
SD∗ DS ∗ . But these are equal, since DD∗ = D∗ D for any diagonal matrix D.
(i) ⇒ (ii) Suppose that A is normal. By Theorem 13.1.1, there is a unitary matrix
S such that E = S −1 AS = S ∗ AS is an upper triangular matrix. We claim that
E is also normal: we have E ∗ = S ∗ A∗ S and so EE ∗ = S ∗ ASS ∗ A∗ S = S ∗ AA∗ S =
S ∗ A∗ AS = S ∗ A∗ SS ∗ AS = E ∗ E. By Lemma 13.2.1, it follows that E is diagonal,
and so E = S ∗ AS is a unitary diagonalisation of A. 2
Furthermore, unitary matrices are normal, and hence also unitarily diagonalisable.
Real orthogonal matrices are also unitary matrices, but most are not diagonalisable
with real eigenvalues. What we know is that real orthogonal matrices have orthonor-
mal bases of eigenvectors, which in general are complex vectors corresponding to
81
Theorem 13.3.2: If f (x) = an xn +· · ·+a1 x+a0 is a polynomial whose roots are all
real numbers, then all the roots are positive if and only if the ai alternative between
being positive and negative.
If all the roots are negative, then there are positive reals b1 , . . . , bn , and a non-zero
real number s, such that f (x) = a(x + b1 )(x + b2 ) · · · (x + bn ); expanding this out
tells us that all the coefficients ai have the same sign as a.
On the other hand, if the ai are either all positive or all negative, then either f (x) > 0
for all non-negative x or f (x) < 0 for all non-negative x, and in either case there
are no non-negative roots. This establishes the claim.
The polynomial f (x) has only positive roots if and only if f (−x) has only neg-
ative roots. By the above claim, f (−x) has only negative roots if and only if
f (−x) = (−1)n an xn + (−1)n−1 an−1 xn−1 + · · · − a1 x + a0 has either only positive
coefficients (−1)i ai , or only negative coefficients (−1)i ai , and this is equivalent to
the ai alternating between being positive and negative. 2
Notice that the result does not hold unless we assume that all the roots are real.
The polynomials x2 + x + 1 and x2 − x + 1 both have two roots that are complex
and not real.
Theorem 13.3.3:
(i) The characteristic polynomial of a Hermitian matrix has only real coefficients.
(ii) A Hermitian matrix is positive definite if and only if its characteristic polynomial
has coefficients alternating in sign.
Proof: (i) Because a Hermitian matrix A has only real eigenvalues, its characteristic
polynomial f (x) = det(xI −A), with leading term 1, takes the form (x−r1 )m1 · · · (x−
rk )mk where r1 , . . . , rk are real numbers. Expanding out gives a polynomial with real
coefficients.
(ii) By Theorem 13.3.2, the characteristic polynomial f (x) has coefficients alternat-
ing in sign if and only if all the roots r1 , . . . , rk are positive. By Theorem 13.3.1, all
roots (eigenvalues) are positive if and only if the matrix is positive definite. 2
83
Examples: Set:
5 −1 3 5 −1 3 + i
A1 = −1 2 −2 ; A2 = −1 2 −2 .
3 −2 3 3 − i −2 3
Both A1 and A2 are Hermitian. We want to discover if they are also positive definite.
Proof: Since A∗ and AA∗ represent linear transformations defined on the same
vector space Cm , to show that AA∗ and A∗ have the same rank it suffices to show
that the nullspace of A∗ is the same as the nullspace of AA∗ . This is also enough
for the rest of the theorem, as the rank of A is equal to the rank of A∗ .
One direction is easy: the nullspace of A∗ is contained in the nullspace of AA∗ , since
if A∗ v = 0 then also A(A∗ v) = A 0 = 0.
84
This means that the eigenvectors Av1 , . . . , Avk of AA∗ are also orthogonal. The
normalised vectors √1λ Av1 , . . . , √1λ Avk form an orthonormal set of eigenvectors of
1 k
AA∗ .
Theorem 13.5.1: The norm of any matrix A is equal to its largest singular value.
85
α1 v1 + · · · + αm vm = 0.
If all the αi for i > ` are zero, then we have a contradiction to the assumption that
{v1 , v` } is a basis of U1 . So the vector v = α`+1 v`+1 +· · · αm vm is a non-zero element
of U2 . The vector v is also an element of U1 , as it can be written as −α1 v1 −· · ·−α` v` .
Therefore v is in U1 ∩ U2 , and so it can be written as β1 v1 + · · · + βk vk , for some
scalars βi . This now gives us two different representations of the vector v in U2 as
a linear combination of the elements of {v1 , . . . , vk , v`+1 , . . . , vm }, contradicting the
assumption that this is a basis of U2 . 2
88
Example: Consider the two subspaces U1 = span({(0, 1, 0)t , (0, 1, 1)t }), U2 =
span({(1, 2, 3)t , (2, 0, 4)t }) of V = R3 . It’s easy to check that the first three of
these four vectors are linearly independent, so U1 + U2 = R3 . The formula in Theo-
rem 14.1.1 tells us that U1 ∩ U2 will have dimension 2 + 2 − 3 = 1. To find a vector
in the intersection, note that U1 is the set of all vectors (x, y, z)t with x = 0. The
vector 2(1, 2, 3)t − (2, 0, 4)t = (0, 4, 2)t is therefore in U1 ∩ U2 . So U1 ∩ U2 is the set
of multiples if (0, 4, 2)t .
Conversely, suppose that, for every u ∈ U1 + U2 , there is only one way to write
u ∈ U1 + U2 as u1 + u2 , with u1 ∈ U1 and u2 ∈ U2 . If u is a vector in U1 ∩ U2 , then
u + 0 and 0 + u are two expressions of u as a sum of a vector in U1 and a vector in
U2 , so these must be the same expressions, i.e., u = 0. 2
Theorem 14.2.1 does not hold in all infinite-dimensional vector spaces, as we will
show later.
Proof: Take a vector z in R(A)⊥ . This means that, for every x in Rm , we have
(Ax) · z = x · (At z) = 0. But, since this is true for all x in Rm , including x = At z,
the positivity property implies that At z = 0, which means that z ∈ N (At ).
On the other hand, suppose that z ∈ N (At ). We reverse the process. Let x be any
vector in Rm . As At z = 0, we have 0 = x · (At z) = (Ax) · z. Because x was chosen
arbitrarily, it follows that z is in R(A)⊥ . 2
We have been using Corollary 14.2.3 already, to find vectors orthogonal to the
nullspace of a matrix A. These are the row vectors of A, or alternatively the row
vectors of a matrix obtained from A by row reduction. The row vectors of the ma-
trix are the column vectors of the transpose of the matrix, and therefore they span
R(At ).
With complex matrices, we must be more careful, since we have to take the conju-
gates of the row vectors to find the orthogonal complement of the nullspace.
Example: Let V be R4 and let U = span({(1, 2, −1, 1)t , (1, 1, 0, −1)t }). We want
to find U ⊥ . The easiest way is to find the nullspace of the matrix whose rows are
(1, 2, −1, 1) and (1, 1, 0, −1). This matrix
1 2 −1 1 0 −1 1 −2
row reduces to .
1 1 0 −1 1 1 0 −1
90
Its nullspace, which is U ⊥ , is spanned by (1, −1, −1, 0)t and (1, 0, 2, 1)t . Thus U ⊥ =
span({(1, −1, −1, 0)t , (1, 0, 2, 1)t }).
14.3 Projections
A projection is a linear transformation P from a (real) vector space V to itself such
that P (v) = v for all v in im(P ).
An n × n matrix A is idempotent if A2 = A.
Proof: Suppose P is a projection from V to itself, and let u be any vector in V . Now
v = P (u) is a vector in im(P ), so P (P (u)) = P (v) = v = P (u). As P 2 (u) = P (u)
for every u in V , we have P 2 = P .
We now show that we can reverse the process: if V is the direct sum of two subspaces,
then there is a unique projection with one of the subspaces as its image and the other
as its kernel. The construction of the projection is simple and natural. Given two
subspaces U1 and U2 of a vector space V such that V = U1 ⊕ U2 , we define a function
P : V → U1 as follows: if v is written uniquely as u + w, with u ∈ U1 and w ∈ U2 ,
then we set P (v) = u.
Proof: First we show that P is a linear transformation. Let x and y be two vectors
in V , and let a, b be scalars. We need to show that P (ax + by) = aP (x) + bP (y).
Write x uniquely as ux + wx and y uniquely as uy + wy , with ux , uy ∈ U1 and
91
From the definition, we have that P (u) = u for all u ∈ U1 , so im(P ) contains U1 .
Also, if v is in U2 , then the unique way to write v as a sum of a vector in U1 and
a vector in U2 is as 0 + v, and hence P (v) = 0. So ker(P ) contains U2 . As the
dimension of im(P ) and the dimension of ker(P ) sum to the dimension of V , which
is dim(U1 ) + dim(U2 ), it now follows that im(P ) = U1 and ker(P ) = U2 .
Now suppose T is any projection on V such that U1 is the image of T and U2 is the
kernel of T . Letting v = u + w be any vector in V , with u ∈ U1 and w ∈ U2 , we
have T (v) = T (u) + T (w) = u + 0 = u, and therefore T = P . 2
The projection P with image U1 and kernel U2 is called the projection of V onto U1
parallel to U2 .
Given bases of U1 and U2 with V = U1 ⊕ U2 , one can define explicitly the matrix
that represents the projection onto U1 parallel to U2 .
Example: Let the vector space be R4 , set U = span({(0, 1, −1, 1)t , (1, 0, −2, 0)t }),
and W = span({(0, 0, 1, 0)t , (0, 0, 0, 1)t }). Let P be the projection onto U parallel
to W . We want to find the matrix AP representing P . We need the change of basis
matrix MB corresponding to the basis
B = {(0, 1, −1, 1)t , (1, 0, −2, 0)t , (0, 0, 1, 0)t , (0, 0, 0, 1)t }, which is:
0 1 0 0 0 1 0 0
1 0 0 0 1 0 0 0
MB = −1 −2 1 0
; M −1
B =
2 1 1 0 .
1 0 0 1 0 −1 0 1
1 0 0 0
0 1 0 0
As
0 0 0 0 is the matrix representing P with respect to the basis B, we have
0 0 0 0
that AP is equal to
0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0
−1 −2 1 0 0 0 0 0 2 1 1 0 = −2 −1 0 0 .
1 0 0 1 0 0 0 0 0 −1 0 1 0 1 0 0
92
Theorem 15.1.1: An n × n real matrix A is normal with all eigenvalues real if and
only if it is symmetric.
Theorem 15.1.1 implies that all matrices representing reflections in Rn are sym-
metric, since a reflection is represented by an orthogonal matrix, which is also a
normal matrix, and it has 1 and −1 for eigenvalues. However, not all orthogonal
−1 0
and symmetric matrices represent reflections: for instance, represents
0 −1
the rotation by the angle π.
For the converse, suppose that A is symmetric and idempotent. Because it is idem-
potent, A is a projection. As A is real and symmetric, it is Hermitian and therefore
normal. Let v1 , . . . , vn be an orthonormal set of eigenvectors for A, and order them
so that v1 , . . . , vk correspond to eigenvalue 1 and vk+1 , . . . , vn correspond to eigen-
value 0. Let U = span({v1 , . . . , vk }) and W = span({vk+1 , . . . , vn }). We now see
that U = R(A), W = N (A), U and W are orthogonal to each other, and the vector
space Rn is the direct sum of U and W . So A represents the orthogonal projection
93
onto U . 2
In the last lecture, we found the matrix A representing the projection onto
span({(0, 1, −1, 1)t , (1, 0, −2, 0)t }), such that the two given vectors (0, 0, 1, 0)t and
(0, 0, 0, 1)t were in the nullspace of A. As (0, 0, 1, 0)t and (0, 0, 0, 1)t are not perpen-
dicular to (0, 1, −1, 1)t and (1, 0, −2, 0)t , this is not an orthogonal projection, and
therefore the previous theorem tells us that the matrix A representing
this projection
1 0 0 0
0 1 0 0
is not symmetric. Indeed that matrix was −2 −1 0 0 .
0 1 0 0
Proof: We are given that BA, an m × m matrix, has rank m, and therefore R is
well defined. We must check that R is idempotent with the same range as A and
the same nullspace as B.
Idempotent:
R2 = [A(BA)−1 B][A(BA)−1 B] = A[(BA)−1 BA][(BA)−1 B] = A(BA)−1 B = R.
Same range: Because R is of the form AM , the range of R is contained in the range
of A. Suppose that v is in R(A). This means that there is a vector u in Rm such
that v = Au. Now Rv = RAu = [A(BA)−1 B]Au = A[(BA)−1 BA]u = Au = v.
Same nullspace: It follows directly from the definition of R that any vector in the
nullspace of B is also in the nullspace of R: N (B) ⊆ N (R).
To show that the two nullspaces are equal, it suffices to show that they have the
same dimension. By the rank-nullity theorem, it suffices to show that B and R have
the same rank.
Because R has the same range as A, we have r(R) = r(A) = m. Also r(B) ≥
94
The second inner product is fixed, but the first varies with the choice of u. Because
P (v) is in U , the choice of u = P v is legitimate and also is the unique choice that
minimises the distance. 2
Assume that D × R is the set on which one has data, meaning that the data consists
of pairs of points (d, r) ∈ D × R, such that we would like to establish a functional
relationship between points in D and values in R. Now assume further that the
reasonable functions f for which the pair (d, r) should satisfy r = f (d) form a finite-
dimensional subspace U of all the functions from D to R. Let {f1 , f2 , . . . , fm } be a
basis for this finite-dimensional subspace U .
Assume that our data is in the form of pairs (d1 , r1 ), (d2 , r2 ), . . . , (dn , rn ), with n ≥ m.
Every function f in the subspace U also generates pairs (d1 , f (d1 ), . . . , (dn , f (dn ).
The accuracy of a function f in U as an approximation to the given data can be
measured by the distance between (f (d1 ), f (d2 ), . . . , f (dn ))t and (r1 , r2 , . . . , rn )t in
Rn , most naturally with respect to the Euclidean norm.
What we really want is the function that approximates this data, not the data
produced by this function. However, this function is not difficult to find. Because
the matrix A represents a linear transformation from U to a subspace of the same
dimension as U , there can be only one function in U that is mapped to P (v). The
vector (At A)−1 At v in Rm corresponds to a function in U , written in terms of the
basis B, that gives the data P (v), since A((At A)−1 At v) = P (v). Since there is
only one function that does this, it must be the function we are looking for, and so
applying the matrix (At A)−1 At to the data v gives the answer.
Example: Suppose we drop a 1kg lead sphere from the heights of 1, 2, 3, and 4
meters and measure the time it takes for the sphere to hit the ground. Suppose
we record the data (1, 7), (2, 10), (3, 12), (4, 15) (where the time is a multiple of
1/20th of a second) and our hypothesis is that the correct function is of the form
f = a + bx + cx2 for some constants a, b and c. We want to find the values of a, b,
c giving the best approximation to the given data.
Let e1 = (1, 0, 0)t correspond to the function 1, e2 = (0, 1, 0)t to the function x, and
e3 = (0, 0, 1)t to the function x2 – this means that we are taking the ordered basis
(f1 , f2 , f3 ) of our space U of approximating functions, where f1 (x) = 1, f2 (x) = x and
98
Applying the above matrix to the vector consisting of our data measurements v =
(r1 , r2 , r3 , r4 )t = (7, 10, 12, 15)t , we get:
7
9/4 −3/4 −5/4 3/4 9/2
−31/20 23/20 27/20 −19/20 10 = 26/10 .
12
1/4 −1/4 −1/4 1/4 0
15
This means that the best approximation in span({1, x, x2 }) to this data is the func-
tion f (x) = 26x 9
10 + 2 . This function fits the data rather well, as the following table
shows.
di 1 2 3 4
f (di ) 7.1 9.7 12.3 14.9
ri 7 10 12 15
This is called the method of least-squares approximation, because what is being min-
imised is the distance from the real data (r1 , . . . , rn )t to the estimate (f (d1 ), . . . , f (dn ))t ,
99
We know already how to find a left inverse; in Lecture 16, when we found a function
that was the best fit to given data, we did it by using a left inverse. In that case
the matrix (At A)−1 At was a left inverse.
Every left inverse of A can be created with the formula (BA)−1 B, for some matrix
B. Indeed, if Al is a left inverse of A, then (Al A)−1 = Im , and putting B = Al in
the formula gives (Al A)−1 Al = Al .
Proof: Since for every vector v in Rm we have (Al A)v = v and (Ax A)v = v it
follows that (Al − Ax )A is the zero matrix.
Suppose N (Al ) = N (Ax ). Since both Al and Ax are left inverses of A, they agree on
what they do to vectors in the range of A. They also agree on what they do to vectors
in N (Al ), namely they send these elements to 0. We claim that Rn = R(A)+ N (Al ):
this implies that Al and Ax agree on a basis of Rn , and therefore are the same matrix.
To show that Rn = R(A) + N (Al ), let v be any vector in Rn . We see that AAl v
101
Corollary 17.1.3: If an n × m matrix A has rank m, then there is one and only
one left inverse Ag of A such that N (Ag ) = R(A), namely Ag = (At A)−1 At .
Right and left inverses are closely related to each other. From the definitions, we
see that if Al is a left inverse of A then A is a right inverse of Al , and likewise if Ar
is a right inverse of A then A is a left inverse of Ar .
Lastly, because the range of (AB)−1 is the whole space Rn , the range of B(AB)−1
is equal to the range of B. 2
If Ar is a right inverse of A, then (AAr )−1 is the n × n identity, making Ar (AAr )−1 =
Ar a right inverse. This tells us that every right inverse of A can be created by the
formula B(AB)−1 for some matrix B so that AB is invertible.
Proof: Since, for every vector v, (AAr )v = v and (AAx )v = v it follows that
A(Ar − Ax ) is the zero matrix.
Now suppose that Ar and Ax have the same ranges, and let v be any vector in Rn .
As Ar v is in the range of Ax , Ar v is equal to Ax w for some w ∈ Rn . By the right
inverse property, v = AAr v = AAx w = w. and it follows that Ar v = Ax w = Ax v.
Therefore Ar = Ax .
For the last part, if Ar and B are as given, then A(Ar +B) = AAr +AB = In +0 = In ,
so Ar + B is also a right inverse for A. 2
Corollary 17.2.3: If an n × m matrix A has rank n, then there is one and only
one right inverse Ag of A such that the range of Ag is equal to N (A)⊥ , namely
Ag = At (AAt )−1 .
Proof: The range of the given matrix Ag is contained in the range of At , which is
N (A)⊥ . Moreover, any right inverse of A has rank n, while the dimension of N (A)
is m − n, so the dimension of N (A)⊥ is n. Therefore the range of Ag is equal to
N (A)⊥ .
By the previous result, any two right inverses of A with range equal to N (A)⊥ are
equal. 2
We choose two different 3 × 2 matrices, At and B, and use these to construct two
103
different right
inverses of A:
12 0 1
2 2 0 1
A = 0
t
1 ; B = 1 0 ; t
AA = ; AB = ;
2 5 1 2
10 0 0
t −1 5/6 −1/3 −1 −2 1
(AA ) = ; (AB) ;
−1/3 1/3 1 0
1/6 1/3 1 0
Ag = At (AAt )−1 = −1/3 1/3 ; Ax = B(AB)−1 = −2 1 .
5/6 −1/3 0 0
As a check:
5/6 −1/3
1 0 1 0 0
A(Ax − Ag ) = −5/3 2/3 = .
2 1 0 0 0
−5/6 1/3
Let’s ask what properties left and right inverses have in common. If A has a left
inverse Al , then AAl A = A(Al A) = A. If A has a right inverse Ar , then AAr A =
(AAr )A = A. This inspires the following definition.
Notice that, if A has a left inverse Al , then A and Al are doubled-sided generalised
inverses of each other, since AAl A = A(Al A) = A and Al AAl = (Al A)Al = Al .
Similarly, if Ar is a right inverse of A, then A and Ar are doubled-sided generalised
inverses of each other.
We now show how to construct a double-sided generalised inverse for any matrix A.
The first step is to write A as a product of two full-rank matrices.
Theorem 17.3.3: Let A be an n × m matrix of rank k > 0. The matrix A can be
written as a product BC of a k × m matrix C of rank k an n × k matrix B of rank
k.
1 1 0 1 1 0 0
0 1 0 0 1
1 1 = 0 1 1 0
BC =
0 0 1 0 −1 = A.
0 1 0 0 1 1
0 0 1 1
1 0 0 1 0 0 1
106
Next, we find a left inverse of B and a right inverse of C. We can save work
by choosing simple matrices P and Q to use in the formulae B l = (P B)−1 P and
C r = Q(CQ)−1 . For instance, we might take:
1 0
1 0 0 0 1
P = ; Q=
0 0 , and now:
0 1 0
0 0
3 2 1 −1
PB = ; CQ = ;
−1 −1 0 2
−1 1 2 −1 1 1/2
(P B) = ; (CQ) = ;
−1 −3 0 1/2
107
1 2 1 0 0 1 2 0
B l = (P B)−1 P = = ;
−1 −3 0 1 0 −1 −3 0
1 0 1 1/2
0 1 1 1/2 0 1/2
C r = Q(CQ)−1 =
0 0 0 =
0 0 .
1/2
0 0 0 0
Therefore, a double-sided generalised inverse for A is
1 1/2 1/2 1/2 0
0 1/2 1 2 0 −1/2 −3/2 0
C B =
r l
0 0 −1 −3 0 =
0
.
0 0
0 0 0 0 0
108
For all matrices A and Ag of appropriate dimensions the range of Ag AAg is contained
in the range of Ag A and the range of Ag A is contained in the range of Ag . But since
Ag AAg is equal to Ag we conclude that all the images are equal to each other. 2
Theorem 18.2.2: For every non-zero matrix A, the strong generalised inverse As
exists and is unique.
Proof: First we show uniqueness. Let As and Ag be two matrices that satisfy the
four properties, namely that both the pair A, As and the pair A, Ag are double-sided
generalised inverses of each other, and that all four matrices AAs , As A, AAg and
Ag A are symmetric. By Theorem 18.1.2, we need only to show that the nullspaces
of As and Ag are the same and that the ranges of As and Ag are the same.
As AAg = (AAg )t , Theorem 14.2.2 tells us that N (AAg ) and R(AAg ) are orthogonal
complements. By Theorem 18.2.1, N (AAg ) = N (Ag ) and R(AAg ) = R(A), and
so N (Ag ) and R(A) are orthogonal complements. Similarly, R(Ag ) and N (A) are
orthogonal complements. The same reasoning applies to As , and hence it follows
from Theorem 18.1.2 that As = Ag .
It remains to show the existence of the strong generalised inverse. Given a non-zero
n × m matrix A, by Theorem 17.3.3, if k = r(A) then A can be written as BC,
where B is an n × k matrix of rank k and C is a k × m matrix of rank k. Define
To show that As is a strong generalised inverse, we need to show that AAs and
As A are symmetric. We have AAs = BCC t (CC t )−1 (B t B)−1 B t = B(B t B)−1 B t and
As A = C t (CC t )−1 (B t B)−1 B t BC = C t (CC t )−1 C. Because (M t )−1 = (M −1 )t for all
invertible matrices M , it follows that both these matrices are symmetric. 2
For example, suppose that we have data collected for the three values −2, 1, 3 for
x, the independent variable, two measurements for each value for x. We want
to find the best approximating function in span({1, x, x2 , x3 , x4 }). The matrix A
representing the linear transformation from the subspace of the functions onto the
space of y values will be:
1 −2 4 −8 16
1 −2 4 −8 16
1 1 1 1 1
1 1 1 1 1 .
1 3 9 27 81
1 3 9 27 81
Once we have found As , the strong generalised inverse of A, the matrix As will be
applied to the vector (a−2 , b−2 , a1 , b1 , a3 , b3 )t , where a−2 and b−2 are the two values
for y corresponding to x = −2, a1 and b1 are the two values for y corresponding
to x = 1, and a3 and b3 are the two values for y corresponding to x = 3. The
result will reveal the appropriate linear combination of the functions 1, x, x2 , x3 , x4
to approximate the data.
112
consider the projections onto finite-dimensional subspaces U 0 of U , and ask how well
these projections approximate the vectors of V .
of polynomial
R1 functions of degree 2 or less, when the inner product is defined by
hf, gi = 0 f (x)g(x) dx.
We are now able to use the methods of the previous lectures to find the orthogonal
projection function from the vector space of real-valued continuous functions on
[0, 1] to the three-dimensional subspace P 2 of polynomial functions, and also do the
same for the higher dimensional subspaces P k .
Take, for example, the function f (x) = e−x . We calculate the inner product of this
function with
R 1 −xeach of the−xfunctions g1 , g2 and g3 in the orthonormal basis of P 2 :
hf, g1 i = 0 e dx = −[e ]10 = 1 − e−1 ;
√ R1 √ √
hf, g2 i = 3 0 (2x − 1)e−x dx = 3[(−2x − 1)e−x ]10 = 3(1 − 3e−1 );
√ R1 √ √
hf, g3 i = 5 0 (6x2 −6x+1)e−x dx = 5[−6x2 e−x −6xe−x −7e−x ]10 = 5(7−19e−1 ).
Now the Gram-Schmidt formula tells us that the orthogonal projection P (f ) of e−x
onto P 2 is:
P (f )(x) = 5(7 − 19e−1 )(6x2 − 6x + 1) + 3(1 − 3e−1 )(2x − 1) + 1 − e−1 =
30(7 − 19e−1 )x2 − (204 − 552e−1 )x + 33 − 87e−1 .
Let’s compare this approximation with one we know better: the Taylor expansion.
The second order Taylor expansion of e−x about x = 0 is the polynomial 1 − x + 12 x2 .
This is a very accurate approximation to e−x for values close to 0, but for 1 we
get the value 1/2. The value of our new approximation function P (f ) at x = 1 is
39−105e−1 . Now is the time to use a calculator. Our new approximation 39−105e−1
is approximately .372659, whereas e−1 is approximately .367879.
114
the Taylor expansion we have used is centred about x = 0, and is designed to give
very good approximations only in the neighbourhood of 0.
If one continues the process of functional approximation without end, either with
an infinite orthonormal subset of functions and the Gram-Schmidt process, or with
the Taylor expansion, or with any other method of approximation or projection,
will one obtain in the limit the original function? In many cases the answer is yes.
When this does and doesn’t happen could be the subject matter of an entire course.
Example: Let V be the space of the previous example, the set of infinite sequences
(r1 , r2 , . . . ) of real numbers such that the sum r12 +r22 +· · · is finite. Define T : V → V
by T (r1 , r2 , . . . ) = (r2 , r3 , . . . ). This function T , called the shift transformation
is a linear transformation, since T (ar1 + bs1 , ar2 + bs2 , . . . ) = (ar2 + bs2 , . . . ) =
a(r2 , . . . ) + b(s2 , . . . ), for all real a, b and all pairs (r1 , r2 , . . . ), (s1 , s2 , . . . ) in V .
The image of T is the whole of V . The kernel of T is not {0}, but rather the
subspace {(r, 0, 0, . . . ) | r ∈ R}. The function T is not invertible, because there is
no linear transformation T −1 with the property that T −1 (T (1, 0, . . . )) = (1, 0, . . . ),
since T (1, 0, . . . ) = 0.
The function ekit , for a positive integer k. is the function that circles the complex
numbers of modulus one in the counter-clockwise direction (from 1 to i, to −1, and
so on) with uniform speed k, starting at 1 when t = 0 and travelling around the
complete circle k times between t = 0 and t = 2π. Another way to understand
ekit is via the formula ekit = cos(kt) + i sin(kt). For a negative integer k, ekit does
the same, except that it moves in the clockwise direction. Whether k is positive
or negative the value of ekit is 1 if and only if t is an integer multiple of 2π k . The
conjugate function of ekit , namely ekit is e−kit , and e−kit = ekit . When k is equal to
0, ekit = e0 = 1, and it this function is conjugate to itself.
P
For any trigonometric polynomial p(t) = nk=−n ak ekit , the evaluation of p at t and
t + 2π will be identical, for any t, since e2iπ = 1 and so ekit = ekit+2kiπ for all integers
k. For this reason the trigonometric polynomials are used mostly for understanding
functions that are periodic – f (t + 2π) = f (t) for all t – or that are only defined on
an interval such as (−π, π).
Theorem 20.1.1: For every pair of different integers m and n, emit and enit are
orthogonal in V .
In order to make the functions ekit into an orthonormal set, they must be normalised.
For any integer n, we have:
Z π Z π Z π
he , e i =
nit nit nit nit
e e dt = e(n−n)it
dt = dt = 2π,
−π −π −π
√
so in every case we must divide by 2π to get an orthonormal set of functions.
Some textbooks work instead with the functions 1, cos(kt) and sin(kt), for all pos-
itive integers k. Due to the next result, it doesn’t matter whether we define Un as
above, or as the span of the functions 1, cos(kt) and sin(kt) for all positive k ≤ n.
Theorem 20.1.2: The subspace Un is equal to the span, over the complex numbers,
of the set of functions {cos(kt) | 0 ≤ k ≤ n} ∪ {sin(kt) | 1 ≤ k ≤ n}.
Proof: From the formula eit = cos t + i sin t, and from the facts that sin(−kt) =
− sin(kt) and cos(−kt) = cos(kt), it follows that the subspace Un is spanned (over
the complex numbers) by the functions {cos(kt) | 0 ≤ k ≤ n}∪{sin(kt) | 1 ≤ k ≤ n}.
In the other direction, we can write
1 kit 1
(e + ekit ) = (cos(kt) + i sin(kt) + cos(kt) − i sin(kt)) = cos(kt);
2 2
−i kit −i
(e − ekit ) = (cos(kt) + i sin(kt) − cos(kt) + i sin(kt)) = sin(kt),
2 2
so each function in the basis of Un can be written as a linear combination of the
functions cos(kt) and sin(kt). 2
The next result shows that, if we want to study only real valued functions, it doesn’t
matter whether we use functions of the form ekit or functions of the form cos(nt)
and sin(nt).
Proof: Since ekit and e−kit are conjugates of each other, the conjugate of a function
in Un is also in Un . Therefore 21 (f + f ), the real part of f , is also in Un . Likewise
−i
2 (f − f ), the imaginary part of f , is also in Un . 2
118
The following result, combined with Theorem 20.1.2, shows that the closest approx-
imation in the linear span of the functions 1, cos(kt) and sin(kt) for 1 ≤ k ≤ n can
be found by calculating the closest approximation in the linear span of the functions
ekit for 0 ≤ k ≤ n.
Define Pn to be the orthogonal projection onto Un ; our methods from earlier tell us
that
Z
X 1 X π
n n
1 1 kit −kit
Pn (f ) = √ hf (t), e i √ e =
kit
f (t)e dt ekit .
2π 2π 2π −π
k=−n k=−n
ForR any function f in V , and any integer k, we define the complex number ak (f ) =
1 π −kit
2π −π f (t)e dt. This number ak (f ) is called
Pn the kth kit
Fourier coefficient of f . The
projection of f onto Un is thus given by k=−n ak (f )e .
Example: Consider the function f (t) = |t| in V . Let’s project it to the subspace
Un . Its inner product with 1 is π 2 .
Its inner product with ekit , for k 6= 0, is
Z 0 Z π
−te−kit dt + te−kit dt.
−π 0
R
We evaluate the indefinite integral te−kit dt by parts, setting u = t and dv =
119
Putting all this together, we find that the projection of the function |t| onto Un is
the function
π 2 X 1 kit π 4 X 1
−kit
− (e + e ) = − cos(kt).
2 π k2 2 π k2
k odd, 1≤k≤n k odd, 1≤k≤n
The next result, which is by no means easy to prove, tells us that this is true for
every f in V , and moreover that we have convergence at every point in (−π, π).
P∞
Theorem 20.3.1: For every function f in V , we have
P∞ ||f ||2
= 2π k=−∞ |ak (f )| .
2
The series ∞ ∞ Z π
X 1 X
ak (f )ekit = f (t)e−kit dt ekit
2π −π
k=−∞ k=−∞
is called the Fourier series for f .
Using Theorem 20.3.1, we can return to our example, where we found approxima-
tions for |t| in each Un , and conclude that, for all t in (−π, π),
π 4 X 1
|t| = − cos(kt).
2 π k2
k odd, 1≤k<∞
This is a remarkable enough formula by itself. We can go one step further and
derive
P∞ 1a formula for the sum of the reciprocals of all the squares. We can write
k=1 k 2 = 1 + 4 + 9 + 16 + 25 + 36 + 49 + · · · as
1 1 1 1 1 1
∞
X X 1 1 1 1 π2 4 π2 π2
= (1 + + + + ···) = =
i=0
(2i k)2 4 16 64 8 3 8 6
k odd, 1≤k<∞
This is not yet entirely satisfactory, as the real-valued function et is being expressed
in terms of complex functions. To resolve this, we combine the ak and a−k terms
for each positive integer k. Notice that ak and a−k are conjugates of each other,
and that ekit and e−kit are conjugates of each other, so that ak ekit and a−k e−kit are
conjugates of each other and ak ekit + a−k e−kit is thus twice the real part of ak ekit .
Using the identity ekit = cos(kt) + i sin(kt), we see that the real part ak ekit is the
real part of
(−1)k 1 + ik π
2
(e − e−π )(cos(kt) + i sin(kt)),
2π 1 + k
which is equal to
(−1)k π −π 1 k
(e − e ) cos(kt) − sin(kt) .
2π 1 + k2 1 + k2
If we evaluate at t = 0, then all the terms with sin disappear, and all the cosines
are equal to 1, so we get
∞
1 π −π eπ − e−π X 1
1= (e − e ) + (−1)k .
2π π 1 + k2
k=1
122
π
Multiplying by eπ −e−π and rearranging gives:
∞
X (−1)k+1 1 1 1 1 1 π
= − + − + ··· = − π .
1 + k2 2 5 10 17 2 e − e−π
k=1
This is perhaps a little less elegant than the formulae we got from our first example,
but it is no less remarkable!
123
Exercises
Exercises will be set each week from those below. Not all exercises will be set, and
they may be set in a different order from the order they appear in.
Acknowledgments.
Many of these exercises were produced by Dr Robert Simon. Some others are taken
from the book “Advanced Mathematical Methods” by Adam Ostaszewski, by kind
permission of the author.
1. Determine if the following matrices have inverses, and if so what they are.
1 1 0 0
1 1 1 1 1 0 0 0 1 1 1 2 −1
(i) 1 2 3 (ii) 0 1 1 (iii)
1 0 0 1 (iv) 1 4 1
1 3 5 1 0 1 1 8 −1
0 1 1 0
(a) {(2, 1, 1, 1)t , (1, 2, −1, 1)t , (3, 1, 0, 2)t , (1, 2, 3, 4)t };
(b) {(1, 2, 0, 1)t , (2, 1, 0, 1)t };
(c) {(2, 1, 1, 1)t , (1, 2, −1, 1)t , (3, 1, 0, 2)t , (3, −3, 0, 2)t }.
For any set that is not linearly independent, find a basis for the subspace spanned
by the set.
4. Calculate the rank of the following matrix and find a basis for its nullspace.
1 2 3 1
2 3 1 2
−1 0 7 −1
3 4 −1 3
1 3 8 1
124
7. Write each of the following four vectors as a linear combination of the other
three: u1 = (1, −2, 1)t , u2 = (0, 1, 2)t , u3 = (−1, 2, 1)t , and u4 = (1, 1, 1).
9. Let
P∞ (a0 , a1 , . . . ) andP
(b0 , b1 , . . . ) be two infinite
P∞ sequences of real numbers such
∞
that i=0 ai < ∞ and i=0 bi < ∞. Show that i=0 (ai + bi )2 < ∞. (This is what
2 2
is needed to show that the set `2 defined in Section 1.2 is a vector space: the sum
of two elements of `2 is also an element of `2 .)
10. Suppose we have a matrix A, and we obtain A0 from A by the elementary row
operation of adding the first row of A to the second row. (So the second row of A0
is the sum of the first two rows of A.)
(a) Show that the row space of A is equal to the row space of A0 .
(You need to show that every row of A0 is in the row space of A, and vice versa).
(b) Suppose the columns c01 , . . . , c0n of A0 satisfy, say α1 c01 + · · · + αn c0n = 0 for some
scalars α1 , . . . , αn . Show that the columns c1 , . . . , cn of A also satisfy α1 c1 + · · · +
αn cn = 0.
(c) Deduce that, if a set of column vectors of A is linearly independent, then the
corresponding set of column vectors of A0 is also linearly independent.
You should be able to see that (a)-(c) also hold for any other elementary row oper-
ation.
125
Hence show that this set of functions is linearly independent if α, β and γ are all
different.
13. Show that the following pair f1 and f2 of real valued functions on (−∞, ∞) are
such that the Wronskian W (x) is zero for all choices of x, but f1 and f2 are linearly
independent: f1 (x) = x2 if x ≤ 0 and f1 (x) = 0 if x ≥ 0 and f2 (x) = 0 if x ≤ 0 and
f2 (x) = x2 if x ≥ 0.
(a) If B = ((1, 0, 1)t , (1, 1, 0)t , (0, 1, 1)t ), what is the matrix AB,B
T representing the
linear transformation T with respect to the ordered basis B?
(See Q1.)
(b) Let C = ((2, 0, 2)t , (3, 3, 0)t , (1, 4, 3)t ). Show that AC,B
T is the identity matrix I3 .
Why does that happen?
126
16. Let B = ((1, 0, 1)t , (1, 1, 0)t , (0, 1, 1)t ) be the ordered basis of R3 as in the
previous exercise, and consider the ordered basis C = ((1, 1)t , (−1, 1)t ) ofR2 . Let T:
1 −1 1
R3 → R2 be a linear transformation represented by the matrix AC,B T =
0 1 0
with respect to the ordered bases B and C.
Let B 0 = ((1, 0, −1)t , (1, −1, 0)t , (0, 1, 1)t ) and C 0 = ((2, −1)t , (1, 0)t ) be alternative
0 0
ordered bases for R3 and R2 respectively. Find ACT ,B , the matrix representing T
with respect to the ordered bases B 0 and C 0 .
17. In R3 , the basis is changed from the standard ordered basis (e1 , e2 , e3 ) to
B = ((1, 2, 3)t , (3, 1, 2)t , (2, 3, 1)t ).
Find the matrix Q such that, if x = (x1 , x2 , x3 ) is an element of R3 written in terms
of the standard basis, then Qx is the same element written in terms of the ordered
basis B.
18. Let A be an n×n matrix, and let x be a variable. Let f (x) = det(xI −A) be the
characteristic polynomial of A. Suppose that f (x) = xn + an−1 xn−1 + · · · + a1 x + a0 .
Show that (−1)n a0 is the determinant of A. Show also that −an−1 is the trace of A,
the sum of its diagonal entries.
19. Suppose that A and B are similar matrices. Show that A2 and B 2 are similar.
If A is invertible, show that B is also invertible and that A−1 and B −1 are similar.
20. (a) Suppose that A and B are n × n matrices, and that A is invertible. Show
that the matrices AB and BA are similar.
(b) Suppose that the n × n matrices S and T are similar. Show that there are
matrices A and B, with A invertible, such that S = AB and T = BA.
1 −1 1 1
(c) Let A = and B = . Show that AB is not similar to BA.
0 0 1 1
21. Let A and B be n × n matrices. Suppose that there is an invertible n × n matrix
S such that SAS −1 and SBS −1 are both diagonal matrices (only zero entries off the
diagonal). Show that AB = BA.
127
Does h(x1 , x2 )t , (y1 , y2 )t i = x1 y1 −x1 y2 −x2 y1 +x2 y2 also give an inner product on R2 ?
R24. Let V be the vector space of continuous functions on [−1, 1]. Show that hf, gi =
1 2
−1 x f (x)g(x) dx + f (0)g(0) defines an inner product on V .
Indicate briefly
R 1 why each of the following does not define an inner product on V .
(i) hf, gi = −1 xf (x)g(x) dx + f (0)g(0),
R1
(ii) hf, gi = 0 x2 f (x)g(x) dx + f (0)g(0),
R1
(iii) hf, gi = −1 x2 f (x)g(x) dx + f (0)g(1).
25. Use the Gram-Schmidt process to find an orthonormal basis for the subspace
span({(2, 1, 2)t , (2, 0, 1)t }) of R3 .
26. Find orthonormal bases for the image and kernel of the linear transformation
represented by the matrix
1 1 1 1
M = 2 0 1 −1
0 2 1 3
(with respect to the standard bases), and extend them to orthonormal bases of the
whole spaces R3 and R4 respectively.
27. A function f : R → R is odd if f (−x) = −f (x) for all R 1x, and even if f (−x) =
f (x) for all x. Show that, if f is odd and g is even, then −1 f (x)g(x) dx = 0.
28. Let P 2 be the space of all real polynomial functions on [−1, 1] with degree two
or less. The 3-tuple of functions (1, x, x2 ) is anRordered basis of this space, as in Q14.
1
An inner product is defined on P 2 by hf, gi = −1 f (x)g(x) dx. Find an orthonormal
basis of P 2 with respect to this inner product.
128
29. (i) Suppose A is an n × n matrix with the property that xt Ay = 0 for all x, y
in Rn . Show that A = 0, i.e., every entry aij of A is equal to 0.
(ii) Deduce that, if A and B are n × n matrices with the property that xt Ay = xt By
for all x, y in Rn . Show that A = B.
(iii) Give an example of a non-zero 2 × 2 matrix A such that xt Ax = 0 for all x
in R2 .
31. Let u = (a, b, c)t be a vector in R3 , with a, c 6= 0. Show that v = (b, −a, 0)t is
orthogonal to u, and that there is a value of x such that w = (a, b, x)t is orthogonal
to both u and v. Find a formula for x.
This is often the easiest way to extend a set of size 1 to an orthonormal basis in R3 ,
or indeed – making the appropriate adjustments – C3 .
32. (a) Find the orthogonal matrix A representing a rotation through an angle π/3
anticlockwise about an axis in the direction (−1, 1, 1)t .
(b) Find the orthogonal matrix C representing a reflection in the plane through the
origin with normal (−1, 1, 1)t .
34. Prove the Cauchy-Schwarz Inequality, the Triangle Inequality and the Gener-
alised Theorem of Pythagoras for complex inner product spaces:
(i) for any pair u, v of vectors, |hu, vi| ≤ ||u|| · ||v||;
(ii) for any pair x, y of vectors, ||x + y|| ≤ ||x|| + ||y||,
(iii) if x and y are orthogonal, then ||x − y||2 = ||x + y||2 = ||x||2 + ||y||2 .
Follow one of the lines of proof in the real case, either as in notes or as in lectures.
36. Find an orthonormal basis for the subspace of C3 spanned by (i, 0, 1)t and
(1, 1, 0)t .
38. (a) Which, if any, of the following matrices are Hermitian? Which are unitary?
What does that tell you about their eigenvalues?
2 2i 0 i 0 1
A= ; B= ; C= .
−2i 5 −i 0 i 0
40. Which of the following classes of n×n complex matrices are closed under matrix
multiplication? I.e., if A and B are in the class, is it true that AB is always in the
class? In order to show that such a result is not true, it is best to give an example
of matrices A and B which have the property, such that AB does not. You may
quote results from the notes.
(a) the set of Hermitian matrices?
(b) the set of unitary matrices?
(c) the set of invertible matrices?
(d) the set of normal matrices?
(e) the set of diagonal matrices?
√ √
1/ 2 i/ 2
41. Find all the triples (a, b, c) of real numbers for which the matrix
a + ib c
is:
(a) Hermitian;
(b) unitary;
(c) normal.
42. If the real matrix A is positive definite, and v is any non-zero vector in Rn ,
what can you say about the angle between v and Av?
44. Prove that, if λ1 , λ2 are distinct eigenvalues of a square matrix A, then the
corresponding eigenvectors v1 , v2 are linearly independent.
Now show that, if A has three distinct eigenvalues λ1 , λ2 , λ3 , then the corresponding
eigenvectors are linearly independent.
0 2 1
45. Consider the matrix A = −2 0 2. Find the eigenvalues and eigenvectors
−1 −2 0
of A. Write down a matrix P and a diagonal matrix D such that D = P −1 AP .
There is no need to calculate P −1 .
46. An n × n matrix is called idempotent if A2 = A. Show that 0 and 1 are the only
possible eigenvalues of an idempotent matrix.
131
y30 1 −1 1 y3 1
132
(a) Use the information given to determine the eigenvalues and corresponding eigen-
vectors of M .
(b) Find the general solution to the differential equation y0 = M y.
(c) Describe the long-term behaviour of the solution y to the differential equation
above, in general.
(d) Find all solutions y to the differential equation such that y1 (t) → 0 as t → ∞,
and y2 (0) = 0.
52. Let A be a real diagonalisable matrix whose eigenvalues are all non-negative.
Show that there is a real matrix B such that B 2 = A. Is the matrix B unique?
What if A is not diagonalisable? (You need only consider the case where A is a 2 × 2
matrix.)
Find the general solution. For which initial values (x0 , y0 , z0 ) will zk be positive for
all sufficiently large k?
tλ tλ
λ 1 e te
56. (a) Show that, if J = , then etJ = .
0 λ 0 etλ
λ 1 0
(b) Find a similar formula if J = 0 λ 1 .
0 0 λ
The expressions for the powers of a Jordan block in and after Theorem 12.2.1 should
help.
57. For each of the two matrices below, find Ak for each k, and etA , for t ∈ R. Use
these expressions to write down the general solution of y0 = Ay in each case.
−1 −2 0 −1
(i) A = ; (ii) A = .
2 3 1 0
Note: for (ii), it’s easier if you don’t start by diagonalising the matrix.
0 −1
58. Show that the matrix A = is normal. Find a unitary matrix U such
1 0
that U ∗ AU is diagonal.
134
59. Find unitary matrices P which reduce the following hermitian matrices A to
diagonal form (i.e., so that P ∗ AP is diagonal):
1 0 1
0 −i
(i) ; (ii) 0 4 0 .
i 0
1 0 1
7 0 9
60. Consider the matrix A = 0 2 0. Find an orthogonal matrix P such that
9 0 7
t
P AP is diagonal. Express A in the form λ1 E1 + λ2 E2 + λ3 E3 where the λi are the
eigenvalues of A and the Ei are 3 × 3 matrices formed by vi vit , where the vi are the
corresponding eigenvectors.
Verify that Ei Ej is equal to the zero matrix if i 6= j and otherwise Ei Ei = Ei .
65. Let A be an invertible matrix. What are the singular values of A−1 , in terms of
the singular values of A? What is the norm of A−1 ?
135
69. Let U = span({(1, 2, 1)t , (1, 0, 0)t }) and V = span({(0, 1, 1)t }) be subspaces of
R3 . Find the matrix A representing the projection onto U parallel to V . Find also
the matrix B representing the projection onto V parallel to U .
Evaluate A + B, and explain the answer you get.
70. Let v = (v1 , . . . , vn )t be any vector in Rn with ||v|| = 1. Define the n × n matrix
A = vvt , where vt is represented as a horizontal 1 × n matrix and v is represented
as a vertical n × 1 matrix. Show that A is symmetric and idempotent, of rank one.
Hence show that A represents orthogonal projection onto span({v}).
With A as above, set B = I − 2A. Show that B is symmetric and orthogonal, and
that B 2 = I. Is B orientation-preserving or orientation-reversing? What transfor-
mation does it represent?
Hint: what are the eigenvalues of A, counted according to multiplicity? How do the
eigenvalues and eigenvectors of B correspond to those of A?
71. Let S be the subspace of R3 spanned by (1, 2, 3)t and (1, 1, −1)t . Find the
matrix A representing orthogonal projection onto S.
73. Consider the temperature data given at the end of Section 16.2:
Find the best approximation to this data of the form T = f (t) = a + b cos t + c sin t.
Evaluate your function f (t) at the values t = 0, π/3, . . . , 5π/3.
136
74. Find the least squares fit for a curve of the form
6m
y= +c
x
given the following data points.
x 1 2 3 6
y 5 3 2 1
Why would it be wrong to suppose this was equivalent to the problem of fitting a
curve of the form
z(= xy) = cx + 6m
through the data points (xy, x)?
76. Given the four points (0, 0), (1, 1), (2, 4) and (4, 7) in R2 , find:
(a) the line in R2 minimising the sum of the vertical distances from the points to
the line,
(b) the line in R2 minimising the sum of the horizontal distances from the points to
the line.
77. Let P 3 be the space of real polynomial Rfunctions of degree at most 3, equipped
1
with the inner product given by hf, gi = −1 f (x)g(x) dx. In Q28, we found an
orthonormal basis of the subspace P 2 = span({1, x, x2 }) of P 3 .
Using this basis, find a least squares approximation to x3 in P 2 . What exactly is
being minimised by the least squares approximation?
78. Consider the following data: (−3, 3), (0, 1), (1, 4), where the first coordinate is
the value for x and second the value for y. Find the least squares approximation
using only functions of the form y = ax + b and check how closely it approximates
the data. What is the least squares approximation using functions of the form
y = ax2 + bx + c, and how closely does it approximate the data?
137
79. Write
down the equations which u, v, w, x, y and z must satisfy in order for
u v
w x to be a right inverse of A = 1 −1 1
. Hence find all the right inverses
1 1 2
y z
of A.
Show that A has no left inverse.
82. Let A be a real matrix, with strong generalised inverse As . Show that (As )t =
(At )s , i.e., that the transpose of As is the strong generalised inverse of At .
Deduce that, if A is symmetric, then so is As .
1 x
83. Let A be the singular 2 × 2 matrix , where x and y are real numbers,
y xy
not both zero. Find a formula for the strong generalised inverse As of A.
84. Assume that there are only two values for x, namely 0 and and 1, but there
are four pairs of data, (0, 3), (0, 2), (1, 5), (1, 3), where the first coordinate is the
independent variable x and the second the dependent variable y. Using the strong
generalised inverse, find the least squares approximation function to the data in
span({1, x, x2 }).
138
85. Let P be the space of all polynomial functions. Define the linear transformation
D : P → P by D(f ) = f 0 , i.e., each function f in P is mapped to its derivative.
(i) Does D have an inverse?
(ii) Find a right inverse of D, i.e., a linear transformation D∗ : P → P such that
DD∗ is the identity.
87. Find the best approximation of the function f (t) = t in the interval (−π, π)
by a linear combination of the functions ekit , for k in some range −n ≤ k ≤ n, and
represent this approximation as a real-valued function.
Write the function t as a sum of functions of the form ck sin(kt) and dk cos(kt), for
k > 0.
Evaluate both sides of your final expression at t = π/2 to get an expression for π.
88. Find the best approximation of the function t2 in the interval (−π, π) by a linear
combination of the functions ekit , for k in some range −n ≤ k ≤ n, and represent
this approximation as a real-valued function.
By letting n tend to ∞ and evaluating at t = 0, show that
∞
X (−1)k+1 1 1 1 1 π2
=1− + − + − ··· = .
k2 4 9 16 25 12
k=1