0.1 Review of Axler's Algorithm
0.1 Review of Axler's Algorithm
0.1 Review of Axler's Algorithm
1 0 0
0 1 . . .
We fix a standard basis e1 = , e2 = , . . . , en =
.
... ... 0
0 0 1
Am v = a0 v + a1 Av + . . . + am−1 Am−1 v
for some numbers (a0 , . . . , am−1 ) that contains at least one nonzero number.
Remark 1. How to construct these numbers? Form the matrix [v|Av| . . . |Am v] and row-reduce. By as-
sumption,
the resulting
matrix will necessarily look like [e1 |e2 | . . . |em−1 |w] (because it has rank m − 1) for
a0
a1
...
some w = am−1 . Now just read it off from w.
0
...
0
Rearranging equation, we obtain
Am v − am−1 Am−1 v − . . . − a1 Av − a0 v = 0
pv (A)v = 0
0 −1 1
It is not guaranteed that pv will have real roots; for instance, for A = and v = we will have
1 0 0
pv (t) = t2 + 1. In this case, Axler’s algorithm cannot produce eigenvectors (what it can do is producing
generalized eigenvectors; see Wikipedia for further explanation). For our purpose, however, we will always
assume that pv has some real root λ, i.e. pv (λ) = 0.
A general fact about real polynomials: if pv (λ) = 0 for some real number λ, then we can write it as
pv (t) = (t − λ)q(t) for some polynomial q(t) (it may or may not be the case that q(λ) = 0).1 So from the
discussion above, we have
1
Then by the same argument as above, we have
So (A − λI)r(A)v is another eigenvector of eigenvalue ζ. Repeating this process, we see that if we can write
pv (t) = (t − λ1 ) . . . (t − λm )
(where λi do not have to be distinct), then by swapping each (t − λi ) to the front, we can extract an
eigenvector of eigenvalue λi for each i.
So what are those pv s? For a given matrix A, you may have heard of
• The characteristic polynomials χ(x) = det(A − xI);
• The minimal polynomial µ(x), defined as the unique lowest degree monic2 polynomial among all
polynomials Q such that Q(A) = 0.
These pv are neither. This is clear from the dependence on v: neither χ nor µ depends on the choice of a
vector. What does characterize pv is the following fact:
Lemma 1. pv is the unique lowest degree monic polynomial among all polynomials Q such that Q(A)v = 0.
Compare this with the definition of the minimal polynomial. What is true is that pv will always divide µ,
and µ will always divide χ (the latter is known as the Caylay-Hamilton theorem). Note that neither converse
holds; in particular, it is not true that pv (A) = 0.
For later use, let us actually prove this characterization.
Proof. Let S denote the set of all polynomials Q such that Q(A)v = 0. By construction, we know that pv ∈ S,
and it is indeed a monic polynomial. If it doesn’t have the lowest degree among S, then some other q ∈ S
would have satisfied q(A)v = 0, then Adeg(q) v would have already been within the span of {v, Av, Adeg(q)−1 v},
violating the construction of pv . So the only remaining possibility is that there is another monic polynomial
q ∈ S of the same degree as pv ; this is again not possible, for otherwise (pv − q) ∈ S has strictly lower degree
than pv .
The description above already pins down pv uniquely among all polynomials; for an explicit description
of this pv , see the end of this note.
Left to Right Let us start by assuming p1 , . . . , pn all have simple real roots. Note that this is a statement
that only depends on the choice of A.
For each i ∈ {1, . . . , n}, we consider the matrix
where mi is the m in Axler’s algorithm if we are running it for v = ei ; in particular, we stop writing more
columns just before columns of Ti start to have linear dependence. By this assumption, then, columns of Ti
are linearly independent; in other words, the linear space Img(Ti ) is of dimension mi .
2
That is, it can be written as xm +(terms with lower powers of x); in other words, the leading coefficient is 1.
2
On the other hand, as we discussed above, when pi can be written as
pi = (t − λ1 ) . . . (t − λmi )
(This uses the assumption that it has all real roots)3 we can extract an eigenvector vi of eigenvalue λi for
each i. Furthermore, since no two λs are the same (This uses the fact that the roots are simple), these vi s
are actually linearly independent, as proven in Proof 4.1.
Furthermore, each vi is in the space Img(Ti ). Why is this the case? Take a look at how we got the vi s.
For instance, for v1 , we considered
where q(A) = (t − λ2 ) . . . (t − λmi ), and we set v1 = q(A)v; in particular, since q(A) is a linear combination of
I, A, . . . , Ami −1 , we know that v1 is a linear combination of v, Av, . . . , Ami −1 v, as we wanted. This argument
works for each vi , so we have checked the claim we just made.
Let’s recap what we have gotten so far: we know that
• Img(Ti ), which is by definition the span of {ei , Aei , . . . , Ami −1 ei }, is of dimension mi ;
S = [S1 | . . . |Sn ]
By discussion above, columns of S1 contain e1 within their span, columns of S2 contain e2 within their span,
etc. So in particular, the columns of S, all considered together, contain {e1 , . . . , en } within their span. In
other words, S is of rank n.
As we learned in previous classes, given a set Z of vectors (in this case all columns of S) that span a
vector space V (in this case Rn ) of dimension n, we can always choose a subset of Z of size n such that they
form a basis of V . In this case, since Z consists entirely of eigenvectors, we have formed a basis of Rn using
eigenvectors.
Right to Left Now we assume that A has an eigenbasis, which is necessarily a set of size n. To avoid
confusion with the previous part, let us call them w1 , . . . , wn , and their eigenvalues µ1 , . . . , µn .
Problem: these eigenvalues may coincide with one another; as we’ll see below, this can cause some trouble.
(Let’s call this Trouble 1.) To avoid that, we let λ1 , . . . , λk denote the unique values among µ1 , . . . , µn (it
is only guaranteed that 1 ≤ k ≤ n). We rearrange ws into groups according to their eigenvalues and given
them new names:
(w1,1 , . . . , w1,s1 ), (w2,1 , . . . , w2,s2 ), . . . , (wk,1 , . . . , wsk )
where wi,anything has eigenvalue λi , and si is the number of eigenvectors among {w1 , . . . , wn } that had
eigenvalue λi . (In particular, s1 + . . . sk = n.)
3
In general, if you are proving an “if and only if” statement, you should always check that you have used all conditions on
one side to arrive at the other side, and vice versa: otherwise your proof must have been incorrect.
3
Since these ws are a basis, each of the standard vectors ei can be written as a linear combination of them.
Let’s write one out:
ei = (a(i)1,1 w1,1 + . . . + a(i)1,si w1,s1 ) + (a(i)2,1 w2,1 + . . . + a(i)2,s2 w2,s2 ) + . . . + (. . . + a(i)k,sk wk,sk )
where a(i)something,something are constants. Now comes Trouble 2: for some j ∈ {1, . . . , k}, it may be the case
that all of a(i)j,1 , . . . , a(i)j,sj are zero; in other words, the jth group wasn’t involved in the expansion of ei .
It turns out we also need to handle this possibility; so let us introduce an indicator function Z(j), which is
0 if this happens, and 1 otherwise (that is, when someone from the jth group is involved in the expansion).
Now we cook up a polynomial
Pi = (t − λ1 )Z(1) . . . (t − λk )Z(k)
Note that, by design, if the jth group wasn’t involved, then (t − λj ) doesn’t appear in this expression either.
Remark 2. A crucial distinction here: the pi are polynomials fixed by A from Axler’s algorithm, which a
priori may have no relation from these Pi that we just cooked up from the random set {w1 , . . . , wn } that was
given to us by the right hand side of the assertion. So how can we talk about properties of pi (the left hand
side), if we can only work on Pi ? The trick is that we eventually will show that these Pi s are precisely the
pi s from Axler’s algorithm.
We claim that
The polynomial Pi is a monic polynomial with simple real roots, and Pi (A)ei = 0.
That it’s monic and have all real roots is by construction. If we hadn’t handled Trouble 1, the λs may not
be distinct, thus violating the “simple” requirement above; luckily, we did handle that, so all the λs are
distinct. Furthermore, take any wt,u that had appeared in the expansion of ei , which by our convention has
eigenvalue λt . Since wt,u appeared there, we have Z(t) = 1, so we have
and since this works for any wt,u that had appeared in the expansion of ei , by linearity we have that
Pi (A)ei = 0 as well. (Note this says nothing about Pi (A)wt0 ,u0 for any wt0 ,u0 that didn’t appear in the
expansion of ei .) We further claim that
If we write Pi = (t − λj )Q for any of the linear factors (t − λj ) of Pi , then Q(A)ei 6= 0.
If we hadn’t handle Trouble 2, this would have been false: if no eigenvector in group j appeared in the
expansion of ei , then we can run the same argument above for Q(A) in place of Pi (A) and it would have
gone through. The introduction of the indicator function Z is to precisely handle this. In our case, we
can write ei = wgrp6=j + wgrp=j , where the former is the part that belonged to all other groups (inclding
the coefficients a(i)s) and the latter is the part that belonged to the λj group. Our handling of Trouble 2
guaranteed that wgrp=j 6= 0.
We check that Q(A)wgrp6=j = 0 and Q(A)wgrp=j 6= 0, thereby concluding that Q(A)ei 6= 0. The former
is quick: run the same argument as before for Q(A) in place for Pi (A) and for wt,u , t 6= j, then the same
argument goes through to show that Q(A)wt,u = 0 for all t 6= j. To check that Q(A)wgrp=j 6= 0, note that
by construction, we have Awgrp=j = λj wgrp=j , so
Q(A)wgrp=j = (A − λ1 I) . . . (A − λj I) . . . (A − λk I)wgrp=j
4
• If we write Pi = (t − λj )Q for any of the linear factors (t − λj ) of Pi , then Q(A)ei 6= 0.
Note that we’re not done yet: what we need to show is that pi has the second property, not Pi . (Of
course, the proof goes through for each i in the same way.) So what remains to be proven is that
The conditions above actually guarantee that Pi = pi .
We will need the characterization of pi given in the previous section in Lemma 1. Namely, since Pi ∈ S (see
proof thereof), we have that deg(Pi (t)) ≥ deg(pi (t)). Then by Euclid’s algorithm4 , we can write
Pi (t) = pi (t)q(t) + r(t)
where q(t), r(t) are some polynomial, and deg(r(t)) < deg(pi (t)). Now we have
r(A)ei = Pi (A)ei − q(A)pi (A)ei = 0 − q(A)0 = 0
And if r(t) 6= 0, by dividing by the leading coefficient we can make r(t) into a monic polynomial, violating
the statement in Lemma 1. So we know that r(t) = 0; in other words, pi divides Pi .
For notational simplicity, let {z1 , . . . , z` } denote the numbers i in {1, . . . , k} such that Z(i) = 1. Then
we can write
(t − λz1 ) . . . (t − λz` ) = Pi (t) = pi (t)q(t)
Each λzj is a root of the left hand side, and is therefore a root of the right hand side, so it must have been
either a root of pi or a root of q. (It cannot be both, for then we have a repeated root on the left hand
side.) However, if some λzj actually was a root of q(t), then Pi (t)/(t − λzj ) would still be divided by pi (t);
write Y (t) = Pi (t)/(t − λzj ), we would have Y (A)ei = 0 (because pi (A)ei = 0), which violates the last point
that we proved above. Thus we see that q(t) cannot have any roots, so it must be a nonzero real number.
However, since both Pi and pi are monic, q(t) is forced to equal to 1. We have thus proven that Pi = pi .
5
3 1
The three Jordan blocks are J1 = 3 (λ1 = 3), J2 = 2 (λ2 = 2) and J3 = (λ3 = 3).
0 3
Let di denote the size of Ji , so we have d1 + . . . + dk = n. Let us consider the vector P v; it will now look
like T
P v = c1,1 ... c1,d1 c2,1 ... c2,d2 c3,1 ... ck,dk
where (ci,1 , . . . , ci,di ) are entries corespond to the ith Jordan block. For a fixed i, let z(i) denote the smallest
number within {1, . . . , di } such that ci,z(i) is the first nonzero entry, if there is such a nonzero entry at all;
otherwise let z(i) = di + 1. Since the λi s are potentially equal to one another, we can partition {1, . . . , k}
into ` subsets Group1 , . . . , Group` such that i, j ∈ {1, . . . , k} are in the same group if and only if λi = λj .
In other words, each group corresponds to one unique eigenvalue; denote this eigenvalue for group r by ζr .
Finally, let Z(r) denote the maximal value of di + 1 − z(i) for all i in group r. Then we have the following
expression for pv :
Y`
pv = (t − ζr )Z(r)
r=1