0.1 Review of Axler's Algorithm

    
1 0 0
0 1 . . . 
We fix a standard basis e1 =  , e2 =   , . . . , en =  
     .
... ... 0
0 0 1
0.1 Review of Axler’s Algorithm

Let’s recall how the algorithm works. Starting from an n by n matrix A, and let v be any nonzero vector.
Consider the sequence (v, Av, A2 v, . . .); by the fact that Rn can have at most n linearly independent vectors,
the set of first m entries of this sequence will eventually fail to be linearly independent. Choose the smallest
m such that this happens. (How do you choose? construct the matrix Mt = [v|Av| . . . |At v] for t = 1, 2, 3, . . .
and row reduce on each to check their ranks. The first t such that Mt has rank less than t is the value of
m.) Since linearly independence fails, we can write
Am v = a0 v + a1 Av + . . . + am−1 Am−1 v
for some numbers (a0 , . . . , am−1 ) that contains at least one nonzero number.
Remark 1. How to construct these numbers? Form the matrix [v|Av| . . . |Am v] and row-reduce. By as-
sumption, 
the resulting
 matrix will necessarily look like [e1 |e2 | . . . |em−1 |w] (because it has rank m − 1) for
a0
 a1 
 
 ... 
 
some w = am−1 . Now just read it off from w.

 0 
 
 ... 
0
Rearranging equation, we obtain
Am v − am−1 Am−1 v − . . . − a1 Av − a0 v = 0
(Am − am−1 Am−1 − . . . − a1 A − a0 )v = 0

Now we define
pv (t) = tm − am−1 tm−1 − . . . − a1 t − a0
where we use the subscript v to emphasize the dependence on v; then it symbolically follows that
pv (A)v = 0

0 −1 1
It is not guaranteed that pv will have real roots; for instance, for A = and v = we will have
1 0 0
pv (t) = t2 + 1. In this case, Axler’s algorithm cannot produce eigenvectors (what it can do is producing
generalized eigenvectors; see Wikipedia for further explanation). For our purpose, however, we will always
assume that pv has some real root λ, i.e. pv (λ) = 0.
A general fact about real polynomials: if pv (λ) = 0 for some real number λ, then we can write it as
pv (t) = (t − λ)q(t) for some polynomial q(t) (it may or may not be the case that q(λ) = 0).1 So from the
discussion above, we have
pv (A)v = (A − λI)q(A)v = 0 =⇒ Aq(A)v = λq(A)v
so q(A)v is an eigenvector of A with eigenvalue λ. By now we have constructed an eigenvector, which is

what Axler’s algorithm is supposed to do.
But let’s suppose that q itself had some real root ζ. By same discussion above, we have q(t) = (t − ζ)r(t)
for some polynomial r(t), and
pv (t) = (t − λ)q(t) = (t − λ)(t − ζ)r(t) = (t − ζ)(t − λ)r(t)

1
The general statement is that R[t], the ring of real-coefficient polynomials, is an unique factorization domain.
1
Then by the same argument as above, we have
pv (A)v = (A − ζI)(A − λI)r(A)v = 0 =⇒ A[(A − λI)r(A)v] = ζ[(A − λI)r(A)v]
So (A − λI)r(A)v is another eigenvector of eigenvalue ζ. Repeating this process, we see that if we can write
pv (t) = (t − λ1 ) . . . (t − λm )
(where λi do not have to be distinct), then by swapping each (t − λi ) to the front, we can extract an
eigenvector of eigenvalue λi for each i.
So what are those pv s? For a given matrix A, you may have heard of
• The characteristic polynomials χ(x) = det(A − xI);
• The minimal polynomial µ(x), defined as the unique lowest degree monic2 polynomial among all
polynomials Q such that Q(A) = 0.
These pv are neither. This is clear from the dependence on v: neither χ nor µ depends on the choice of a
vector. What does characterize pv is the following fact:
Lemma 1. pv is the unique lowest degree monic polynomial among all polynomials Q such that Q(A)v = 0.
Compare this with the definition of the minimal polynomial. What is true is that pv will always divide µ,
and µ will always divide χ (the latter is known as the Caylay-Hamilton theorem). Note that neither converse
holds; in particular, it is not true that pv (A) = 0.
For later use, let us actually prove this characterization.
Proof. Let S denote the set of all polynomials Q such that Q(A)v = 0. By construction, we know that pv ∈ S,
and it is indeed a monic polynomial. If it doesn’t have the lowest degree among S, then some other q ∈ S
would have satisfied q(A)v = 0, then Adeg(q) v would have already been within the span of {v, Av, Adeg(q)−1 v},
violating the construction of pv . So the only remaining possibility is that there is another monic polynomial
q ∈ S of the same degree as pv ; this is again not possible, for otherwise (pv − q) ∈ S has strictly lower degree
than pv .
The description above already pins down pv uniquely among all polynomials; for an explicit description
of this pv , see the end of this note.
0.2 Proof 4.2

Let us consider the pv for some specific v, namely v = e1 , . . . , en . We will write pi in place of pei following
Paul’s notation. (To emphasize: once we are talking about a fixed choice of A, these polynomials are already
fully fixed.) Now let us take a look at Proof 4.2, which asks us to prove that
The polynomials p1 , . . . , pn all have simple (i.e. no repeated) real roots if and only if A has an eigenbasis.
Left to Right Let us start by assuming p1 , . . . , pn all have simple real roots. Note that this is a statement
that only depends on the choice of A.
For each i ∈ {1, . . . , n}, we consider the matrix
Ti = [ei |Aei | . . . |Ami −1 ei ]
where mi is the m in Axler’s algorithm if we are running it for v = ei ; in particular, we stop writing more
columns just before columns of Ti start to have linear dependence. By this assumption, then, columns of Ti
are linearly independent; in other words, the linear space Img(Ti ) is of dimension mi .
2
That is, it can be written as xm +(terms with lower powers of x); in other words, the leading coefficient is 1.
2
On the other hand, as we discussed above, when pi can be written as
pi = (t − λ1 ) . . . (t − λmi )
(This uses the assumption that it has all real roots)3 we can extract an eigenvector vi of eigenvalue λi for
each i. Furthermore, since no two λs are the same (This uses the fact that the roots are simple), these vi s
are actually linearly independent, as proven in Proof 4.1.
Furthermore, each vi is in the space Img(Ti ). Why is this the case? Take a look at how we got the vi s.
For instance, for v1 , we considered
pi (A)v = (A − λ1 I)q(A)v = 0 =⇒ Aq(A)v = λ1 q(A)v
where q(A) = (t − λ2 ) . . . (t − λmi ), and we set v1 = q(A)v; in particular, since q(A) is a linear combination of
I, A, . . . , Ami −1 , we know that v1 is a linear combination of v, Av, . . . , Ami −1 v, as we wanted. This argument
works for each vi , so we have checked the claim we just made.
Let’s recap what we have gotten so far: we know that
• Img(Ti ), which is by definition the span of {ei , Aei , . . . , Ami −1 ei }, is of dimension mi ;
• The eigenvectors v1 , . . . , vmi are linearly independent;

• And they are all in Img(Ti ).
So we can conclude that
The eigenvectors v1 , . . . , vmi form a basis of Img(Ti ).

Since ei is itself within Img(Ti ), we now know that
ei can be written as a linear combination of v1 , . . . , vmi .
This is the crucial fact that we wanted. Note that if we write Si = [v1 | . . . |vmi ], then the image of Si , which
is by definition the space of the columns, are the same as Img(Ti ).
Now, for each i ∈ {1, . . . , n}, we can construct the Ti and Si as above. We can now write
S = [S1 | . . . |Sn ]
By discussion above, columns of S1 contain e1 within their span, columns of S2 contain e2 within their span,
etc. So in particular, the columns of S, all considered together, contain {e1 , . . . , en } within their span. In
other words, S is of rank n.
As we learned in previous classes, given a set Z of vectors (in this case all columns of S) that span a
vector space V (in this case Rn ) of dimension n, we can always choose a subset of Z of size n such that they
form a basis of V . In this case, since Z consists entirely of eigenvectors, we have formed a basis of Rn using
eigenvectors.
Right to Left Now we assume that A has an eigenbasis, which is necessarily a set of size n. To avoid
confusion with the previous part, let us call them w1 , . . . , wn , and their eigenvalues µ1 , . . . , µn .
Problem: these eigenvalues may coincide with one another; as we’ll see below, this can cause some trouble.
(Let’s call this Trouble 1.) To avoid that, we let λ1 , . . . , λk denote the unique values among µ1 , . . . , µn (it
is only guaranteed that 1 ≤ k ≤ n). We rearrange ws into groups according to their eigenvalues and given
them new names:
(w1,1 , . . . , w1,s1 ), (w2,1 , . . . , w2,s2 ), . . . , (wk,1 , . . . , wsk )
where wi,anything has eigenvalue λi , and si is the number of eigenvectors among {w1 , . . . , wn } that had
eigenvalue λi . (In particular, s1 + . . . sk = n.)
3
In general, if you are proving an “if and only if” statement, you should always check that you have used all conditions on
one side to arrive at the other side, and vice versa: otherwise your proof must have been incorrect.
3
Since these ws are a basis, each of the standard vectors ei can be written as a linear combination of them.
Let’s write one out:
ei = (a(i)1,1 w1,1 + . . . + a(i)1,si w1,s1 ) + (a(i)2,1 w2,1 + . . . + a(i)2,s2 w2,s2 ) + . . . + (. . . + a(i)k,sk wk,sk )
where a(i)something,something are constants. Now comes Trouble 2: for some j ∈ {1, . . . , k}, it may be the case
that all of a(i)j,1 , . . . , a(i)j,sj are zero; in other words, the jth group wasn’t involved in the expansion of ei .
It turns out we also need to handle this possibility; so let us introduce an indicator function Z(j), which is
0 if this happens, and 1 otherwise (that is, when someone from the jth group is involved in the expansion).
Now we cook up a polynomial
Pi = (t − λ1 )Z(1) . . . (t − λk )Z(k)
Note that, by design, if the jth group wasn’t involved, then (t − λj ) doesn’t appear in this expression either.
Remark 2. A crucial distinction here: the pi are polynomials fixed by A from Axler’s algorithm, which a
priori may have no relation from these Pi that we just cooked up from the random set {w1 , . . . , wn } that was
given to us by the right hand side of the assertion. So how can we talk about properties of pi (the left hand
side), if we can only work on Pi ? The trick is that we eventually will show that these Pi s are precisely the
pi s from Axler’s algorithm.
We claim that
The polynomial Pi is a monic polynomial with simple real roots, and Pi (A)ei = 0.
That it’s monic and have all real roots is by construction. If we hadn’t handled Trouble 1, the λs may not
be distinct, thus violating the “simple” requirement above; luckily, we did handle that, so all the λs are
distinct. Furthermore, take any wt,u that had appeared in the expansion of ei , which by our convention has
eigenvalue λt . Since wt,u appeared there, we have Z(t) = 1, so we have
Pi (A)wt,u = (Some polynomial in A)(A − λt I)wt,u = (Some polynomial in A)0 = 0
and since this works for any wt,u that had appeared in the expansion of ei , by linearity we have that
Pi (A)ei = 0 as well. (Note this says nothing about Pi (A)wt0 ,u0 for any wt0 ,u0 that didn’t appear in the
expansion of ei .) We further claim that
If we write Pi = (t − λj )Q for any of the linear factors (t − λj ) of Pi , then Q(A)ei 6= 0.
If we hadn’t handle Trouble 2, this would have been false: if no eigenvector in group j appeared in the
expansion of ei , then we can run the same argument above for Q(A) in place of Pi (A) and it would have
gone through. The introduction of the indicator function Z is to precisely handle this. In our case, we
can write ei = wgrp6=j + wgrp=j , where the former is the part that belonged to all other groups (inclding
the coefficients a(i)s) and the latter is the part that belonged to the λj group. Our handling of Trouble 2
guaranteed that wgrp=j 6= 0.
We check that Q(A)wgrp6=j = 0 and Q(A)wgrp=j 6= 0, thereby concluding that Q(A)ei 6= 0. The former
is quick: run the same argument as before for Q(A) in place for Pi (A) and for wt,u , t 6= j, then the same
argument goes through to show that Q(A)wt,u = 0 for all t 6= j. To check that Q(A)wgrp=j 6= 0, note that
by construction, we have Awgrp=j = λj wgrp=j , so
Q(A)wgrp=j = (A − λ1 I) . . . (A − λj I) . . . (A − λk I)wgrp=j
= (λj − λ1 ) . . . (λj − λk )wgrp=j 6= 0

Using the fact we checked above that all λi are distinct.
Let’s recap what we have at this point:
• Starting from an arbitrary eigenbasis w1 , . . . , wn , we have constructed a polynomial Pi ;
• Pi is a monic polynomial with simple real roots, and Pi (A)ei = 0;
4
• If we write Pi = (t − λj )Q for any of the linear factors (t − λj ) of Pi , then Q(A)ei 6= 0.
Note that we’re not done yet: what we need to show is that pi has the second property, not Pi . (Of
course, the proof goes through for each i in the same way.) So what remains to be proven is that
The conditions above actually guarantee that Pi = pi .
We will need the characterization of pi given in the previous section in Lemma 1. Namely, since Pi ∈ S (see
proof thereof), we have that deg(Pi (t)) ≥ deg(pi (t)). Then by Euclid’s algorithm4 , we can write
Pi (t) = pi (t)q(t) + r(t)
where q(t), r(t) are some polynomial, and deg(r(t)) < deg(pi (t)). Now we have
r(A)ei = Pi (A)ei − q(A)pi (A)ei = 0 − q(A)0 = 0
And if r(t) 6= 0, by dividing by the leading coefficient we can make r(t) into a monic polynomial, violating
the statement in Lemma 1. So we know that r(t) = 0; in other words, pi divides Pi .
For notational simplicity, let {z1 , . . . , z` } denote the numbers i in {1, . . . , k} such that Z(i) = 1. Then
we can write
(t − λz1 ) . . . (t − λz` ) = Pi (t) = pi (t)q(t)
Each λzj is a root of the left hand side, and is therefore a root of the right hand side, so it must have been
either a root of pi or a root of q. (It cannot be both, for then we have a repeated root on the left hand
side.) However, if some λzj actually was a root of q(t), then Pi (t)/(t − λzj ) would still be divided by pi (t);
write Y (t) = Pi (t)/(t − λzj ), we would have Y (A)ei = 0 (because pi (A)ei = 0), which violates the last point
that we proved above. Thus we see that q(t) cannot have any roots, so it must be a nonzero real number.
However, since both Pi and pi are monic, q(t) is forced to equal to 1. We have thus proven that Pi = pi .
0.3 Explicit Description of pv

The following is not strictly speaking necessary, but we include it for reader’s pleasure. Let us assume for
simplicity that A actually has all real eigenvalues. In such case, the crowning achievement of classical linear
algebra, the Jordan normal form theorem, states that there exists an invertible matrix P such that P AP −1
has the following particularly simple form (we leave out all zeros for cleanness):
 
J1
J2
P AP −1 = 
 

 ... 
Jk
where each Ji is a block matrix of its own, and itself looks like
 
λi 1
.
λi . .
 
 
Ji =  
 .. 
 . 1
λi
i.e. it is of some value λi on the diagonal and 1 right above the diagonal, and 0 everywhere else. Note that
it may be the case for λi = λj for i 6= j. These λi are the eigenvalues of A.
Example 1. The following matrix is in Jordan normal form:
 
3 0 0 0
0 2 0 0
 
0 0 3 1
0 0 0 3
4
If you haven’t seen this before, take a look on Wikipedia: it is the polynomial version of the algorithm of computing the
greatest common denominator.
5

3 1
The three Jordan blocks are J1 = 3 (λ1 = 3), J2 = 2 (λ2 = 2) and J3 = (λ3 = 3).
0 3
Let di denote the size of Ji , so we have d1 + . . . + dk = n. Let us consider the vector P v; it will now look
like T
P v = c1,1 ... c1,d1 c2,1 ... c2,d2 c3,1 ... ck,dk
where (ci,1 , . . . , ci,di ) are entries corespond to the ith Jordan block. For a fixed i, let z(i) denote the smallest
number within {1, . . . , di } such that ci,z(i) is the first nonzero entry, if there is such a nonzero entry at all;
otherwise let z(i) = di + 1. Since the λi s are potentially equal to one another, we can partition {1, . . . , k}
into ` subsets Group1 , . . . , Group` such that i, j ∈ {1, . . . , k} are in the same group if and only if λi = λj .
In other words, each group corresponds to one unique eigenvalue; denote this eigenvalue for group r by ζr .
Finally, let Z(r) denote the maximal value of di + 1 − z(i) for all i in group r. Then we have the following
expression for pv :
Y`
pv = (t − ζr )Z(r)
r=1

0.1 Review of Axler's Algorithm

Uploaded by

Copyright:

Available Formats

0.1 Review of Axler's Algorithm

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

0.1 Review of Axler's Algorithm

Uploaded by

Copyright:

Available Formats

    

0.1 Review of Axler’s Algorithm

(Am − am−1 Am−1 − . . . − a1 A − a0 )v = 0

pv (A)v = (A − λI)q(A)v = 0 =⇒ Aq(A)v = λq(A)v

so q(A)v is an eigenvector of A with eigenvalue λ. By now we have constructed an eigenvector, which is

pv (t) = (t − λ)q(t) = (t − λ)(t − ζ)r(t) = (t − ζ)(t − λ)r(t)

pv (A)v = (A − ζI)(A − λI)r(A)v = 0 =⇒ A[(A − λI)r(A)v] = ζ[(A − λI)r(A)v]

0.2 Proof 4.2

Ti = [ei |Aei | . . . |Ami −1 ei ]

pi (A)v = (A − λ1 I)q(A)v = 0 =⇒ Aq(A)v = λ1 q(A)v

• The eigenvectors v1 , . . . , vmi are linearly independent;

The eigenvectors v1 , . . . , vmi form a basis of Img(Ti ).

Pi (A)wt,u = (Some polynomial in A)(A − λt I)wt,u = (Some polynomial in A)0 = 0

= (λj − λ1 ) . . . (λj − λk )wgrp=j 6= 0

0.3 Explicit Description of pv

You might also like