Matrixex
Matrixex
Matrixex
dY
= AY
dt
where A is an n by n matrix and Y = Y (t) is a vector listing the n dependent variables. (In most of what
we’ll do, we take n = 2, since we study mainly systems of 2 equations, but the theory is the same for all n.)
If we were dealing with just one linear equation
y 0 = ay
then the general solution of the equation would be eat . It turns out that also for vector equations the solution
looks like this, provided that we interpret what we mean by “e At ” when A is a matrix instead of just a scalar.
How to define e At ? The most obvious procedure is to take the power series which defines the exponential,
which as you surely remember from Calculus is
1 1 1
ex = 1 + x + x 2 + x 3 + · · · + x k + · · ·
2 6 k!
and just formally plug-in x = At. (The answer should be a matrix, so we have to think of the term “1” as
the identity matrix.) In summary, we define:
1 1 1
e At = I + At + (At)2 + (At)3 + · · · + (At)k + · · ·
2 6 k!
where we understand the series as defining a series for each coefficient. One may prove that:
−1
e−At = e At (2)
(which is the matrix version of e−x = 1/e x ). We now prove that this matrix exponential has the following
property:
de At
= Ae At = e At A (3)
dt
for every t.
1
Proof. Let us differentiate the series term by term:
de At d 1 2 1 3 1 k
= I + At + (At) + (At) + · · · + (At) + · · ·
dt dt 2 6 k!
1 1
= 0 + A + A2 t + A3 t 2 + · · · + Ak t k−1 + · · ·
2 (k − 1)!
1 2 1 3 1 k
= A I + At + (At) + (At) + · · · + (At) + · · ·
2 6 k!
= Ae At
and a similar proof, factoring A on the right instead of to the left, gives the equality between the derivative
and e At A. (Small print: the differentiation term-by-term can be justified using facts about term by term
differentiation of power series inside their domain of convergence.) The property (3) is the fundamental
property of exponentials of matrices. It provides us immediately with this corollary:
dY
The initial value problem = AY , Y (0) = Y0 has the unique solution Y (t) = e At Y0 .
dt
We can, indeed, verify that the formula Y (t) = e At Y0 defines a solution of the IVP:
dY (t) de At Y0 de At
= = Y0 = Ae At Y0 = A e At Y0 = AY (t) .
dt dt dt
(That it is the unique, i.e., the only, solution is proved as follows: if there were another solution Z (t) of the
same IVP, then we could let W (t) = Y (t) − Z (t) and notice that W 0 = Y 0 − Z 0 = A(Y − Z ) = AW , and
W (0) = Y (0) − Z (0) = 0. Letting V (t) = e−At W (t), and applying the product rule, we have that
so that V must be constant. Since V (0) = W (0) = 0, we have that V must be identically zero. Therefore
W (t) = e At V (t) is also identically zero, which because W = Y − Z , means that the functions Y and Z are
one and the same, which is what we claimed.)
Although we started by declaring Y to be a vector, the equation Y 0 = AY makes sense as long as Y
can be multiplied on the left by A, i.e., whenever Y is a matrix with n rows (and any number of columns).
In particular, e At itself satisfies this equation. The result giving uniqueness of solutions of initial value
problems applies to matrices since each column satisfies the equation and has the corresponding column of
the initial data as its initial value. The value of e At at t = 0 is the n by n identity matrix. This initial value
problem characterizes e At . Verification of these properties is an excellent check of a calculation of e At . This
plays an important role in other notes describing matrix exponentials containing trigonometric functions.
So we have, in theory, solved the general linear differential equation. A potential problem is, however,
that it is not always easy to calculate e At .
2
We calculate the series by just multiplying A by t:
t 0
At =
0 2t
and now calculating the powers of At. Notice that, because At is a diagonal matrix, its powers are very easy
to compute: we just take the powers of the diagonal entries (why? if you don’t understand, stop and think
it over right now). So, we get
At 1 0 t 0 1 t2 0 1 t3 0 1 tk 0
e = + + + + ··· + + ···
0 1 0 2t 2 0 (2t)2 6 0 (2t)3 k! 0 (2t)k
So, in this very special case we obtained the exponential by just taking the exponentials of the diagonal
elements and leaving the off-diagonal elements zero (observe that we did not end up with exponentials of
the non-diagonal entries, since e0 = 1, not 0).
In general, computing an exponential is a little more difficult than this, and it is not enough to just take
exponentials of coefficients. Sometimes things that seem surprising (the first time that you see them) may
happen. Let us take this example now:
0 1
A= . (5)
−1 0
To start the calculation of the series, we multiply A by t:
0 t
At =
−t 0
and again calculate the powers of At. This is a little harder than in the first example, but not too hard:
2
2 −t 0
(At) =
0 −t 2
3 0 −t 3
(At) = 3
t 0
4
4 t 0
(At) =
0 t4
5 0 t5
(At) =
−t 5 0
6
6 −t 0
(At) =
0 −t 6
3
and so on. We won’t compute more, because by now you surely have recognized the pattern (right?). We
add these up (not forgetting the factorials, of course):
At 1 0 0 t 1 −t 2 0 1 0 −t 3 1 t4 0
e = + + + + + ···
0 1 −t 0 2 0 −t 2 3! t3 0 4! 0 t4
It is remarkable that trigonometric functions have appeared. Perhaps we made a mistake? How could we
make sure? Well, let us check that property (3) holds (we’ll check only the first equality, you can check the
second one). We need to test that
d cos t sin t cos t sin t
=A . (6)
dt − sin t cos t − sin t cos t
Since
d d
(sin t) = cos t, and (cos t) = − sin t,
dt dt
we know that
d cos t sin t − sin t cos t
=
dt − sin t cos t − cos t − sin t
4
and calculating the powers of At. It is easy to see that the powers are:
k
k t kt k
(At) =
0 tk
since this is obviously true for k = 1 and, recursively, we have
k k k+1
k+1 k t kt k t t t t t k t + kt k t t (k + 1)t k+1
(At) = (At) A = k = k = .
0 t 0 t 0 t t 0 t k+1
Therefore,
∞ k
X
At t /k! kt k /k!
e =
0 t k /k!
k=0
X ∞ k X ∞
t kt k
k! k!
= k=0 k=0
∞ k
X t
0
k!
k=0
t
e tet
= .
0 et
To summarize, we have worked out three examples:
• The first example (4) is a diagonal matrix, and we found that its exponential is obtained by taking
exponentials of the diagonal entries.
• The second example (5) gave us an exponential matrix that was expressed in terms of trigonometric
√
functions. Notice that this matrix has imaginary eigenvalues equal to i and −i, where i = −1.
• The last example (7) gave us an exponential matrix which had a nonzero function in the (1, 2)-position.
Notice that this nonzero function was not just the exponential of the (1, 2)-position in the original matrix.
That exponential would give us an et term. Instead, we got a more complicated tet term.
In a sense, these are all the possibilities. Exponentials of all two by two matrices can be obtained
using functions of the form eat , teat , and trigonometric functions (possibly multiplied by eat ). Indeed,
exponentials of any size matrices, not just 2 by 2, can be expressed using just polynomial combinations of t,
scalar exponentials, and trigonometric functions. We will not quite prove this fact here; you should be able
to find the details in any linear algebra book.
Calculating exponentials using power series is OK for very simple examples, and important to do a
few times, so that you understand what this all means. But in practice, one uses very different methods
for computing matrix exponentials. (Remember how you first saw the definition of derivative using limits
of incremental quotients, and computed some derivatives in this way, but soon learned how to use “the
Calculus” to calculate derivatives of complicated expressions using the multiplication rule, chain rule, and
so on.)
Computing Matrix Exponentials. We wish to calculate e At . The key concept for simplifying the
computation of matrix exponentials is that of matrix similarity. Suppose that we have found two matrices,
3 and S, where S is invertible, such that this formula holds:
A = S3S −1 (8)
5
(if (8) holds, one says that A and 3 are similar matrices). Then, we claim, it is true that also:
e At = S e3t S −1 (9)
for all t. Therefore, if the matrix 3 is one for which e3t is easy to find (for example, if it is a diagonal
matrix), we can then multiply by S and S −1 to get e At . To see why (9) is a consequence of (8) , we just
notice that At = S(3t)S −1 and we have the following “telescopic” property for powers:
(At)k = S(3t)S −1 S(3t)S −1 · · · S(3t)S −1 = S(3t)k S −1
since the terms may be regrouped so that all the in-between pairs S −1 S cancel out. Therefore,
1 1 1
e At = I + At + (At)2 + (At)3 + · · · + (At)k + · · ·
2 6 k!
1 1 1
= I + S(3t)S −1 + S(3t)2 S −1 + S(3t)3 S −1 + · · · + S(3t)k S −1 + · · ·
2 6 k!
1 1 1
= S I + 3t + (3t)2 + (3t)3 + · · · + (3t)k + · · · S −1
2 6 k!
= Se3t S −1
as we claimed.
The basic theorem is this one:
Theorem. For every n by n matrix A with entries in the complex numbers, one can find an
invertible matrix S, and an upper triangular matrix 3 such that (8) holds.
Remember that an upper triangular matrix is one that has the following form:
λ1 ∗ ∗ ··· ∗ ∗
0 λ2 ∗ ··· ∗ ∗
0 0 λ2 ··· ∗ ∗
. .. .. .. .. ..
.
. . . . . .
0 0 0 ··· λn−1 ∗
0 0 0 ··· 0 λn
where the stars are any numbers. The numbers λ1 , . . . , λn turn out to be the eigenvalues of A.
There are two reasons that this theorem is interesting. First, it provides a way to compute exponentials,
because it is not difficult to find exponentials of upper triangular matrices (the example (7) is actually quite
typical) and second because it has important theoretical consequences.
Although we don’t need more than the theorem stated above, there are two stronger theorems that you
may meet elsewhere. One is the “Jordan canonical form” theorem, which provides a matrix 3 that is not
only upper triangular but which has an even more special structure. Jordan canonical forms are theoretically
important because they are essentially unique (that is what “canonical” means in this context). Hence, the
Jordan form allows you to determine whether or not two matrices are similar. However, it is not very useful
6
from a computational point of view, because they are what is known in numerical analysis as “numerically
unstable”, meaning that small perturbations of A can give one totally different Jordan forms. A second
strengthening is the “Schur unitary triangularization theorem” which says that one can pick the matrix S to
be unitary. (A unitary matrix is a matrix with entries in the complex numbers whose inverse is the complex
conjugate of its transpose. For matrices S with real entries, then we recognize it as an orthogonal matrix.
For matrices with complex entries, unitary matrices turn out to be more useful than other generalization
of orthogonal matrices that one may propose.) Schur’s theorem is extremely useful in practice, and is
implemented in many numerical algorithms.
We do not prove the theorem here in general, but only show it for n = 2; the general case can be proved
in much the same way, by means of a recursive process.
We start the proof by remembering that every matrix has at least one eigenvalue, let us call it λ, and an
associate eigenvector, v. That is to say, v is a vector different from zero, and
Av = λv . (10)
If you stumble on a number λ and a vector v that you believe to an eigenvalue and its eigenvector, you
should immediately see if (10) is satisfied, since that is an easy calculation. Numerical methods for finding
eigenvalues and eigenvectors take this approach.
For theoretical purposes, it is useful to note that the the eigenvalues λ can be characterized as the roots
of the characteristic equation
det(λI − A) = 0.
For two-dimensional systems, this is the same as the equation
λ2 − tr(A)λ + det(A) = 0
with
a b
tr =a+d
c d
.
a b
det = ad − bc .
c d
Now, quadratic equations are easy to solve, so this approach is also computationally useful for 2 by 2
matrices.
There are, for 2 by 2 matrices with real entries, either two real eigenvalues, one real eigenvalue with
multiplicity two, or two complex eigenvalues. In the last case, the two complex eigenvalues must be
conjugates of each other.
If you have λ, an eigenvector associated to an eigenvalue λ is then found by solving the linear system
(A − λI )v = 0
(since λ is a root of the characteristic equation, there are an infinite number of solutions; we pick any nonzero
one).
With an eigenvalue λ and eigenvector v found, we next pick any vector w with the property that the
two vectors v and w are linearly independent. For example, if
a
v=
b
7
and a is not zero, we can take
0
w=
1
(what would you pick for w is a were zero?). Now, since the set {v, w} forms a basis (this is the key
idea for all n: once you know v, you need to find n − 1 other vectors to fill out a basis containing v) of
two-dimensional space, we can find coefficients c and d so that
Aw = cv + dw . (11)
Here (v w) denotes the 2 by2 matrix whose columns are the vectors v and w. To complete the construction,
we let S = (v w) and
λ c
3= .
0 d
Then,
AS = S3
which is the same as what we wanted to prove, namely A = S3S −1 . Actually, we can even say more. It is a
fundamental fact in linear algebra that, if two matrices are similar, then their eigenvalues must be the same.
Now, the eigenvalues of 3 are λ and d, because the eigenvalues of any triangular matrix are its diagonal
elements. Therefore, since A and 3 are similar, d must be also an eigenvalue of A.
The proof of Schur’s theorem follows the same pattern, except for having fewer choices for v and w.
The Three Cases for n = 2. The following special cases are worth discussing in detail:
1. A has two different real eigenvalues.
2. A has two complex conjugate eigenvalues.
3. A has a repeated real eigenvalue.
In cases 1 and 2, one can always find a diagonal matrix 3. To see why this is true, let us go back to the
proof, but now, instead of taking just any linearly independent vector w, let us pick a special one, namely an
eigenvector corresponding to the other eigenvalue of A:
Aw = µw .
This vector is always linearly independent of v, so the proof can be completed as before. Notice that 3 is
now diagonal, because d = µ and c = 0.
To prove that v and w are linearly independent if they are eigenvectors for different eigenvalues, assume
the contrary and show that it leads to a contradiction. Thus, suppose that αv + βw = 0. Apply A to get
8
On the other hand, multiplying αv + βw = 0 by λ we would have αλv + βλw = 0. Subtracting gives
β(λ − µ)w = 0, and as λ − µ 6= 0 we would arrive at the conclusion that βw = 0. But w, being an
eigenvector, is required to be nonzero, so we would have to have β = 0. Plugging this back into our linear
dependence would give αv = 0, which would require α = 0 as well. This shows us that there are no nonzero
coefficients α and β for which αv + βw = 0, which means that the eigenvectors v and w are linearly
independent.
Notice that in cases 1 and 3, the matrices 3 and S are both real. In case 1, we will interpret the solutions
with initial conditions on the lines that contain v and w as “straight line solutions” and this is the subject of
Section 3.2 in the book.
In case 2, the matrices 3 and S are, in general, not real. Note that, in case 2, if Av = λv, taking
complex conjugates gives
Av̄ = λ̄v̄
and we note that
λ̄ 6= λ
because λ is not real. So, we can always pick w to be the conjugate of v. It will turn out that solutions can
be re-expressed in terms of trigonometric functions — remember example (5) — as we’ll see in the next
section and in Section 3.4 of the book.
Now let’s consider Case 3 (the repeated real eigenvalue). We have that
λ c
3=
0 λ
Observe that:
(λI + cN )2 = (λI )2 + c2 N 2 + 2λcN = λ2 I + 2λcN
(because N 2 = 0) and, for the general power k, recursively:
(λI + cN )k = λk−1 I + (k − 1)λk−2 cN (λI + cN )
so k k
λ t kλk−1 ct k
(λI + cN )k t k = λk I + kλk−1 cN t k =
0 λk t k
and therefore
3t eλt cteλt
e = (12)
0 eλt
because 0 + ct + (2λc)t 2 /2 + (3λ2 c)t 3 /6! + · · · = cteλt . (This generalizes the special case in example (7) .)
9
A Shortcut. If we just want to find the form of the general solution of Y 0 = AY , we do not need to actually
calculate the exponential of A and the inverse of the matrix S.
Let us first take the cases of different eigenvalues (real or complex, that is, cases 1 or 2, it doesn’t matter
which one). As we saw, 3 can be taken to be the diagonal matrix consisting of these eigenvalues (which we
call here λ and µ instead of λ1 and λ2 ), and S = (v w) just lists the two eigenvectors as its columns. We then
know that the solution of every initial value problem Y 0 = AY , Y (0) = Y0 will be of the following form:
3t −1 eλt 0 a
At
Y (t) = e Y0 = S e S Y0 = (v w) = a eλt v + b eµt w
0 eµt b
(let us not worry for now about what the two functions Y1 and Y2 look like). Since µ is the conjugate of λ
and w is the conjugate of v, the second term is:
So we can write the general solution shown in (13) also like this:
Now, it is easy to see that a and b must be conjugates of each other. (Do this as an optional homework
problem. Use the fact that these two coefficients are the components of S −1 Y0 , and the fact that Y0 is real
and that the two columns of S are conjugates of each other.) This means that both coefficients a + b and
i(a − b) are real numbers. Calling these coefficients “k1 ” and “k2 ”, we can summarize the complex case
like this:
The general solution of Y 0 = AY , when A has a non-real eigenvalue λ with respective eigenvector
v, is of the form
k1 Y1 (t) + k2 Y2 (t) (17)
for some real constants k1 and k2 . The functions Y1 and Y2 are found by the following procedure:
calculate the product eλt v and separate it into real and imaginary parts as in Equation (14) .
10
What do Y1 and Y2 really look like? This is easy to answer using Euler’s formula, which gives
eλt = eαt+iβt = eαt (cos βt + i sin βt) = eαt cos βt + ieαt sin βt
where α and β are the real and imaginary parts of λ respectively. This is what we do in Section 3.4 of the
book.
Finally, in case 3 (repeated eigenvalues) we can write, instead:
At 3t −1 eλt cteλt a
Y (t) = e Y0 = S e S Y0 = (v w)
0 eλt b
= a eλt v + b eλt (ctv + w) .
When c = 0 we have from A = S3S −1 that A must have been the diagonal matrix
λ 0
0 λ
to start with (because S and 3 commute). When c 6= 0, we can write k2 = bc and redefine w as 1c w. Note
that then (11) becomes Aw = v + λw, that is, (A − λI )w = v. Any vector w with this property is linearly
independent from v (why?).
So we conclude, for the case of repeated eigenvalues:
The general solution of Y 0 = AY , when A has a repeated (real) eigenvalue λ is either of the form
eλt Y0 (if A is a diagonal matrix) or, otherwise, is of the form
for some real constants k1 and k2 , where v is an eigenvector corresponding to λ and w is any vector
which satisfies (A − λI )w = v.
Observe that (A −λI )2 w = (A −λI )v = 0. general, one calls any nonzero vector such that (A −λI )k w = 0
a generalized eigenvector (of order k) of the matrix A (since, when k = 1, we have eigenvectors).
Forcing Terms. The use of matrix exponentials also helps explain much of what is done in chapter 4
(forced systems), and renders Laplace transforms unnecessary. Let us consider non-homogeneous linear
differential equations of this type:
dY
(t) = AY (t) + u(t) . (19)
dt
We wrote the arguments “t” just this one time, to emphasize that everything is a function of t, but from now
on we will drop the t’s when they are clear from the context.
Let us write, just as we did when discussing scalar linear equations, Y 0 − AY = u. We consider
the “integrating factor” M(t) = e−At . Multiplying both sides of the equation by M, we have, since
(e−At Y )0 = e−At Y 0 − e−At AY (right?):
de−At Y
= e−At u .
dt
11
Taking antiderivatives:
Z t
−At
e Y = e−As u(s) ds + Y0
0
for some constant vector Y0 . Finally, multiplying by e−At and remembering that e−At e At = I , we conclude:
Z t
At
Y (t) = e Y0 + e At
e−As u(s) ds . (20)
0
This is sometimes called the “variation of parameters” form of the general solution of the forced equa-
tion (19) . Of course, Y0 = Y (0) (just plug-in t = 0 on both sides). There are other notes on this topic.
Notice that, if the vector function u(t) is a polynomial in t, then the integral in (20) will be a combination
of exponentials and powers of t (integrate by parts). Similarly, if u(t) is a combination of trigonometric
functions, the integral will also combine trigonometric functions and polynomials. This observation justifies
the “guesses” made for forced systems in chapter 4 (they are, of course, not guesses, but consequences of
integration by parts, but the book would lead you to believe otherwise).
Exercises.
1. In each of the following, factor the matrix A into a product S3S −1 , with 3 diagonal:
1 1
a. A=
0 0
5 6
b. A=
−1 −2
2 −8
c. A=
1 −4
!
2 2 1
d. A= 0 1 2
0 0 −1
2. For each of the matrices in Exercise 1, use the S3S −1 factorization to calculate A6 (do not just multiply
A by itself).
3. For each of the matrices in Exercise 1, use the S3S −1 factorization to calculate e At .
4. Calculate e At for this matrix: !
0 1 2
0 0 1
0 0 0
using the power series definition.
5. Consider these matrices:
1 1 0 −1
A= B=
0 0 0 0
12
and calculate e At , e Bt , and e(A+B)t .
Answer, true or false: is e At e Bt = e(A+B)t ?
6. (Challenge problem) Show that, for any two matrices A and B, it is true that
if and only if AB − B A = 0. (The expression “AB − B A” is called the “Lie bracket” of the two
matrices A and B, and it plays a central role in the advanced theory of differential equations.)
13