Trefethen Bau
Trefethen Bau
Trefethen Bau
Abstract
This is a solution manual of the textbook Numerical Linear Algebra, by Lloyd N. Trefethen and David
Bau III (SIAM, 1997). This version omits Exercise 9.3, 10.4.
Contents
1 Matrix-Vector Multiplication 2
3 Norms 6
6 Projectors 14
7 QR Factorization 15
8 Gram-Schmidt Orthogonalization 17
9 MATLAB 19
10 Householder Triangularization 22
1
1 Matrix-Vector Multiplication
1.1. (a)
Proof. The basic principle applied here is the following: if A is a rectangular matrix, then any elementary
row operation on A may be carried out by multiplying A from the left by the corresponding elementary
matrix; any elementary column operation on A may be carried out by multiplying A from the right by the
corresponding elementary matrix. See, for example, Munkres [2] §2, Theorem 2.1 for details.
1 −1 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0
0 1 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 1 0 0 0 1 0 0 1 0 0
0 −1 1 0 0 0 1 0 0 0 1 0 B 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0
2
0 −1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
(b)
Proof.
1 −1 1
2 0 0 0 0
0 1 0 0 2 0 0
A=
0 −1
, C = .
1
2 0 0 1 1
0 −1 0 1 0 0 0
1.2. Not sure if I understand the problem correctly. Anyway, here’s the solution.
(a)
Proof. We have
f1 −k12 k12 0 0 x1 k12 l12
f2 0 −k23 k23 0
= x2 − k23 l23 .
f3 0 0 −k34 k34 x3 k34 l34
f4 0 0 0 0 x4 0
(2)
Proof. The entries of K are spring constants. By Hooke’s law (force = spring constant × displacement), the
dimension of the entries of K is force = mass2 .
distance time
(3)
[ ]4
Proof. mass .
time2
(4)
Proof. K = 1000K ′ and det(K) = 1012 · det(K).
1.3.
2
Proof. We write Im×m and R−1 in the column form: Im×m = [e1 , e2 , · · · , em ], R−1 = [a1 , a2 , · · · , am ].
Suppose R = (rij ) has the form
r11 r12 ··· r1m
0 r22 ··· r2m
R= · · ·
.
··· ··· ···
0 0 ··· rmm
Then Im×m = R−1 R can be written as
r11 r12 ··· r1m
0 r22 ··· r2m
[e1 , e2 , · · · , em ] = [a1 , a2 , · · · , am ]
· · ·
.
··· ··· ···
0 0 ··· rmm
∏m
Since R is nonsingular and det(R)= i=1 rii , we conclude the diagonal entries rii are non-zero.
To show R−1 is necessarily upper-triangular, we work by induction. To begin with, we have e1 = r11 a1 .
−1
r11e1 has zero entries except the first one. For convenience, we denote by C (k) the column space
m
So a1 =
v1
v2
{v =
· · · ∈ C : vi = 0 for i > k}. Then C (1) ⊂ C (2) ⊂ · · · ⊂ C (m) = C and each C (k)
m m m m m m
vm
is a linear subspace of Cm . We have shown a1 ∈ Cm (1). Assume for any k ≤ i, ak ∈ Cm (k). Then by
Im×m = RR−1 , we have
∑
m ∑
i
ei+1 = ak rk(i+1) = ak rk(i+1) + ai+1 r(i+1)(i+1) .
k=1 k=1
Therefore ( )
∑
m
−1
ai+1 = r(i+1)(i+1) ei+1 − ak rk(i+1) ∈ Cm (i + 1).
k=1
By induction, we have proved ak ∈ C (k) for 1 ≤ k ≤ m, which is equivalent to R−1 being upper-
m
triangular.
1.4. (a)
d1 c1
d2 c2
Proof. Denote by d the column vector
· · · and by c the column vector · · ·. Let F be the matrix whose
d8 c8
(i,j)-entry is fj (i). Then the given condition can be rephrased as: for any d ∈ C8 , we can find c ∈ C8 such
that F c = d. This means range(F )=C8 . By Theorem 1.3, null(F )={0}, which implies the mapping d 7→ c
is one-to-one.
(b)
Proof. Note Ad = c for any c ∈ C8 implies AF c = c for any c ∈ C8 . So A−1 = F and the (i,j)-entry of A−1
is Fij = fj (i).
3
2.2. (a)
Proof.
∑
m ∑
m ∑
m
||x1 + x2 ||2 = (x1i + x2i )2 = (x21i + x22i ) + 2 x1i x2i = ||x1 ||2 + ||x2 ||2 + 2x∗1 x2 = ||x1 ||2 + ||x2 ||2 ,
i=1 i=1 i=1
∑
n+1 ∑
n ∑
n ∑
n+1
|| xi ||2 = || xi ||2 + ||xn+1 ||2 = ||xi ||2 + ||xn+1 ||2 = ||xi ||2 .
i=1 i=1 i=1 i=1
where we have used the result of part (a) to get µ∗ = µ. Since λ ̸= µ, we must have y ∗ x = 0, i.e. x and y
are orthogonal to each other.
2.4.
Proof. Suppose A is a unitary matrix, λ is an eigenvalue of A with x a corresponding eigenvector. Then
So x∗ x = 0 and x must be zero. This shows null(I − S) = {0}. By Theorem 1.3, I − S is nonsingualr.
4
(c)
Proof. Define A = [(I − S)−1 (I + S)]∗ [(I − S)−1 (I + S)]. Then (note for a nonsingular matrix U , (U −1 )∗ =
(U ∗ )−1 )
Therefore, A = A(I − S)(I − S)−1 = (I − S)[(I − S 2 )−1 (I − S 2 )](I − S)−1 = (I − S)(I − S)−1 = I. This
shows the matrix Q is unitary.
2.6.
So we must have v ∗ u = −1. For sufficiency, assume v ∗ u = −1. Then for any α ∈ \{0}, A(αu) = αu +
uv ∗ (αu) = αu(1 + v ∗ u) = 0, which implies A is singular. Combined, we conclude A is singular if and only if
v ∗ u = −1. In this case, null(A) = {αu : α ∈ C}, the linear subspace spanned by u.
Now assume A is nonsingular. We shall find out A−1 . Write A−1 in column form [x1 , · · · , xm ]. Then
2.7.
Assume Hk is a Hadamard matrix and denote by αk the corresponding constant factor. Then α0 = 1 and
we have [ T ] [ ]
T Hk HkT Hk Hk−1 −1
Hk+1 = = α = 2αk Hk+1 .
HkT −HkT k
Hk−1 −Hk−1
By induction, we can conclude Hk (k ∈ N) is a Hadamard matrix and the corresponding constant factor is
αk = 2k .
5
3 Norms
3.1.
Proof. To verify property (1) of (3.1), we note ||x||W = ||W x|| ≥ 0 is obvious. Also ||x||W = ||W x|| = 0 if and
only if W x = 0, which is further equivalent to x = 0 by the nonsingularity of W . To verify property (2) of
(3.1), we note
||x + y||W = ||W (x + y)|| = ||W x + W y|| ≤ ||W x|| + ||W y|| = ||x||W + ||y||W .
3.2.
||Ax|| ||Ay||
|λ| = ≤ sup = ||A||.
||x|| y∈Cm \{0} ||y||
3.3. (a)
√∑m
Proof. Assume |xi0 | = max1≤i≤m |xi |. Then ||x||∞ = |xi0 | ≤ i=1 |xi |2 = ||x||2 . For x = αei (α ∈ C, 1 ≤
i ≤ m), the equality holds.
(b)
√∑m √∑m √ √
|xi0 | = max1≤i≤m |xi |. Then ||x||2 =
Proof. Assume i=1 |xi |2 ≤ i=1 |xi0 |2 = m|xi0 | = m||x||∞ .
1
1
For x = α
(α ∈ C), the equality holds.
· · ·
1
(c)
(d)
Proof. For any x ∈ Cn \ {0}, by part (a) and (b),
√
||Ax||2 m||Ax||∞
≤ .
||x||2 ||x||∞
√
Take supremum on both sides, we have ||A||2 ≤ m||A||∞ .
3.4. (a)
Proof. If A is multiplied from the right by identity matrix with the i-th column removed, the result is the
matrix obtained by removing the i-th column of A. Similarly, if A is multiplied from the left by the identity
matrix with the i-th row removed, the result is the matrix obtained by removing the i-th row of A. To
obtain B, we can multiply A from the left and the right by the appropriate identity matrices with certain
rows/columns removed.
6
(b)
Proof. By inequality (3.14), it suffices to prove any matrix obtained by removing a column or a row from an
identity matrix has p-norm less than or equal to 1. This is easy to see by straightforward computation and
we omit the details.
3.5.
where the last inequality is by Cauchy-Schwarz inequality. This shows ||E||2 ≤ ||u||2 ||v||2 . By letting x = v in
the above reasoning, we can show the equality holds, i.e. ||E||2 = ||u||2 ||v||2 .
Moreover, we note Eij = ui vj . Therefore
v v v
u∑ u∑ u∑
um ∑ n
um ∑n
u m ∑ n
||E||F = t |Eij |2 = t |ui |2 |vj |2 = t( |ui |2 )( |vj |2 ) = ||u||F ||v||F .
i=1 j=1 i=1 j=1 i=1 j=1
3.6. (a)
Proof. ||x||′ ≥ 0 is straightforward. Note ||x||′ ≥ |( ||x||
x ∗
) x| = ||x||. So ||x||′ = 0 if and only if x = 0. Also,
||x1 + x2 ||′ = sup |y ∗ (x1 + x2 )| ≤ sup (|y ∗ x1 | + |y ∗ x2 |) ≤ sup |y ∗ x1 | + sup |y ∗ x2 | = ||x1 ||′ + ||x2 ||′ .
||y||=1 ||y||=1 ||y||=1 ||y||=1
(b)
Proof. We follow the hint. For the given x, we can find a nonzero z0 ∈ Cm such that |z0∗ x| = ||z0 ||′ ||x||. Define
z = eiθ z0 /||z0 ||′ , where θ = arg(z0 x). Then ||z||′ = 1 and
z0∗ |z0∗ x|
z ∗ x = e−iθ x = = ||x|| = 1.
||z0 ||′ ||z0 ||′
||B|| = sup ||Bw|| = sup ||yz ∗ w|| = sup |z ∗ w|||y|| = sup |w∗ z| = ||z||′ = 1.
||w||=1 ||w||=1 ||w||=1 ||w||=1
Combined, we conclude for the above defined z ∗ and B = yz ∗ , we have Bx = y and ||B|| = 1.
Remark 2. Note the desired properties of B, Bx = y and ||B|| = 1, are equivalent to (a) z ∗ x = 1 = ||x||;
(b) ||z||′ = 1. This immediately reminds us of the following well-known corollary of Han-Banach Theorem:
Given an element x in a normed linear space X, there exists a continuous linear functional l on X, such
∑m x 1
that l(x) = ||x|| and ||l|| = 1 (Yosida [3], page 108). Note l(x) = l( i=1 xi ei ) = [l(e1 ), · · · , l(em )] · · · . We
xm
can therefore define z such that z ∗ = [l(e1 ), · · · , l(em )]. Then l(w) = z ∗ w for any w ∈ Cm (in particular,
z ∗ x = l(x) = ||x||). Consequently, ||z||′ = ||l|| = 1. This proves the hint.
7
4 The Singular Value Decomposition
Remark 3. In the proof of Theorem 4.1, the part on uniqueness is a bit confusing, esp. the following
sentence: “Since ||A||2 = σ1 , ||Av2 ||2 ≤ σ1 ; but this must be an equality, for otherwise, since w = v1 c + v2 s
for some constants c and s with |c|2 + |s|2 = 1, we would have ||Aw||2 < σ1 .”
I don’t see why “we would have ||Aw||2 < σ1 .” An alternative proof for the uniqueness goes as follows:
The uniqueness of the singular values {σj } are given by the observation that {σj2 } are eigenvalues of AA∗ or
A∗ A. Note {uj } are eigenvectors of AA∗ , with σj2 ’s the corresponding eigenvalues. So when σj ’s are distinct,
the eigenvalues of AA∗ are also distinct, and the eigenspace of each σj2 (for AA∗ ) is of dimension one. Due
to normalization, each uj is unique up to a complex sign. The result for {vj } can be proven similarly.
4.1. To find the SVD of a matrix A, we note that if A = U ΣV ∗ is the SVD of A, then AA∗ = U (ΣΣ∗ )U ∗ ,
or equivalently, (AA∗ )U = U (ΣΣ∗ ). Writing U in its column form [u1 , u2 , · · · , um ], the equation becomes
{
[σ12 u1 , σ22 u2 , · · · , σm
2
um ] if m ≤ n
(AA∗ )[u1 , u2 , · · · , um ] =
[σ1 u1 , · · · , σn un , 0, · · · , 0] if m > n.
2 2
This shows for p = min{m, n}, σ12 , σ22 , · · · , σp2 are eigenvalues of AA∗ and u1 , u2 , · · · , up are the corresponding
eigenvectors. This gives us a way to find the singular values and U : if m ≤ n, U consists of the eigenvectors
of σ12 , σ22 , · · · , σm
2
; if m > n, the first n columns of U are eigenvectors of σ12 , σ22 , · · · , σn2 and the last m − n
columns of U can be chosen from null(AA∗ ) so that U becomes unitary. If A is not of full rank, then p needs
to be replaced by rank(A) in the above algorithm.
A similar algorithm for V can be found by noting (A∗ A)V = V (ΣΣ∗ ). Note ΣΣ∗ and Σ∗ Σ have the same
nonzero eigenvalues. we don’t have to recalculate the eigenvalues. In practice, various tricks often directly
yield the SVD without resorting to the above method.
The solution below has been verified by Mathematica.1
(a)
[ ] [ ]
3 0 1 0
Proof. U = I2×2 , Σ = ,V = .
0 2 0 −1
(b)
[ ] [ ] [ ]
0 1 3 0 0 1
Proof. U = ,Σ= ,V = .
1 0 0 2 1 0
(c)
2 0 [ ]
0 1
Proof. U = I3×3 , Σ = 0 0 , V = .
1 0
0 0
(d)
[√ ] √
[ ]
2 0 2 1 −1
Proof. U = I2×2 , Σ = ,V = 2 .
0 0 1 1
(e)
√
[ ] [ ] √
[ ]
2 1 −1 2 0 2 1 −1
Proof. U = 2 ,Σ= ,V = 2 .
1 1 0 0 1 1
4.2.
1 Example: {u, w, v} = SingularValueDecomposition[{{1, 2}, {1, 2}}].
8
a11 a12 · · · a1n am1 am−1,1 · · · a11
Proof. Suppose A = · · · · · · · · · · · · . Then B = · · · ··· · · · · · · . We can obtain from
am1 am2 · · · amn amn am−1,n · · · a1n
B the transpose AT of A by swapping columns of B. Therefore, the question asked by the problem is reduced
to the following two sub-questions: 1) Does a matrix and its transpose have the same singular values? 2)
If we multiply a matrix by a unitary matrix, does the resulted matrix have the same singular values as the
original one?
In view of the structure of SVD and the uniqueness of singular values, we can say yes to both questions.
Hence A and B have the same singular values.
4.3.
On pre−image plane On image plane
1 3
0.8
2
0.6
0.4
1
0.2
0 0
−0.2
−1
−0.4
−0.6
−2
−0.8
−1 −3
−1 −0.5 0 0.5 1 −3 −2 −1 0 1 2 3
function svd_plot(A)
t=0:0.01:(2*pi);
9
v1 = V(:,1); v2 = V(:,2); % right singular vectors
u1 = U(:,1); u2 = U(:,2); % left singular vectors
w1 = S(1,1)*u1; w2 = S(2,2)*u2; % scaled left singular vectors
% Plot the unit circle and the right singular vectors on pre-image plane.
figure(1);
axis([-1.2 1.2 -1.2 1.2]);
plot(x1, y1);
hold on;
line_plot(v1, 'b');
hold on;
line_plot(v2, 'r');
grid on;
title('On pre-image plane');
% Plot the ellipse and the scaled left signualr vectors on image plane.
figure(2);
axis([-3 3 -3 3]);
plot(x2, y2);
hold on;
line_plot(w1,'b');
hold on;
line_plot(w2, 'r');
grid on;
title('On image plane');
end % svd_plot
%LINE_PLOT plots the line segment between the origin and the
% vector V on the 2-dimensional plane.
%
% Example: line_plot([1 1], 'r').
%
% Z.Y., 09/19/2009.
end % line_plot
4.4.
Proof. Suppose A has SVD U1 Σ1 V1∗ and B has SVD U2 Σ2 V2∗ .
If A and B are unitarily equivalent, there exists a unitary matrix Q, such that A = QBQ∗ . Then
A = Q(U2 Σ2 V2∗ )Q∗ = (QU2 )Σ2 (QV2 )∗ . So (QU2 )Σ2 (QV2 )∗ is an SVD of A. By the uniqueness of singular
values, Σ2 = Σ1 . In other words, A and B have the same singular values.
The
[ converse
] is not[√
necessarily
] true. As a counter example, note in Problem 4.1 (d), if we can let
1 1 2 0
A= and B = , then A and B have the same singular values. But they are not unitarily
0 0 0 0
equivalent: otherwise A would be symmetric, which is clearly not true.
10
4.5.
Proof. We give two proofs.
For the first proof, we can examine the proof of Theorem 4.1 and see that if we replace Cm with Rm
everywhere, the same proof-by-induction works for real A. So when A is real, it has a real SVD.
For the second proof, we note A∗ A is now real and symmetric. So it has a complete set of orthogonal (i.e.
real and orthonormal) eigenvectors: v1 , v2 , · · · , vn . Define V = [v1 , · · · , vn ] and denote by λi the eigenvalue
corresponding to vi (i = 1, · · · , n). Since A∗ A is nonnegative, all the eigenvalues are nonnegative. Without
loss of generality, we assume λ1 ≥ λ2 ≥ · · · ≥ λn . Let r be the largest index such that λr > 0. Then
λr+1 = · · · = λn = 0. Moreover, we note null(A∗ A) = null(A), since
We can expand the set {u1 , · · · , ur } to be an orthogonal basis for Rm : {u1 , · · · , ur , ur+1 , · · · , um }. Define
U = [u1 , · · · , um ]. We check U ∗ AV is a diagonal matrix (recall null(A∗ A) = null(A)):
∗
u1 {
u∗2 0 if j > r
U AV =
∗ ∗ ∗
· · · A[v1 , · · · , vn ] = (ui Avj ), and ui Avj = σ δ
i ij if j ≤ r.
u∗m
Therefore limε→0 ||A − Aε ||2 = 0 with Aε of full rank. This proves the set of full-rank matrices is a dense
subset of Cm×n .
11
5.3. (a)
Proof. We will derive the expression U from the equation AAT = U ΣΣU T . First of all, we note
[ ] [ ]
T 125 75 5 3
AA = = 25
75 125 3 5
] [
5 3
The characteristic polynomial of is
3 5
([ ])
5−λ 3
det = λ2 − 10λ + 16 = (λ − 8)(λ − 2).
3 5−λ
√
So the√eigenvalues of AAT are 8 · 25 = 200 and 2 · 25 = 50. This implies the singular values of A are 10 2
and 5 2. Solving the equation [ ] [ ]
5 3 8 0
U =U
3 5 0 2
gives us the general expression of U : [ ]
1 a b
U=√ ,
2 a −b
where a, b ∈ {1, −1}. Then
[ ] [ ] [ 1√ ] [ 3 ]
−1 −2 −10 1 a b 0 − a 4
5b
T
V = A UΣ = ·√ · 10 2 = 45 .
11 5 2 a −b 0 1
√
5 2 5a
3
5b
(b)
√ √
Proof. From our
[ calculation
] in
[ ]part (a), the singular values of A are 10
[ √ ] 2 and 5[ 2.√The] right singular
−3/5 4/5 1/√2 1/ √2
vectors are ± and ± . The left singular vectors are ± and ± . The unit ball
4/5 3/5 1/ 2 −1/ 2
in R2 and its image under A, together with the singular vectors are shown in the following graph, plotted
by the MATLAB program we used for Exercise Problem 4.3. 4.3.
On image plane
On pre−image plane 15
1
0.8
10
0.6
0.4 5
0.2
0
0
−0.2
−5
−0.4
−0.6
−10
−0.8
−1 −15
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −15 −10 −5 0 5 10 15
(c)
√ √ √
Proof. By Theorem 5.3, ||A||2 = 10 2, ||A||F = 200 + 50 = 5 10. By Example 3.3, ||A||1 = 16. By Example
3.4, ||A||∞ = 15.
(d)
12
Proof. Since A = U ΣV T and U , V are unitary,
[ ][ 1√
] [ ] [ ]
−1 T −1 −1 − 35 4 0 1 1 1 1 5 −11
A = (U ΣV ) =VΣ U T
= 5 10 2 √ = .
2 1 −1 100 10 −2
4 3 1
√
5 5 0 5 2
(e)
Proof. We note
[ ] [ ] [ ][ ]
0 A∗ 0 V Σ∗ U ∗ 0 Im×m U ΣV ∗ 0
= =
A 0U ΣV ∗ 0 Im×m 0 0 V Σ∗ U ∗
[ ][ ][ ][ ∗ ]
0 Im×m U 0 Σ 0 V 0
= .
Im×m 0 0 V 0 Σ∗ 0 U∗
[ ] [ ] [ ∗ ]
0 Im×m U 0 U 0
The inverse of is itself and the inverse of is . So the inverse of
Im×m 0 0 V 0 V∗
[ ][ ]
0 Im×m U 0
Im×m 0 0 V
[ ∗ ][ ]
U 0 0 Im×m
is . Note
0 V ∗ Im×m 0
[ ∗ ] [ ][ ][ ]
V 0 0 Im×m U ∗ 0 0 Im×m
= .
0 U∗ Im×m 0 0 V ∗ Im×m 0
Therefore, we have
[ ] ([ ][ ]) ([ ][ ]) ([ ][ ])−1
0 A∗ 0 Im×m U 0 Σ 0 0 Im×m 0 Im×m U 0
=
A 0 Im×m 0 0 V 0 Σ∗ Im×m 0 Im×m 0 0 V
([ ][ ]) [ ] ([ ][ ])−1
0 Im×m U 0 0 Σ 0 Im×m U 0
= .
Im×m 0 0 V Σ∗ 0 Im×m 0 0 V
Note Σ is a diagonal matrix with nonnegative diagonal elements, we must have Σ = Σ∗ . It’s easy to see
[ ][ ] [ ][ ] [ ][ ]
0 Σ Im×m Im×m I Im×m Σ 0 I Im×m Im×m Im×m
= m×m , m×m = 2I2m×2m .
Σ 0 Im×m −Im×m Im×m −Im×m 0 −Σ Im×m −Im×m Im×m −Im×m
13
So the formula [ ] [ ] [ ] [ ]
0 Σ 1 Im×m Im×m Σ 0 1 Im×m Im×m
=√ · ·√
Σ 0 2 Im×m −Im×m 0 −Σ 2 Im×m −Im×m
[ ]
0 Σ
gives the eigenvalue decomposition of . Combined, if we set
Σ 0
[ ][ ] [ ]
0 Im×m U 0 1 Im×m Im×m
X= ·√ ,
Im×m 0 0 V 2 Im×m −Im×m
[ ] [ ]
Σ 0 0 A∗
then X X −1 gives the eigenvalue decomposition of .
0 −Σ A 0
6 Projectors
6.1.
Proof. Algebraically, we have by Theorem 6.1 (I − 2P )∗ = I ∗ − 2P ∗ = I − 2P . So
(I − 2P )∗ (I − 2P ) = [(I − P ) − P ]2 = (I − P )2 − (I − P )P − P (I − P ) + P 2 = I − P + P = I,
6.2.
Proof. It’s easy to see F 2 = I. So E 2 = 14 (I + 2F + F 2 ) = I+F
2 = E. This shows E is a projector. Note F
has the matrix representation
0 0 ··· 0 1
0 0 ··· 1 0
F = · · · · ·· ··· ··· · · ·
0 1 ··· 0 0
1 0 ··· 0 0
So F ∗ = F and consequently E ∗ = E. By Theorem (6.1), E is an orthogonal projector. The matrix
representation of E is
1 0 ··· 0 1
0 1 · · · 1 0
· · · · · · · · · · · · · · ·
1
E= ··· ··· 2 ··· · · ·
2
· · · · · · · · · · · ·
· · ·
0 1 ··· 1 0
1 0 ··· 0 1
6.3.
Proof. We suppose the SVD of A is A = U ΣV ∗ . Then A is of full rank if and only if Σ is of full rank. Since
A∗ A = V Σ∗ ΣV ∗ , we can easily see A∗ A is nonsingular if and only if Σ is of full rank. Therefore, A∗ A is
nonsingular if and only if A has full rank.
An alternative proof: If A∗ A is singular, there exists x ̸= 0 such that A∗ Ax = 0, which implies ||Ax||22 =
∗ ∗
x A Ax = 0. So A is not of full rank. Conversely, if A is not of full rank, there must exist x ̸= 0 such that
Ax = 0. So A∗ Ax = 0, which implies A∗ A is singular. This argument appears in the proof of Theorem 11.1
(pp. 81).
6.4. (a)
14
√
1/ 2 0
Proof. range(A) is the column space of A, which has an orthonormal basis a1 = 0√ and a2 = 1.
1/ 2 0
So the orthogonal projector P onto range(A) can be written as P x = a∗1 xa1 + a∗2 xa2= a1 a∗1 x + a2 a∗2 x =
( ∗) 1/2 0 1/2
a
(a1 , a2 ) 1∗ x = AA∗ x, for any x ∈ C3 . Therefore, P = AA∗ = 0 1 0 (the derivation so
a2
1/2 0 1/2
far is just the discussion after formula (6.7)). Consequently, the image under P of the vector (1, 2, 3)∗ is
(2, 2, 2)∗ .
(b)
Proof. By formula (6.13),
5 2 1
1
P = B(B ∗ B)−1 B ∗ = 2 2 −2 .
6
1 −2 5
Accordingly, the image under P of the vector (1, 2, 3)∗ is (2, 0, 2)∗ .
6.5.
||P x||2
Proof. For any v ∈ range(P ), we have P v = v. So ||P ||2 = maxx∈C m \{0} ||x||2 ≥ 1. Suppose the SVD of P
is U ΣV ∗ . Then for any x ∈ Cm , we have
||P x||22 = x∗ P ∗ P x = x∗ V Σ∗ U ∗ U ΣV ∗ x = x∗ V Σ2 V ∗ x.
Therefore
||P x||2 ||ΣV ∗ x||2 ||Σx||2
||P ||2 = max = max = max = ||Σ||2 .
x∈Cm \{0} ||x||2 x∈Cm \{0} ||V ∗ x||2 x∈Cm \{0} ||x||2
7 QR Factorization
7.1. (a)
Proof. We note the columns of A are already orthogonal, so a reduced QR factorization and a full QR
factorization can be obtained by normalizing the columns of A
√1 √1 √
1 0 2
0 (√ ) 2
0 − √12 2 0
2 0
A = 0 1 = 0 1 = 0 1 0 0 1
√1
0 1 √1 1
√
1 0 2
0 2
0 2
0 0
(b)
Proof. We follow the Gram-Schmidt orthogonalization to produce the QR factorization of B = (b1 , b2 ). We
note ( )
r r12
(b1 , b2 ) = (q1 , q2 ) 11
0 r22
√ √ √ T √ √ 2 √
So r11 = ||b1 ||2 = 2 and q1 = b1 /||b1 ||2 = (1/ 2, 0, 1/ 2) . r12 = q1∗ b2 = (1/ 2, 0, 1/ 2) 1 = 2, so
0
√
2 √ 1/ 2 1
b2 − r12 q1 = 1 − 2 0√ = 1
0 1/ 2 −1
15
√
and r22 = 3. Consequently, the reduced QR factorization of B is
√ √
1/ 2 1/√3 (√ √ )
2 √2
B= 0√ 1/ √3 .
0 3
1/ 2 −1/ 3
To find out the full QR factorization of B, we can set (see, for example, Lax [1], Chapter 7, Exercise 20 (vii))
√
−1/√ 6
q3 = q1 × q2 = 2/√6 .
1/ 6
Then √ √ √ √ √
1/ 2 1/√3 −1/√ 6 2 √2
B = 0√ 1/ √3 2/√6 0 3 .
1/ 2 −1/ 3 1/ 6 0 0
7.2.
Proof. In the matrix R̂, we have rij = 0 if i > j or i ≤ j but i and j are not simultaneously even or odd.
7.3.
Proof. If A is not invertible, det A = 0 and we have nothing to prove. So without loss of generality, ∏mwe as-
sume A is of full rank. Let A = QR be the QR factorization of A, then | det A| = | det Q·det R| = | j=1 rjj |.
∑j−1
According to the Gram-Schmidt orthogonalization, rij = qi∗ aj (i ̸= j) and |rjj | = ||aj − i=1 rij qi ||2 . There-
fore
∑
j−1 ∑
j−1 ∑
j−1 ∑
j−1 ∑
j−1
|rjj |2 = ||aj − rij qi ||22 = (a∗j − rij qi∗ )(aj − rij qi ) = ||aj ||22 − 2 2
rij + 2
rij ≤ ||aj ||22 .
i=1 i=1 i=1 i=1 i=1
∏m ∏m
This implies | det A| = | j=1 rjj | ≤ j=1 ||aj ||2 . The geometric interpretation of this inequality for m =
2 is that the area of a parallelepiped is at most the product the lengths of its two neighboring sides,
where the equality holds if and only if its sides are orthogonal to each other. Higher dimension has similar
interpretation.
Remark 5. For an alternative proof, see Lax [1], Chapter 10, Theorem 11.
7.4.
Proof. P as the intersection of P (1) and P (2) is a straight line. So P is orthogonal to both x(1) × y (1) and
(1)
x(2) × y (2) . In the full QR factorization of the 3 × 2 matrix (x(1) , y (1) ), the third column q3 of the Q matrix
is a scalar multiple of x × y ; in the full QR factorization of the 3 × 2 matrix (x , y ), the third
(1) (1) (2) (2)
(2)
column q3 of the Q matrix is a scalar multiple of x(2) × y (2) . Then a nonzero vector v ∈ P can be chosen
(1) (2)
as q3 × q3 , which is the third column of the Q matrix in the full QR factorization of the 3 × 2 matrix
(1) (2)
(q3 , q3 ). Finding v is thus reduced to the computation of QR factorization of three 3 × 2 matrices.
7.5. (a)
16
If A is not of full rank, then there exists k (1 ≤ k ≤ n) such that ak can be represented by a linear
combination of a1 , · · · , ak−1 .2 By the Gram-Schmidt orthogonalization, we necessarily have rkk = 0.
Conversely, if rkk = 0 for some k with 1 ≤ k ≤ n,3 we must have
∑
k−1
ak = rik qi ∈ ⟨a1 , · · · , ak−1 ⟩.
i=1
That is, ak is a linear combination of a1 , · · · , ak−1 . So A is not of full rank. Combined, we conclude A has
rank n if and only if all the diagonal entries of R̂ are nonzero.
(b)
⟨a1 , · · · , ak ⟩ = ⟨q1 , · · · , qk ⟩.
So the rank of A is at least k. The following example shows that the rank of A could be larger than k:
1 0 0
Q̂R̂ = (q1 , q2 , q3 ) 0 0 1 = (q1 , 0, q2 ) = A.
0 0 0
8 Gram-Schmidt Orthogonalization
8.1.
Proof. To obtain q1 , we use the formula v1 = a1 and q1 = ||vv11 || . The computation of norm || · || involves m
multiplications, m − 1 additions and one square root. So the total number of flops needed for computing the
Euclidean norm of a vector in Rm is 2m. This implies the flops needed for obtaining q1 is 2m + m = 3m,
where the second term m comes from division of v1 by ||v1 ||. In summary, the flops for q1 is 3m.
Suppose we have obtained q1 , q2 , · · · , qj−1 , we count the flops needed for obtaining qj . Algorithm 8.1
uses the formula
vj = P⊥qj−1 · · · P⊥q2 P⊥q1 aj .
The projection P⊥q1 aj = aj −q1 q1∗ aj involves m multiplication and (m−1) addition for q1∗ aj , m multiplication
for q1 (q1∗ aj ), and m subtraction for aj −q1 (q1∗ aj ). So the total number of flops is m+(m−1)+m+m = 4m−1.
This type of computation will be repeated (j −1) times, so the flops needed to obtain vj is (4m−1)(j −1). As
v
shown in the case of q1 , the normalization qj = ||vjj || needs 3m flops. So the flops for qj is (4m−1)(j −1)+3m.
∑n
Summing up flops in each step, the total number of flops for Algorithm 8.1 is j=1 [(4m−1)(j −1)+3m] =
n(n−1)
(4m − 1) · 2 + 3mn.
8.2.
Proof. A translation of Algorithm 8.1 into MATLAB code is given below:
function [Q, R] = mgs(A)
17
%
% Implementation is based on Algorithm 8.1 of [1], pp. 58.
%
% Z.Y., 07/11/2010.
%
% Reference:
% [1] Lloyd N. Trefethen and David Bau III. Numerical Linear Algebra.
% Exercise 8.2.
[m, n] = size(A);
if (m<n)
error('number of rows must be no less than number of columns.');
end
Q = zeros(m,n);
R = zeros(n,n);
for i=1:n
Q(:, i) = A(:, i);
end
for i=1:n
R(i,i) = norm(Q(:,i),2);
Q(:,i) = Q(:,i)/R(i,i);
for j = (i+1):n
R(i,j) = Q(:,i)'*Q(:,j);
Q(:,j) = Q(:,j) - R(i,j)*Q(:,i);
end
end
end %mgs
An alternative but equivalent, both theoretically and numerically, implementation is a translation of formula
(8.7) into MATLAB code, which is given below:
function [Q, R] = mgs_alt(A)
[m, n] = size(A);
if (m<n)
18
error('number of rows must be no less than number of columns.');
end
Q = zeros(m,n);
R = zeros(n,n);
R(1,1) = norm(A(:,1),2);
Q(:,1) = A(:,1)/R(1,1);
for j=2:n
v = A(:,j);
for i=1:(j-1)
R(i,j) = Q(:,i)'*v;
v = v - R(i,j)*Q(:,i);
end
R(j,j) = norm(v,2);
Q(:,j) = v/R(j,j);
end
end %mgs_alt
8.3.
Proof. We have the following factorization of Rj :
1 0 ··· 0 0 ··· 0
0 1 ··· 0 0 ··· 0
· · · · · · · · · · · · · · · · · · · ··
Rj =
r r
0 0 · · · r1jj − rjj j(j+1)
· · · − rjn jj
0 0 ··· ··· 0
0 1
· · · · · · · · · · · · ··· ··· ···
0 0 ··· 0 0 ··· 1
1 0 ··· 0 0 ··· 0 1 0 ··· 0 0 ··· 0
0 1 · · · 0 0 · · · 0 0 1 ··· 0 0 ··· 0
· · · · · · · · · · · · · · · · · · · · · · · · · · · ··· ··· ··· ··· ···
= 0 0 · · · r1jj 0 ··· 0
0 0 ··· 1 −rj(j+1) ··· −rjn
0 0 · · · 0 1 · · · 0 0 0 ··· 0 1 ··· 0
· · · · · · · · · · · · · · · · · · · · · · · · · · · ··· ··· ··· ··· ···
0 0 ··· 0 0 ··· 1 0 0 ··· 0 0 ··· 1
From formula (8.10), we can easily see that the diagonal matrix corresponds to the line qi = vi /rii in
Algorithm 8.1, and the unit upper-triangular matrix corresponds to the line vj = vj − rij qi in Algorithm
8.1.
9 MATLAB
9.1.
Proof. The MATLAB code is given below:
function discrete_legendre_polynomials
19
%
% Z.Y., 07/11/2010.
%
% Reference
% [1] Lloyd N. Trefethen and David Bau III. Numerical Linear Algebra. SIAM,
% 1997.
% maximum error as a function of the (base 1/2) power of the grid spacing
p1 = 4;
p2 = 20;
errs = [];
for v = p1:p2
errs = [errs, legendre(v)];
end
figure(3);
slope = (log(errs(end))-log(errs(1)))/(p2-p1);
plot(p1:p2, log(errs));
ylabel('logarithm of max error between the first 4 discrete and continuous Legendre polynomials');
xlabel('(base 1/2) power of the grid spacing');
title(['slope = ', num2str(slope)]);
end %discrete_legendre_polynomials
% Experiement 1
x = (-2^v:2^v)'/2^v;
A = [x.^0 x.^1 x.^2 x.^3];
[Q, R] = qr(A, 0);
scale = Q(2^(v+1)+1,:);
Q = Q*diag(1./scale);
figure(1);
plot(x, Q);
figure(2);
subplot(2,2,1);
plot(x, Q(:,1)-L(:,1));
subplot(2,2,2);
plot(x, Q(:,2)-L(:,2));
subplot(2,2,3);
plot(x, Q(:,3)-L(:,3));
20
subplot(2,2,4);
plot(x, Q(:,4)-L(:,4));
error = max(max(abs(L-Q)));
end %legendre
slope = −0.69779
−2
logarithm of max error between the first 4 discrete and continuous Legendre polynomials
−4
−6
−8
−10
−12
−14
4 6 8 10 12 14 16 18 20
(base 1/2) power of the grid spacing
9.2.
Proof. The eigenvalue of A is 1; the determinant of A is 1; and the rank of A is m. To find the inverse of A, we
denote by N the m × m matrix with 1 on the first superdiagonal and 0 everywhere else. Then N is nilpotent
with power (m−1), i.e. N m−1 = 0. Therefore A−1 = (I +2N )−1 = I −2N +(2N )2 −· · ·+(−1)m−2 (2N )m−2 .
For example, when m = 3, we have
1 2 0 1 −2 4
A = 0 1 2 , A−1 = 0 1 −2 .
0 0 1 0 0 1
1
σm = .
||A−1 ||2
So an upper bound on σm is the inverse of a lower bound on√||A−1 ||2 . Since A−1 em = ((−2)m−1 , (−2)m−2 , · · · , 1)′ ,
we conclude ||A−1 ||2 ≥ ||((−2)m−1 , (−2)m−2 , · · · , 1)′ ||2 = 4√3−1 . This gives us
m
√
3
σm ≤ √ .
4m − 1
√ √
For example, when m = 3, σ ≈ 0.1939 while √ m3
4 −1
≈ 0.2182; when m = 5, σ ≈ 0.0470 while √ m3
4 −1
≈
0.0542.
21
10 Householder Triangularization
10.1. (a)
Proof. The Householder reflector F has the general form of I − 2qq ∗ , where q is a unit vector. If λ is an
eigenvalue and x is an associated eigenvector, we have x − 2qq ∗ x = λx, or equivalently, (1 − λ)x = 2(q ∗ x)q.
So 1 is an eigenvalue and the space of associated eigenvectors is ⊥ q = {x : q ∗ x = 0}. Geometrically, this
eigen-space is the hyperplane H in Figure 10.1, which is invariant under the Housholder reflection.
If λ ̸= 1, we can conclude x is a scalar multiple of q. Suppose x = µq with µ ̸= 0. Then (1 − λ)µ = 2µ. So
λ = −1 is an eigenvalue and the space of associated eigenvectors is ⟨q⟩ = {µq : µ ∈ C or R}. Geometrically,
this eigen-space is the straight line going through x and ||x||xe1 in Figure 10.1, which is invariance under the
reflection with respect to H.
Remark 6. Beside direct computation, we can verify that the Housholder reflector satisfies the condition
F 2 = I. So the eigenvalues must be either 1 or −1.
(b)
Proof. We note F ∗ = F . By Theorem 5.5, the singular values of F are all equal to 1.
10.2. (a)
[m, n] = size(A);
if (m<n)
error('number of rows cannot be smaller than number of columns.');
22
end
W = zeros(m,n);
for k=1:n
x = A(k:m, k);
if (x(1)>=0)
sgn = 1;
else
sgn = -1;
end
v = sgn*norm(x,2)*eye(m-k+1,1)+x;
v = v/norm(v,2);
A(k:m, k:n) = A(k:m, k:n) - 2*v*v'*A(k:m, k:n);
W(k:m,k) = v;
end
R = A(1:n,:);
end %house
(b)
%FORMQ takes the matrix W = (v1, ..., vn) produced by Algorithm 10.1 of
% [1] as input and generates a corresponding m x m orthogonal matrix
% Q.
%
% Z.Y., 7/14/2010.
%
% Reference
% [1] Lloyd N. Trefethen and David Bau III. Numerical Linear Algebra.
% SIAM, 1997. Algorithm 10.3. Exercise 10.2(b).
[m, n] = size(W);
if (m<n)
error('number of rows cannot be smaller than number of columns.');
end
Q = eye(m,m);
for k = 1:m
Q(:,k) = formQx(W,Q(:,k));
end
end %formQ
function y = formQx(W, x)
23
[m, n] = size(W);
if (m<n)
error('number of rows cannot be smaller than number of columns.');
end
if (~isvector(x))
error('x must be a vector.');
elseif (length(x) ~= m)
error('length of x must agree with the number of rows for W.');
end
for k = n:-1:1
x(k:m) = x(k:m)-2*W(k:m,k)*(W(k:m,k)'*x(k:m));
end
y = x;
end %formQx
10.3.
Proof. The Gram-Schmidt routine mgs of Exercise 8.2 produces the following Q and R:
0.1010 0.3162 0.5420
0.4041 0.3534 0.5162 9.8995 9.4954 9.6975
Q= 0.7071 0.3906 −0.5248 , R =
0 3.2919 3.0129 .
0.4041 −0.5580 0.3871 0 0 1.9701
0.4041 −0.5580 −0.1204
The Householder routines house and formQ of Exercise 10.2 produce the following Q and R:
−0.1010 −0.3162 0.5420 −0.6842 −0.3577
−0.4041 −0.3534 0.5162 0.3280 0.5812 −9.8995 −9.4954 −9.6975
Q= −0.7071 −0.3906 −0.5248 0.0094 −0.2683 , R =
0 −3.2919 −3.0129 .
−0.4041 0.5580 0.3871 0.3656 −0.4918 0 0 1.9701
−0.4041 0.5580 −0.1204 −0.5390 0.4695
24
( ) ( ) ( 1−c ) ( )
x x x + sy x
So the midpoint of and F is s2 c+12 , which is perpendicular to the vector connecting
y y 2 + 2 y
y
( )
x
and F :
y
[( ) ( )]∗ [( ) ( )] [ ( ) ( )]
1 x x x x 1 x x
+F −F = (x, y) − (x, y)F ∗ F = 0.
2 y y y y 2 y y
A+ = (A∗ A)−1 A∗ = (R̂∗ Q̂∗ Q̂R̂)−1 (R̂∗ Q̂∗ ) = R̂−1 Q̂∗ = A−1 ∗
1 Q̂1 Q̂ .
This implies ||Q̂1 Q̂∗ ||2 ≤ 1. Combined, we can conclude ||A+ ||2 ≤ ||A−1
1 ||2 .
Remark 8. We make the following interesting observation. Suppose Q is an m × m orthogonal matrix and
we write it in the form of ( )
Q11 Q12
Q= ,
Q21 Q22
where Q11 is of size n × n, Q12 is of size n × (m − n), Q21 is of size (m − n) × n, and Q22 is of size
(m − n) × (m − n). In general, we cannot claim Q11 is orthogonal and conclude ||Q11 ||2 = 1, but we can
conclude Q11 is a contraction under the 2-norm, i.e. ||Q11 ||2 ≤ 1. This is because for any z ∈ Rn or Cn ,
( ) ( )
Q11 Q12 z
||Q11 z||2 ≤ = ||z||2 ,
Q21 Q22 0
where the special property that an orthogonal matrix preserves 2-norm is used.
11.2. (a)
25
∫2[ ]2
Proof. The problem is to find coefficient a1 , a2 and a3 such that 1 a1 ex + a2 sin x + a3 Γ(x) − x−1 dx is
∫2[ ]2
minimized. Let F (a1 , a2 , a3 ) = 1 a1 ex + a2 sin x + a3 Γ(x) − x−1 dx. Then the conditions for extremum
∂
∂a1 F (a1 , a2 , a3 ) = 0
∂
∂a2 F (a1 , a2 , a3 ) = 0
∂
∂a3 F (a1 , a2 , a3 ) = 0
Using the numerical integration routine of MATLAB (quad), we can compute the matrix in the above
equation as well as the column vector in the right side of the equation. Solving it gives us the (theoretically)
accurate solution of optimal fitting.
function [acc_coeff, acc_L2err, apprx_coeff, apprx_L2err] = L2Approx(l,r)
acc_coeff = accA\accb;
acc_L2err = quad(@(x)L2Err(x,acc_coeff), l, r, tol);
26
figure(1);
step = 0.001;
x = l:step:r;
y = exp(x)*acc_coeff(1) + sin(x)*acc_coeff(2) + gamma(x)*acc_coeff(3);
z = 1./x;
plot(x, z, 'b', x, y, 'r');
legend('1/x', [num2str(acc_coeff(1)), 'exp(x)+', ...
num2str(acc_coeff(2)), 'sin(x)+', ...
num2str(acc_coeff(3)), 'gamma(x)']);
grid on;
title(['Approximation of 1/x by linear combination of exp(x), sin(x), ',...
'gamma(x) in L2-norm over [', num2str(l),',',num2str(r),']', ...
', L2 err: ', num2str(acc_L2err)]);
figure(2);
plot(xx, 1./xx, 'b', xx, apprxA*apprx_coeff, 'r');
legend('1/x', [num2str(apprx_coeff(1)), 'exp(x)+', ...
num2str(apprx_coeff(2)), 'sin(x)+', ...
num2str(apprx_coeff(3)), 'gamma(x)']);
grid on;
title(['Approximation of 1/x by linear combination of exp(x), sin(x), ',...
'gamma(x) in discrete L2-norm over [', num2str(l),',',num2str(r),...
'], discrete L2 err: ', num2str(apprx_L2err)]);
end %appox
Remark 9. The help information of MATLAB, under the entry pseudoinverses, explains three equivalent
ways of solving a rank-deficient system: If A is m-by-n with m > n and full rank n, then each of the three
statements
x = A\b
x = pinv(A)*b
x = inv(A'*A)*A'*b
theoretically computes the same least squares solution x, although the backslash operator does it faster.
11.3.
27
Proof. The MATLAB code for this exercise is given below:
function x = poly_apprx
format long;
m = 50; n = 12;
t = linspace(0,1,m);
A = fliplr(vander(t));
A = A(:,1:12);
b = cos(4*t)';
28
coefficients x1 x2 x3
a0 1.00000001465169 1.00000000350410 1.00000000099659
a1 -0.00000449714573 -0.00000117405761 -0.00000042274146
a2 -7.99982732153282 -7.99995281040164 -7.99998123572225
a3 -0.00259730734173 -0.00074001641075 -0.00031876281895
a4 10.68694453795226 10.67267129913343 10.66943079325323
a5 -0.09313280176144 -0.02850450420331 -0.01382027756329
a6 -5.42151357259457 -5.60529257307527 -5.64707565382729
a7 -0.48952347964704 -0.15207694321151 -0.07531598015849
a8 2.18420198493399 1.78455754119606 1.69360691536360
a9 -0.35588588385285 -0.06108461047809 0.00603214174793
a10 -0.22300544585926 -0.34618754322297 -0.37424171636699
a11 0.06070014261595 0.08296770979626 0.08804057827562
coefficients x4 x5 x6
a0 1.00000000099661 1.00000000099661 1.00000000099661
a1 -0.00000042274331 -0.00000042274367 -0.00000042274330
a2 -7.99998123567923 -7.99998123566715 -7.99998123567958
a3 -0.00031876330399 -0.00031876345909 -0.00031876329890
a4 10.66943079635676 10.66943079740723 10.66943079631425
a5 -0.01382028980042 -0.01382029406980 -0.01382028959570
a6 -5.64707562270147 -5.64707561163632 -5.64707562330371
a7 -0.07531603222675 -0.07531605097010 -0.07531603110511
a8 1.69360697232565 1.69360699300149 1.69360697099570
a9 0.00603210250426 0.00603208818858 0.00603210347811
a10 -0.37424170091224 -0.37424169526212 -0.37424170131392
a11 0.08804057562238 0.08804057465256 0.08804057569379
References
[1] Peter Lax. Linear algebra and its applications, 2nd Edition. John Wiley & Sons, Inc., New York, 2007.
1, 7, 5, 10
[2] James R. Munkres. Analysis on manifolds. Westview Press, 1997. 1
29