Caam 453 Numerical Analysis I: 6 October 2009 M. Embree, Rice University

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

CAAM 453 · NUMERICAL ANALYSIS I

Lecture 17: The Singular Value Decomposition: Theory

3.2. The singular value decomposition.


Both the normal equation and QR approaches to solving the discrete linear least squares problem
C
assume that the matrix A ∈ m×n has full column rank, i.e., its columns are linearly independent,
implying that both A∗ A and R1 are invertible. What if this is not the case? The discrete least
squares problem still makes sense, but we need a more robust computational approach to determine
the solution. What if the columns of A are close to being linearly dependent? What does it even
mean to be ‘close’ to linear dependence?
To answer these questions, we shall investigate one of the most important matrix factorizations,
the singular value decomposition (SVD). This factorization writes a matrix as the product of a
unitary matrix times a diagonal matrix times another unitary matrix. It is an incredibly useful
tool for proving a variety of results in matrix theory, but it also has essential computational appli-
cations: from the SVD we immediately obtain bases for the four fundamental subspaces, Ran(A),
Ker(A), Ran(A∗ ), and Ker(A∗ ). Furthermore, the SVD facilitates the robust solution of a variety
of approximation problems, including not only least squares problems with rank-deficient A, but
also other low-rank matrix approximation problems that arise throughout engineering, statistics,
the physical sciences, and social science.
There are several ways to derive the singular value decomposition. We shall constructively prove
the SVD based on analysis of A∗ A; Trefethen and Bau follow an alternative approach somewhat
different from the one we describe; see their Theorem 4.1. Before beginning, we must recall some
fundamental results from linear algebra.
3.2.1. Hermitian positive definite matrices.
C
Theorem (Spectral Theorem). Suppose H ∈ n×n is Hermitian. Then there exist n (not neces-
sarily distinct) eigenvalues λ1 , . . . , λn and corresponding unit eigenvectors v1 , . . . , vn such that

Hvj = λj vj

and the eigenvectors form an orthonormal basis for Cn.


Theorem. All eigenvalues of a Hermitian matrix are real.
Proof. Let (λj , vj ) be an arbitrary eigenpair of the Hermitian matrix H, so that Hvj = λj vj .
Without loss of generality, we can assume that vj is scaled so that kvj k2 = 1. Thus

λj = λj (v∗j vj ) = v∗j (λj vj ) = v∗j (Hvj ) = v∗j H∗ vj = (Hvj )∗ vj = (λj vj )∗ vj = λj v∗j vj = λj .

Thus λj = λj , which is only possible if λj is real.


C
Definition. A Hermitian matrix H ∈ n×n is positive definite provided x∗ Hx > 0 for all nonzero
C C
x ∈ n ; if x∗ Hx ≥ 0 for all x ∈ n , we say H is positive semidefinite.
Theorem. A Hermitian positive semidefinite matrix has nonnegative real eigenvalues.
Proof. Let (λj , vj ) denote an eigenpair of the Hermitian positive semidefinite matrix H ∈ Cn×n
with kvj k22 = v∗j vj = 1. Since H is Hermitian, λj must be real. We conclude that

λj = λj v∗j vj = v∗j (λj vj ) = v∗j Hvj ≥ 0

6 October 2009 17-1 M. Embree, Rice University


CAAM 453 · NUMERICAL ANALYSIS I

since H is positive semidefinite.


3.2.2. Derivation of the singular value decomposition.
C
Suppose A ∈ m×n with m ≥ n. The n-by-n matrix A∗ A is always Hermitian positive semidefinite.
C
(Clearly (A∗ A)∗ = A∗ (A∗ )∗ = A∗ A, so A∗ A is Hermitian. For any x ∈ n , note that x∗ A∗ Ax =
(Ax)∗ (Ax) = kAxk22 ≥ 0, so A∗ A is positive semidefinite.)
Step 1. As a consequence of results presented in §3.2.1, we can construct n eigenpairs {(λj , vj )}nj=1
of A∗ A with unit eigenvectors (vj∗ vj = 1) that are orthogonal to one another (vj∗ vk = 0 when
j 6= k). We are free to pick any convenient indexing for these eigenpairs; it will be convenient to
label them so that the eigenvalues are decreasing in magnitude, λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0.
Step 2. Define σj = kAvj k2 .
Note that σj2 = kAvj k22 = v∗j A∗ Avj = λj . Since the eigenvalues λ1 , . . . , λn are decreasing in
magnitude, so are the σj values: σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0.
C
Step 3. Next, we will build a set of related orthonormal vectors in m . Suppose we have already
constructed such vectors u1 , . . . , uj−1 .
If σj 6= 0, then define uj = σj−1 Avj , so that kuj k2 = σj−1 kAvj k2 = 1.
If σj = 0, then pick uj to be any unit vector such that
uj ∈ span{u1 , . . . , uj−1 }⊥ ;
i.e., ensure u∗j uk = 0 for all k < j.†
By construction, u∗j uk = 0 for j 6= k if σj or σk is zero. If both σj and σk are nonzero, then
1 1 ∗ ∗ λk ∗
u∗j uk = (Avj )∗ (Avk ) = vj A Avk = vv,
σj σk σj σk σj σk j k
where we used the fact that vj is an eigenvector of A∗ A. Now if j 6= k, then v∗j vk = 0, and hence
u∗j uk = 0. On the other hand, j = k implies that v∗j vk = 1, so u∗j uk = λj /σj2 = 1.
In conclusion, we have constructed a set of orthonormal vectors {uj }nj=1 with uj ∈ Cm.
Step 4. For all j = 1, . . . , n,
Avj = σj uj ,
regardless of whether σj = 0 or not. We can stack these n vector equations as columns of a single
matrix equation,
   
| | | | | |
 Av1 Av2 · · · Avn  =  σ1 u1 σ2 u2 · · · σn un  .
| | | | | |
Note that both matrices in this equation can be factored into the product of simpler matrices:
 
    σ1
| | | | | |  σ2 
A  v1 v2 · · · vn  =  u1 u2 · · · un   .
 
..
| | | | | |
 . 
σn

Note that σj = 0 implies that λj = 0, and so A∗ A has a zero eigenvalue; i.e., this matrix is singular. Recall from
the last lecture that this case can only occur when A is rank-deficient: dim(Ran(A)) < n.

6 October 2009 17-2 M. Embree, Rice University


CAAM 453 · NUMERICAL ANALYSIS I

Denote these matrices as AV = U b where A ∈


b Σ, Cm×n, V ∈ Cn×n, Ub ∈ Cm×n, and Σb ∈ Cn×n.
The (j, k) entry of V∗ V is simply v∗j vk , and so V∗ V = I. Since V is a square matrix, we have just
proved that it is unitary. Hence, VV∗ = I as well, and we conclude that

A=U b ∗.
b ΣV

This matrix factorization is known as the reduced singular value decomposition. It can be obtained
via the MATLAB command

[Uhat, Sighat, V] = svd(A,0);

While the matrix Ub has orthonormal columns, it is not a unitary matrix. In particular, we have
U∗
b Ub =I∈ n×n C
, but
U b ∗ ∈ m×m
bU C
cannot be the identity unless m = n. (To see this, note that U
bUb ∗ is an orthogonal projection onto
Ran(U) = span{u1 , . . . , un }. Since dim(Ran(U)) = n, this projection cannot equal the m-by-m
b b
identity matrix when m > n.)
Though U b is not unitary, we might call it subunitary.‡ We can construct m − n additional column
vectors to append to Ub to make it unitary. Here is the recipe: For j = n + 1, . . . , m, pick

uj ∈ span{u1 , . . . , uj−1 }⊥

with u∗j uj = 1. Then define


 
| | |
U =  u1 u2 · · · um  .
| | |
It is simple to confirm that U∗ U = UU∗ = I ∈ Cm×m, so U is unitary.
We wish to replace the Ub in the reduced SVD with the unitary matrix U. To do so, we also need
to replace Σ
b by some Σ in such a way that U bΣb = UΣ. The simplest approach is to obtain Σ by
appending zeros to the end of Σ, thus ensuring there is no contribution when the new entries of U
b
multiply against the new entries of Σ:

C
 
Σ
∈ m×n .
b
Σ=
0

Finally, we are prepared to state our main result, the full singular value decomposition.
Theorem (Singular value decomposition). Any matrix A ∈ Cm×n can be written in the form
A = UΣV∗ ,

C C C
where U ∈ m×m and V ∈ n×n are unitary matrices and Σ ∈ m×n is zero everywhere except
for entries on the main diagonal, where the (j, j) entry is σj for j = 1, . . . , min(m, n), and

σ1 ≥ σ2 ≥ · · · ≥ σmin(m,n) ≥ 0.

There is no universally accepted term for such a matrix; Gilbert Strang suggests the descriptive term subunitary.

6 October 2009 17-3 M. Embree, Rice University


CAAM 453 · NUMERICAL ANALYSIS I

We have only proved this result for m ≥ n. The proof for m < n is obtained by applying the same
arguments above to A∗ in place of A.
The full SVD is obtained via the MATLAB command

[U,S,V] = svd(A).

6 October 2009 17-4 M. Embree, Rice University

You might also like