Polar Decomposition

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Polar decomposition

November 25, 2019

Contents
1 Introduction 1

2 Orthogonal decompositions 4

3 Length of T v 5

4 Proof of the polar decomposition 9

1 Introduction
The polar decomposition proven in the book (Theorem 7.45 on page 233)
concerns a linear map T ∈ L(V ) from a single inner product space to itself.
Exactly the same ideas treat that the case of a map

T ∈ L(V, W ) (V, W fin-diml inner product spaces). (1.1a)

I stated the result in class; the point of these notes is to write down some
details of the proof, as well as to talk a bit more about why the result is
useful. To get to the statement, we need some notation. I’ll think of the
linear map T as an arrow going from V to W :
T
V −−−→ W ; (1.1b)

this is just another notation for saying that T is a function that takes any-
thing in v ∈ V and gives you something T (v) ∈ W . Because these are inner
product spaces, we get also the adjoint of T
T∗
W −−−−→ V, hT v, wi = hv, T ∗ wi (v ∈ V, w ∈ W ). (1.1c)

1
The property written on the right defines the linear map T ∗ . Using these
two maps, we immediately get two subspaces of each of V and W :

Null(T ), Range(T ∗ ) ⊂ V, Null(T ∗ ), Range(T ) ⊂ W. (1.1d)

The first basic fact is that these spaces provide orthogonal direct sum de-
compositions of V and W .

Proposition 1.2. In the setting of (1.1),

1. The null space of T is the orthogonal complement of the range of T ∗ :


Null(T ) = Range(T ∗ )⊥ .

2. The null space of T ∗ is the orthogonal complement of the range of T :


Null(T ∗ ) = Range(T )⊥ .

3. The space V is the orthogonal direct sum of Null(T ) and Range(T ∗ ):


V = Null(T ) ⊕ Range(T ∗ ).

4. The space W is the orthogonal direct sum of Null(T ∗ ) and Range(T ):


W = Null(T ∗ ) ⊕ Range(T ).

5. The natural quotient map π from V to V / Null(T ) (text, 3.88) restricts


to an isomorphism of vector spaces
'
Range(T ∗ ) −−→ V / Null(T ).

Te
6. The natural isomorphism V / Null(T ) −−→ Range(T ) (text, 3.91) re-
stricts to an isomorphism of vector spaces
T
Range(T ∗ ) −−−→ Range(T ).

7. The natural quotient map π from W to W/ Null(T ∗ ) (see text, 3.88)


restricts to an isomorphism of vector spaces
'
Range(T ) −−→ W/ Null(T ∗ ).

Te∗
8. The natural isomorphism W/ Null(T ∗ ) −−→ Range(T ∗ ) (see text, 3.91)
restricts to an isomorphism of vector spaces
T∗
Range(T ) −−−→ Range(T ∗ ).

2
I’ll outline proofs in Section 2. (Even better, you should try to write
down proofs yourself.)
Here is the polar decomposition I stated in class.

Theorem 1.3. In the setting of (1.1), there is a factorization

T = SP (S ∈ L(V, W ), P ∈ L(V ))

characterized uniquely by the following properties:

1. P is a positive self-adjoint operator;

2. S restricts to an isometry from Range(T ∗ ) onto Range(T ); and

3. P |Null(T ) = S|Null(T ) = 0.

In addition, the decomposition has the following properties.

4. The operator P is the (unique) positive square root of T ∗ T .

Write λmax and λmin and for the largest and smallest eigenvalues of P (which
are the square roots of the corresponding eigenvalues of T ∗ T ). Then

5.
kT vk/kvk ≤ λmax (0 6= v ∈ V )
with the maximum attained exactly on the λmax eigenspace of P .

6.
kT vk/kvk ≥ λmin (0 6= v ∈ V )
with the minimum attained exactly on the λmin eigenspace of P .

If dim(V ) = dim(W ), then there is another factorization (not unique)

T = S0P

in which conditions (3) above is replaced by

3.0 S 0 restricts to an isometry (which may be arbitrary) from Null(T ) onto


Null(T ∗ ).

It is equivalent to require that

3.00 S 0 is an isometry from V onto W .

3
It is the factorization T = S 0 P that is established in the text (when
V = W ). Even in that case, I like T = SP better because it’s unique, and
unique is (almost) always better. The two maps S and S 0 differ only on
Null(T ); so if T is invertible, S = S 0 .
Items (5) and (6) correspond to an application that I discussed in class:
often one cares about controlling how a linear map can change the sizes of
vectors. This result answers that question very precisely. I’ll discuss this in
Section 3.
The proof will be outlined in Section 4. This is harder than Proposition
1.2; but it’s still very worthwhile first to try yourself to write proofs.

2 Orthogonal decompositions
This section is devoted to the proof of Proposition 1.2.

Proof of Proposition 1.2. The idea that’s used again and again is that in an
inner product space V

v = 0 ⇐⇒ hv, v 0 i = 0 (all v 0 ∈ V ). (2.1)

We use this idea to prove 1.

Null(T ) = {v ∈ V | T v = 0} (definition)
= {v ∈ V | hT v, wi = 0 (all w ∈ W )} (by (2.1))

= {v ∈ V | hv, T wi = 0 (all w ∈ W )} (by (1.1c))
= Range(T ∗ )⊥ (definition of U ⊥ ).

This proves 1.
Part 2 is just 1 applied to T ∗ , together with the (very easy; you should
write a proof!) T ∗∗ = T .
Part 3 follows from 1 and Theorem 6.47 in the text; and part 4 is 3
applied to T ∗ .
For part 5, suppose V = V1 ⊕ V2 is any direct sum decomposition of any
finite-dimensional V over any field F ; the claim is that the natural quotient
map
π : V → V /V1
restricts to an isomorphism
'
V2 −−→ V /V1 .

4
To see this, notice first that

dim V2 = dim V − dim V1 = dim(V /V1 );

(Theorem 3.78 on page 93 and 3.89 on page 97)so the two vector spaces V
and V /V1 have the same dimension. Second, Null(π) = V1 (text, proof of
Theorem 3.89), so

Null(V2 → V /V1 ) = V2 ∩ V1 = 0

(text, 1.45). So our map is an injection between vector spaces of the same
dimension; so it is an isomorphism.
For 6, the map we are looking at is the composition of the isomorphism
from 5 and the isomorphism from Proposition 3.91 in the text; so it is also
an isomorphism.
Parts 7 and 8 are just 5 and 6 applied to T ∗ .

3 Length of T v
This section concerns the general question “how big is T v?” whenever this
question makes sense: for us, that’s the setting (1.1a). I’ll start with a
slightly different result, proved in class.
Proposition 3.1. Suppose S ∈ L(V ) is a self-adjoint linear operator. De-
fine
µmin = smallest eigenvalue of S
µmax = largest eigenvalue of S
Then for all nonzero v ∈ V ,

µmin ≤ hSv, vi/hv, vi ≤ µmax .

All values in this range are attained. The first inequality is an equality if
and only if v is an eigenvector for µmin ; that is, if and only if v ∈ Vµmin .
The second inequality is an equality if and only if v is an eigenvector for
µmax ; that is, if and only if v ∈ Vµmax .
Proof. According to the Spectral Theorem for self-adjoint operators (which
is in the text, but not so easy to point to; there is a simple statement in the
notes on the spectral theorem on the class web site) the eigenvalues of S are
all real, so they can be arranged as

µmin = µ1 < µ2 < · · · < µr−1 < µr = µmax . (3.2a)

5
Furthermore V is the orthogonal direct sum of the eigenspaces:

V = Vµ1 ⊕ · · · ⊕ Vµr . (3.2b)

If we choose an orthonormal basis (e1i , . . . , eni i ) of each eigenspace Vµi (with


ni = dim Vµi ), then we get an orthonormal basis (eij ) of V , with i running
from 1 to r and j running from 1 to ni . (The total size of the orthonormal
basis is therefore
n1 + · · · + nr = n = dim V.) (3.2c)
Once we have chosen this basis, we can write any vector in V as
ni
r X
vij eji (vij ∈ F ).
X
v= (3.2d)
i=1 j=1

Because eji belongs to the µi eigenspace Vµi , we find


ni
r X
µi vij eji ,
X
Sv = (3.2e)
i=1 j=1

So far this calculation would have worked for any diagonalizable S. Now we
use the fact that S is self-adjoint, and therefore that we could choose the eji
to be orthonormal. This allows us to calculate
0 0 0 0
hµi vij eji , vij0 eki0 i = µi vij vij0 heji , eji0 i
0
(
(µi vij vij0 ) · 1 (i = i0 , j = j 0 )
= 0
(µi vij vij0 ) · 0 i 6= i0 or j 6= j 0 )
(
µi |vij |2 (i = i0 , j = j 0 )
=
0 i 6= i0 or j 6= j 0 ).

Using this calculation to expand hSv, vi, we find


   
r nr nr
|vij |2  /  i|vij |2  .
X X X
hSv, vi/hv, vi =  µi ( (3.2f)
i=1 j=1 j=1

Here’s how you take a weighted average of the real numbers µ1 , . . . µr .


Fix r real numbers w1 , . . . , wr such that

0 ≤ wi ≤ 1, w1 + · · · + wr = 1 (3.2g)

6
These numbers are the weights. Simplest weights are the uniform weight
(1/r, · · · , 1/r). The weighted average is

µ = w1 µ 1 + · · · + wr µ r . (3.2h)

If the weight is uniform, then the weighted average is

µ = (µ1 + · · · + µr )/r,

the ordinary average. The opposite extreme is the teacher’s pet weight
(
1 (i = p)
wi =
0 (i 6= p)

where all the weight is on a single value µp . The weighted average for the
teacher’s pet weight is
µ = µp .
No matter what weights you use, it is always true that

µmin ≤ µ ≤ µmax . (3.2i)

The first inequality is an equality if and only if the weights are concentrated
on the minimum values:

wi 6= 0 ⇐⇒ µi = µmin ,

and similarly for the second inequality. I won’t write out a proof of (3.2i),
but here’s how to start. For each i, the definitions of min and max say that

µmin ≤ µi ≤ µmax .

We can multiply inequalities by a nonnegative real number like wi , getting


r inequalities
wi µmin ≤ wi µi ≤ wi µmax .
Now add up these r inequalities to get (3.2i).
Each nonzero vector v defines a set of nonzero weights
   
nr nr
r X
|vij0 |2  /  |vij |2  .
X X
wi0 =  (3.2j)
j=1 i=1 j=1

7
You should convince yourself that this really is a set of weights (that they
are non-negative real numbers adding up to 1). Now (3.2f) says that
r
X
hSv, vi/hv, vi = wi µi = weighted average of eigenvalues of S. (3.2k)
i=1

Now (3.2i) gives the inequalities in the proposition. You should also convince
yourself that the conditions for equality given for weighted averages after
(3.2i) lead exactly to the conditions for equality stated in the proposition.

Here’s an exercise in thinking about careful mathematical formulations.


Suppose V has dimension zero. In this case there are no eigenvalues, so
the largest and smallest eigenvalues are not defined. Is the formulation of
Proposition 3.1 wrong in this case?
Now we can talk about the length of T v.

Proposition 3.3. Suppose we are in the setting (1.1). Write λ2max and λ2min
and for the largest and smallest eigenvalues of T ∗ T , with 0 ≤ λmin ≤ λmax .
Then

1.
kT vk/kvk ≤ λmax (0 6= v ∈ V )
with the maximum attained exactly on the λ2max eigenspace of T ∗ T .

2.
kT vk/kvk ≥ λmin (0 6= v ∈ V )
with the minimum attained exactly on the λ2min eigenspace of T ∗ T .

3. Suppose P ∈ L(V, V ) is the nonnegative self-adjoint square root of


T ∗ T . Then
kT vk = kP vk (v ∈ V ).

4. Suppose R ∈ L(V, U ) is any linear map such that R∗ R = T ∗ T . Then

kT vk = kRvk (v ∈ V ).

Proof. Because the inequalities in the proposition involve non-negative real


numbers, they are equivalent to the squared versions:

λ2min ≤ hT v, T vi/hv, vi ≤ λ2max .

8
Using the definition of adjoint, these can be written

λ2min ≤ hT ∗ T v, vi/hv, vi ≤ λ2max .

In this form it is precisely Proposition 3.1 applied to the self-adjoint operator


T ∗T .
The proof of the inequalities was based on the formula

hT v, T vi = hT ∗ T v, vi > .

A consequence is that the length of T v depends only on T ∗ T . For 3, we


have
P ∗P = P 2 = T ∗T
by the definition of P ; this proves 3. The same argument proves 4.

4 Proof of the polar decomposition


Proof of Theorem 1.3. We address uniqueness first. So suppose that we
have two factorizations T = SP and T = SePe satisfying the properties 1,
2, and 3; we must show that S = Se and P = Pe. Write S1 and S f1 for the

restrictions to Range(T ). Because of hypothesis Theorem 1.3(2),

S1∗ S1 = IRange(T ∗ ) , f∗ S
S1 1 = IRange(T ∗ ) .
f (4.1a)

(Here IU means the identity operator on the vector space U .) Consequently

S ∗ S|Range(T ∗ ) = Se∗ S|
e Range(T ∗ ) = IRange(T ∗ ) . (4.1b)

Meanwhile condition 3 of the theorem guarantees that

S ∗ S|Null(T ) = Se∗ S|
e Null(T ) = 0Null(T ) . (4.1c)

Because of the direct sum decomposition Proposition 1.2(1), it follows that

S ∗ S = Se∗ Se = PRange(T ∗ ) , (4.1d)

the orthogonal projection on the range of T ∗ .


Now the factorization T = SP and the assumption that P is self-adjoint
means that T ∗ = P S ∗ . Therefore

T ∗ T = P S ∗ SP = P (PRange(T ∗ ) )P. (4.1e)

9
To continue, we need to know that

PRange(T ∗ ) P = P PRange(T ∗ ) . (4.1f)

We already know from Theorem 1.3(3) that

Null(T ) ⊂ Null(P ). (4.1g)

Write Vµ (P ) for the µ eigenspace of P on V . Because the eigenspaces of the


self-adjoint P must be orthogonal, it follows that for any nonzero µ,

Vµ (P ) ⊂ Null(P )⊥ ⊂ Null(T )⊥ = Range(T ∗ ). (4.1h)

Similarly,

Null(P ) = V0 (P ) = Null(T ) ⊕ (V0 (P ) ∩ Range(T ∗ ), (4.1i)

an orthogonal direct sum decomposition. That is, each eigenspace of P is


the orthogonal direct sum of its intersections with Null(T ) and Range(T ∗ ).
From this fact (4.1f) follows. We also find

Range(P ) ⊂ Range(T ∗ ) (4.1j)

Now we can plug (4.1f) into (4.1e) to get

T ∗ T = PRange(T ∗ ) P 2 . (4.1k)

Taking into account (4.1j), this becomes

T ∗T = P 2. (4.1l)

Now Theorem 1.3(4) follows immediately; and in particular this shows that

P = Pe. (4.1m)

We also know from Proposition 1.2(6) and (8) that T ∗ T is an isomorphism


on Range(T ∗ ). Therefore (using (4.1l) again) Null(P ) ∩ Range(T ∗ ) = 0, so
P |Range(T ∗ ) is invertible. Therefore the factorization gives

S1 = T (P |Range(T ∗ ) )−1 = S
f1 . (4.1n)

Since S and Se both vanish on Null(T ), it follows that

S = S,
e (4.1o)

10
completing the uniqueness proof for the decomposition. (We also proved (4)
in the process.
For the existence of the decomposition, we define P = (T ∗ T )1/2 , the
unique positive square root, as we must. By Proposition 3.3(3)

kT vk = kP vk (v ∈ V ). (4.1p)

This implies immediately that Null(T ) = Null(P ), so

Null(P ) ∩ Range(T ∗ ) = Null(P ) ∩ (Null(T )⊥ ) = 0. (4.1q)

Therefore
P1 =def P |Range(T ∗ ) ∈ L(Range(T ∗ ) (4.1r)
is invertible; and P1 inherits from P the property

kP1 vk = kT vk (v ∈ Range(T ∗ ) = Null(T )⊥ ). (4.1s)

Now the invertibility of P1 allows us to define

S1 = T P1−1 ∈ L(Range(T ∗ ), W ), (4.1t)

and (4.1s) (applied to P1 v) says that

kS1 vk = kvk (v ∈ Range(T ∗ ); (4.1u)

this is Theorem 1.3(2). Now the orthogonal decomposition

V = Null(T ) ⊕ Range(T ∗ )

of Proposition 1.2(3) allows us to define S ∈ L(V, W ) by

S(n + v 0 ) = S1 (v 0 ) (n ∈ Null(T ), v 0 ∈ Range(T ∗ )) (4.1v)

Then Theorem 1.3(2) and (3) are true by this definition and (4.1u). Fur-
thermore
T (n + v 0 ) = (T |Range(T ∗ ) )(v 0 )
= (T |Range(T ∗ ) )P1−1 P1 v 0
(4.1w)
= S1 P1 v 0
= SP (n + v 0 ).
This proves the factorization T = SP .
Parts (5) and (6) of Theorem 1.3 are contained in Proposition 3.3.

11
For the last assertion (about an alternate factorization), it’s clear from
the preceding proof that we can achieve the factorization using any S 0 which
agrees with S on Range(P ) = Null(T )⊥ . To complete the definition of S 0 ,
we just need to define it (as any linear map to W ) on Null(T ). The hy-
pothesis dim V = dim W and Proposition 1.2 guarantee that dim Null(T ) =
dim Null(T ∗ ); so we can choose S 0 on Null(T ) to be an isometry to Null(T ∗ ).
I’ll omit the rest of the details.

12

You might also like