A Metric for Covariance Matrices∗
Wolfgang Förstner, Boudewijn Moonen
Institut für Photogrammetrie, Universität Bonn
Nussallee 15, D-53115 Bonn, e-mail: wf|
[email protected]
Diese Sätze führen dahin, die Theorie der krummen Flächen aus einem neuen Gesichtspunkt
zu betrachten, wo sich der Untersuchung ein weites noch ganz unangebautes Feld öffnet
. . . so begreift man, dass zweierlei wesentlich verschiedene Relationen zu unterscheiden sind,
theils nemlich solche, die eine bestimmte Form der Fläche im Raume voraussetzen, theils
solche, welche von den verschiedenen Formen . . . unabhängig sind. Die letzteren sind es,
wovon hier die Rede ist . . . man sieht aber leicht, dass eben dahin die Betrachtung der auf
der Fläche construirten Figuren, . . . , die Verbindung der Punkte durch kürzeste Linien∗] u.
dgl. gehört. Alle solche Untersuchungen müssen davon ausgehen, dass die Natur der
krummen Fläche an sich durch den Ausdruck eines unbestimmten Linearelements in der
√
Form (Edp2 + 2F dpdq + Gdq 2 ) gegeben ist . . .
Carl Friedrich Gauss
Abstract
The paper presents a metric for positive definite covariance matrices. It is a natural expression
involving traces and joint eigenvalues of the matrices. It is shown to be the distance coming from
a canonical invariant Riemannian metric on the space Sym+ (n, R) of real symmetric positive
definite matrices
In contrast to known measures, collected e. g. in Grafarend 1972, the metric is invariant
under affine transformations and inversion. It can be used for evaluating covariance matrices or
for optimization of measurement designs.
Keywords: Covariance matrices, metric, Lie groups, Riemannian manifolds, exponential mapping, symmetric spaces
1
Background
The optimization of geodetic networks is a classical problem that has gained large attention in
the 70s.
1972 E. W. Grafarend put together the current knowledge of network design, datum transformations and artificial covariance matrices using covariance functions in his classical monograph
[5]; see also [6]. One critical part was the development of a suitable measure for comparing
two covariance matrices. Grafarend listed a dozen measures. Assuming a completely isotropic
network, represented by a unit matrix as covariance matrix, the measures depended on the
eigenvalues of the covariance matrix.
∗
Quo vadis geodesia ...?, Festschrift for Erik W. Grafarend on the occasion of his 60th birthday, Eds..: F.
Krumm, V. S. Schwarze, TR Dept. of Geodesy and Geoinformatics, Stuttgart University, 1999
∗]
emphasized by the authors
113
1983, 11 years later, at the Aalborg workshop on ’Survey Control Networks’ Schmidt [13] used
these measures for finding optimal networks. The visualization of the error ellipses for a single
point, leading to the same deviation from an ideal covariance structure revealed deficiencies of
these measures, as e. g. the trace of the eigenvalues of the covariance matrix as quality measure
would allow a totally flat error ellipse to be as good as a circular ellipse, more even, as good as
the flat error ellipse rotated by 900 .
Based on an information theoretic point of view, where the information
a Gaussian variable
P of
2
2
2
increases with ln σ , the first author guessed the squared sum d = i ln λi of the logarithms
of the eigenvalues to be a better measure, as deviations in both directions would be punished
the same amount if measured in percent, i. e. relative to the given variances. He formulated the
conjecture that the distance measure d would be a metric. Only in case d would be a metric,
comparing two covariance matrices A and B with covariance matrix C would allow to state one
of the two to be better than the other. Extensive simulations by K. Ballein [2] substantiated this
conjecture as no case was found where the triangle inequality was violated.
1995, 12 years later, taking up the problem within image processing, the first author proved
the validity of the conjecture for 2 × 2-matrices [3]. For this case the measure already had been
proposed by V. V. Kavrajski [8] for evaluating the isotropy of map projections. However, the
proof could not be generalized to higher dimensions. Using classical results from linear algebra
and differential geometry the second author proved the distance d to be a metric for general
positive definite symmetric matrices. An extended proof can be found in [11].
This paper states the problem and presents the two proofs for 2 × 2-matrices and for the
general case. Giving two proofs for n = 2 may be justified by the two very different approaches
to the problem.
2
Motivation
Comparing covariance matrices is a basic task in mensuration design. The idea, going back
to Baarda 1973 [1] is to compare the variances of arbitrary functions f = eT x on one hand
determined with a given covariance matrix C and on the other hand determined with a reference
or criterion matrix H.
2(C)
One requirement would be the variance σf
of f when calculated with C to be always
2(H)
smaller than the variance σf
of f when calculated with H. This means:
eT Ce ≤ eT He for all e 6= 0
or the Raleigh ratio
eT Ce
≤ 1 for all e 6= 0.
eT He
The maximum λ from 1/2∂λ(e)/∂e = 0 ↔ λHe − Ce = (λH − C)e = 0 results in the maximum
eigenvalue λmax (CH−1 ) from the generalized eigenvalue problem
0 ≤ λ(e) =
|λH − C| = 0 ,
(1)
Observe that λeT He − eT Ce = eT (λH − C)e = 0 for e 6= 0 only is fulfilled if (1) holds. The
eigenvalues of (1) are non-negative if the two matrices are positive semidefinite.
This suggests the eigenvalues of CH−1 to capture the difference in form of C and H completely.
The requirement λmax ≤ 1 can be visualized by stating that the (error) ellipsoid xT C−1 x = c
lies completely within the (error) ellipsoid xT H−1 x = c.
114
The statistical interpretation of the ellipses results from the assumption, motivated by the
principle of maximum entropy, that the stochastical variables are normally distributed, thus
having density:
1 T −1
1
p(x) = p
e− 2 x Σ x
(2π)n det Σ
with covariance matrix Σ. Isolines of constant density are ellipses with semiaxes proportional to
the square roots of the eigenvalues of Σ. The ratio
2(C)
λmax = max
e
σf
2(H)
σf
thus gives the worst case for the ratio of the variances when calculated with the covariances C
and H respectively.
Instead of requiring the worst precision to be better than a specification one also could require
the covariance matrix C to be closest to H in some sense. Let us for a moment assume H = I.
Simple examples for measuring the difference in form of C compared to I are the trace
tr C =
n
X
λi (C)
(2)
n
Y
λi (C) .
(3)
i=1
or the determinant
det C =
i=1
These classical measures are invariant with respect to rotations (2) or affine transformations
(3) of the coordinate system. Visualizing covariance matrices of equal trace or determinant
can use the eigenvalues. Restricting to n = 2 in a 2D-coordinate system (λ1 , λ2 ) covariance
matrices of equal trace tr(C) = ctr are characterized by the straight line λ1 = ctr − λ2 or
σ12 = ctr − σ22 . Covariance matrices of equal determinant det(C) = cdet are determined by the
hyperbola λ1 = cdet /λ2 or σ12 = cdet /σ22 . Obviously in both cases covariance matrices with very
flat form of the corresponding error ellipse eT Ce =
pc are allowed. E. g., if one requires ctr = 2
then the pair (0.02, 1.98) with a ratio of semiaxes 1.98/0.02 = 7 is evaluated as being similar
to the unit circle. The determinant measure is even more unfavourable. When requiring cdet = 1
even a pair (0.02, 50.0) with ratio of semiaxes 50 is called similar to the unit circle.
However, it would be desirable that the similarity between two covariance matrices reflects the
deviation in variance in both directions according to the ratio of the variances. Thus deviations
in variance by a factor f should be evaluated equally as a deviation by a factor 1/f , of course
a factor f = 1 indicating no difference. Thus other measures capturing the anisotropy, such as
(1 − λ1 )2 + (1 − λ2 )2 , not being invariant to inversion, cannot be used.
The conditions can be fulfilled by using the sum of the squared logarithms of the eigenvalues.
Thus we propose the distance measure
v
uX
u n 2
ln λi (A, B)
d(A, B) = t
(4)
i=1
between symmetric positive definite matrices A and B, with the eigenvalues λi (A, B) from
|λA − B| = 0. The logarithm guarantees, that deviations are measured as factors, whereas the
squaring guarantees factors f and 1/f being evaluated equally. Summing squares is done in close
resemblance with the Euclidean metric.
This note wants to discuss the properties of d:
115
• d is invariant with respect to affine transformations of the coordinate system.
• d is invariant with respect to an inversion of the matrices.
• It is claimed that d is a metric. Thus
(i) positivity: d(A, B) ≥ 0, and d(A, B) = 0 only if A = B.
(ii) symmetry: d(A, B) = d(B, A),
(iii) triangle inequality: d(A, B) + d(A, C) ≥ d(B, C).
The proof for n = 2 is given in the next Section. The proof for general n is sketched in the
subsequent Sections 4 – 6.
3
3.1
Invariance Properties
Affine Transformations
Assume the n×n matrix X to be regular. Then the distance d(A, B) of the transformed matrices
A = XAXT
B = XBXT
is invariant w. r. t. X.
Proof: We immediately obtain:
λ(A, B) = λ(XAXT , XBXT )
= λ(XAXT (XBXT )−1 )
= λ(XAXT (XT )−1 B−1 X−1 ) = λ(XAB−1 X−1 )
= λ(AB−1 )
= λ(A, B) .
Comment: A and B can be interpreted as covariance matrices of y = Xx in case A and B
are the covariance matrices of x. Changing coordinate system does not change the evaluation of
covariance matrices. Obviously, this invariance only relies on the properties of the eigenvalues,
and actually was the basis for Baarda’s evaluation scheme using so-called S-transformations.
3.2
Inversion
The distance is invariant under inversion of the matrices.
Proof: We obtain
d2 (A−1 , B−1 ) = d2 (A−1 B)
=
n
X
ln λi (A−1 B)
i=1
=
n
X
−1
ln[λ−1
i (AB )]
i=1
=
n
X
ln λi (AB−1 )
i=1
= d2 (A, B) .
2
2
=
n
X
i=1
2
− ln λi (AB−1 )
2
= d2 (AB−1 )
Comment: A−1 and B−1 can be interpreted as weight matrices of x if one chooses σo2 = 1.
Here essential use is made of the property λ(A) = λ−1 (A−1 ). The proof shows, that also
individual inversions of eigenvalues do not change the value of distance measure, as required.
116
3.3
d is a Distance Measure
We show that d is a distance measure, thus the first two criteria for a metric hold in general.
ad 1 d ≥ 0 is trivial from the definition of d, keeping in mind, that the eigenvalues all are
positive. Proof of d = 0 ↔ A = B:
← : If A = B then d = 0.
Proof: From λ(AB−1 ) = λ(I) follows λi = 1 for all i, thus d = 0.
→ : If d = 0 then A = B.
Proof: From d = 0 follows λi (AB−1 ) = 1 for all i, thus AB−1 = I from which
A = B follows.
ad 2 As (AB−1 )−1 = BA−1 symmetry follows from the inversion invariance.
3.4
Triangle Inequality
For d providing a metric on the symmetric positive definite matrices the triangle inequality must
hold.
Assume three n × n matrices with the following structure:
• The first matrix is the unit matrix:
A = I.
• The second matrix is a diagonal matrix with entries ebi thus
B = Diag(ebi ) .
• The third matrix is a general matrix with eigenvalues eci and modal matrix R, thus
C = RDiag(eci )RT .
This setup can be chosen without loss of generality, as A and B can be orthogonalized simultaneously [4].
The triangle inequality can be written in the following form and reveals three terms
.
s = d(A, B) + d(A, C) − d(B, C) = dc + db − da ≥ 0 .
(5)
The idea of the proof is the following:
(i) We first use the fact that db and dc are independent on the rotation R.
s(R) = dc + db − da (R) .
(ii) In case R = I then the correctness of (5) results from the triangle inequality in Rn . This
even holds for any permutation P (i) of the indices i of the eigenvalues λi of BC−1 . There
exists a permutation Pmax for which da is maximum, thus s(R) is a minimum.
(iii) We then want to show, and this is the crucial part, that any rotation R 6= I leads to a
decrease of da (R), thus to an increase of s(R) keeping the triangle inequality to hold.
117
3.4.1
Distances dc and db
The distances dc and db are given by
d2c =
n
X
b2i
,
d2b =
i=1
n
X
c2i .
i=1
The special definition of the matrices B and C now shows to be useful. The last expression
results from the fact that the eigenvalues of CA−1 = C are independent on rotations R.
3.4.2
Triangle Inequality for No Rotation
In case of no rotations the eigenvalues of BC−1 are ebi /eci . Therefore the distance da yields
d2a
=
n
X
i=1
ebi
ln c
ei
2
=
n
X
i=1
(bi − ci )2 .
With the vectors b = (bi ) and c = (ci ) the triangle inequality in Rn yields
|c| + |b| − |b − c| ≥ 0
or
v
v
v
u n
u n
u n
uX
uX
uX
2
2
t
t
s=
ci +
bi − t (bi − ci )2 ≥ 0 .
i=1
i=1
i=1
holds.
For any permutation P (i) we also get
v
v
v
u n
u n
u n
uX
uX
uX
c2P (i) + t
b2i − t (bi − cP (i) )2 ≥ 0 .
s=t
i=1
i=1
(6)
i=1
which guarantees that there is a permutation Pmax (i) for which s in (6) is minimum.
3.4.3
d is Metric for 2 × 2-Matrices
We now want to show that the triangle inequality holds for 2 × 2 matrices. Thus we only need to
show that s(R(φ)) is monotonous with φ in [0..π/2], or equivalently that da (R) is monotonous.
We assume (observe the change of notation in the entries bi and ci of the matrices)
b1 0
,
b 1 > 0 , b2 > 0 .
B=
0 b2
(7)
With
x = sin φ
the rotation R(x) = R(φ) is represented as
√
1 − x2 √ x
,
R(x) =
−x
1 − x2
thus only values x ∈ [0, 1] need to be investigated.
118
(8)
With the diagonal matrix Diag(c1 , c2 ), containing the positive eigenvalues
c1 > 0,
c2 > 0 ,
(9)
this leads to the general matrix
√
c1√
x2 + c2 (1 − x2 )
−x 1 − x2 (c2 − c1 )
c1 0
T
C=R
.
R =
0 c2
−x 1 − x2 (c2 − c1 )
c1 (1 − x2 ) + c2 x2
The eigenvalues of CB−1 are (from Maple)
p
u(x) + v(x)
≥0
λ1 =
2b1 b2
,
p
u(x) − v(x)
λ2 =
≥0
2b1 b2
with the discriminant
v(x) = u(x)2 − w ≥ 0
and
u(x) = (b1 c1 + c2 b2 )(1 − x2 ) + (b1 c2 + b2 c1 )x2 ≥ 0 ,
w = 4b1 b2 c1 c2 ≥ 0 ,
(10)
(11)
the last inequality holding due to (7)(9). The distance
q
da (x) = ln2 λ1 (x) + ln2 λ2 (x) ,
which is dependent on x, has first derivative
!
p
p
u(x) − v(x)
u(x) + v(x)
ln
− ln
2b1 b2
2b1 b2
p
u(x) − v(x)
p
x(b2 − b1 )(c2 − c1 ) ln
u(x) + v(x)
=2
.
v(x) da (x)
x(b2 − b1 )(c2 − c1 )
∂da (x)
=2
∂x
v(x) da (x)
(12)
For fixed b1 , b2 , c1 and c2 this expression does not change sign in x ∈ [0, 1]. This is because the
discriminant v(x) = u2 (x) − w(x) (cf. (3.4.3)) is always positive, due to
v(0) = (b1 c1 − b2 c2 )2 ≥ 0
v(1) = (b2 c1 − b1 c2 )2 ≥ 0
∂v(x)
= −4x(b2 − b1 )(c2 − c1 ) u(x)
∂x
with u(x) ≥ 0 (cf. (10)) thus v(x) being monotonous. Furthermore, v(x) is always smaller than
u2 (cf. (11), (3.4.3)), thus the logarithmic expression always negative. As the triangle equation
is fulfilled at the extremes of the interval [0, 1] it is fulfilled for all x, thus for all rotations.
Comment: When substituting x = sin φ (cf. (8)) the first derivative (12) is of the form
∂da (φ)/∂φ = sin φ f (φ) with a symmetric function f (φ) = f (−φ). Thus the derivative is skew
symmetric w. r. t. (0, 0), indication da to be symmetric da (φ) = da (−φ), which is to be expected.
119
3.4.4
d is Claimed to be a Metric for n × n-Matrices
The proof of the metric properties of d for 2 × 2 matrices suggests that in the general case of
n × n matrices any rotation away from the worst permutation of the indices (cf. (6)) results in
an increase of the value s. The proof for the case n = 2 can be used to show, that, starting
with the worst permutation of the indices, any single rotation around one of the axes leads to
a monotonous change of s. Therefore, for proving the case of general n, there would have to
be shown that any combination of two rotations away from the worst permutation leads to a
monotonous change of s allowing to reach any permutation by a rotation while increasing s(R).
Completing this line of proof has not been achieved so far.
4
Restating the Problem
In der Kürze liegt die Würze.
Deutscher Volksmund
Let
M (n, R) := { A = (aij ) | 1 ≤ i, j ≤ n , aij ∈ R }
be the space of real n × n-matrices, and let
S + := Sym+ (n, R) := A ∈ M (n, R)
A = AT , A > 0
be the subspace of real, symmetric, positive definite matrices. Recall that any symmetric matrix A can be substituted into a function f : R −→ R which gives a symmetric matrix f (A)
commuting with all matrices commuting with A. In particular, a symmetric matrix A has an
exponential exp (A), and a symmetric positive definite matrix A has a logarithm ln (A), and
these assignments are
√ inverse to each other. A symmetric positive definite matrix A also has a
unique square root A which is of the same type. Define, for A, B ∈ Sym+ (n, R), their distance
d(A, B) ≥ 0 by
√
√
d2 (A, B) := tr ln2
A−1 B A−1 ,
(13)
where tr denotes the trace. In particular, this shows that
d(A, B) ≥ 0
,
d(A, B) = 0 ⇐⇒ A = B .
In more down-to-earth terms:
v
u n
uX 2
ln λi (A, B),
d(A, B) = t
(14)
i=1
where λ1 (A, B), . . . , λn (A, B) are the joint eigenvalues of A and B, i.e. the roots of the equation
det(λA − B) = 0.
This is the proposal of [3], i.e. of equation (4) above. (To see why these two definitions coincide,
note that
√
√
√
√
λA − B = A (λE − A−1 B A−1 ) A ,
120
so that the joint
λi (A, B) are just the eigenvalues of the real symmetric positive
√ eigenvalues
√
−1
−1
definite matrix A B A ; in particular, they are positive real numbers and so the definition
(14) makes sense.) The equation (14) shows that d is invariant under congruence transformations
with X ∈ GL(n, R), where GL(n, R) is the group of regular linear transformations of Rn :
d(A, B) = d(XAXT , XBXT )
(15)
(since det (λA − B) and det X(λA − B)XT have the same roots); this is not easily seen from
definition (13). It also shows that
∀ X ∈ GL(n, R) :
d(A, B) = d(B, A) , d(A, B) = d(A−1 , B−1 ).
5
The results
One then has
Theorem 1. The map d defines a distance on the space Sym+ (n, R), i.e there holds
(i) Positivity: d(A, B) ≥ 0, and d(A, B) = 0 ⇐⇒ A = B
(ii) Symmetry: d(A, B) = d(B, A)
(iii) Triangle inequality: d(A, C) ≤ d(A, B) + d(B, C)
for all A, B, C ∈ Sym+ (n, R). Moreover, d has the following invariances:
(iv) It is invariant under congruence transformations, i.e.
d(A, B) = d(XAXT , XBXT )
for all A, B ∈ Sym+ (n, R), X ∈ GL(n, R)
(v) It is invariant under inversion, i.e.
d(A, B) = d(A−1 , B−1 )
for all A, B ∈ Sym+ (n, R)
The same conclusions hold for the space SSym(n, R) of real symmetric positive definite matrices
of determinant one, when one replaces the general linear group GL(n, R) with the special linear
group SL(n, R), the n × n–matrices of determinant one, and the space of real symmetric matrices
Sym(n, R) with the space Sym0 (n, R) of real symmetric traceless matrices.
Remark 1. We use here the terminology “distance” in contrast to the standard terminology
“metric” in order to avoid confusion with the notion of “Riemannian metric”, which is going to
play a rôle soon.
The case n = 2 is already interesting; see Remark 2 below.
All the properties except property (iv), the triangle inequality, are more or less obvious from
the definition (see above), but the triangle inequality is not. In fact, the theorem will be the
consequence of a more general theorem as follows.
The most important geometric way distances arise is as associated distances to Riemannian
metrics on manifolds; the Riemannian metric, as an infinitesimal description of length is used
to define the length of paths by integration, and the distance between two points then arises as
the greatest lower bound on the length of paths joining the two points. More precisely, if M
121
is a differentiable manifold (in what follows, “differentiable” will always mean “infinitely many
times differentiable”, i.e. of class C ∞ ), a Riemannian metric is the assignment to any p ∈ M
of a Euclidean scalar product h− | −ip in the tangent space Tp M depending differentiably on p.
Technically, it is a differentiable positive definite section of the second symmetric power S 2 T ∗ M
of the cotangent bundle, or a positive definite symmetric 2-tensor. In classical terms, it is given
in local coordinates (U, x) as the “square of the line element” or “first fundamental form”
ds2 = gij (x)dxi dxj
(16)
(Einstein summation convention: repeated lower and upper indices are summed over). Here
the gij are differentiable functions (the metric coefficients) subjected to the transformation rule
gij (x) = gkl (y(x))
∂y k ∂y l
.
∂xi ∂xj
A differentiable manifold together with a Riemannian metric is called a Riemannian manifold.
Given a piecewise differentiable path c : [a, b] −→ M in M , its length L [c] is defined to be
L [c] :=
Z
b
a
kċ(t)kc(t) dt ,
p
where for X ∈ Tp M we have kXkp := hX | Xip , the Euclidean norm associated to the scalar
product in Tp M given by the Riemannian metric. In local coordinates
L [c] =
Z
b
a
q
gij (c(t)) ċi (t)ċj (t) dt .
Given p, q ∈ M , the distance d(p, q) associated to a given Riemannian metric then is defined to
be
d(p, q) := inf L [c] ,
(17)
c
the infimum running over all piecewise differentiable paths c joining p to q. This defines indeed
a distance:
Proposition. The distance defined by (17) on a connected Riemannian manifold is a metric in
the sense of metric spaces, i.e. defines a map d : M × M −→ R satisfying
(i) d(p, q) ≥ 0, d(p, q) = 0 ⇐⇒ p = q
(ii) d(p, q) = d(q, p)
(iii) d(p, r) ≤ d(p, q) + d(q, r) .
An indication of proof will be given in the next section.
So the central issue here is the fact that a Riemannian metric is the differential substrate
of a distance and, in turn, defines a distance by integration. This is the most important way
of constructing distances, which is the fundamental discovery of Gauß and Riemann. In our
case, this paradigm is realized in the following way.
The space Sym+ (n, R) is a differentiable manifold of dimension n(n + 1)/2, more specifically,
it is an open cone in the vector space
Sym(n, R) := A ∈ M (n, R) A = AT
122
of all real symmetric n × n-matrices. Thus the tangent space TA Sym+ (n, R) to Sym+ (n, R) at
a point A ∈ Sym+ (n, R) is just given as
TA Sym+ (n, R) = Sym(n, R).
The tangent space TA SSym+ (n, R) to SSym(n, R) at a point A ∈ SSym(n, R) is just given as
TA SSym(n, R) = Sym0 (n, R) := { X ∈ Sym(n, R) | tr (X) = 0 } ,
the space of traceless symmetric matrices.
Now note that there is a natural action of GL(n, R) on S + , namely, as already referred to
above, by congruence transformations: X ∈ GL(n, R) acts via A 7→ XAXT . If one regards A as
the matrix corresponding to a bilinear form with respect to a given basis, this action represents
a change of basis. This action is transitive, and the isotropy subgroup at E ∈ S + is just the
orthogonal group O(n):
X ∈ GL(n, R) XEXT = E = X ∈ GL(n, R) XXT = E = O(n),
and so S + can be identified with the homogeneous space GL(n, R)/O(n) upon which GL(n, R)
acts by left translations (its geometric significance being that its points parametrize the possible
scalar products in Rn ) †] .
In general, a homogeneous space is a differentiable manifold with a transitive action of a Lie
group G, whence it has a representation as a quotient M = G/H with H a closed subgroup of
G. The most natural Riemannian metrics in this case are then those for which the group G acts
by isometries, or, in other words, which are invariant under the action of G; e.g. the classical
geometries – the Euclidean, the elliptic, and the hyperbolic geometry – arise in this manner.
Looking out for these metrics, Theorem 1 will be a consequence of the following theorem :
Theorem 2. (i) The Riemannian metrics g on Sym+ (n, R) invariant under congruence
transformations with matrices X ∈ GL(n, R) are in one-to-one-correspondence with positive
definite quadratic forms Q on TE Sym+ (n, R) = Sym(n, R) invariant under conjugation
with orthogonal matrices, the correspondence being given by
√
√
√
√
gA (X, Y) = B( A−1 X A−1 , A−1 Y A−1 ) ,
where A ∈ Sym+ (n, R), X, Y ∈ Sym(n, R) = TA Sym+ (n, R), and B is the symmetric
positive bilinear form corresponding to Q
(ii) The corresponding distance dQ is invariant under congruence transformations and inversion, i.e. satisfies
dQ (A, B) = dQ (XAXT , XBXT )
and
dQ (A, B) = dQ (A−1 , B−1 )
for all A, B ∈ Sym+ (n, R), X ∈ GL(n, R), and is given by
†]
√
√
1
d2Q (A, B) = Q ln A−1 B A−1 .
4
(18)
A concrete map p : G −→ S + achieving this identification is given by p(A) := AAT ; it is surjective and
satisfies p(XA) = Xp(A)XT . The fact that p is an identification map is then equivalent to the polar decomposition: any regular matrix can be uniquely written as the product of a positive definite symmetric matrix and an
orthogonal matrix. This generalizes the representation z = reiϕ of a nonzero complex number z.
123
(iii) In particular, the distance in Theorem 1 is given by the invariant Riemannian metric
corresponding to the canonical non–degenerate bilinear form ‡]
∀ X, Y ∈ M (n, R) :
B(X, Y) := 4 tr (XY)
on M (n, R), restricted to Sym(n, R) = TE Sym(n, R)+ , i.e. to the quadratic form
∀ X ∈ Sym(n, R) : Q(X) := 4 tr X2
on Sym(n, R). As a Riemannian metric it is, in classical notation,
√
√
2
2
X−1 dX X−1
= 4 tr X−1 dX
ds2 = 4 tr
(19)
where X = (Xij ) is the matrix of the natural coordinates on Sym+ (n, R) and dX = (dXij )
is a matrix of differentials.
The same conclusions hold for the space SSym(n, R) of real symmetric positive definite matrices of determinant one, when one replaces the general linear group GL(n, R) with the special
linear group SL(n, R), the n × n–matrices of determinant one, and the space of real symmetric
matrices Sym(n, R) with the space Sym0 (n, R) of real symmetric traceless matrices.
Remark 2. Although the expression (19) appears to be explicit in the coordinates, it seems
to be of no use for analyzing the properties of the corresponding Riemannian metric, since the
operations of inverting, squaring, and taking the trace gives, in the general case, untractable
expressions. In particular, it apparently is of no help in deriving the expression (14) for the
associated distance by direct elementary means.
There is, however, one interesting case where it can be checked to give a very classical
expression; this is the case n = 2. In this case, one has
x y
2
+
x, y, z ∈ R , xz − y = 1 , x > 0
SSym (2, R) =
y z
a hyperboloid in 3-space. This is classically known as a candidate for a model of the hyperbolic
plane.In fact, in this case, one may show by explicit computation that the metric (19) restricts to
the classical hyperbolic metric, and that the corresponding distance just gives one of the classical
formulas for the hyperbolic distance. For details, see [11].
Of course, the next question is which invariant metrics there are. Also this question can be
answered:
Addendum. (i) The positive definite quadratic forms Q on Sym(n, R) invariant under conjugation with orthogonal matrices are of the form
2
α
Q(X) = α tr X2 + β tr (X) ,
α > 0, β > −
n
(ii) The positive definite quadratic forms Q on Sym0 (n, R) invariant under conjugation with
orthogonal matrices are unique up to a positive scalar and hence of the form
Q(X) = α tr X2 ,
α > 0.
In particular, the Riemannian metric (19) corresponds to the case α = 1, β = 0. Since from
the point of this classification all these metrics stand on an equal footing, it would be interesting
to know by which naturality requirements this choice can be singled out.
‡]
the famous Cartan-Killing-form of Lie group theory
124
6
The proofs
To put this result into proper perspective and to cut a long story short, let us very briefly
summarize why Theorem 2, and consequently Theorem 1, are true. First, however, we indicate
a proof of the Proposition above, since it is on this Proposition that our approach to the triangle
equality for the distance defined by (14) is based.
The fact that d(p, q) ≥ 0 and the symmetry of d are immediate from the definitions. There
remains to show d(p, q) = 0 =⇒ p = q and the triangle inequality.
For given p ∈ M , choose a coordinate neighbourhood U ∼
= Rn around p such that p corren
sponds to 0 ∈ R . We then have the expression (16) for the given metric in U . Moreover, we
have in U the standard Euclidean metric
ds2E
i
j
:= δij dx dx =
n
X
(dxi )2 .
i=1
Let k−k denote the norm belonging to the given Riemannian metric in U and |−| the norm
given by the standard Euclidean metric. For r > 0 let
B(p; r) := { x ∈ Rn | |x| ≤ r }
be the standard closed ball with radius r around p = 0, and
S(p; r) := { x ∈ Rn | |x| = r }
its boundary, the sphere of radius r around p ∈ Rn .
As a continuous function U × Rn −→ R the norm k−k takes its minimum m > 0 and its
maximum M > 0 on the compact set B(p; 1) × S(p; 1). It follows that we have
∀ q ∈ B(p; 1) , X ∈ Rn :
m|X| ≤ kXkq ≤ M |X|
by homogeneity of the norm, and so by integrating and taking the infimum
∀ q ∈ B(p; 1) :
mdE (p, q) ≤ d(p, q) ≤ M dE (p, q)
(20)
where dE (p, q) = |q − p| is the Euclidean distance. If q 6∈ B(p; 1), then any path c joining p to
q meets the boundary S(p; 1) in some point r, from which follows L [c] ≥ L [c′ ] ≥ d(p, r) ≥ m
– where c′ denotes the part of c joining p to r for the first time, say – whence d(p, q) ≥ m. In
other words, if d(p, q) < m we have q ∈ B(p; 1), where we can apply (20). If now d(p, q) = 0,
then surely d(p, q) < m, and then by (20) mdE (p, q) ≤ d(p, q) = 0, whence dE (p, q) = 0, which
implies p = q.
For the triangle inequality, let c be a path joining p to q and d a path joining q to r. Let c ∗ d
be the composite path joining p to r. Then L [c ∗ d] = L [c] + L [d]. Taking the infimum on the
left hand side over all paths joining p to r gives d(p, r) ≤ L [c] + L [d]. Taking on the right hand
side first the infimum over all paths joining p to q and subsequently over all the paths joining q
to r then gives d(p, r) ≤ d(p, q) + d(q, r), which is the triangle inequality.
Remark 3. In particular, (20) shows that the metric topology induced by the distance d on a
connected Riemannian manifold coincides with the given manifold topology.
Now to the proof of Theorem 2. Recall the terminology of [10], Chapter X: Let G be a Lie
group with Lie algebra g, H ⊆ G a closed Lie subgroup corresponding to the Lie subalgebra
h ⊆ g. Let M be the homogeneous space M = G/H. Then G operates as a symmetry group on
M by left translations. M has the distinguished point o = eH = H corresponding to the coset
125
of the unit element e ∈ G with tangent space To M = g/h. This homogeneous space is called
reductive if g splits as a direct vector space sum g = h ⊕ m for a linear subspace m ⊆ g such
that m is invariant under the adjoint action Ad : H −→ GL(g). Then canonically To M = m.
In our situation, G = GL(n, R), H = O(n). Then g = M (n, R), the full n × n-matrices, and
h = Asym(n, R), the antisymmetric matrices. As is well known,
M (n, R) = Asym(n, R) ⊕ Sym(n, R) ,
since any matrix X splits into the sum of its antisymmetric and symmetric part via
X=
X − XT
X + XT
+
.
2
2
The adjoint action of O ∈ O(n) on M (n, R) is given by X 7→ OXO−1 = OXOT and clearly
preserves Sym(n, R). So Sym+ (n, R) is a reductive homogeneous space.
We now have the following facts from the general theory:
a) On a reductive homogeneous space there is a distinguished connection invariant under the
action of G, called the natural torsion free connection in [10]. It is uniquely characterized by the
following properties ([10], Chapter X, Theorem 2.10)
– It is G–invariant
– Its geodesics through o ∈ M are the orbits of o under the one-parameter subgroups of G,
i.e. of the form t 7→ exp(tX) · o for some X ∈ g, where exp : g −→ G is the exponential
mapping of Lie group theory
– It is torsion free
In particular, with this connection M becomes an affine locally symmetric space, i.e. the geodesic
symmetries at a point of M given by inflection in the geodesics locally preserve the connection
(loc. cit Chapter XI, Theorem 1.1). If M is simply connected, M is even an affine symmetric
space, i.e. the geodesic symmetries extend to globally defined transformations of M preserving
the connection (loc. cit., Chapter XI, Theorem 1.2). By homogeneity, these are determined by
the geodesic symmetry s at o. In our case M = Sym+ (n, R), M is even contractible, hence
simply connected, and so with the natural torsion free connection an affine symmetric space.
We have o = E, the n × n unit matrix. For G = GL(n, R), P
the exponential mapping of Lie
k k
group theory is given by the “naive matrix exponential” eX = ∞
k=0 t X /k!. So the geodesics
are t 7→ exp (tX) E exp (tX)T = e2tX , where X ∈ Sym(n, R), and s is given by s(X) = X−1 .
b) The Riemannian metrics g on M invariant under the action of G are in one-to-one-correspondence with positive definite quadratic forms Q on m invariant under the adjoint action of H
(loc. cit, Chapter X, Corollary 3.2), the correspondence being given by
∀ X ∈ To M = m :
go (X, X) = Q(X) .
This is intuitively obvious, since we can translate o to any point of M by operating on it with
an element g ∈ G.
c) All G-invariant Riemannian metrics on M (there may be none) have the natural torsion
free connection as their Levi-Cività connection (loc. cit, Chapter XI, Theorem 3.3). In particular,
such a metric makes M into a Riemannian (locally) symmetric space, i.e. the geodesic symmetries
are isometries, and the exponential map of Riemannian Geometry at o, Expo : To M = m −→ M
is given by the exponential map of Lie group theory for G:
∀X ∈ m :
Expo (tX) = exp (tX) · o .
126
Collecting these results, we now can come to terms with formula (4). First we see that
Part (i) of Theorem 2 is a standard result in the theory of homogeneous spaces. Furthermore,
S + , being a Riemannian symmetric space with the metric (19), is complete (loc. cit, Chapter
XI, Theorem 6.4), the exponential mapping ExpE of Riemannian geometry is related to the
exponential mapping exp : S −→ S+ , S = TE S + from Lie theory and the matrix exponential eX
via ExpE (X) = exp(2X) = e2X and is a diffeomorphism §] .
Having reached this point, here is the showdown. Since, by general theory, the Riemannian
exponential mapping is a radial isometry, we get for the square of the distance dQ :
√
√
d2Q (A, B) = d2Q (E, A−1 B A−1 )
since dQ is invariant under congruences by (15) ,
√
√
1
= Q( exp−1 ( A−1 B A−1 ))
2
since ExpE is a radial isometry ,
√
√
1
= Q(ln( A−1 B A−1 )) ,
4
and this is just equation (18). In particular, from this one directly reads off that the distance is
invariant under inversion, as claimed. Of course, the invariances in question are for the particular
case corresponding to (14) read off easily from the classical form (19) of the Riemannian metric.
On the other hand, we see that the invariance under inversion comes from the structural facts
that S + is a symmetric space, and that the geodesic symmetry at E, which on general grounds
must be an isometry, is just given by matrix inversion (see a) above).
One should add that these arguments are general and pertain to the situation of a symmetric
space of the non–compact type; for this, see [11].
The representation of the orthogonal group O(n) on the symmetric matrices by conjugation
is not irreducible, but decomposes as
Sym(n, R) = Sym0 (n, R) ⊕ d(n, R),
(21)
where d(n, R) are the scalar diagonal matrices. It is easy to see that both summands are invariant
under conjugation with orthogonal matrices, and it can be shown that both parts are irreducible
representations of O(n). From this it is standard to derive the Addendum. In the geometric
framework of symmetric spaces, this describes the decomposition of the holonomy representation
and correspondingly the canonical de Rham–decomposition
Sym+ (n, R) ∼
= SSym(n, R) × R+
of the symmetric space Sym(n, R) into irreducible factors. This is a direct product of Riemannian
manifolds, i.e. the metric on the product is just the product of the metrics on the individual
factors, that is given by the Pythagorean description. Thus it suffices to classify the invariant
metrics on the individual factors, which accounts for the Addendum.
Thus, it transpires that the theorems above follow from the basics of Lie group theory and
Differential Geometry and so should be clear to the experts. The main results upon which it is
based appeared originally in the literature in [12]. All in all, it follows in a quite straightforward
§]
The fact that the naive matrix exponential is a diffeomorphism, whence S + is complete, can be seen by
elementary means in the case under consideration. The main point is that it coincides with the exponential
mapping coming from Riemannian Geometry (up to scaling with a factor of 2).
127
manner from the albeit rather elaborated machinery of modern Differential Geometry and the
theory of symmetric spaces. In conclusion, it might therefore be still interesting to give a more
elementary derivation of the result, as was done above in the case n = 2.
As a general reference for Differential Geometry and the theory of symmetric spaces I recommend [9], [10] (which, however, make quite a terse reading). A detailed exposition [11] covering
all the necessary prerequisites is under construction; the purpose of this paper is to introduce the
non–experts to all the basic notions of Differential Geometry and to expand the brief arguments
just sketched.
References
[1] Baarda, W. (1973): S-Transformations and Criterion Matrices, Band 5 der Reihe 1. Netherlands Geodetic Commission, 1973.
[2] Ballein, K., 1985, Untersuchung der Dreiecksungleichung beim Vergleich von Kovarianzmatrizen, persönliche Mitteilung, 1985
[3] Förstner, W., A Metric for Comparing Symmetric Positive Definite Matrices, Note, August
1995
[4] Fukunaga, K., Introduction to Statistical Pattern Recognition, Academic Press, 1972
[5] Grafarend, E. W. (1972): Genauigkeitsmasse geodätischer Netze. DGK A 73, Bayerische
Akademie der Wissenschaften, München, 1972.
[6] Grafarend, E. W. (1974): Optimization of geodetic networks. Bolletino di geodesia e
Scienze Affini, 33:351–406, 1974.
[7] Grafarend, E. W. and Niermann, A. (1994): Beste echte Zylinderabbildungen, Kartographische Nachrichten, 3:103–107, 1984.
[8] Kavrajski, V. V., Ausgewählte Werke, Bd. I: Allgemeine Theorie der kartographischen
Abbildungen, Bd. 2. Kegel- und Zylinderabbildungen (russ.), GVSMP, Moskau, 1958, zitiert
in [7]
[9] Kobayashi, S. & Nomizu, K., Foundations of Differential Geometry, Vol. I, Interscience
Publishers 1963
[10] Kobayashi, S. & Nomizu, K., Foundations of Differential Geometry, Vol. II, Interscience
Publishers 1969
[11] Moonen, B, Notes on Differential Geometry,
ftp://ftp.uni-bonn.cd/pub/staff/moonen/symmc.ps.gz
[12] Nomizu, K., Invariant affine connections on homogeneous spaces, Amer. J. Math, 76
(1954), 33-65
[13] Schmitt, G. (1983): Optimization of Control Networks – State of the Art. In: Borre, K.;
Welsch, W. M. (Eds.), Proc. Survey Control Networks, pages 373–380, Aalborg University
Centre, 1983.
128