Approved for Public Release; Distribution Unlimited
Case # 11-2903
Efficient Covariance Encoding
James R. Van Zandt
July 28, 2011
Abstract
We describe, optimize, and compare covariance encoding schemes. Several current systems encode
three dimensional covariances in terms of their eigenvalues and Euler angles. We generalize this method
to n dimensions. We precondition covariances to ensure that reconstructed matrices will be positive
definite. We propose a new scalar measure of merit for covariance encodings, the Bhattacharyya distance.
We compare the schemes in terms of the encoding error and the encoding length in bits. We recommend
using enough bits to make the encoding error small compared to the error described by the covariance
itself. The most efficient scheme uses logarithmic encoding of the variances and linear encoding of the
Cholesky factor of the correlation matrix.
1 Introduction
In a networked tracking system for airborne or ballistic targets, it is usual to exchange not only track states
(position, velocity, and possibly acceleration), but also their covariances, to facilitate cross-sensor cueing
(determining what region to search for a target), track association (determining whether tracks are close
enough to correspond to the same object), and track fusion (optimally combining two tracks). For covariance
encoding, the most important requirement is that the recipient must always construct a positive definite
matrix, which represents physically plausible track state uncertainties. Secondarily, the encoding should
efficiently compromise between communications bandwidth used and error introduced in the covariance.
To characterize the error introduced by encoding, we propose using the Bhattacharyya distance [2, 5].
This scalar measure is easily calculated, and accounts for all the possible perturbations. It also permits the
encoding error to be directly compared with the errors (from measurements, navigation, etc.) described by
the covariance itself. We use this to establish a criterion for choosing field sizes: that the error introduced
by the encoding be small compared to the error already present in the track state.
We discuss encoding methods that can be applied to more than three dimensions. Several current systems
encode three dimensional covariances in terms of their eigenvalues and Euler angles. We generalize this
method to n dimensions. If communications bandwidths are severely constrained, it may be appropriate
to encode only partial covariance information (e.g. only horizontal components, or only variances). Such
schemes are not considered here.
The remainder of this paper is organized as follows: Section 2 discusses measures of merit for covariance encoding. The testing method is described in Section 3. Section 4 describes several alternative
encoding schemes and shows their individual performance. The schemes are compared in Section 5, and
recommendations appear in Section 6.
2 Measures of Merit
Several measures of merit have been used in the past to evaluate covariance encodings [10, 11]. Some
measures are most applicable to track association, and others to sensor cueing. I propose a new measure
1
which accounts for all the possible effects of encoding on the covariance.
2.1
Association-Related Measures of Merit
Consider two tracks x1 and x2 with corresponding track covariances P1 and P2 . The likelihood these reports
should be associated (that is, that they plausibly represent the same object) may be estimated using the
Mahalanobis distance Dm
D2m = (x1 − x2 )T (P1 + P2 )−1 (x1 − x2 ),
(1)
where T denotes the transpose, and D2m has a Chi-square distribution.
D2m is a function of the sum of the two track covariances. Assume one track is local and known very
accurately, so the distance is dominated by the covariance of the remote track (the one with encoding error).
Let P be the true covariance of the distant track, and P2 be the covariance after encoding and decoding. One
reasonable figure of merit is the ratio of the two Mahalanobis distances
x T P2 x
,
xT Px
(2)
averaged over test displacements x on the unit n-sphere [10]. The optimum value of this metric is one. This
measure is more sensitive to changes in a small eigenvalue than a large one (e.g. a 10 percent change in the
smallest eigenvalue makes a much larger difference than a 10 percent change in the largest eigenvalue).
The following MATLAB code estimates this metric using a Monte Carlo method.1 :
function merit = mah(p,p2)
% this measure should be near one, can be either greater or
% smaller than one, and small values are especially bad
% 18 quasi-Monte Carlo trials
mah=zeros(3,6);
for k=1:3
[r,q]=qr(randn(6,6));
if det(r)<0, r(:,1)=-r(:,1); end % r is now a random 6D rotation
for j=1:6
x=r(:,j);
% x is uniformly distributed in the unit 6-sphere
mah(k,j)=x’*(p2\x)/(x’*(p\x));
end
end
merit=mean(mean(mah));
The first parameter of mah is the original covariance of the remote track, and the second parameter is the
covariance after encoding and decoding.
The inverse of the error covariance is known as the Fisher information matrix [1]. It represents the
amount of information in a measurement, in the following sense. Suppose we have an estimate of the
current state of a system, and a linear measurement y with covariance P. We could use a Kalman filter to
update the state with the measurement. As a thought experiment, imagine instead of the one measurement,
we had two “sub-measurements”, the first with value y and covariance P1 , and the second with the same
value y but with covariance P2 . Using the same Kalman filter, we could update the state with each sub−1
measurement in turn. It turns out that the effect on the state would be the same, if P−1 = P−1
1 + P2 , i.e. the
Fisher information matrix for the original measurement is the sum of the Fisher information matrices of the
two sub-measurements.
1
A random orthogonal matrix can be generated using the QR algorithm [8]
2
Recall that the trace of a matrix (the sum of the diagonal elements) is invariant with rotations. This
suggests that the trace of the inverse of the covariance could be a useful scalar measure of the information
[11]. If the state elements have different units, it is necessary to scale the covariance appropriately, e.g. with
a matrix
1
1
1
,
(3)
S =
τ
τ
τ
where τ is some suitable time constant (e.g. the expected latency of the track report, or half the update
interval). Suppose a covariance P is encoded then decoded, resulting in a matrix P2 . If an encoding is
guaranteed to be conservative, a suitable measure of the information lost is
n
o
tr (S P2 S )−1
1−
.
(4)
tr (S PS )−1
If the encoding is not conservative, we may first scale the recovered covariance to make it conservative and
then measure the information loss. Let U be the Cholesky factor of P−1 , so that P−1 = U T U. Then the
scaling constant k is the smallest eigenvalue of UP2 U T , and the measure of information lost becomes
n
o
tr (S P2 S )−1
1−k
.
(5)
tr (S PS )−1
For example:
% fraction of the information lost, so 0 is perfect
S=diag([1 1 1 6 6 6]);
u=chol(inv(p));
k=min(eig(u*p2*u’));
merit=1-k*trace(inv(S*p2*S’))/trace(inv(S*p*S’));
The information metric is largely determined by the best-known components of the track state (i.e., the small
eigenvalues of P).
2.2
Cueing-Related Measures of Merit
Another family of measures is related to sensor cueing. Suppose a track is sent for the purpose of cueing a
second sensor, which will perform a search in two dimensions to acquire the target. If the cue is too large,
the sensor will waste effort on regions unlikely to contain the target (“waste”–see Figure 1). If the cue is too
small, the sensor may fail to look in likely regions (“leakage”). If the measurement is characterized by the
2 × n measurement matrix M, the most efficient search region is the ellipse defined by
T
2
xP−1
pro j x < χ ,
(6)
where P pro j is the two dimensional projection of the covariance
P pro j = MPM T ,
(7)
and χ is chosen according to the chi-square distribution so the target is included with the desired probability.
For example, to include the target 95 percent of the time, we would choose χ = 2.4477, since P(χ2 |ν) =
3
cued ellipse
searched ellipse
waste area
leakage area
Figure 1: “Cookie Cutter” Model of Cueing.
1 − Q(χ2 |ν) = .95 for ν = 2. Let the cue ellipse be derived from the original track covariance (for 95 percent
inclusion probability), and the search ellipse be derived from the covariance reconstructed from the track
report. The leakage area ratio is the area in the cue ellipse but outside the search ellipse, divided by the area
of the cue ellipse [10]. E.g.:
function [leak,theta]=leakage(p,p2)
% leakage - calculate leakage area ratio (area in ellipse p but not in p2,
% divided by area in p)
u=chol(inv(p));
% whitening transformation
eval=eig(u*p2*u’);
b=sqrt(min(eval));
a=sqrt(max(eval));
if 1<b
leak=0;
else
if a<1
leak=1-a*b;
else
theta=atan2(sqrt(aˆ2-1)*b,a*sqrt(1-bˆ2));
leak=-2*(aˆ2*b*asin(cos(theta)/a)...
-a*asin(cos(theta))...
+b*cos(theta)*sqrt(aˆ2-cos(theta)ˆ2)...
-a*cos(theta)*sin(theta))/a/pi;
end
end
The waste area ratio is the area in the search ellipse but outside the cue ellipse, divided by the area of the
search ellipse [10]. Each of these measures is non-negative, with optimal value of zero.
The leakage ratio and waste area ratio take a “cookie cutter” view, such that searching anywhere outside
the cue ellipse is worthless, and failing to search anywhere inside it is inexcusable. However, the actual
value of a given search region depends on its statistical distance from the center of the cue. Suppose the
cue ellipse contains the target with probability fc and the search ellipse contains it with probability f s . The
leakage fraction [10] is then
fc − f s .
(8)
It can be negative (indicating the sensor will acquire the target with more than the expected probability) as
well as positive.
4
The following function calculates the fraction q s = 1 − f s of the distribution outside the search ellipse:
function p=exclude(p1,p2,f,tol)
% exclude - return integral of a bivariate normal distribution with
% covariance p1, outside the ellipse defined by p2, scaled by factor k.
% That is, the integral over x=[x1 x2]’ satisfying x’*p2ˆ(-1)*x > 1
% of 1/(2*pi)*exp(-(1/2)*x’*p1ˆ(-1)*x), where f = -kˆ2/2
%
% Inputs:
%
p1 = 2*2 covariance of the normal distribution function
%
p2 = 2*2 covariance defining the elliptical boundary of the
%
integration region
%
f = optional probability specifying the scaling factor for the
%
integration boundary. The boundary ellipse is scaled by a factor
%
k according to the chi squared distribution for two degrees of
%
freedom, such that a fraction f would fall outside the radius
%
k. (default=.05)
%
tol = optional absolute integration error tolerance (default 1.e-6)
global axis_a axis_b
if nargin<3; f=.05; end
if nargin<4; tol=1.e-6; end
scale = sqrt(-2*log(f));
u = chol(inv(p1));
eval = eig(u*p2*u’);
axis_a = scale*sqrt(eval(1));
axis_b = scale*sqrt(eval(2));
p = 4*quad(’integrand’,0,pi/2,tol);
% whiten the error
function r=integrand(theta)
% integrand - calculate integrand for function exclude
global axis_a axis_b
R = 1./sqrt((cos(theta)/axis_a).ˆ2 + (sin(theta)/axis_b).ˆ2);
r = exp(-R.ˆ2/2)/2/pi;
The area of the search ellipse is
p
A s = π det(P s )(−2 log(.05)),
(9)
where det(A) is the determinant of A. An optimal search area including a fraction f s of the distribution
would have area
p
A0 = π det(P s )(−2 log(q s )
(10)
The waste fraction is the fraction by which the search area could be reduced, and still contain the target with
the same probability as the cue ellipse [10]:
A s − A0
(11)
As
All of these waste and leakage metrics should be averaged over projection directions. If the track uncertainty is much smaller in some directions than others (as is common for single-radar tracks), then these
metrics are sensitive mainly to the large dimensions of the uncertainty (i.e., the large eigenvalues of P). This
5
is because the cued sensor rarely has the favorable geometry to take full advantage of the well-measured
component of the target position. They are also sensitive to rotations of the uncertainty ellipsoid.
2.3
Bhattacharyya Distance
Each of the above metrics accounts for only a subset of possible perturbations in the covariance (e.g., rotation
of the ellipsoid, or changes in small eigenvalues, or changes in large eigenvalues). I propose a new metric
that accounts for all these perturbations: the Bhattacharyya distance2 [2, 5].
The Bhattacharyya distance is a scalar separability measure between two probability distributions. If
two multivariate normal distributions have mean mi and covariance Pi , i = 1, 2, then the Bhattacharyya
distance between them is
!
1 T −1
1
det(P)
b = δ P δ + log √
,
(12)
8
2
det(P1 ) det(P2 )
where δ = m1 − m2 and P = (P1 + P2 )/2.
This distance summarizes all the possible ways an encoding could perturb a track report. The first term3
is positive if the means of the two distributions differ. The second term is positive if their covariances
differ, i.e., the uncertainty ellipsoids have different axes and/or orientations. It is invariant with rotations and
translations of the coordinate system. It is equally sensitive to changes in the large or small eigenvalues.
The encoding error will be greater for some messages than others, due to the condition number of the
covariance and (for some encoding methods) how well the axes of the uncertainty ellipsoid align with the
coordinate axes. To account for this variability among messages, we suggest using as our metric the 90th
percentile of the Bhattacharyya distance over a representative sample of covariances.
To establish a reasonable criterion for this metric, we note that the Bhattacharyya distance depends on the
mean of the track report, so it will be affected by measurement errors.4 Let m1 and m2 be two n-dimensional
estimated states, with the same covariance P, and δ = m1 − m2 . For example, they might be estimates from
different Monte Carlo trials of a tracking simulation. Without loss of generality, we assume P is diagonal.
The expected Bhattacharyya distance between the reports, due to measurement error alone, is then
b =
=
1
[δ1
8
δ2
···
1/σ21
1/σ22
δn ]
δ22
δ2n 2n n
1 δ21
= .
+
+ · · · + 2 =
8 σ21 σ22
8
4
σn
..
.
1/σ2n
δ1
δ2
..
.
δn
+ 0
(13)
In the third expression, the sum of the normalized errors is 2n rather than n, because both measurements
contribute errors to δ.
We suggest that sensor systems are typically much more expensive than the communication network
used to report their measurements, so in trading off the investment in the two systems, the emphasis should
be on preserving as much as possible of the measurement information. That is, we want the encoding error
to be small compared to the measurement error.5 We also recognize that our metric will not accurately
2
Hint: “ch” is pronounced as in “charcoal”, not as in “chaos”.
which is proportional to the Mahalanobis distance
4
The mean is also affected by round-off errors during encoding. We assume the track state encoding has a high enough resolution
that round-off errors are much smaller than measurement errors. Round-off errors are not considered in this analysis.
5
In that case, we feel it is not necessary to ensure the encoding is conservative (that the decoded covariance completely encloses
the original covariance [9]).
3
6
bhatrotscale
original covariance
rotated covariance, b=.05 from original
intermediate rotation and scales, b=.05 from original
scaled covariance, b=.05 from original
Figure 2: Bhattacharyya Distance = 0.05 due to Rotation and Scaling.
reflect the impact of the encoding error on each actual application. We suggest a factor of ten margin, giving
us this criterion: The 90th percentile Bhattacharyya distance should be less than n/40.
In 2D, the suggested criterion would be b = 2/40 = 0.05. As an illustration, Figure 2 shows the “one
sigma” uncertainty ellipses for an original covariance, a rotated version, a scaled version, and intermediate
rotated and scaled versions. All the perturbed covariances are at a Bhattacharyya distance of 0.05 from
the original. Similarly, ellipses perturbed by scaling and displacement are shown in Figure 3, and ellipses
perturbed by displacement and changed aspect ratio are shown in Figure 4.
3 Testing
For best results, an encoding scheme should be matched to the expected covariances.
The difficulty of encoding a correlation matrix increases according to its condition number (the ratio of
the largest to smallest eigenvalue). Single-sensor track covariances tend to have high condition numbers,
because a typical radar can measure range much more accurately than angle. Composite track covariances
tend to have lower condition numbers, because they are updated by several sensors.6 Figures 5 and 6 show
distributions of condition numbers and of eigenvalues of position and velocity covariance sub-matrices for
two sets of covariances.
Each proposed encoding was subjected to Monte Carlo testing with 1000 test covariances. For example,
in the 6 × 6 case, each test covariance P was generated as
P = R6×6 S P0 S T RT6×6 ,
6
(14)
Similarly, in Global Positioning Satellite navigation, the Geometric Dilution of Precision (GDOP) is reduced when more
satellites are visible.
7
bhatgrow
original covariance
scaled covariance, b=.05 from original
intermediate scales and displacements, b=.05 from original
displaced covariance, b=.05 from original
Figure 3: Bhattacharyya Distance= 0.05 due to Scaling and Displacement.
bhataspect
original covariance
perturbed aspect ratio, b=.05 from original
intermediate aspect ratios and displacements, b=.05 from original
displaced covariance, b=.05 from original
Figure 4: Bhattacharyya Distance= 0.05 due to Displacement and Aspect Ratio.
8
cdf
1
0.9
0.8
fraction of reports
0.7
0.6
single-sensor tracks
0.5
0.4
0.3
0.2
0.1
velocity root eigenvalue
position root eigenvalue
6x6 condition number
0
1
10
100
1000
values
10000
100000
1e+06
Figure 5: Covariance Matrix Characteristics for Single-Sensor Tracks.
cdf
1
0.9
0.8
fraction of reports
0.7
0.6
0.5
composite tracks
0.4
0.3
0.2
0.1
velocity root eigenvalue
position root eigenvalue
6x6 condition number
0
1
10
100
1000
values
10000
100000
Figure 6: Covariance Matrix Characteristics for Composite Tracks.
9
1e+06
Table 1: Axis Limits for Test Covariances
Axis
i
1
2
3
4
5
6
7
8
9
where
Lower Limit
li
22
5
1
4
4
4
12
6
3
Upper Limit
ui
500
106
22
1000
64
16
50
25
12
2
σ1
σ22
σ23
P0 =
σ24
σ25
σ26
Units
m
m
m
m/s
m/s
m/s
m/s2
m/s2
m/s2
,
(15)
the ith semi-axis σi is log-uniformly distributed between li and ui as shown Table 1,
S =
1
0
0
ρ
0
0
0
1
0
0
ρ
0
0
0
1
0
0
ρ
ρ
0
0
1
0
0
0
ρ
0
0
1
0
0
0
ρ
0
0
1
,
ρ is uniformly distributed in [.5, .9], R6×6 is a block diagonal rotation matrix
#
"
R3×3
0
,
R6×6 =
0
R3×3
(16)
(17)
and R3×3 is a uniformly distributed random 3D rotation. Distributions of eigenvalues and condition numbers
for the test covariances are shown in Figure 7.
Each of these test covariances was encoded, decoded, and the Bhattacharyya distance between the original and decoded covariances calculated. The random number generator seed was reset before each run, so
the same covariances were used to test each encoding method.
4 Encoding Methods
Several covariance encoding methods have been suggested:
• Encode the covariance elements directly.
• Encode the variances and the correlation matrix.
10
condit
1
0.9
0.8
fraction of reports
0.7
0.6
0.5
0.4
0.3
test covariances
0.2
0.1
velocity root eigenvalue
position root eigenvalue
6x6 condition number
0
1
10
100
1000
values
10000
100000
1e+06
Figure 7: Characteristics of Test Covariances.
• For 3 × 3 covariance, encode the eigenvalues and the Euler angles needed to reconstruct the eigenvectors (i.e. the lengths and directions of the principle axes of the uncertainty ellipsoid).
The first two classes of encodings generalize immediately to more dimensions. We found two ways to
extend the third class to more dimensions.
4.1
Covariance Element Encoding
The simplest scheme directly encodes each element of the covariance matrix. Logarithmic encoding is used,
to cover a large dynamic range. The resolution depends on the field size and the dynamic range of the values
being encoded. This is shown in Figure 8. We assume a dynamic range of 2000:1. For variances (which are
always positive), it then takes 8 bits to give 3 percent resolution. I.e. the ratio between encoded values is
Pmax
Pmin
!1/28
= 20001/256 = 1.030
If bm bits are used for each diagonal element, a value V is encoded by an integer
$
%
log(V/Pmin )
1
i= c
+ ,
log(Pmax /Pmin ) 2
(18)
(19)
where c = 2bm − 2. The decoded value is
Vdecoded = Pmin
Pmax
Pmin
!i/c
.
(20)
Off-diagonal elements of the covariance can be of either sign, so we need an extra bit to encode those
elements with the same resolution. If we use bc bits for each off-diagonal element, then the length of the
11
res
1
bm = 5
6
0.1
resolution
7
8
9
0.01
10
11
12
0.001
100
1000
dynamic range
10000
Figure 8: Resolution of Logarithmic Encoding.
encoding (i.e. the number of bits to encode the upper triangular portion of the symmetric covariance matrix)
is
n(n − 1)
nbm +
bc .
(21)
2
Here, we assume bc = bm + 1.
With this scheme, a large fraction of reconstructed matrices would not be positive definite, making them
invalid covariance matrices (i.e. with imaginary uncertainties). Figure 9 shows distributions of ratios of
eigenvalues (i.e. the largest eigenvalue of the reconstructed matrix divided by the largest eigenvalue of the
original matrix, etc.) with bc = 9 and bm = 8. For our test matrices, 58 percent of the reconstructed matrices
have at least one negative eigenvalue, and 15 percent have two!
To ensure the reconstruction of positive definite matrices, we can precompensate by multiplying each
off-diagonal element of the covariance by 1 − ǫ. This kind of precompensation has the effect of inflating the
uncertainty ellipsoid, more significantly for the smaller axes than the larger axes. We find empirically that
ǫ = 25−bc is sufficient.
The encoding efficiency of this scheme, with and without precompensation, is shown in Figure 10, in
terms of the 90 percentile Bhattacharyya distance. Precompensation makes the encoding very conservative—
the eigenvalues are never negative, but mostly they are too large.
The basic problem with this encoding scheme is that accommodating a large dynamic range in covariance element values forces us to use a coarse resolution, which makes it difficult to keep the reconstructed
covariances positive definite.
4.2
Root Variance/Correlation Matrix Encoding
We may make the encoding more efficient by first expressing the covariance in terms of root variances (the
square roots of diagonal elements of the covariance matrix) and a correlation matrix, as suggested by Jerardi
[4]. The n root variances have a large dynamic range, and can be encoded logarithmically as above using
12
ratio0
1
smallest eigenvalue
2nd smallest
largest eigenvalue
fraction of cases
0.8
0.6
58% have a negative eigenvalue
0.4
0.2
mbits=8
cbits=9
0
0
0.5
1
decoded eigenvalue/original eigenvalue
1.5
2
Figure 9: Distribution of Eigenvalue Ratios for Covariance Element Encoding.
score1xx
90th percentile Bhattacharyya distance
10
bm=bc-1=5
6
7
8
1
bm=bc-1=5
9
10
6
11
7
8
9
10
suggested requirement
11
0.1
with precompensation
without precompensation
0.01
50
100
150
total bits
200
Figure 10: Efficiency of Encoding Covariance Matrix Elements.
13
250
score7xx
1
bm=5
90th percentile Bhattacharyya distance
bm=12
Pareto bound
bm=bc-3
suggested requirement
0.1
bc=8
bc=9
bc=10
bc=11
bc=12
0.01
50
6x6 covariances
100
150
total bits
200
250
Figure 11: Efficiency of Root Variance/Correlation Matrix Encoding.
a relatively coarse resolution. The positive definiteness of the covariance depends only on the correlation
matrix elements. These have a finite range (−1, 1), and can be encoded linearly with a fine resolution. If bc
bits are used for each correlation element, a value C would be encoded by an integer
i = min {⌊cC⌋, c − 1} ,
(22)
where c = 2bc −1 , and the decoded value would be
Cdecoded =
i + 1/2
.
c
(23)
The length of the encoding is again given by (21). We find it is adequate to precompensate using ǫ = 21−bc ,
which is much less than in Section 4.2. The efficiency is shown in Figure 11. Since there are two parameters,
there is actually a family of curves. For the most efficient encodings (the efficient frontier, or Pareto bound),
we have approximately bm = bc − 3.
4.3
Root Variance/Correlation Factor Encoding
As mentioned above, the difficulty of encoding a matrix increases with its condition number. Using the
Cholesky decomposition, we may find an upper triangular matrix U such that C = U T U. The condition
number of U is the square root of the condition number of C, so U is much easier to encode than C [6].
U has n(n + 1)/2 nonzero elements. We encode all but the first row of U linearly using (22). The first
row of U is the same as the first row of C, so the first element is always 1 and need not be encoded. We use
n − 1 bits to encode only the signs of the remaining elements. Each column obeys the constraint
X
Ui2j = 1,
(24)
i
14
score4xxrel
1
90th percentile Bhattacharyya distance
bm=5
bm=12
suggested requirement
0.1
recommended
Pareto bound
bm=0.6*bc+4.4
0.01
bc=5
bc=6
bc=7
bc=8
0.001
50
6x6 covariances
100
150
200
total bits
Figure 12: Efficiency of Root Variance/Correlation Factor Encoding.
and this can be used to find the magnitudes of those elements. The length of the encoding is again given
by (21). For precompensation, we can useǫ = 2−2bc , which is much less than in Section 4.2 or 4.1. The
encoding efficiency is shown in Figure 12. For the most efficient encodings, we have bm = 0.6bc + 4.4.
Applying this scheme to the 3 × 3 and 9 × 9 test covariances results in the performance shown in Figures
13 and 14. (The knees in the 3 × 3 figure are near bm = bc + 1, but the bm = 0.6bc + 4.4 points are still near
the Pareto bound, and we expect 6 × 6 covariances will be sent more often.) In each case, the encoding with
bc = 6 and bm = 8 meets the suggested requirement.
4.4
Eigenvalue/Rodrigues Parameter Encoding
In 3D, a covariance may be encoded using its three eigenvalues together with three Euler angles, as in a recent patent [9]. This can be generalized to n dimensions as follows. Using the singular value decomposition
algorithm, we may factor a covariance P as
P = US U T ,
(25)
where S is diagonal (the diagonal elements are the eigenvalues of P) and U is orthogonal. In general, U has
n2 independent elements. However, we may use the Cayley transform [3] to parameterize it as a function of
a skew-symmetric matrix Q
U = (I − Q)(I + Q)−1 = (I + Q)−1 (I − Q),
(26)
Q = (I − U)(I + U)−1 = (I + U)−1 (I − U).
(27)
where
(This fails if U has an eigenvalue at -1 — for example, in two dimensions, a 180 degree rotation.)
15
score4xx3
1
90th percentile Bhattacharyya distance
bm=5
0.1
bm=12
suggested requirement
0.01
bc=5
bc=6
bc=7
bc=8
0.001
Pareto bound
bm=0.6*bc+4.4
3x3 covariances
0
50
total bits
Figure 13: Efficiency of Root Variance/Correlation Factor Encoding for 3 × 3 Covariances.
score4xx9
1
90th percentile Bhattacharyya distance
bm=5
bm=12
suggested
requirement
0.1
Pareto bound
bm=0.6*bc+4.4
0.01
200
bc=5
bc=6
bc=7
bc=8
9x9 covariances
250
300
total bits
350
400
Figure 14: Efficiency of Root Variance/Correlation Factor Encoding for 9 × 9 Covariances.
16
score8xx
10
90th percentile Bhattacharyya distance
bm=3
bm=12
1
Pareto bound
bm=(bc+1)/3
suggested requirement
0.1
bc=7
bc=8
bc=9
bc=10
bc=11
bc=12
bc=13
bc=14
0.01
0.001
50
6x6 covariances
100
150
200
250
total bits
Figure 15: Efficiency of Encoding using Rodrigues Parameters.
The M = n(n − 1)/2 independent elements of Q are the extended Rodrigues parameters [7]. Let q be the
vector of extended Rodrigues parameters, e.g. in 3D
or in 6D
Q =
0 −q3
q2
q3
0 −q1 ,
−q2
q1
0
q14 −q13
q12 −q11
0 −q15
q
0
−q
q
−q
q7
15
10
9
8
−q
q10
0
−q6
q5 −q4
14
Q =
q13 −q9
q6
0 −q3
q2
−q
q
−q
q
0
−q
12
8
5
3
1
q11 −q7
q4 −q2
q1
0
(28)
.
(29)
The qi take on all values including infinity.7 However, for a random orthogonal matrix U, the elements of Q
have a tangent distribution—i.e. the arc tangents of elements of Q are uniformly distributed in (−π/2, π/2].
Therefore, we encode an element of Q using cb bits as
mi = 2cb −1C atan(qi ).
(30)
The encoding length is again given by (21). With this scheme, the reconstructed matrix is always positive
definite, so no precompensation is necessary. Its efficiency is shown in Figure 15. For the most efficient
encodings, we have approximately bm = (bc + 1)/3.
7
Since any three-parameter parameterization of three-dimensional rotations has singularities, we should expect an M parameter
description of U to have singularities too.
17
scoreAxx
10
90th percentile Bhattacharyya distance
bm=3
bm=12
1
suggested requirement
0.1
Pareto bound
bm=max(.38*bc,bc-7)
bc=7
bc=8
bc=9
bc=10
bc=11
bc=12
bc=13
bc=14
0.01
0.001
50
6x6 covariances
100
150
200
250
total bits
Figure 16: Efficiency of Encoding using Once-Redundant Parameters.
When Q has large elements, encoding using this scheme does not improve much as we increase the
number of bits. The degradation may be reduced by encoding an element using
( cb −1
√
qi )
qi ≥ 0
2 C atan( p
mi =
.
(31)
c
−1
b
−2 C atan( |qi |) qi < 0
However, the scheme in the following section is even better.
4.5
Eigenvalue/Once-Redundant Encoding
A relatively efficient covariance encoding can be designed using the once-redundant encoding for an ndimensional rotation introduced by Schaub, Tsiotras, and Junkins [7]. We start with SVD factoring and a
Cayley transform as in the previous section. Let the Rodrigues parameters be qi , i = [1 . . . M]. Introduce a
scaled set of parameters βi = β0 qi , with the constraint
β20 + β21 + · · · + β2M = 1,
so that
β0 = q
1
(32)
.
(33)
1 + q21 + q22 + · · · + q2M
Like the Euler parameters (quaternions) in 3D, these once-redundant parameters have no singularities.8
They have a finite range (−1, 1), and can be encoded linearly. The efficiency of this method is shown in
Figure 16. The most efficient encodings are for bm = max{.38bc , bc − 7}.
8
This advantage is of course lost if, as here, they are calculated in terms of the Rodrigues parameters. It is possible to calculate
them directly (see [7, equation 52]). However, as shown below, they do not turn out to be efficient enough to warrant the effort.
18
scores
90th percentile Bhattacharyya distance
1
suggested requirement
0.1
0.01
covariance matrix
correlation matrix
Rodrigues parameters
once-redundant
correlation factor, omit 1st row
6x6 covariances
0.001
50
100
150
total bits
200
250
Figure 17: Summary of Covariance Encoding Efficiency.
5 Comparison of Encoding Methods
The Pareto bound efficiency curves for encoding 6D covariances using the encoding schemes described in
Chapter 4 are collected in Figure 17. The most efficient scheme encodes the root variances and all but the
first row of the correlation matrix factor.
The cumulative distributions of errors for the 6×6 test covariances are shown in Figure 18. The encoding
with bm = 8 and bc = 6 meets the suggested criterion on the 90th percentile Bhattacharyya distance:
b < n/40 = 0.15 for n=6 dimensions.
As mentioned above, even if the covariance were encoded perfectly, the received state would still differ
from the truth due to measurement error. We would expect 4b to be chi-square distributed with n degrees of
freedom. This contribution, P(4b|6), is plotted in the same figure for reference.
6 Recommendations
We recommend that covariances be encoded using the scheme described in Section 4.3 (root variance/correlation
factor encoding), with one of these parameter settings:
• bc = 5 and bm = 8,
• bc = 6 and bm = 8, or
• bc = 6 and bm = 9.
This recommendation is based on the set of test matrices adopted for testing. More bits may be needed
for the magnitude if the dynamic range of encoded root variances is increased, or for the correlation matrix
factor if covariance matrix condition numbers are found to be higher.
19
1
0.9
0.8
score4x
bm=9 bc=7
bm=9 bc=6
bm=8 bc=6
bm=8 bc=5
bm=7 bc=5
measurement error
suggested requirement
fraction of cases
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.0001
0.001
0.01
0.1
Bhattacharyya distance
1
10
Figure 18: Error Distributions.
References
[1] Berger. Statistical Decision Theory and Bayesian Analysis, page 88. Springer-Verlag, New York, NY,
1985.
[2] A. Bhattacharyya. On a measure of divergence between two statistical populations defined by their
probability distributions. Bull. Calcutta Math. Soc., 35:99–109, 1943.
[3] A. Cayley. On the motion of rotation of a solid body. Cambridge Mathematics Journal, 3:224–232,
1843.
[4] Thomas Jerardi. Personal communication, 1992.
[5] T. Kailath. The divergence and Bhattacharyya distance measures in signal selection. Communication
Technology, IEEE Transactions on, 15(1):52 –60, February 1967.
[6] John Nordmann. personal communication, 1992.
[7] Hanspeter Schaub, Panagiotis Tsiotras, and John L. Junkins. Principal rotation representations of
proper n × n orthogonal matrices. Intl. J. of Engineering Science, 33(15):2277–2295, 1995.
[8] G. W. Stewart. The efficient generation of random orthogonal matrices with an application to condition
estimators. SIAM Journal on Numerical Analysis, 17(3):pp. 403–409, June 1980.
[9] Robert E. Yang. Eigen-based method for covariance data compression. U. S. Patent 7574057B1,
August 2009.
[10] J. R. Van Zandt. Error analysis for covariance encoding. Technical Report MTR93B0000014, MITRE
Corporation, Bedford, MA, February 1993.
20
[11] J. R. Van Zandt. Tracklet covariance encoding. MITRE Paper MP97B0000095, MITRE Corporation,
Bedford, MA, December 1997.
A
Sample Program
The recommended encoding method (root variance/correlation factor encoding) was analyzed using the
following MATLAB/Octave program:
function precomp4(mbits, cbits, n, oname, plotting)
% precomp4 - test precompensation when encoding all but first row of U
if nargin<3; n=6; end % # states (may be 3, 6, or 9)
if nargin<4; oname=[mfilename ’.dat’]; end
if nargin<5; plotting=0; end
% These values are only for illustration
if nargin<1 mbits = 8; end
% bits for magnitudes
if nargin<2 cbits = 8; end
% bits for covariance elements
pmin = [1 1 1 1 1 1 1 1 1]; % m, m/s, m/sˆ2
pmax = [2000 2000 2000 1000 1000 1000 100 100 100];
% upper and lower bounds for root variances
du = [ 500 106 22
1000 64 16
50 25 12];
dl= [
22 5 1
4 4 4
12 6 3];
mbit_total=n*mbits;
% bits for signs, off-diagonal elements, diagonal elements:
cbit_total=n-1 + cbits*(n-1)*(n-2)/2 + (cbits-1)*(n-1);
bit_total=mbit_total+cbit_total;
fprintf([’ ---------------- cbits=%d bit/element correlation encoding,’...
’omitting first row, %d bits for correlation -----------\n’], ...
cbits, cbit_total);
fprintf(’ ---------------- mbits=%d, %d bits for magnitudes, %d bits total ---------\n’, ...
mbits, mbit_total, bit_total);
mcode = 2ˆmbits-2;
% root variance codes are in [0:mcode],
% 2ˆmbits-1 = "no statement"
ccode = 2ˆ(cbits-1);
% "no statement" not permitted, covariance
% codes are in [-ccode: ccode-1]
epsilon = 2ˆ(-2*cbits);
eval2_worst=1e9;
if exist(’OCTAVE_VERSION’)
rand(’state’,42);
randn(’state’,42);
else
% rand(’twister’,5489); % MATLAB 7.7 accepts this syntax, but doesn’t
21
% actually reset the random number generator
reset(RandStream.getDefaultStream); % reset random number generator
end
N=1e3;
ratio=zeros(N,n);
K=0;
% # valid test cases so far
for KK=1:N
d = dl.*(du./dl).ˆrand(1,9); % log-uniformly distributed between bounds
s = diag(d(1:n));
rho=diag(.5*.4*rand*ones(1,3)); % position-velocity correlation
% ranges from +0.5 to +0.9 (typically 0.7)
[r,q]=qr(randn(3,3));
if det(r)<0, r(:,1)=-r(:,1); end; % r is now a random 3D rotation matrix
switch n
case 3
cr=eye(3);
case 6
cr=[eye(3) rho;rho eye(3)];
r=blkdiag(r,r);
case 9
cr=[eye(3) rho zeros(3,3); rho eye(3) rho; zeros(3,3) rho eye(3)];
r=blkdiag(r,r,r);
end
p=floor(r*s*cr*s*r’);
% encode the covariance
sig=sqrt(diag(p));
d=diag(1./sig);
c=d*p*d;
if min(eig(c))>0
% root variances
% correlation matrix
% valid correlation matrix?
K=K+1;
eval1=sort(eig(p));
% precompensate
c=(1-epsilon)*c;
for k=1:n
c(k,k)=1;
end
u=chol(c);
% Cholesky factor such that u’*u = c
% note k=n here
% linear encoding of Cholesky factor
msg = zeros(1, n + n-1 + n*(n-1)/2); % n for root variances + n-1 for
% signs + n*(n-1)/2 for U
22
% log encoding of root variances
for k=1:n
msg(k)=floor( mcode*log(sig(k)/pmin(k))/log(pmax(k)/pmin(k))+1/2 );
msg(k)=max(0, min(mcode, msg(k))); % sanitize
end
msg(n+1:2*n-1)=u(1,2:n)<0;
% signs of elements in top row
% linear encoding of U
k = 2*n;
for j=2:n
for i=2:j % omit first row
msg(k)=min(floor(ccode*u(i,j)),ccode-1);
k=k+1;
end
end
% decode covariance
u2 = zeros(n,n);
u2(1,1)=1;
k = 2*n;
for j = 2:n
sum = 1;
for i = 2:j
u2(i,j) = (msg(k)+1/2)/ccode;
sum = sum - u2(i,j)ˆ2;
k = k+1;
end
if msg(n+j-1); sign=-1; else sign=1; end
u2(1,j) = sign*sqrt(max(0, sum));
end
c2=u2’*u2;
sig = pmin(1:n).*(pmax(1:n)./pmin(1:n)).ˆ((msg(1:n))/mcode);
s = diag(sig);
p2 = s*c2*s;
eval2=sort(eig(p2));
ratio(K,:)=eval2./eval1; % assume the eigenvectors correspond
if eval2(1)<eval2_worst
eval2_worst=eval2(1);
p_worst=p;
c_worst=c;
u_worst=u;
msg_worst=msg;
u2_worst=u2;
c2_worst=c2;
23
end
pm=(p+p2)/2;
bhat(K)=.5*log(det(pm)/sqrt(det(p)*det(p2)))/2; % Bhattacharyya distance
end
end
fprintf(’worst eigenvalue = %13.6g, corresponding to:\n’,
eval2_worst);
fprintf(’p=’);
disp(p_worst);
fprintf(’ eigenvalues= %10.3g %10.3g %10.3g %10.3g %10.3g %10.3g\n’, sort(eig(p_worst)));
fprintf(’c=’);
disp(c_worst);
fprintf(’ eigenvalues= %10.3g %10.3g %10.3g %10.3g %10.3g %10.3g\n’, sort(eig(c_worst)));
fprintf(’\nu=’);
disp(u_worst);
fprintf(’\nmsg=’);
disp(msg_worst);
fprintf(’\nu2=’);
disp(u2_worst);
fprintf(’\nc2=’);
disp(c2_worst);
fprintf(’ eigenvalues= %10.3g %10.3g %10.3g %10.3g %10.3g %10.3g\n’, sort(eig(c2_worst)));
fprintf(’\n’);
fprintf(’K eig(u3) eig(c3)\n’);
y=[.5:K]’/K;
ratio=sort(ratio(1:K,:));
if plotting>1
figure(4);
plot(ratio(:,1),y,ratio(:,2),y,ratio(:,3),y);
axis([.95 1.05 0 1]);
end
bhat=sort(bhat);
if plotting>0, figure(24); semilogx(bhat,y); end
ofile=fopen(oname,’wt’);
fprintf(’# encoding Cholesky factor of correlation matrix, omitting first row\n’);
fprintf(’# cbits = %2d, %3d bits for correlation\n’, cbits, cbit_total);
fprintf(’# mbits = %2d, %3d bits for magnitudes\n’, mbits, mbit_total);
fprintf(’#
%3d bits total\n’, bit_total);
fprintf(’median Bhattacharyya distance = %f\n’, bhat(round(K/2)));
fprintf(’90th percentile Bhattacharyya distance = %f\n’, bhat(round(K*.9)));
fprintf(’%4d %f ##\n’, bit_total, bhat(round(round(K*.9)))); % for score*.dat file
fprintf(ofile, ’# %s(%d, %d, %d, ’’%s’’, %d)\n’, mfilename, mbits, cbits, n, oname, plotting);
fprintf(ofile, ’# encoding Cholesky factor of correlation matrix, omitting first row\n’);
fprintf(ofile, ’# cbits = %2d, %3d bits for correlation\n’, cbits, cbit_total);
fprintf(ofile, ’# mbits = %2d, %3d bits for magnitudes\n’, mbits, mbit_total);
fprintf(ofile, ’#
%3d bits total\n’, bit_total);
fprintf(ofile, ’# median Bhattacharyya distance = %f\n’, bhat(round(K/2)));
fprintf(ofile, ’# 90th percentile Bhattacharyya distance = %f\n’, bhat(round(K*.9)));
fprintf(ofile, ’# fraction
bhat ratio(1) ratio(2) ratio(3)\n’);
for i=1:K
24
fprintf(ofile,’%f
end
fclose(ofile);
%f
%f
%f
%f\n’, y(i), bhat(i), ratio(i,1),ratio(i,2),ratio(i,3));
25