Preprint: IEEE Trans. Information Theory, vol. 61, no. 8, pp. 4475-4485, 2015
1
The Cauchy-Schwarz divergence for Poisson point
processes
Hung Gia Hoang, Ba-Ngu Vo, Ba-Tuong Vo, and Ronald Mahler
Abstract—In this paper, we extend the notion of CauchySchwarz divergence to point processes and establish that the
Cauchy-Schwarz divergence between the probability densities
of two Poisson point processes is half the squared L2 -distance
between their intensity functions. Extension of this result to
mixtures of Poisson point processes and, in the case where the intensity functions are Gaussian mixtures, closed form expressions
for the Cauchy-Schwarz divergence are presented. Our result also
implies that the Bhattacharyya distance between the probability
distributions of two Poisson point processes is equal to the square
of the Hellinger distance between their intensity measures. We
illustrate the result via a sensor management application where
the system states are modeled as point processes.
Index Terms—Poisson point process, information divergence,
random finite sets
I. I NTRODUCTION
The Poisson point process, which models “no interaction”
or “complete spatial randomness” in spatial point patterns, is
arguably one of the best known and most tractable of point
processes [2]–[6]. Point process theory is the study of random
counting measures with applications spanning numerous disciplines, see for example [2], [3], [6]–[8]. The Poisson point
process itself arises in forestry [9], geology [10], biology [11],
particle physics [12], communication networks [13]–[15] and
signal processing [16]–[18]. The role of the Poisson point
process in point process theory, in most respects, is analogous
to that of the normal distribution in random vectors [19].
Similarity measures between random variables are fundamental in information theory and statistical analysis [20]. Information theoretic divergences, for example Kullback-Leibler,
Rényi (or α-divergence) and their generalization CsiszárMorimoto (or Ali-Silvey), Jensen-Rényi, Cauchy-Schwarz etc.,
measure the difference between the information content of
the random variables. Similarity between random variables
can also be measured via the statistical distance between
their probability distributions, for example total variation,
Bhattacharyya, Hellinger/Matusita, Wasserstein, etc. Some distances are actually special cases of f -divergences [21]. Note
that statistical distances are not necessarily proper metrics.
For point processes or random finite sets, similarity measures have been studied extensively in various application
Acknowledgement: The work of B.-N. Vo and B.-T. Vo are supported by
the Australian Research Council under Future Fellowship FT0991854 and
Discovery Early Career Research Award DE120102388 respectively.
H. G. Hoang, B.-N. Vo, and B.-T. Vo are with the Department of Electrical
and Computer Engineering, Curtin University, Bentley, WA 6102, Australia
(email: {hung.hoang,ba-ngu.vo,ba-tuong.vo}@curtin.edu.au).
R. Mahler is with Random Sets LLC (email:
[email protected]).
Part of the paper has been presented at the 2014 IEEE Workshop on
Statistical Signal Processing, Gold Coast, Australia [1].
areas such as sensor management [22]–[27] and neuroscience
[28]. However, so far except for trivial special cases, these similarity measures cannot be computed analytically and require
expensive approximations such as Monte Carlo.
In this paper, we present results on similarity measures
for Poisson point processes via the Cauchy-Schwarz divergence and its relationship to the Bhattacharyya and Hellinger
distances. In particular, we show that the Cauchy-Schwarz
divergence between two Poisson point processes is given by
the square of the L2 -distance between their intensity functions.
Geometrically, this result relates the angle subtended by the
probability densities of the Poisson point processes to the
L2 -distance between their corresponding intensity functions.
For Gaussian mixture intensity functions, their L2 -distance,
and hence the Cauchy-Schwarz divergence can be evaluated
analytically. We also extend the result to the Cauchy-Schwarz
divergence for mixtures of Poisson point processes. In addition, using our result on the Cauchy-Schwarz divergence, we
show that the Bhattacharyya distance between the probability
distributions of two Poisson point processes is the square of the
Hellinger distance between their respective intensity measures.
The Poisson point process enjoys a number of nice properties
[2]–[4], and our results are useful additions. We illustrate the
use of our result on the Cauchy-Schwarz divergence in a sensor
management application for multi-target tracking involving the
Probability Hypothesis Density (PHD) filter [16].
The organization of the paper is as follows. Background on
point processes and the Cauchy-Schwarz divergence is provided in Section II. Section III presents the main results of the
paper that establish the analytical formulation for the CauchySchwarz divergence and Bhattacharyya distance between two
Poisson point processes. In Section IV, the application of the
Cauchy-Schwarz divergence to sensor management, including
numerical examples, is studied. Finally, Section V concludes
the paper.
II. BACKGROUND
d
In this work we consider a state space
∫ X ⊆ R , and adopt
2
the inner product notation
√ ⟨f, g⟩ , f (x)g(x)dx; the L norm notation ∏
∥f ∥ , ⟨f, f ⟩; the multi-target exponential
notation hX , x∈X h(x), where h is a real-valued function,
with h∅ = 1 by convention; and the indicator function notation
{
1, if x ∈ B
1B (x) ,
.
0, otherwise
The notation N (x; m, Q) is used to explicitly denote the
probability density of a Gaussian random variable with mean
m and covariance Q, evaluated at x.
2
Preprint: IEEE Trans. Information Theory, vol. 61, no. 8, pp. 4475-4485, 2015
A. Point processes
This section briefly summarizes concepts in point process
theory needed for the exposition of our result. Point process
theory, in general, is concerned with random counting measures. Our result is restricted to simple-finite point processes,
which can be regarded as random finite sets. For simplicity,
we omit the prefix “simple-finite” in the rest of the paper. For
an introduction to the subject we refer the reader to the article
[7], and for detailed treatments, books such as [2], [3], [5],
[6].
A point process or random finite set X on X is a random
variable taking values in F(X ), the space of finite subsets of
X . Let |X| denotes the number of elements in a set X. A point
process X on X is said to be Poisson with a given intensity
function u (defined on X ) if [2], [3]:
1) for any B ⊆ X such that ⟨u, 1B ⟩ < ∞, the random
variable |X ∩ B| is Poisson distributed with mean
⟨u, 1B ⟩,
2) for any disjoint B1 , ..., Bi ⊆ X , the random variables
|X ∩ B1 |, ..., |X ∩ Bi | are independent.
Since ⟨u, 1B ⟩ is the expected number of points of X in
the region B, the intensity value u(x) can be interpreted as
the instantaneous expected number of points per unit hypervolume at x. Consequently, u(x) is not dimensionless in
general. If hyper-volume (on X ) is measured in units of K
(e.g. md , cmd , ind , etc.) then the intensity function u has unit
K −1 .
The number of points of a Poisson point process X is
Poisson distributed with mean ⟨u, 1⟩, and conditional on the
number of points the elements x of X, are independently
and identically distributed (i.i.d.) according to the probability
density u(·)/ ⟨u, 1⟩ [2], [3], [5], [6]. It is implicit that ⟨u, 1⟩
is finite since we only consider simple-finite point processes.
The probability distribution of a Poisson point process X
with intensity function u is given by [6, pp. 15]
Pr(X ∈ T ) =
∞ −⟨u,1⟩ ∫
∑
e
i=0
i!
Xi
1T ({x1 , ..., xi })u{x1 ,...,xi } d(x1 , ..., xi ),
(1)
for any (measurable) subset T of F(X ), where X i denotes
the ith -fold Cartesian product of X , with the convention
X 0 = {∅}, and the integral over X 0 is 1T (∅). A Poisson point
process is completely characterized by its intensity function (or
more generally the intensity measure).
Probability densities of point processes considered in this
work are defined with respect to the reference measure µ given
by
∫
∞
∑
1
µ(T ) =
1T ({x1 , ..., xi })d(x1 , ..., xi )
(2)
i!K i X i
i=0
for any (measurable) subset T of F(X ). The measure µ is
analogous to the Lebesque measure on X (indeed it is the
unnormalized distribution of a Poisson point process with unit
intensity u = 1/K when the state space X is bounded).
Moreover, it was shown in [29] that for this choice of reference
measure, the integral of a function f : F(X ) → R, given by
∫
∫
∞
∑
1
f ({x1 , ..., xi })d(x1 , ..., xi ),
f (X)µ(dX) =
i!K i X i
i=0
(3)
is equivalent to Mahler’s set integral [30]. Note that the
reference measure µ, and the integrand f are all dimensionless.
Our main result involves Poisson point processes with
probability densities of the form
X
f (X) = e−⟨u,1⟩ [Ku] .
(4)
Note that for any (measurable) subset T of F(X )
∫
f (X)µ(dX)
T
∫
=
1T (X)f (X)µ(dX)
∞ −⟨u,1⟩∫
∑
e
=
1T ({x1 , ..., xi })u{x1 ,...,xi }d(x1 , ..., xi ).
i!
i
X
i=0
Thus, comparing with (1), f is indeed a probability density
(with respect to µ) of a Poisson point process with intensity
function u.
B. The Cauchy-Schwarz divergence
The Cauchy-Schwarz divergence is based on the CauchySchwarz inequality for inner products, and is defined for two
random vectors with probability densities f and g by [31]
DCS (f, g) = − ln
⟨f, g⟩
.
∥f ∥ ∥g∥
(5)
The argument of the logarithm in (5) is non-negative (since
probability densities are non-negative) and does not exceed
one (by the Cauchy-Schwarz inequality). Moreover, this quantity can be interpreted as the cosine of the angle subtended by
f and g in L2 (X , R), the space of square integrable functions
taking X to R. Note that DCS (f, g) is symmetric and positive
unless f = g, in which case DCS (f, g) = 0.
Geometrically, the Cauchy-Schwarz divergence determines
the information “difference” between random vectors from
the angle between their probability densities. The CauchySchwarz divergence can also be interpreted as an approximation to the Kullback-Leibler divergence [31]. While the
Kullback-Leibler divergence can be evaluated analytically for
Gaussians (random vectors) [32], [33], for the more versatile
class of Gaussian mixtures, only Jensen-Rényi and CauchySchwarz divergences can be evaluated in closed form [31],
[34]. Hence, the Cauchy-Schwarz divergence between two
densities of random variables has been employed in many
information theoretic applications, especially in machine learning and pattern recognition [31], [35]–[38].
III. T HE C AUCHY-S CHWARZ DIVERGENCE FOR P OISSON
POINT PROCESSES
This section presents the main theoretical results of the
paper. Subsection III-A establishes the Cauchy-Schwarz divergence for general Poisson point processes. Subsection III-B
presents analytical solution for Poisson point processes with
3
Preprint: IEEE Trans. Information Theory, vol. 61, no. 8, pp. 4475-4485, 2015
Gaussian mixture intensities while subsection III-C details the
solution for mixtures of Poisson point processes. Finally, subsection III-D presents a result on the Bhattacharyya distance
between two Poisson processes.
A. Cauchy-Schwarz divergence for Poisson point processes
For point processes, the Csiszár-Morimoto divergence,
which includes the Kullback-Leibler and Rényi, were formulated in [23] by replacing the standard (Lebesque) integral
with the set integral which is defined for a Finite Set Statistics
(FISST) density ϕ as follows [30]
∫
∫
∞
∑
1
ϕ({x1 , ..., xi })d(x1 , ..., xi ).
ϕ(X)δX =
i!
i=0
The FISST density ϕ is not a probability density, but is
closely related to a probability density, see [29] for further
details. Note that ϕ({x1 , ..., xi }) has unit K −i , since the
infinitesimal hyper-volume d(x1 , ..., xi ) has unit K i . Thus,
ϕ(X) has different units for different cardinalities of X.
Unlike the Csiszár-Morimoto divergence, the CauchySchwarz divergence, however, cannot be extended to point
processes by simply replacing the standard integral with the set
integral. To see this, consider the naı̈ve inner product between
two FISST densities ϕ and φ via the set integral:
∫
⟨ϕ, φ⟩ = ϕ(X)φ(X)δX
∫
∞
∑
1
ϕ({x1 , ..., xi })φ({x1 , ..., xi })d(x1 , ..., xi );
=
i!
i=0
since the i-th term in the above sum has units of K −i , the
sum itself is meaningless because the terms cannot be added
together due to unit mismatch, e.g. if K = m3 , then the first
term is unitless, the second term is in m−3 , the third term is
in m−6 , etc. Indeed such naı̈ve inner product has been used
incorrectly in [39].
Using the standard notion of density and integration summarized in subsection II-A, we can define the inner product
∫
⟨f, g⟩µ = f (X)g(X)µ(dX),
and corresponding norm
∥f ∥µ ,
√
Lemma 1. Let f (X) = rX and g(X) = sX with r, s ∈
−1
L2 (X , R). Then ⟨f, g⟩µ = eK ⟨r,s⟩ .
Proof:
∫
X
∞
i
∑
⟨r, s⟩
i=0
i!K i
= eK
−1
⟨r,s⟩
.
In the spirit of using the angle between probability densities to determine the information “difference”, the CauchySchwarz divergence can be extended to point processes as
follows.
Definition 1. The Cauchy-Schwarz divergence between the
probability densities f and g of two point processes with
respect to the reference measure µ is defined by
DCS (f, g) = − ln
⟨f, g⟩µ
∥f ∥µ ∥g∥µ
.
(6)
The above definition of the Cauchy-Schwarz divergence
can be equivalently expressed in terms of set integrals as
follows. Let ϕ and φ denote the FISST densities of the
respective point processes. Using the relationship between the
FISST density and the Radon-Nikodym derivative in [29], the
corresponding probability densities relative to µ are given by
f (X) = K |X| ϕ(X) and g(X) = K |X| φ(X). Since
∫
∞
∑
1
K i ϕ({x1 , ..., xi })×
⟨f, g⟩µ =
i
i!K
i
X
i=0
=
∫
K i φ({x1 , ..., xi })d(x1 , ..., xi )
K |X| ϕ(X)φ(X)δX
the Cauchy-Schwarz divergence can be written as
∫ |X|
K ϕ(X)φ(X)δX
.
DCS (ϕ, φ) = − ln √∫
∫
|X|
K ϕ2 (X)δX K |X| φ2 (X)δX
The following proposition asserts that the Cauchy-Schwarz
divergence between two Poisson point processes is half the
squared distance between their intensity functions.
Proposition 1. The Cauchy-Schwarz divergence between the
probability densities f and g of two Poisson point processes
with respective intensity functions u and v ∈ L2 (X , R)
(measured in units of K −1 ), is given by
DCS (f, g) =
K
2
∥u − v∥ .
2
(7)
X
⟨f, f ⟩µ
on L2 (F(X ), R). Such forms for the inner product and norm
are well-defined because the densities f , g and reference
measure µ are all unitless.
Interestingly, the inner product between multi-object exponentials is given by the following result.
⟨f, g⟩µ =
=
[rs] µ(dX)
]i
[∫
∞
∑
1
r(x)s(x)dx
=
(using (3))
i!K i X
i=0
Proof: Substituting f (X) = e−⟨u,1⟩ [Ku] , g(X) =
X
−⟨v,1⟩
e
[Kv] into (6) and canceling out the constants e−⟨u,1⟩ ,
e−⟨v,1⟩ we have
⟨
⟩
[Ku](·) , [Kv](·) µ
DCS (f, g) = − ln ⟨
⟩1 ⟨
⟩1
[Ku](·) , [Ku](·) µ2 [Kv](·) , [Kv](·) µ2
Applying Lemma 1 to the above equation gives
)
(
K
K
DCS (f, g) = − ln eK⟨u,v⟩− 2 ⟨u,u⟩− 2 ⟨v,v⟩
( K
)
= − ln e− 2 (⟨u,u⟩−2⟨u,v⟩+⟨v,v⟩)
K
2
∥u − v∥ .
2
Note that since the intensity functions have units of K −1 ,
2
2
∥u − v∥ also has unit of K −1 and hence K ∥u − v∥ is
=
4
Preprint: IEEE Trans. Information Theory, vol. 61, no. 8, pp. 4475-4485, 2015
2
unitless. Moreover, K ∥u − v∥ , referred to as the squared
distance between the intensity functions u and v, takes on
the same value regardless of the choice of measurement unit.
Suppose that the unit of the hyper-volume in the state space
X has been changed from K to ρK (for example, from
dm3 to m3 = 103 dm3 ) as illustrated in Fig. 1. The change
of unit inevitably leads to the change in numerical values
of the two intensity functions (for example, the intensity
measured in m−3 , which is the expected number of points
per cubic meter, is one thousand times the intensity measured
in dm−3 ). However, these changes cancel each other in the
2
product ρK
2 ∥uρ − vρ ∥ such that the squared distance remains
unchanged.
u [K −1 ]
Poisson distributions vanishes as the distance between their
intensity functions tends to zero. However, it was not clear
that a reduction in the error between the intensity functions
necessarily implies a reduction in the “difference” between
the corresponding distributions. Our result not only verifies
that the “difference” between the distributions is reduced, it
also quantifies the reduction.
B. Gaussian Mixture Intensities
In general, the L2 -distance between the intensity functions,
and hence the Cauchy-Schwarz divergence, cannot be numerically evaluated in closed form. However, for Poisson point
processes with Gaussian mixture intensity functions, applying
the following identity for Gaussian probability density functions [40, pp. 200]
⟨N (·; µ0 , Σ0 ), N (·; µ1 , Σ1 )⟩ = N (µ0 ; µ1 , Σ0 + Σ1 ),
u0
to (7) yields an analytic expression for the Cauchy-Schwarz
divergence. This is stated more concisely in the following
result.
x0
x[K]
Corollary 1. The Cauchy-Schwarz divergence between two
Poisson point processes with Gaussian mixture intensities:
u ′ [ρ−1 K −1 ]
u(x) =
ρu0
v(x) =
Nu
∑
i=1
Nv
∑
i=1
(i)
wu(i) N (x; m(i)
u , Pu ),
(8a)
(i)
wv(i) N (x; m(i)
v , Pv )
(8b)
(measured in units of K −1 ) is given by
x ′ [ρK]
ρ−1 x0
DCS (f, g) =
Fig. 1. Change of unit in the state space
Proposition 1 has a nice geometric interpretation that relates the angle subtended by the probability densities in
L2 (F(X ), R) to the distance between the corresponding intensity functions in L2 (X , R) as depicted in Fig. 2. More
concisely: the secant of the angle between the probability
densities of two Poisson point processes equals the exponential
of half the squared distance between their intensity functions.
L2 (F(X ), R)
ln(sec θ) =
K
2
∥u − v∥2
u
0
v
v∥
θ
g
∥u −
f
0
L2 (X , R)
Fig. 2. Geometric interpretation of Proposition 1
The above result has important implications in the approximation of Poisson point processes through their intensity
functions. It is intuitive that the “difference” between the
Nu ∑
Nu
(
)
1∑
(j)
(i)
(j)
wu(i) wu(j) N m(i)
;
m
,
P
+
P
+
u
u
u
u
2 i=1 j=1
Nv ∑
Nv
(
)
1∑
(j)
(i)
(j)
wv(i) wv(j) N m(i)
−
v ; mv , Pv + Pv
2 i=1 j=1
Nu ∑
Nv
∑
i=1 j=1
(
)
(j)
(i)
(j)
wu(i) wv(j) N m(i)
u ; mv , Pu + Pv
(9)
In terms of computational complexity, each term in (9)
involves evaluations of a Gaussian probability density function within a double sum. Hence, if we use the standard
Gauss-Jordan elimination for matrix inversions, computing
DCS (f, g) is quadratic in the number of Gaussian components
and cubic in the state dimension (i.e. O(Nv2 d3 ), assuming
Nv ≥ Nu ). The complexity can be reduced to O(Nv2 d2.373 )
if the optimized Coppersmith-Winograd algorithm [41] was
employed in place of the Gauss-Jordan elimination.
This Corollary has important implications in Gaussian mixture reduction for intensity functions. The result provides
mathematical justification for Gaussian mixture reduction for
intensity functions based on L2 -error. Furthermore, since
Gaussian mixtures can approximate any density to any desired accuracy [42], Corollary 1 enables the Cauchy-Schwarz
divergence between two Poisson point processes to be approximated to any desired accuracy.
5
Preprint: IEEE Trans. Information Theory, vol. 61, no. 8, pp. 4475-4485, 2015
C. Mixture of Poisson point processes
Proposition 1 can be easily extended to mixtures of Poisson
point processes, i.e. those whose probability densities can be
written as a weighted sum of Poisson point process densities:
Nf
∑
(i)
X
f (X) =
(10a)
wf e−⟨ui ,1⟩ [Kui ] ,
i=1
Ng
∑
g(X) =
X
wg(i) e−⟨vi ,1⟩ [Kvi ] ,
(10b)
i=1
(i)
i=1 wf
∑Nf
∑Ng
(i)
= 1. Such point processes
where
=
i=1 wg
have applications in immunology [43], neural data analysis
[44], criminology [45], and machine learning [46].
Substituting (10) into (6) and applying Lemma 1, the
Cauchy-Schwarz divergence between two mixtures of Poisson
point processes is stated as follows.
Corollary 2. The Cauchy-Schwarz divergence between two
mixtures of Poisson point processes given in (10) is
Nf Ng
K⟨ui ,vj ⟩
∑ (i)
∑
e
wf wg(j) ⟨u +v ,1⟩ +
DCS (f, g) = − ln
i
j
e
i=1 j=1
Nf Nf
1 ∑ ∑ (i) (j) eK⟨ui ,uj ⟩
wf wf ⟨u +u ,1⟩ +
ln
2
e i j
i=1 j=1
Ng Ng
1 ∑ ∑ (i) (j) eK⟨vi ,vj ⟩
wg wg ⟨v +v ,1⟩ . (11)
ln
2
e i j
i=1 j=1
Furthermore, if the intensity function of each Poisson point
process component is a Gaussian mixture (in units of K −1 ):
ui (x) =
Nui
∑
ℓ=1
Nvj
vj (x) =
∑
ℓ=1
(ℓ)
N (x; m(ℓ)
ωu(ℓ)
ui , Pui ),
i
(ℓ)
ωv(ℓ)
N (x; m(ℓ)
vj , Pvj ),
j
(
)
(k)
(ℓ)
(k)
(ℓ)
(k)
+
P
,
P
;
m
N
m
ω
ωv(ℓ)
,
vj
vi
vj
vi
vj
i
K⟨vi , vj ⟩ =
∑∑
K⟨ui , vj ⟩ =
∑∑
(k)
(ℓ)
(k)
N m(ℓ)
ωv(k)
ωu(ℓ)
,
ui ; mvj , Pui + Pvj
j
i
⟨ui + vj , 1⟩ =
∑∑
ωu(ℓ)
+ ωv(k)
,
i
j
⟨ui + uj , 1⟩ =
∑∑
,
+ ωu(k)
ωu(ℓ)
j
i
⟨vi + vj , 1⟩ =
∑i ∑
ℓ=1 k=1
N u i N vj
ℓ=1 k=1
N u i N vj
(
ℓ=1 k=1
Nui Nuj
ℓ=1 k=1
N v N vj
ℓ=1 k=1
.
+ ωv(k)
ωv(ℓ)
j
i
The Cauchy-Schwarz divergence is based on the angle
between two probability densities (with respect to a reference
measure), and is not necessarily invariant to the choice of
reference measure. Closely related to the Cauchy-Schwarz
divergence is the Bhattacharyya distance between two probability measures [47].
Definition 2. The Bhattacharyya distance between to probability measures F and G, is defined by
√
⟩
⟨√
dF
dG
(12)
,
DB (F, G) = − ln
dµ
dµ
µ
where µ is any measure dominating F and G. The inner
product in the above definition, denoted by CB (F, G), is called
the Bhattacharyya coefficient and is invariant to the choice of
reference measure µ [47].
Unlike the Cauchy-Schwarz divergence, the Bhattacharyya
distance avoids the requirement of square integrable probability densities since square roots of probability densities are
always square integrable. Note also that the Bhattacharyya
distance can be expressed as the Cauchy-Schwarz divergence
between the square roots of the probability densities, i.e. for
any µ that dominates F and G
√
)
(√
dF
dG
(13)
,
DB (F, G) = DCS
dµ
dµ
Hence, Proposition 1 can be applied to relate the Bhattacharyya distance between the probability distributions of
Poisson point processes to their intensity functions.
Corollary 3. The Bhattacharyya distance between the probability distributions F and G of two Poisson point processes
with respective intensity measures U and V (assumed to have
densities with respect to the Lebesque measure), is given by
then DCS (f, g) can be evaluated analytically by substituting
the following equations into (11)
Nui Nuj
(
)
∑
∑
(k)
(ℓ)
(k)
(ℓ)
(k)
K⟨ui , uj ⟩ =
ωu(ℓ)
ω
N
m
;
m
,
P
+
P
,
u
u
u
u
u
i
j
i
j
i
j
ℓ=1 k=1
N vi N vj
D. Bhattacharyya distance for Poisson point processes
)
2
(U, V ),
DB (F, G) = DH
(14)
where
1
DH (U, V ) = √
2
√
dU
−
dλ
√
dV
,
dλ
is the Hellinger distance between the measures U, and V ,
(which is invariant to the choice of reference measure).
Proof: Let u and v be densities (measured in units of
K −1 ) of U and V relative to the Lebesque measure λ.
Then the densities of F and G relative to µ, are given by
X
X
f (X) = e−⟨u,1⟩ [Ku] , g(X) = e−⟨v,1⟩ [Kv] . From
√ Proposition 1 the Cauchy-Schwarz divergence between f (X) ∝
]X
[ √
]X
[ √
√
is given by
K u/K , and g(X) ∝ K v/K
DCS
(√
f,
√
√
u
v
√ ) K
−
g =
2
K
K
√ 2
1 √
=
u− v ,
2
2
(U, V ).
= DH
2
,
6
Preprint: IEEE Trans. Information Theory, vol. 61, no. 8, pp. 4475-4485, 2015
The above Corollary asserts that the Bhattacharyya distance
between two Poisson point processes is the squared Hellinger
distance between their intensity measures. Moreover, the
square of the Hellinger distance can be expanded as
⟨√
2
2
√
√ ⟩
√
dU
dV
dU
dV
+
−
2
,
dλ
dλ
dλ
dλ
2
DH
(U, V ) =
2
U (X ) + V (X )
− CB (U, V ).
=
2
The intensity masses U (X ) and V (X ) are the expected number of points of the respective Poisson point processes. Thus,
Corollary 3 provides another interesting interpretation: the
Bhattacharyya distance between two Poisson point processes
is the difference between the expected number of points per
process and the Bhattacharyya coefficient of their intensity
measures.
In general, the Hellinger distance cannot be numerically
evaluated in closed form. However, for Poisson point processes
with Gaussian intensity function, using the Bhattacharyya
coefficient for Gaussians [48]
CB (N (·; µ0 , Σ0 ), N (·; µ1 , Σ1 )) =
)
(
√
√
µ0 µ1 Σ 0 + Σ 1
; ,
(2π)d |Σ0 | |Σ1 |N
2 2
2
yields an analytic expression for the Hellinger distance between the Gaussian intensity functions, stated as follows.
Corollary 4. The Bhattacharyya distance between two Poisson
point processes with Gaussian intensities:
u(x) = wu N (x; mu , Pu ),
v(x) = wv N (x; mv , Pv )
(15a)
(15b)
(measured in units of K −1 ) is given by
√
√
wu + wv
DB (F, G) =
− (2π)d wu wv |Pu | |Pv |×
2
)
(
mu mv Pu + Pv
N
. (16)
;
,
2
2
2
Remark. For point processes, the Bhattacharyya distance can
be defined by replacing the standard (Lebesque) integral with
the set integral. Again let ϕ and φ denote the FISST densities
of the respective point processes. Then it follows from [29]
that the corresponding probability densities relative to µ are
given by f (X) = K |X| ϕ(X) and g(X) = K |X| φ(X). Hence,
∫ √
∞
⟨√ √ ⟩
∑
1
f, g =
K i ϕ({x1 , ..., xi })×
i
i!K
µ
i
X
i=0
√
K i φ({x1 , ..., xi })d(x1 , ..., xi )
∫ √
√
=
ϕ(X) φ(X)δX
and the Bhattacharyya distance can be written in terms of
FISST densities and set integral as
∫ √
√
ϕ(X) φ(X)δX.
DB (ϕ, φ) = − ln
IV. A PPLICATION TO MULTI - TARGET SENSOR
MANAGEMENT
In this section, we present an application of our result
to a sensor management (a.k.a. sensor control) problem for
multi-target systems, where system states are modeled as point
processes or random finite sets (RFS) [16], [29], [30], [49]. A
multi-target system is fundamentally different from a singletarget system in that the number of states changes with time
due to births and deaths of targets.
For the purpose of illustrating the result in the previous
section, we assume a linear Gaussian multi-target model [50],
where the hidden multi-target state at time k is a finite set
Xk , which is partially observed as another finite set Zk . All
aspects of the system dynamics as well as sensor detection
and false alarms are described in details in Appendix A.
Multi-target sensor management is a stochastic control
problem which involves the following steps
1) Propagating the multi-target posterior density, or alternatively a tractable approximation, recursively in time;
2) At each time, determining the action of the sensor by
optimizing an objective function over a set of admissible
actions.
In step 1, propagating the full posterior is generally intractable. However, for linear Gaussian multi-target systems,
the first moment of the posterior (a.k.a. the intensity function) can be propagated efficiently via the Gaussian Mixture
Probability Hypothesis Density (GM-PHD) filter [50] as documented in Appendix B. The sensor action in step 2 is executed
by applying a control command/signal to the sensor, usually in
order to either minimize a cost or maximize a reward. In the
rest of this section, we demonstrate that the Cauchy-Schwarz
divergence is a useful reward function for multi-target sensor
management.
A. Cauchy-Schwarz divergence based reward
Denote by R(ak−1 , Zk:k+p ) the value of a reward function
if the control command ak−1 were applied to the sensor at time
k − 1 and subsequently the measurement sequence Zk:k+p =
[Zk , Zk+1 , ..., Zk+p ] is observed for p + 1 time steps in the
future. For illustration purpose, we only focus on the single
step look-ahead (i.e. p = 0) policy. Naturally, given the reward
function R(ak−1 , Zk:k+p ), the optimal control [command a∗k−1
]
is chosen to maximize the expected reward E R(ak−1 , Zk ) ,
where the expectation is taken over all possible values of the
future measurement Zk . A computationally cheaper approach
is to maximize the ideal predicted reward R(ak−1 , Zk∗ ) [26],
[51], [52], where Zk∗ is the ideal predicted measurement from
the predicted intensity vk|k−1 , that is, assuming no false alarms
(zero clutter) and perfect target measurements (unity detection
probability and negligible measurement noise). Other choices
of objective functions are discussed in [26], [51]–[53].
A common class of reward functions for sensor control is
that of information theoretic divergences between the predicted
and posterior probability densities. For example, in [26],
[52], [53] the Rényi divergence is employed to quantify the
information gain from the future measurements for a chosen
control action. The main drawback of the Rényi divergence
7
Preprint: IEEE Trans. Information Theory, vol. 61, no. 8, pp. 4475-4485, 2015
based approach is that it involves computation of integrals in
infinite dimensional spaces which is generally intractable.
As an alternative to the Rényi divergence, we propose the
use of the Cauchy-Schwarz divergence for multi-target sensor
control. According to Proposition 1, computing the CauchySchwarz reward function for Poisson multi-target densities
reduces to calculating the squared L2 -distance between the
predicted and posterior intensities:
]
y [m]
0.
84
400
0.
0.9
96
9
72
0
1
500
300
0.99
9
200
84
1
0
6
0.
with σϵ,k = 3m.
[
9
0.9
100
0.
99
0
0
100
200
300
400
500
x [m]
0.9
2
Rk = σϵ,k
6
66
0.
where
600
0.
0.
0.
78
g(zk |xk ) = N (zk ; Hxk , Rk ),
72
78
0.
with T = 1s.
Measurements are noisy position returns according to the
single-target likelihood
0.
0.
6
0.9
The detection profile is illustrated in Fig. 3.
The single-target transition density is f (xk |xk−1 ) =
N (xk ; F xk−1 , Q), where
2
0
T 3 0 T54
1 0 T 0
2
0 1 0 T
0 T 3 0 T54
, Q = 27
F =
2
T
0 0 1 0
T54
0
0
81
2
T
0 0 0 1
0
0 T54
81
84
9
[
]
[
]
1 0 0 0
3
−2.4
6
H=
, S = 10
.
0 1 0 0
−2.4 3.6
0.
700
0.
(18)
0.9
9
where
N (sk (ak ); Hxk , S)
N (0; 0, S)
54
800
0.9
pD,k (xk ; ak−1 ) =
48
0.
66
8
900
6
0.
9
This example is based on a scenario adapted from [26]
in which a mobile robot is tracking a varying number of
moving targets. The surveillance area is a square of dimensions
1000m × 1000m. Each target at time k − 1 is characterized by
a single-target state of the form xk−1 = [pTk−1 , ṗTk−1 ]T where
pk−1 is the 2D position vector and ṗk−1 is the 2D velocity
vector. If the control command ak−1 is applied at time k − 1,
the sensor will move from its current position sk−1 to a new
position sk (ak−1 ), where a target with state xk can be detected
with probability
72
0.7
99
B. Numerical example
4
0.
0.
0.
0.8
0.
K
2
vk|k−1 (·) − vk (·; ak−1 , Zk )
(17)
2
This strategy effectively replaces the evaluation of the Rényi
divergence, via integrals in the infinite dimensional space
F(X ), with the Cauchy-Schwarz divergence, which can be
computed via standard integrals on the finite dimensional
space X . Moreover, when the GM-PHD filter is used for the
propagation of the Gaussian mixture posterior intensity, the
reward function R(ak−1 , Zk ) can be evaluated in closed form
using Corollary 1.
In this section, our control policy is to select the control command ak−1 so as to maximize the ideal reward
∗
R(ak−1 , Zk−1
).
R(ak−1 , Zk ) =
1000
600
700
800
900
1000
Fig. 3. Initial positions of the sensor (♢) and targets (). The contours
depict the sensor’s detection profile presented in (18), in which the detection
probability decreases with distance from the sensor.
Clutter is modeled by a Poisson RFS with intensity κ(z) =
λc(z) where λ = 2 × 10−5 m−2 and c(z) = U ([0, 1000m] ×
[0, 1000m]) is the uniform density over the surveillance area.
At time k − 1, the set Ak−1 contains all admissible control
command
[ that drive] the sensor from the current position
(x)
(y)
T
sk−1 = sk−1 , sk−1
to one of the following locations
{[
]T}(NR ,Nθ )
(x)
(y)
Sk = sk−1 +j∆R cos(ℓ∆θ), sk−1 +j∆R sin(ℓ∆θ)
,
(j,ℓ)=(0,0)
2π
Nθ rad
and ∆R = 50m are the angular and
where ∆θ =
radial step sizes respectively. The number of angular and radial
steps are NR = 2 and Nθ = 8. The set Sk , thus, has 17
options in total which discretize the angular and radial region
around the current sensor position. The sensor is always kept
inside the surveillance area by setting the value of the objective
function corresponding to positions outside the surveillance
area to −∞.
With these settings, it is expected that our control policy
should, intuitively speaking, move the sensor towards the
targets, and remain in their vicinity in order to obtain a high
detection probability. Fig. 4 depicts a typical sensor trajectory
which appears to be consistent with this intuitive expectation.
We proceed to illustrate the performance of the proposed
strategy. First, we compare the performance of the CauchySchwarz divergence based control strategy to that of an
existing Rényi divergence based control strategy proposed in
[26]. Since the Rényi divergence in general has no closed form
solution and thus must be approximated by Sequential Monte
Carlo (SMC), we also have to implement the Cauchy-Schwarz
divergence using SMC approximation in order to enable a
fair comparison. Second, the proposed GM implementation
performance is then benchmarked against that of the SMCbased approach. When the objective function is approximated
8
Preprint: IEEE Trans. Information Theory, vol. 61, no. 8, pp. 4475-4485, 2015
100
1000
Cauchy−Schwarz (GM)
Cauchy−Schwarz (SMC)
Renyi (SMC)
Sensor trajectory
900
90
800
80
700
70
60
OSPA [m]
y [m]
600
500
50
40
400
30
300
20
200
10
100
0
0
0
100
200
300
400
500
x [m]
600
700
800
900
1000
Fig. 4. A typical sensor trajectory. Target start and stop positions are marked
by and ∇, respectively. The red target died at k = 19 whereas the green
target was born at k = 27. The sensor initially moved towards the targets
and remained in their vicinity, then moved again to the middle of the existing
targets and the new born target for optimal detection of all targets.
by SMC, the corresponding SMC-PHD filter [29] is used
for recursive propagation of the posterior intensity function.
All algorithms were implemented in MATLAB R2010b on a
laptop with an Intel Core i5-3360 CPU and 8GB of RAM. The
average run time for the Rényi divergence based strategy is
10.62 seconds (SMC-PHD filter implementation) while those
for the Cauchy-Schwarz based strategies are 10.68 seconds
(SMC-PHD filter implementation) and 3.21 seconds (GMPHD filter implementation). It is evident that the closed form
Cauchy-Schwarz divergence based strategy is the fastest.
Fig. 5 shows the Optimal SubPattern Assignment (OSPA)
metric or miss distance [54] (with parameters p = 2,
c = 100m) averaged over 200 Monte Carlo runs for each
of the considered control strategies. The OSPA curves in
Figure 5 suggest that the closed form GM-PHD filter based
strategy outperforms its approximate SMC-PHD filter based
counterparts, while the performance of the two approximate
SMC-PHD filter based strategies are virtually identical.
These numerical results suggest that the Cauchy-Schwarz
divergence can be at least as effective as the Rényi divergence
when used as a reward function for multi-target sensor control.
The results further suggest that the former has the distinct
advantage of the GM implementation which leads to superior
performance due to closed form solution and better filtering
capability.
V. C ONCLUSIONS
In this paper, we have extended the notion of the CauchySchwarz divergence to point processes, and have shown that
for an appropriate choice of reference measure, the CauchySchwarz divergence between the probability densities of two
Poisson point processes is half the squared distance between
their intensity functions. We have extended this result to
mixtures of Poisson point process and derived closed form
5
10
15
20
Time steps
25
30
35
40
Fig. 5. Comparison of the averaged OSPA distance generated by different
control strategies. While SMC-PHD implementations for the Rényi divergence
(dashed line) and the Cauchy-Schwarz divergence (starred line) yield similar
results, they are outperformed by the GM-PHD implementation (solid line)
due to closed form solution for the Cauchy-Schwarz divergence and better
filtering performance.
expressions for the Cauchy-Schwarz divergence when the intensity functions are Gaussian mixtures. The Cauchy-Schwarz
divergence for probability densities is not necessarily invariant
to the choice of reference measure. Nonetheless the CauchySchwarz divergence for the square roots of probability densities, or equivalently, the Bhattacharyya distance for probability measures, importantly is invariant to the choice of
reference measure. For Poisson point processes, our result
implies that the Bhattacharyya distance between the probability distributions is equal to the square of the Hellinger
distance between the intensity measures, which in turn is the
difference between the expected number of points per process
and the Bhattacharyya coefficient of their intensity measures.
We have illustrated an application of our result on a sensor
control problem for multi-target tracking where the system
state is modeled as a point process. Our result is an addition
to the list of interesting properties of Poisson point processes
and has important implications in the approximation of point
processes.
A PPENDIX A
L INEAR G AUSSIAN SYSTEM MODEL
In a linear Gaussian multi-target model, each constituent
element xk−1 of the multi-target state Xk−1 at time k − 1
either continues to exist at time k with probability pS,k or
dies with probability 1 − pS,k , and conditional on its existence
at time k, transitions from xk−1 to xk with probability density
f (xk |xk−1 ) = N (xk ; Fk−1 xk−1 , Qk−1 ).
(19)
The surviving targets at time k is thus a Multi-Bernoulli point
process or RFS [26], [51]–[53]. New targets can arise at time
k either by spontaneous births, or by spawning from targets
at time k − 1. The set of birth targets and spawned targets are
9
Preprint: IEEE Trans. Information Theory, vol. 61, no. 8, pp. 4475-4485, 2015
modeled as Poisson point processes with respective Gaussian
mixture intensity functions
Jk−1 Jβ,k
vβ,k|k−1 (x) =
∑∑
i=1 j=1
Jγ,k
γk (x) =
(i)
∑
(
)
(i)
(i)
(i)
(i)
wβ,k N x; Fβ,k−1 ζ + dβ,k−1 , Qβ,k−1 ,
,
(i)
JD,k
∑
j=0
(i)
PS,k|k−1 = Qk−1 + Fk−1 Pk−1 [Fk−1 ]
The multi-target state is hidden and is partially observed by a
sensor driven by the control vector ak−1 at time k − 1. Each
target evolves and generates observations independently of one
another. A target with state xk is detected by the sensor with
probability:
pD,k (x; ak−1 ) =
(i)
mS,k|k−1 = Fk−1 mk−1
(i)
wγ,k N
i=1
(i)
(i)
x; mγ,k , Pγ,k
)
∑
i=1
Jβ,k
βk|k−1 (x|ζ) =
(
(
)
(j)
(j)
(j)
wD,k N x; mD,k (ak−1 ), PD,k
(or missed with probability 1 − pD,k (xk ; ak−1 )) and conditional on detection generates a measurement zk according to
the probability density
gk (zk |xk ) = N (zk ; Hk xk , Rk ).
(i,j)
(j)
Update: If predicted intensity and detection probability are
Gaussian mixtures of the form
Jk|k−1
vk|k−1 (x) =
then, the posterior intensity at time k is given by
∑
vk (x; Zk (ak−1 )) = vM,k (x; ak−1 ) +
vD,k (x; z)
z∈Zk (ak−1 )
where
Jk|k−1
vM,k (x; ak−1 ) =
(i)
wM,k (ak−1 ) =
(i)
wµ,k (ak−1 ) =
Tk (ak−1 ) =
(i)
(i)
wk−1 N (x; mk−1 , Pk−1 )
then the predicted intensity at time k is also a Gaussian mixture
and is given by
vk|k−1 (x) = vS,k|k−1 (x) + vβ,k|k−1 (x) + γk (x)
where
Jk−1
∑
i=1
∑
(
)
(i)
(i)
(i)
wM,k (ak−1 )N x; mk|k−1 , Pk|k−1
i=1
(i)
wµ,k (ak−1 )Tk (ak−1 )
Jk|k−1
∑ (i)
wµ,k (ak−1 )
i=1
[
(
)]
(i)
(i)
1 − pD,k mk|k−1 ; ak−1 wk|k−1
(
)
(i)
(i)
(i)
wk−1 N x; mS,k|k−1 , PS,k|k−1
1 Here, we use a slightly different technique from that in [55], which
proposes an approximate propagation for the original GM-PHD filter in order
to mitigate computational issues involving negative Gaussian mixture weights
which arise due to a state dependent detection probability. For notational
compactness we omit the time index on the state variable and the conditioning
on the measurement history in expressions involving the posterior intensity
function.
∑
i=1
(i,j)
wk|k−1 (ak−1 )
=
Jk|k−1 JD,k
(i)
wk|k−1 −
∑ ∑
i=1
(i,j)
wk|k−1 (ak−1 )
j=0
(i)
(j) (i,j)
wk|k−1 wD,k qk|k−1 (ak−1 ),
)
(
(i,j)
(j)
(i)
(i)
(j)
qk|k−1 (ak−1 ) = N mD,k (ak−1 ); mk|k−1 , Pk|k−1 + PD,k ,
and
Jk|k−1 JD,k
vD,k (x; z) =
∑ ∑
i=1
Jk−1
vS,k|k−1 (x) = pS,k
(
)
(i)
(i)
(i)
wk|k−1 N x; mk|k−1 , Pk|k−1 ,
Jk|k−1
In general, posterior intensity function is propagated recursively in time via the Probability Hypothesis Density
(PHD) filter [16]. For the linear Gaussian multi-target system
described in Appendix A, the posterior intensity function is
propagated via the Gaussian Mixture PHD (GM-PHD) filter
[50] as follows1 .
Prediction: If the posterior intensity at time k − 1 is a
Gaussian mixture of the form
i=1
∑
i=1
P OSTERIOR INTENSITY PROPAGATION
(i)
(j)
[
]T
(i)
(j)
(i,j)
(j)
(j)
Pβ,k|−1 = Qβ,k−1 + Fβ,k−1 Pβ,k−1 Fβ,k−1 .
A PPENDIX B
∑
(i)
T
mβ,k|k−1 = Fβ,k−1 mk−1 + dβ,k−1
(20)
The detections corresponding to targets is thus a MultiBernoulli point process [26], [51]–[53]. The sensor also registers a set of spurious measurements (clutter), independent
of the detections, modeled as a Poisson point process with
intensity κk . Thus, at each time step the measurement is a
collection of detections Zk , only some of which are generated
by targets.
vk−1 (x) =
)
(
(i)
(j)
(i,j)
(i,j)
wk−1 wβ,k N x; mβ,k|k−1 , Pβ,k|k−1
(i,j)
wk
(i,j)
wk
j=0
(i,j)
)
(
(i,j)
(i,j)
(z)N x; mk|k (z), Pk|k ,
(i,j)
wk|k−1 (ak−1 )qk
(z) =
(z)
,
Jk|k−1 JD,k
κk (z) +
∑ ∑
i=1
(i,j)
(i,j)
wk|k−1 (ak−1 )qk
(z)
j=0
(
)
(i,j)
(i,j)
(z) = N z; Hk mk|k−1 , Rk + Hk Pk|k−1 HkT ,
[
]
(i,j)
(i)
(i,j)
(j)
(i)
mk|k−1 = mk|k−1 + Kk|k−1 mD,k (ak−1 ) − mk|k−1 ,
[
]
(i,j)
(i,j)
(i)
Pk|k−1 = I − Kk|k−1 Pk|k−1 ,
[
]−1
(i,j)
(i)
(i)
(j)
Kk|k−1 = Pk|k−1 Pk|k−1 + PD,k
,
[
]
(i,j)
(i,j)
(i,j)
(i,j)
mk|k (z) = mk|k−1 + Kk
z − Hk mk|k−1 ,
[
]
(i,j)
(i,j)
(i,j)
Pk|k = I − Kk Hk Pk|k−1 ,
(
)−1
(i,j)
(i,j)
(i,j)
,
Kk = Pk|k−1 HkT Hk Pk|k−1 HkT + Rk
(i,j)
qk
Preprint: IEEE Trans. Information Theory, vol. 61, no. 8, pp. 4475-4485, 2015
(i,0)
(i,0)
(i)
and by convention qk|k−1 = 1, mk|k−1 = mk|k−1 , and
(i,0)
Pk|k−1
=
(i)
Pk|k−1 .
R EFERENCES
[1] H. G. Hoang, B.-N. Vo, B. T. Vo, and R. Mahler, “The Cauchy-Schwarz
divergence for poisson point processes,” in Proc. IEEE Workshop on
Statistical Signal Processing (SSP 2014), June 2014, pp. 240–243.
[2] D. Daley and D. Vere-Jones, An introduction to the theory of point
processes. Springer-Verlag, 1988.
[3] D. Stoyan, D. Kendall, and J. Mecke, Stochastic Geometry and its
Applications. John Wiley & Sons, 1995.
[4] J. Kingman, Poisson Processes. Oxford University Press, 1993.
[5] N. Van Lieshout, Markov Point Processes and Their Applications.
Imperial College Press, 2000.
[6] J. Moller and R. Waagepetersen, Statistical Inference and Simulation for
Spatial Point Processes. Chapman & Hall CRC, 2004.
[7] A. Baddeley, I. Bárány, R. Schneider, and W. Weil, Stochastic Geometry:
Lectures Given at the C.I.M.E. Summer School Held in Martina Franca,
Italy, September 13-18, 2004. Springer, 2007.
[8] F. Baccelli and B. Bllaszczyszyn, “Stochastic geometry and wireless
networks: Volume I Theory,” Foundation and Trends in Networking,
vol. 3, no. 3-4, pp. 249–449, Mar. 2009.
[9] D. Stoyan and A. Penttinen, “Recent applications of point process
methods in forestry statistics,” Statistical Science, vol. 15, no. 1, pp.
61–78, 2000.
[10] Y. Ogata, “Seismicity analysis through point-process modeling: A review,” Pure and applied geophysics, vol. 155, no. 2-4, pp. 471–507,
1999.
[11] V. Marmarelis and T. Berger, “General methodology for nonlinear
modeling of neural systems with Poisson point-process inputs,” Mathematical Biosciences, vol. 196, no. 1, pp. 1 – 13, 2005.
[12] D. L. Snyder, L. J. Thomas, and M. M. Ter-Pogossian, “A mathematical
model for positron-emission tomography systems having time-of-flight
measurements,” IEEE Trans. Nucl. Sci., vol. 28, no. 3, pp. 3575–3583,
1981.
[13] F. Baccelli, M. Klein, M. Lebourges, and S. A. Zuyev, “Stochastic geometry and architecture of communication networks,” Telecommunication
Systems, vol. 7, no. 1-3, pp. 209–227, 1997.
[14] M. Haenggi, “On distances in uniformly random networks,” IEEE Trans.
Inf. Theory, vol. 51, no. 10, pp. 3584–3586, 2005.
[15] M. Haenggi, J. Andrews, F. Baccelli, O. Dousse, and M. Franceschetti,
“Stochastic geometry and random graphs for the analysis and design of
wireless networks,” IEEE J. Sel. Areas Commun., vol. 27, no. 7, pp.
1029–1046, 2009.
[16] R. Mahler, “Multitarget Bayes filtering via first-order multitarget moments,” IEEE Trans. Aerosp. Electron. Syst., vol. 39, no. 4, pp. 1152–
1178, Oct 2003.
[17] S. S. Singh, B.-N. Vo, A. J. Baddeley, and S. A. Zuyev, “Filters for
spatial point processes.” SIAM J. Control and Optimization, vol. 48,
no. 4, pp. 2275–2295, 2009.
[18] F. Caron, P. Del Moral, A. Doucet, and M. Pace, “On the conditional distributions of spatial point processes.” Advances in Applied Probability,
vol. 43, no. 2, pp. 301–307, 2011.
[19] D. R. Cox and V. Isham, Point processes. Chapman & Hall, 1980.
[20] T. M. Cover and J. A. Thomas, Elements of Information Theory. New
York, NY, USA: Wiley-Interscience, 1991.
[21] S. M. Ali and S. D. Silvey, “A General Class of Coefficients of
Divergence of One Distribution from Another,” Journal of the Royal
Statistical Society. Series B (Methodological), vol. 28, no. 1, pp. 131–
142, 1966.
[22] W. Schmaedeke, “Information based sensor management,” in Proc. SPIE
Signal Processing, Sensor Fusion, and Target Recognition II, vol. 155,
1993, pp. 156–164.
[23] R. P. S. Mahler, “Global posterior densities for sensor management,” in
Proc. SPIE, vol. 3365, 1998, pp. 252–263.
[24] S. Singh, N. Kantas, B.-N. Vo, A. Doucet, and R. Evans, “Simulation
based optimal sensor scheduling with application to observer trajectory
planning,” Automatica, vol. 43, no. 5, pp. 817–830, 2007.
[25] A. O. Hero, C. M. Kreucher, and D. Blatt, “Information theoretic
approaches to sensor management,” in Foundations and applications
of sensor management, A. O. Hero, D. A. Castanón, D. Cochran, and
K. Kastella, Eds. Springer, 2008, ch. 3, pp. 33–57.
[26] B. Ristic, B.-N. Vo, and D. Clark, “A note on the reward function for
PHD filters with sensor control,” IEEE Trans. Aerosp. Electron. Syst.,
vol. 47, no. 2, pp. 1521–1529, 2011.
10
[27] H. G. Hoang, “Control of a mobile sensor for multi-target tracking
using Multi-Target/Object Multi-Bernoulli filter,” in Proc. International
Conference on Control, Automation and Information Sciences (ICCAIS
2012), Ho Chi Minh City, Vietnam, 2012, pp. 7–12.
[28] J. D. Victor, “Spike train metrics,” Current Opinion in Neurobiology,
vol. 15, no. 5, pp. 585–592, 2005.
[29] B.-N. Vo, S. Singh, and A. Doucet, “Sequential Monte Carlo methods
for multi-target filtering with random finite sets,” IEEE Trans. Aerosp.
Electron. Syst., vol. 41, no. 4, pp. 1224–1245, 2005.
[30] R. Mahler, Statistical Multisource-Multitarget Information Fusion.
Artech House, 2007.
[31] K. Kampa, E. Hasanbelliu, and J. C. Principe, “Closed-form CauchySchwarz PDF divergence for mixture of Gaussians,” in Proc. International joint conference on Neural Networks (IJCNN 2011). IEEE, 2011,
pp. 2578–2585.
[32] S. Kullback and R. A. Leibler, “On information and sufficiency,” The
Annals of Mathematical Statistics, vol. 22, no. 1, pp. pp. 79–86, 1951.
[33] J. Lin, “Divergence measures based on the Shannon entropy,” IEEE
Trans. Inf. Theory, vol. 37, no. 1, pp. 145–151, 1991.
[34] F. Wang, T. Syeda-Mahmood, B. Vemuri, D. Beymer, and A. Rangarajan,
“Closed-form Jensen-Rényi divergence for mixture of Gaussians and
applications to group-wise shape registration,” in Medical Image Computing and Computer-Assisted Intervention MICCAI 2009. Springer,
2009, vol. 5761, pp. 648–655.
[35] R. Jenssen, D. Erdogmus, K. E. Hild, J. C. Principe, and T. Eltoft,
“Optimizing the Cauchy-Schwarz PDF distance for information theoretic, non-parametric clustering,” in Proc. 5th international conference
on Energy Minimization Methods in Computer Vision and Pattern
Recognition, 2005.
[36] T. Villmann, B. Hammer, F.-M. Schleif, T. Geweniger, T. Fischer, and
M. Cottrell, “Prototype based classification using information theoretic
learning,” in Neural Information Processing, I. King, J. Wang, L.-W.
Chan, and D. Wang, Eds. Springer Berlin Heidelberg, 2006, vol. 4233,
pp. 40–49.
[37] R. Jenssen, J. C. Principe, D. Erdogmus, and T. Eltoft, “The CauchySchwarz divergence and Parzen windowing: Connections to graph theory
and Mercer kernels,” Journal of the Franklin Institute, vol. 343, no. 6,
pp. 614 – 629, 2006.
[38] E. Hasanbelliu, L. Sanchez-Giraldo, and J. Principe, “A robust point
matching algorithm for non-rigid registration using the Cauchy-Schwarz
divergence,” in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2011), 2011, pp. 1–6.
[39] K. DeMars, I. Hussein, M. Jah, and R. Erwin, “The Cauchy-Schwarz
divergence for assessing situational information gain,” in Proc. International Conference on Information Fusion (FUSION2012), 2012, pp.
1126–1133.
[40] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine
Learning. The MIT Press, 2005.
[41] F. Le Gall, “Powers of tensors and fast matrix multiplication,” in Proc.
39th International Symposium on Symbolic and Algebraic Computation,
ser. ISSAC ’14, 2014, pp. 296–303.
[42] J.-H. Lo, “Finite-dimensional sensor orbits and optimal nonlinear filtering,” IEEE Trans. Inf. Theory, vol. 18, no. 5, pp. 583–588, 1972.
[43] C. Ji, D. Merl, T. B. Kepler, and M. West, “Spatial mixture modelling
for unobserved point processes: Examples in immunofluorescence histology,” Bayesian Analysis, vol. 4, pp. 297–316, 2009.
[44] A. Kostas and S. Behseta, “Bayesian nonparametric modeling for
comparison of single-neuron firing intensities,” Biometrics, vol. 66, pp.
277–286, 2010.
[45] M. Taddy, “Autoregressive mixture models for dynamic spatial Poisson
processes: Application to tracking intensity of violent crime,” Journal
of the American Statistical Association, vol. 105, pp. 1403–1417, 2010.
[46] D. Phung and B.-N. Vo, “A random finite set model for data clustering,”
in Proc. International Conference on Information Fusion (FUSION
2014), July 2014, pp. 1–8.
[47] A. Bhattacharyya, “On a measure of divergence between two statistical
populations defined by their probability distribution,” Bulletin of the
Calcutta Mathematical Society, vol. 35, pp. 99–110, 1943.
[48] T. Kailath, “The divergence and Bhattacharyya distance measures in
signal selection,” IEEE Trans. Inf. Theory, vol. 15, no. 1, pp. 52–60,
1967.
[49] I. R. Goodman, R. Mahler, and H. T. Nguyen, Mathematics of Data
Fusion. Kluwer Academic Publishers, 1997.
[50] B.-N. Vo and W.-K. Ma, “The Gaussian mixture probability hypothesis
density filter,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4091–
4104, 2006.
Preprint: IEEE Trans. Information Theory, vol. 61, no. 8, pp. 4475-4485, 2015
[51] R. Mahler, “Multitarget sensor management of dispersed mobile sensors,” in Theory and algorithms for cooperative systems, D. Grundel,
R. Murphey, and P. Pardalos, Eds. World Scientific Books, 2004, ch. 12,
pp. 239–310.
[52] H. G. Hoang and B. T. Vo, “Sensor management for multi-target tracking
via multi-Bernoulli filtering,” Automatica, vol. 50, no. 4, pp. 1135–1142,
2014.
[53] B. Ristic and B.-N. Vo, “Sensor control for multi-object state-space
estimation using random finite sets,” Automatica, vol. 46, no. 11, pp.
1812–1818, 2010.
[54] D. Schumacher, B.-T. Vo, and B.-N. Vo, “A consistent metric for
performance evaluation of multi-object filters,” IEEE Trans. Signal
Process., vol. 56, no. 8, pp. 3447–3457, 2008.
[55] M. Ulmke, O. Erdinc, and P. Willett, “GMTI tracking via the Gaussian
Mixture Cardinalized Probability Hypothesis Density filter,” IEEE Trans.
Aerosp. Electron. Syst., vol. 46, no. 4, pp. 1821–1833, 2010.
11