Sensors 18 04222
Sensors 18 04222
Sensors 18 04222
Article
An Iterative Nonlinear Filter Using Variational
Bayesian Optimization
Yumei Hu 1,2 , Xuezhi Wang 3 , Hua Lan 1,2, *, Zengfu Wang 1,2 , Bill Moran 4 and Quan Pan 1,2
1 School of Automation, Northwestern Polytechnical University, Xi’an 710072, China;
[email protected] (Y.H.); [email protected] (Z.W.); [email protected] (Q.P.)
2 Key Laboratory of Information Fusion Technology, Ministry of Education, Xi’an 710072, China
3 School of Engineering, RMIT University, Melbourne 3000, Australia; [email protected]
4 Department of Electrical and Electronic Engineering, University of Melbourne,
Melbourne, VIC 3010, Australia; [email protected]
* Correspondence: [email protected]; Tel: +86-1539-9035-213
Received: 11 October 2018 ; Accepted: 28 November 2018 ; Published: 1 December 2018
Abstract: We propose an iterative nonlinear estimator based on the technique of variational Bayesian
optimization. The posterior distribution of the underlying system state is approximated by a solvable
variational distribution approached iteratively using evidence lower bound optimization subject
to a minimal weighted Kullback-Leibler divergence, where a penalty factor is considered to adjust
the step size of the iteration. Based on linearization, the iterative nonlinear filter is derived in a
closed-form. The performance of the proposed algorithm is compared with several nonlinear filters
in the literature using simulated target tracking examples.
1. Introduction
Bayesian estimation is widely applied across many areas of engineering such as target tracking,
aerial surveillance, intelligent vehicles, and machine learning [1]. In linear Gaussian systems, the state
estimation can be optimally achieved using a Kalman filter as a closed form solution. However, many
real-world estimation problems are nonlinear, resulting in analytically intractable posterior probability
density function (PDF) for the state. In consequence, suboptimal approximation methods are sought to
solve the nonlinear estimation problems [2].
Many suboptimal techniques have been developed to solve nonlinear estimation problems.
These techniques may be divided into the following three categories. The first category, includes
the extended Kalman filter (EKF) [3], the iterated extended Kalman filter (IEKF) [4,5], and their
variants [6,7], solves the state estimation problem through replacing the nonlinear functions by their
linear approximations via the Taylor expansion. The second category involves stochastic sampling
methods. In the filtering process, a set of randomly sampled points with weights are adopted to
approximate the PDF of the underlying state. For example, the particle filter (PF) [8–10] is a sequential
Monte Carlo (SMC) stochastic sampling method, which approximates the PDF by the sampled particles
from a proposal distribution. PF can be applied to nonlinear non-Gaussian systems. Markov Chain
Monte Carlo (MCMC) is another popular stochastic method since it is able to achieve arbitrarily
high accuracy using a large number of particles, sometimes resulting in prohibitive computational
expenditure. The techniques in the third category use deterministic sampling methods. Nonlinear
state PDFs are approximated by a set of fixed points and weights that represent the location and spread
of the distribution. This category includes the unscented Kalman filter (UKF) [9,11,12], the cubature
Kalman filter (CKF) [13,14], and the central difference Kalman filter (CDKF) [15]. The UKF and the
CDKF approximate the nonlinear state transition function and the measurement function by the
unscented transform (UT) and the Stirling interpolation, respectively. The state and the corresponding
error covariance are then calculated based on the sampling points and the weights. The CKF uses the
third order spherical-radial cubature rule to approximate integrals numerically.
Where nonlinear filters involve optimisation, quasi-Newton approximations are used, along with
Kullback-Leibler (KL) divergence and α-divergence as objective functions. For example, the EKF
update step may use a Hessian correction term, resulting in improved performance [7,16]. Another
approach for estimation, in some ways can be regarded as the fourth category, uses the variational
Bayes (VB) technique. VB has a unified and principled architecture and can be used, as with other
methods mentioned earlier, for problems where analytic solutions cannot be found. The objective
of the VB approximation is to find the variational distribution best able, in the sense of the KL
divergence, to approximate the true PDF [17–19]. Various variational inference methods for parametric
distributions are discussed in [20], where the true PDF of hidden variables is assumed to be given by
a parametric model. Since VB usually runs faster than MCMC [20–22], it is widely applied in areas
such as statistics, finite element analysis, machine learning, etc. An important concern with the VB
inference is the accuracy of the approximation [23]. Paul [24] and Stephane et al. [25] introduced a
proximal point algorithm in the framework of the expectation-maximization (EM) and studied its
convergence. Khan et al. [18,26] adopted KL divergence as a measure of divergence to improve the
accuracy of VB inference by considering the geometry of the true posterior PDF. They applied their
algorithm to parameter estimation, data set classification and regression.
In this paper, we adopt KL divergence as a metric of posterior PDF approximation, and derive
an alternative nonlinear estimation algorithm based on the VB technique. The posterior PDF of the
underlying system state is approximated by a parameterized variational distribution and the difference
between the two distributions is minimized iteratively using a weighted KL divergence criterion. This
is carried out through optimization of the evidence lower bound (ELBO); the intractable integration
of the posterior PDF is converted to a solvable optimization problem. A penalty factor is applied to
ensure that the filter algorithm obtains a good trade-off between accuracy and computational overhead.
Numerical simulations of two typical target tracking scenarios and a benchmark nonlinear filtering
problem are presented. The simulation results show that the approximation of nonlinear stochastic
system state by the proposed algorithm is tighter than EKF and UKF. The computational cost and the
effect of different penalty factors on estimation accuracy are analyzed.
The rest of the paper is organized as follows. In Section 2, we introduce the general nonlinear
Bayesian filtering problem. The formulation of the ELBO is described in Section 3. In Section 4,
we propose the proximal iterative nonlinear filter. In Section 5, we present simulations of two target
tracking scenarios, where the performance of the proposed algorithm is compared with those of EKF
and UKF in terms of estimation accuracy and computational overhead. Lastly, conclusions are drawn
in Section 6.
2. Problem Formulation
Consider a general dynamic system with measurement as follows.
x k = f k ( x k −1 ) + ω k , (1)
zk = hk ( xk ) + vk , (2)
where f k (·) denotes the state transition function, and hk (·) denotes the mapping from the system state
to the measurement; ωk and vk are the process noise and the measurement noise, respectively. We
assume that ωk and vk are Gaussian and mutually independent, ωk ∼ N (0, Qk ) and vk ∼ N (0, Rk ).
Sensors 2018, 18, 4222 3 of 17
Assuming that the posterior PDF p ( xk−1 |zk−1 ) at time k − 1 is available, the PDF of the predicted
state is obtained by the Chapman-Kolmogorov equation.
Z
p ( x k | z k −1 ) = p ( xk | xk−1 ) p ( xk−1 |zk−1 ) dxk−1 . (3)
Then, at time k, the posterior PDF is obtained using the measurement zk by an application of the
Bayes rule:
p ( z k | x k ) p ( x k | z k −1 )
p ( xk |zk ) = R . (4)
p (zk | xk ) p ( xk |zk−1 ) dxk
For linear Gaussian systems, it is well known that the optimal state estimation xk|k and
the corresponding error covariance Pk|k under the criterion of minimum variance estimation are
obtained by: Z
xk|k = xk p ( xk |zk ) dxk , (5)
Z T
Pk|k = xk − xk|k xk − xk|k p ( xk |zk ) dxk . (6)
For nonlinear systems, the integral in Equation (4) is often intractable. Suboptimal approximations
for the posterior PDF are needed. Most existing suboptimal algorithms adopt linearization or sampling
techniques to approximate the posterior PDF p ( xk |zk ). We consider an iterative VB approach, in which
the true PDF is approximated by a variational distribution and is approached by iterative optimization
of the ELBO. The proposed algorithm converts the nontrivial integration to a closed-form optimization
and therefore improves estimation accuracy.
p ( xk , zk )
Z
log p (zk ) = log q ( xk |ψk ) dx
q ( xk |ψk ) k (7)
= L (ψk ) + DKL [ p( xk |zk )kq( xk |ψk )] ,
where DKL [ p( xk |zk )kq( xk |ψk )] is the KL divergence between the true posterior PDF p( xk |zk ) and the
variational distribution q( xk |ψk ); that is,
p ( xk |zk )
Z
DKL [ p( xk |zk )kq( xk |ψk )] = − q ( xk |ψk ) log dx . (8)
q ( xk |ψk ) k
p ( xk , zk )
Z
L (ψk ) = q ( xk |ψk ) log dx
q ( xk |ψk ) k (9)
= Eq [log p (zk | xk )] − DKL [ p( xk )kq( xk |ψk )] .
The variational
distribution q ( xk |ψk ) is assumed Gaussian with unknown parameter ψk =
xk|k , Pk|k (to be estimated), where xk|k is the mean and Pk|k is the covariance.
Please note that the poterior PDF p( xk |zk ) needs to be closely approximated by a known
distribution in nonlinear filtering. From Equation (7), evidently, the variational distribution q( xk |φk )
would be equal to the true posterior PDF p( xk |zk ) if the KL divergence were zero. Minimizing the KL
divergence, and thereby approximating the posterior PDF by a variational distribution, is equivalent
to maximizing the ELBO, i.e.,
Sensors 2018, 18, 4222 4 of 17
According to Equations (9) and (10), the nontrivial integration of Equation (7) is converted to the
problem of maximizing ELBO, which can be solved by the VB method.
where i denotes the iteration index and the penalty factor β i is used to adjust the optimization
step length. Roughly speaking, the ELBO is maximized when the KL divergence between the two
variational distributions q ( xk |ψk ) and q xk |ψki approaches zero. Equation (11) can be rewritten as
i
1 h
ψki+1 i
= arg max L (ψk ) − DKL q ( xk |ψk ) kq xk |ψk . (12)
ψk βi
Please note that one iteration of this proximal method is equivalent to moving a step in the
direction of the natural gradient [18]. The influence of the KL divergence on ψki+1 can be adjusted by
β i . The larger the β i , the weaker the influence of the KL divergence on ψki+1 , and vice versa. In [18],
it is assumed that β i = 1.
Under the Gaussian assumptions for the process noise and the measurement noise, the variational
distribution is of the form q ( xk |ψk ) ∼ N ( xk ; xk|k , Pk|k ). Given the prior of the state xk−1|k−1 at time
k − 1, we can assume p ( xk ) ∼ N xk ; xk|k−1 , Pk|k−1 . Accordingly, the ψki+1 is
n h i
ψki+1 = arg max Eq [log p (zk | xk )] − DKL N ( xk ; xk|k−1 , Pk|k−1 )kN xk ; xk|k , Pk|k
ψk
i (14)
1 h
i
− DKL q ( xk |ψk ) kq xk |ψk .
βi
Sensors 2018, 18, 4222 5 of 17
For the first term in Equation (14), the expectation Eq [log p (zk | xk )] related to xk and
Pk can be
approximated linearly using the gradients with respect to (w. r. t.) xk and Pk . Defining g xk|k , Pk|k ,
Eq [log p (zk | xk )], the gradients of g w. r. t. xk|k and Pk|k are
∇ x g xk|k , Pk|k = −αk , (15)
∇ P g xk|k , Pk|k = −γk . (16)
The expectation Eq [log p (zk | xk )] at time k is then maximized by gradient ascent in the variables
xk|k and Pk|k ; that is,
1
Eq [log p (zk | xk )] ≈ αk xk|k + γk Pk|k , (17)
2
where αik and γki at iteration i are given by
T
αik = − Hki R−
k
1
zk − hk xki |k , (18)
1 i T −1 i
γki = Hk Rk Hk , (19)
2
and Hki is the Jacobian matrix:
∂hk ( x )
Hki = | i . (20)
∂xT x= xk|k
In other words, the coefficients αik and γki are updated by Hki with xki |k in each iterative step.
The detailed calculations of αik and γki are given in Appendix A.
For Gaussian distributions N (ξ 1 ; µ1 , C1 ) and N (ξ 2 ; µ2 , C2 ) with the same dimension d, the KL
divergence is
1h i
DKL [N (ξ 1 ; µ1 , C1 ) kN (ξ 2 ; µ2 , C2 )] = − log C1 C2−1 − tr C1 C2−1 −(µ1 − µ2 )T C2−1 (µ1 − µ2 ) + d . (21)
2
where operators tr(·) and |·| denote the trace and the determinant of a matrix, respectively. By
Equations (17)–(21), Equation (13) can be rewritten as,
−1
1 1
ψki+1 =arg max αik xk|k + γki Pk|k − log Pk|k−1 Pki |k − tr Pk|k−1 Pk−|k1
ψk 2 2
−1 T
−tr Pk|k Pki |k − xk|k−1 − xk|k Pk−|k1 xk|k−1 − xk|k (22)
T −1
i i i
− xk|k − xk|k Pk|k xk|k − xk|k − d .
Then ψki+1 is maximized at a point ( x, P) which can be explicitly calculated. To find them, set the
partial derivatives of ψki+1 w.r.t. { x, P} to be zero, i.e.,
∂ψki+1 ∂ψki+1
| i+1 = 0, | i +1 = 0. (23)
∂x x= xk|k ∂P P= Pk|k
Sensors 2018, 18, 4222 6 of 17
1 i −1 −1
1 i −1 i
βi −1 β i −1
xki+|k1 = P + Pk|k i
Pk|k−1 xk|k−1 − αk + Pk|k xk|k
1 + β i k | k −1 1 + β i 1 + βi 1 + βi
(24)
−1 −1 −1
−1 i −1 i i i
= (1 − bi ) Pk|k−1 + bi Pk|k (1 − bi ) Pk|k−1 xk|k−1 − αk + bi Pk|k xk|k ,
1 i −1 β i −1 −1 −1 −1
−1
Pki+
|k
1
= Pk|k + i
Pk|k−1 + γk i i
= bi Pk|k + (1 − bi ) Pk|k−1 + γk , (25)
1 + βi 1 + βi
1
where bi = 1+ β i . Equations (24) and (25) show that the state estimate and the associated covariance
in the iteration are updated by αik and γki , respectively. As shown in Figure 1, the complete iteration
procedure consists of Equations (18), (19), (24) and (25).
xk |k 1 , Pk |k 1
zk
Update state xk |k and associated
covariance Pk |k by Eqs. (5) and (6)
Initialize iteration
xki |k xk |k and Pki|k Pk |k
Termination N
We note that, in principle, the computational cost in Equations (24) and (25) can be slightly
reduced by using the Matrix Inversion Lemma [27]. As a result, Equations (24) and (25) are derived as
Equations (26) and (27), respectively.
n h i o −1
−1
xki+|k1 = 1 i 1 i
bi Pk |k− bi Pk |k
1 1 i
1−bi Pk |k −1+ bi Pk |k
1 i
bi Pk |k ( 1 − b i) P x
k | k −1 k | k −1
− α i
k + b i P i
k|k
xki |k (26)
Sensors 2018, 18, 4222 7 of 17
1 bi −1 bi −1 −1 1
−1 −1 −1
Pki+
= |k
1
K − K i
Pk|k I4×4 + K i
Pk|k K −1 , (27)
1 − bi k 1 − bi k 1 − bi k 1 − bi k
−1
where Kk−1 = Pk−|k1−1 + γki = Pk−|k1−1 − Pk−|k1−1 γki I4×4 + Pk|k−1 γki Pk|k−1 .
For special case when β i = 1, Equations (24) and (25) can be rewritten as
−1 −1 −1
where c g = Pk−|k1−1 + Pki |k , c1 = c g Pk−|k1−1 and c2 = c g Pki |k−1 .
The flow diagram of the proposed PEKF-VB algorithm is shown in Figure 1 and the detailed
implementation of PEKF-VB is given in Algorithm 1.
∂ f (x)
k
where Fk = ∂x | x = x k −1| k −1 .
3: Compute the filtering grain Kk
∂h ( x )
where Hk = ∂x k
| x = x k | k −1 .
4: Update the state estimation xk|k and the associated error covariance Pk|k
xk∗|k = xk|k−1 − Kk zk − hk ( xk|k−1 ) , Pk∗|k = Pk|k−1 − Kk Hk Pk|k−1 .
4.3. Remarks
1. The VB method approximates the true posterior PDF by choosing from a parameterized
variational distribution. In each iteration of the PEKF-VB, the ELBO (9) increases. It follows
that the ELBO is a proper criterion for measuring the performance of variational optimization.
The ELBO of the proposed nonlinear filter is
(2π ) Dz | Rk || Pk|k−1 | 1
T
1
L (ψk ) = − log − tr Pk−|k1−1 xk|k − xk|k−1 xk|k − xk|k−1
2 | Pk|k | 2
1 T (30)
1 −1 Dx
− tr Pk|k−1 Pk|k − tr R− k
1
z k − Hk x k | k z k − Hk x k | k + Hk Pk | k Hk
T
+ ,
2 2 2
Sensors 2018, 18, 4222 8 of 17
where Dx and Dz denote the dimension of the state and the dimension of the measurement,
respectively. The derivation of the ELBO is given in Appendix B.
2. Apart from the KL divergence, we can use Calvo and Oller’s distance (COD) as the penalty
function in Equation (13); the corresponding filter is denoted by CODEKF. The COD of two
distributions f (ξ 1 ) = p (ξ 1 | µ1 , C1 ) and f (ξ 2 ) = p (ξ 2 | µ2 , C2 ) is [28],
" #1/2
n +1
d(ξ 1 , ξ 2 ) = 1/2 ∑ ln 2
λi , (31)
i =1
where n is the dimension of ξ i , i ∈ {1, 2}, λi are the eigenvalues of f (∆µ̄, D ) with µ̄ = T T C1−1/2 ∆µ,
D = T T C1−1/2 C2 C1−1/2 T which is diagonal, and ∆µ = µ2 − µ1 , TT T = I. We replace the KLD in
Equation (13) with Equation (31) as follows.
1 1
ψki+1 =arg max (αik xk|k + γki Pk|k ) − DKL [ p( xk )kq( xk |ψk )] − d(ξ 1 , ξ 2 ) . (32)
ψk 2 βi
3. Since both PEKF-VB and CODEKF involve iterations within the VB framework to minimize the
divergence between the posterior PDF and variational distribution, their complexity is increased
by the calculation of the Jacobian in each iteration.
4. In PEKF-VB, we use KL divergence to measure the similarity between two distributions. Under
Gaussian assumptions for the distributions, a closed-form solution of the variational distribution
has been derived. However, the VB framework with the KL divergence can also apply to
non-Gaussian distributions. If no closed-form exists, a Monte Carlo method can be used to
approximate the divergence. Other measures of dissimilarity between probability distributions,
such as the alpha-divergence, the Rényi-divergence and the alpha-beta divergence, can also
be used in the VB framework. See [29] and references therein. Unfortunately, in general,
no computationally tractable form of the variational distribution can be derived and a Monte
Carlo method has to be employed.
5. Numerical Simulations
In this section, we present two nonlinear estimation examples of 2D target tracking and a
benchmark nonlinear filtering problem to illustrate the performance of PEKF-VB and CODEKF.
We compare them with EKF and UKF. The performance is measured by root-mean-squared
error (RMSE) of the estimates and the computational overhead.
Example 1: Range-bearing tracking. In this scenario, the underlying target motion is described
by a constant turn (CT) model, with the state vector consists of 2D position and velocity components.
As shown in Figure 2a, the target moves with initial state x0 = (565.7, 29.99, 1166, −0.62)T along
a circular trajectory, and is observed by a range-bearing sensor. The state transition matrix Fk in
Equation (1) and the measurement function hk ( xk ) in Equation (2) are:
1 sin θ̇T /θ̇ 0 cos θ̇T − 1 /θ̇
0 cos θ̇T 0 − sin θ̇T
Fk = , (33)
0 1 − cos θ̇T /θ̇ 1 sin θ̇T /θ̇
0 sin vθ̇T 0 cos θ̇T
" p #
( x 2 + y2 )
hk|k ( xk ) = , (34)
artan (y/x )
where the turn rate θ̇ = −0.0333 rad/s, the covariances of the zero mean Gaussian white noises
ωk and υk are Qk = 1.5 GqGT and Rk = diag r2 , φ2 , respectively, where q = I2×2 , and G =
Sensors 2018, 18, 4222 9 of 17
" #T
T2 /2 T 0 0
, r = 35, and φ = 0.5π/180. At each run, the track is initialised using
0 0 T2 /2 T
the two-point method [1] with initial error covariance P0 = diag([600, 100, 600, 100]). We let β i = 1,
and T = 1 s. The target moves for 80 scans (periods). 1000 Monte Carlo runs are carried out.
The RMSE plots of EKF, UKF, CODEKF and PEKF-VB are showed in Figure 2b. The black, red,
green and blue curves are obtained by the PEKF-VB CODEKF, UKF and EKF, respectively. It is seen
that in terms of RMSE performance, PEKF-VB is slightly better than CODEF; both PEKF-VB and
CODEF are better than EKF and UKF. Table 1 provides the quantitative comparison of RMSE and
execution time of EKE, UKF and CODEKF, PEKF-VB. Figure 3b gives the relationship between iteration
index and the running time of PEKF-VB.
Table 1. The quantitative comparison of the mean RMSE (from 20 to 80 scan) and the running time.
3500 50
Ground truth EKF
Measurements 45 UKF
3000
PEKF-VB CODEKF
18 PEKF-VB
40
RMSE in position
2500
35
y (m)
17
2000
30
1500 16
25 42 44 46 48
1000 20
500 15
400 600 800 1000 1200 1400 1600 0 10 20 30 40 50 60 70 80
x (m) Scan
(a) (b)
Figure 2. Tracking with a CT model (a) Trajectory, measurements and estimation (b) RMSE obtained
by EKF, UKF, CODEKF and PEKF-VB.
×10-4
50 2 19
1 Time
45 2 RMSE
16.8 3
4 1.5 18.9
40 16.6
RMSE in position
5
Runing time
16.4 6
35
RMSE
16.2 1 18.8
30 16
15.8
25
63 63.5 64 64.5 65 0.5 18.7
20
15 0 18.6
0 10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10
Scan The number of iterations
(a) (b)
Figure 3. RMSE and running time of PEKF-VB (a) RMSE with different number of iterations (b) The
relationship between iteration and mean RMSE, running time.
In Figure 3a we compare the RMSE performance of PEKF-VB by varying the numbers of iterations.
Notice that the RMSE decreases with the increasing of the number of iterations. Figure 3b gives
Sensors 2018, 18, 4222 10 of 17
the computational overhead of PEKF-VB w.r.t the number of iterations. Both results agree with
our intuition.
To illustrate the convergence of PEKF-VB, we present the ELBO for different numbers of iterations
in Figure 4. Figure 4a illustrates the ELBO at the second scan, and Figure 4b shows the ELBO for all
scans. Clearly, the ELBO increases with the number of iterations increases, showing that the iteration
procedure in PEKF-VB converges.
0 0.2
-0.1 0
-0.2 -0.2
ELBO
ELBO
-0.3 -0.4
-0.4 -0.6
-0.5 -0.8
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Iteration Iteration
(a) (b)
Figure 4. The curve in (a) is the ELBO of the second scan. The curves in (b) are the ELBO of all scans.
Example 2: Bearing-only tracking. In this scenario, a single target tracking using measurements
from a single bearing-only sensor is considered. While the sensor (ownership) measurement model
is nonlinear to the target state, the sensor has to maneuver relative to the target in order to observe
it [30,31]. Let xk = [ xk , yk , ẋk , ẏk ] be the state of the target at time k, where ( xk , yk ) and ( ẋk , ẏk ) are the
position and velocity, respectively. The target, with an initial range of 5 km (relative to the sensor) and
initial course of 220◦ in a clockwise direction (Set the positive axis of y is 0◦ ), is modeled by a constant
velocity model in Equation (35).
1 0 T 0
0 1 0 T
Fk = , (35)
0 0 1 0
0 0 0 1
where T = 1min. The speed of the target is 0.1235 km/min. The sensor starts moving with a fixed
speed of 0.1543 km/min and an initial course of 140◦ in a clockwise direction (Set the positive axis of y
is 0◦ ). Please note that for the bearing-only tracking problem, to be able to estimate the range of the
target, the sensor has to maneuver. Here we assume the sensor maneuvers from scan 14 to scan 17, and
then moves with constant velocity from scan 18 to scan 40. The motion model of the sensor is given by
Equation (36), where the turn rate θ̇ = 30◦ /min. Both the target and the sensor move for 40 scans and
their trajectories are shown in Figure 5a.
1 0 T 0
0
1 0 T
Sk = , 0 < k < 14, 18 ≤ k ≤ 40,
0 0 1 0
0 0 0 1
(36)
1 0 sin θ̇T /θ̇ cos θ̇T − 1 /θ̇
0 1 1 − cos θ̇T /θ̇ sin θ̇T /θ̇
Sk = , 14 ≤ k < 18.
0 0 cos θ̇T − sin θ̇T
0 0 sin θ̇T cos θ̇T
Sensors 2018, 18, 4222 11 of 17
xk0 − xs / y0k − ys ,
hk ( xk ) = atan (37)
where xk0 and y0k are the position of the target in Cartesian coordinate system, xs and ys are the position
of the sensor. The standard deviation of measurement noise is 1◦ . The initial position of the target is
randomly sampled at range r = 13 km with covariance P0
P0 = ABA0 , (38)
where
cos θ̇ sin θ̇ 0 0 r2 σm
2 0 0 0
− sin θ̇ cos θ̇ 0 0
0
∆r2 0 o
A= , B = . (39)
0 0 1 0 0 0 ∆v2 0
0 0 0 1 0 0 0 ∆v2
where σm = π/180 rad, ∆r = 2 km and ∆v = 61.7 m/min. We let β i = 1. Figure 5b shows the
estimated target trajectories obtained by the EKF, UKF, CODEKF and PEKF-VB for a single run.
3 3
Trajectory of sensor
Ground truth of target 2
2
1
1
Target
0
y (km)
y (km)
0
Sensor -1
-1 Ground truth
-2 Estimation of EKF
Estimation of UKF
-2 Estimation of CODEKF
-3
Estimation of PEKF-VB
-3 -4
0 1 2 3 4 5 0 2 4 6 8 10 12 14
x (km) x (km)
(a) (b)
Figure 5. Bearing-only tracking. (a) Tracking scenario (b) Estimation obtained by EKF, UKF, CODEKF
and PEKF-VB.
Based on 1000 Monte Carlo simulations, the RMSE comparison of EKF, UKF, CODEKF and
PEKF-VB is illustrated in Figure 6, where the number of iterations for PEKF-VB is 5.
• As we expected, with a fixed sensor trajectory, PEKF-VB has the best target observability from
sensor measurements and leads to a better RMSE performance than EKF and UKF. Under the VB
framework, the variational distribution approaches the real posterior PDF through the iteration of
the proximal filter.
• The RMSE performance of CODEKF is also better than those of EKF and UKF because, for
CODEKF, the Jacobian matrix of Equation (37) in each iteration is updated to minimize the COD.
However, the RMSE performance of CODEKF is slightly worse than that of PEKF-VB.
• In the first few scans, the performance of the four filters are comparable. This is because, in this
bearing-only tracking problem, the accumulative measurements in these scans do not provide
enough information to the four filters. The performance of CODEKF and of PEKF-VB suffers
when measurement data is very limited. As more measurements accumulate both CODEKF and
PEFK-VB extract more information via the iteration process, resulting in superior performance.
Sensors 2018, 18, 4222 12 of 17
9 0.18
EKF EKF
8 UKF 0.16 UKF
CODEKF CODEKF
PEKF-VB PEKF-VB
7 0.14
RMSE in position (km)
5 0.1
4 0.08
3 0.06
2 0.04
1 0.02
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
Scan Scan
(a) (b)
Figure 6. RMSE comparison of EKF, UKF, CODEKF and PEKF-VB in bearing-only tracking. (a) RMSE
in position (b) RMSE in velocity.
Figure 7 shows the metric values versus the number of iterations for PEKF-VB and CODEKF.
The KLD curve is close to zero after the fifth iteration. The enlarged plots marked from the sixth to the
tenth iteration shows that PEKF-VB converge faster than CODEKF.
×10-5
1 3.5
KLD COD
3
0.8
0.3
×10-8 2.5
1
0.6 0.2
2
COD
KLD
0.5 1.5
0.4 0.1
1
0.2 0 6 7 8 9 10
6 7 8 9 10 0.5
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Iteration Iteration
(a) (b)
Figure 7. Value of metrics versus the number of iterations for PEKF-VB and CODEKF. (a) KL divergence
in PEKF-VB, (b) COD in CODEKF.
Example 3: A Strongly nonlinear filtering problem. For further verifying our proposed method,
we run EKF, UKF, CODEDK and PEKF-VB on the benchmark nonlinear problem in [32–34]
x k −1 25xk−1
xk = + + 8cos(1.2k) + νk−1 , (40)
2 1 + xk2−1
xk 2
zk = + ωk , (41)
20
where νk−1 and ωk are zero mean Gaussian noise with variances Qk−1 and Rk , respectively. We let
Qk = 0.0001, Rk = 1 and scan period T = 1. The simulation results are given in Figure 8, from which
we can see that CODEKF and PEKF-VB have very similar results and outperform EKF and UKF. This is
because that local linearization adopted by EKF-based filter are not a sufficient description of the
nonlinear nature of this example [34], while the VB iteration can make use of measurement information
as much as possible.
Sensors 2018, 18, 4222 13 of 17
20 25
EKF UKF CODEKF PEKF-VB EKF
15 UKF
CODEKF
20
PEKF-VB
10
5
15
Position
RMSE
0
10
-5
-10
5
-15
Ground truth Measurement
-20 0
0 10 20 30 40 50 0 10 20 30 40 50
Scan Scan
(a) (b)
Figure 8. Comparison of EKF, UKF, CODEKF and PEKF-VB (a) Filtering results (b) RMSE.
6. Conclusions
We have developed a proximal iterative nonlinear filter, in which the expectation of the posterior
PDF is approximated by a parameterised variational distribution that is iteratively optimized in the
VB framework. A weighted KL divergence is adopted as a proximal term in the iteration to ensure
the convergence can be achieved with a tight bound. The simulation results show that the proposed
algorithm is better than several existing algorithms in terms of estimation accuracy at the cost of
increased computational burden.
Author Contributions: Y.H. performed this research and Q.P. is the advisor. Conceptualization, H.L.;
methodology, Y.M., X.W., H.L.; software, Y.H. and X.W.; writing-original draft preparation, Y.H.; writing-review
and editing, Z.W., B.M.; supervision, H.L., Z.W., Q.P.; funding acquisition, Q.P.
Funding: This work was supported by Excellent Chinese and Foreign Youth Exchange Programme of China
Association for Science and Technology (No. 2017CASTQNJL046) and National Natural Science Foundation of
China (No. 61790552, 61501378, 61503305, 61873211).
Conflicts of Interest: The authors declare no conflict of interest.
The parameters αk in Equation (18) and γki in Equation (19) are derived according to Equation (A2)
and Equation (A3), respectively.
h i
αik = −∇ x Eq log p zk | xki |k
(A2)
" !#
1 1 T
i − 1 i
= −∇ x Eq log p exp − zk − h xk|k Rk zk − h xk|k .
(2π )1/Dz | Rk | 2
h i
γki = −∇ P Eq log p zk | xki |k
(A3)
" !#
1 1 T
i − 1 i
= −∇ P Eq log p exp − zk − h xk|k Rk zk − h xk|k .
(2π )1/Dz | Rk | 2
Sensors 2018, 18, 4222 14 of 17
By Bonnet’s theorem [35], the gradient of the expectation of f (ξ ) under a Gaussian distribution
N (ξ |µ, C ) w.r.t. the mean µ is the expectation of the gradient of f (ξ )
∇µ EN (ξ |µ,C) [ f (ξ )] = EN (ξ |µ,C) ∇ξ f (ξ ) . (A4)
∂hk ( x )
where the Jacobian matrix Hki = |
∂xT x = xki |k
.
According to Price’s theorem [36], the gradient of the expectation of f (ξ ) under a Gaussian
distribution N (ξ |µ, C ) w.r.t. the covariance C is the expectation of the Hessian of f (ξ ):
1
∇c EN (ξ |µ,C) [ f (ξ )] = EN (ξ |µ,C) ∇ξ, ξ f (ξ ) . (A6)
2
Similarly,
1 T
−1
γki = − Eq ∇ x,x i
zk − h xk|k Rk zk − h xk|ki
2
1 T T
i −1 −1 i (A7)
= − Eq ∇ x Hk Rk + Rk zk − h xk|k
2
1 i T −1 i
= Hk Rk Hk .
2
Since both system noise and measurement noise are Gaussian, the likelihood function p (yk | xk ),
the PDF p ( xk ) and the PDF q ( xk |ψk ) below are Gaussian
1 1
p(zk | xk ) = p exp − (zk − h ( xk ))T Rk −1 (zk − h ( xk )) , (A9)
(2π ) Dz | Rk | 2
1 1 T −1
p( xk ) = p exp − ( xk − f k ( xk−1 )) Qk ( xk − f k ( xk−1 )) , (A10)
(2π ) Dx | Qk | 2
T
1 1
q ( xk |ψk ) = q exp − xk − xk|k Pk−|k1 xk − xk|k . (A11)
(2π ) Dx | P | 2
k|k
Assume that the state transition matrix and the measurement matrix are obtained by linearization,
p(zk | xk ) and p( xk ) can be written as,
1 1
p(zk | xk ) ≈ p exp − (zk − Hk xk )T Rk −1 (zk − Hk xk ) , (A12)
(2π ) Dz | Rk | 2
1 1 T −1
p( xk ) ≈ p exp − ( xk − Fk−1 xk−1 ) Qk ( xk − Fk−1 xk−1 ) . (A13)
(2π ) Dx | Qk | 2
∂ f (x) ∂h( x )
where Fk−1 = ∂x | x = xk−1|k−1 and Hk = ∂x | x = xk|k .
Sensors 2018, 18, 4222 15 of 17
Assume that the state estimation is unbiased; that is, Eq [ xk−1 ] = xk−1|k−1 , Equation (A10) can be
written approximately as
1
p ( xk ) = q
(2π ) Dx | Fk−1 Pk−1|k−1 FkT−1 + Qk−1 |
T −1
1
× exp − xk − Fk−1 xk−1|k−1 Fk−1 Pk−1|k−1 FkT−1 + Qk−1 xk − Fk−1 xk−1|k−1 (A14)
2
T
1 1
= q exp − xk − xk|k−1 Pk−|k1−1 xk − xk|k−1 .
(2π ) Dx | Pk|k−1 | 2
Since estimation is unbiased, the ground truth xk at time k can be expressed as,
where Eq [ x̃k|k ] = 0. The first term Eq [log p(zk | xk )] in Equation (A8) becomes
1
Eq [log p(zk | xk )] ≈ −log (2π ) Dz | Rk |
2
1 T
−1
− tr Rk Eq zk − Hk xk|k + x̃k|k zk − Hk xk|k + x̃k|k
2
(A16)
1
= − log (2π ) Dz | Rk |
2
1 T
− tr R− k
1
z k − H x
k k|k z k − H x
k k|k + H P H
k k|k k
T
.
2
1 1 T
Dx −1
Eq [log p( xk )] ≈ − log (2π ) | Pk|k−1 | − tr Pk|k−1 Eq xk − xk|k−1 xk − xk|k−1
2 2
1 1 T
Dx −1
= − log (2π ) | Pk|k−1 | − tr Pk|k−1 Eq xk|k + x̃k xk|k + x̃k
2 2
T
T T
− xk|k + x̃k xk|k−1 − xk|k−1 xk|k + x̃k + xk|k−1 xk|k−1
1 1
(A17)
=− log (2π ) Dx | Pk|k−1 | − tr Pk−|k1−1 Pk|k
2 2
1 −1 h i
− tr Pk|k−1 Eq xk|k xkT|k − xk|k xkT|k−1 − xk|k−1 xkT|k + xk|k−1 xkT|k−1
2
1 1
= − log (2π ) Dx | Pk|k−1 | − tr Pk−|k1−1 Pk|k
2 2
1 T
−1
− tr Pk|k−1 x k | k − x k | k −1 x k | k − x k | k −1 .
2
(2π ) Dz | Rk || Pk|k−1 | 1
T
1 −1
L (ψk ) = − log − tr Pk|k−1 xk|k − xk|k−1 xk|k − xk|k−1
2 | Pk|k | 2
T (A19)
1 1 Dx
− tr Pk−|k1−1 Pk|k − tr R− k
1
z k − Hk x k|k z k − Hk x k|k + Hk P H
k|k k
T
+ .
2 2 2
References
1. Barshalom, Y.; Li, X.R. Estimation and Tracking: Principles, Techniques, and Software; Artech House: Norhood,
MA, USA, 1996.
2. Van Trees, H.L. Detection, Estimation, and Modulation Theory Part 1; John Wiley &sons, Inc.: New York City,
NY, USA, 2003.
3. Farina, A.; Ristic, B.; Benvenuti, D. Tracking a ballistic target: Comparison of several nonlinear filters.
IEEE Trans. Aerosp. Electron. Syst. 2002, 38, 854–867.
4. Hu, J.; Wang, Z.; Gao, H.; Stergioulas, L.K. Extended Kalman filtering with stochastic nonlinearities and
multiple missing measurements. Automatica 2012, 48, 2007–2015.
5. Hu, X.; Bao, M.; Zhang, X.; Guan, L.; Hu, Y. Generalized iterated Kalman filter and its performance
evaluation. IEEE Trans. Signal Process. 2015, 63, 3204–3217.
6. Ángel F., G.F.; Svensson, L.; Morelande, M.R.; Särkkä, S. Posterior linearization filter: Principles and
implementation using sigma points. IEEE Trans. Signal Process. 2015, 63, 5561–5573.
7. Tronarp, F.; Garcia-Fernandez, F.; Särkkä, S. Iterative filtering and smoothing in non-linear and non-Gaussian
systems using conditional moments. IEEE Signal Process. Lett. 2018, 25, 408–412.
8. Khan, Z.; Balch, T.; Dellaert, F. MCMC-based particle filtering for tracking a variable number of interacting
targets. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1805–1819.
9. Merwe, R.V.D.; Doucet, A.; Freitas, N.D.; Wan, E. The unscented particle filter. In Proceedings of the
International Conference on Neural Information Processing Systems, USA, CO, Denver, 4–6 December 2000;
pp. 563–569.
10. Cappe, O.; Godsill, S.J.; Moulines, E. An overview of existing methods and recent advances in sequential
Monte Carlo. Proc. IEEE 2007, 95, 899–924.
11. Julier, S.; Uhlmann, J.; Durrantwhyte, H.F. A new method for nonlinear transformation of means and
covariances in filters and estimates. IEEE Trans. Autom. Control 2000, 45, 477–482.
12. Gustafsson, F.; Hendeby, G. Some relations between extended and unscented Kalman filters. IEEE Trans.
Signal Process. 2012, 60, 545–555.
13. Arasaratnam, I.; Haykin, S. Cubature Kalman filters. IEEE Trans. Autom. Control 2009, 54, 1254–1269.
14. Arasaratnam, I.; Haykin, S.; Hurd, T.R. Cubature Kalman filtering for continuous-discrete systems: Theory
and simulations. IEEE Trans. Signal Process. 2010, 58, 4977–4993.
15. Merwe, R.V.D.; Wan, E.A. Sigma-point Kalman filters for integrated navigation. In Proceedings of the 60th
Annual Meeting of the Institute of Navigation, Dayton, OH, USA, 7–9 June 2004; pp. 641–654.
16. Gultekin, S.; Paisley, J. Nonlinear Kalman filtering with divergence minimization. IEEE Trans. Signal Process.
2017, 65, 6319–6331.
17. Beal, M.J. Variational Algorithms for Approximate Bayesian Inference. Ph.D. Thesis, Cambridge University,
Cambridge, UK, 2003.
18. Khan, M.E.; Baqué, P.; Fleuret, F.; Fua, P. Kullback-Leibler proximal variational inference. In Proceedings
of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015;
pp. 3402–3410.
19. Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc.
2017, 112, 1–32.
20. Bishop, C.M. Pattern Pecognition and Machine Learning; Springer: New York, NY, USA, 2006.
21. Salimans, T.; Kingma, D.; Welling, M. Markov chain Monte Carlo and variational inference: Bridging the
gap. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015;
pp. 1218–1226.
Sensors 2018, 18, 4222 17 of 17
22. Mnih, A.; Rezende, D. Variational inference for Monte Carlo objectives. In Proceedings of the International
Conference on Machine Learning, New York City, NY, USA, 19–24 June 2016; pp. 2188–2196.
23. Sain, R.; Mittal, V.; Gupta, V. A comprehensive review on recent advances in variational Bayesian inference.
In Proceedings of the International Conference on Advances in Computer Engineering and Applications,
India, Ghaziabad, 19–20 March 2015; pp. 488–492.
24. Tseng, P. An analysis of the EM algorithm and entropy-like proximal point methods. Math. Oper. Res. 2004,
29, 27–44.
25. Chrétien, S.; Hero, A.O. Kullback proximal algorithms for maximum-likelihood estimation. IEEE Trans.
Inf. Theory 2000, 46, 1800–1810.
26. Khan, M.E.; Babanezhad, R.; Lin, W.; Schmidt, M.; Sugiyama, M. Convergence of proximal-gradient
stochastic variational inference under non-decreasing step-size sequence. J. Comp. Neurol. 2015, 319, 359–386.
27. Petersen, K.B.; Pedersen, M.S. The Matrix Cookbook. Technical University of Denmark: Denmark,
Copenhagen, 2012.
28. Calvo, M.; Oller, J.M. A distance between multivariate normal distributions based in an embedding into the
Siegel group. J. Multivar. Anal. 1990, 35, 223–242.
29. Regli, J.B.; Silva, R. Alpha-Beta Divergence For Variational Inference. arXiv 2018, arXiv:1805.01045.
30. Arulampalam, M.S.; Ristic, B.; Gordon, N.; Mansell, T. Bearings-only tracking of manoeuvring targets using
particle filters. EURASIP J. Adv. Signal Process. 2004, 2004, 1–15.
31. Wang, X.; Morelande, M.; Moran, B. Target motion analysis using single sensor bearings-only
measurements. In Proceedings of the International Congress on Image and Signal Processing, Tianjin,
China, 17–19 October 2009; pp. 2094–2099.
32. Gordon, N.J.; Salmond, D.J.; Smith, A.F.M. Novel approach to nonlinear/non-Gaussian Bayesian state
estimation. IEE Proc. F-Radar Signal Process. 1993, 140, 107–113.
33. Gustafsson, F. Particle filter theory and practice with positioning applications. IEEE Trans. Aerosp. Electron.
Syst. Mag. 2010, 25, 53–82.
34. Arulampalam, S.; Maskell, S.; Gordon, N.; Tim, C. A tutorial on particle filters for online
nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 2002, 50, 174–188.
35. Rezende, D.J.; Mohamed, S.; Wierstra, D. Stochastic backpropagation and approximate inference in deep
generative models. In Proceedings of the Machine Learning Research, China, Beijing, 21–26 June 2014;
pp. 1278–1286.
36. Price, R. A useful theorem for nonlinear devices having Gaussian inputs. IEEE Trans. Inf. Theory 1958, 4,
69–72.
c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).