Smoothing, Filtering and Prediction:
Estimating the Past, Present and Future
Second Edition
Garry A. Einicke
Prime Publishing – Amazon.com
Second Edition published February 2019
Copyright © 2019 Garry A. Einicke
All rights reserved. No part of this publication may be reproduced or used in any
manner without acknowledgment and attribution of the author.
ISBN 978-0-6485115-1-9 Kindle format
ISBN 978-0-6485115-0-2 Printable pdf format
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
6.
Continuous-time Smoothing
6.1
Introduction
137
The previously-described minimum-mean-square-error and minimum-variance
filtering solutions operate on measurements up to the current time. If some
processing delay can be tolerated then improved estimation performance can be
realised through the use of smoothers. There are three state-space smoothing
technique categories, namely, fixed-point, fixed-lag and fixed-interval smoothing.
Fixed-point smoothing refers to estimating some linear combination of states at a
previous instant in time. In the case of fixed-lag smoothing, a fixed time delay is
assumed between the measurement and on-line estimation processes. Fixedinterval smoothing is for retrospective data analysis, where measurements
recorded over an interval are used to obtain the improved estimates. Compared to
filtering, smoothing has a higher implementation cost, as it has increased memory
and calculation requirements.
A large number of smoothing solutions have been reported since Wiener’s and
Kalman’s development of the optimal filtering results – see the early surveys [1] –
[2]. The minimum-variance fixed-point and fixed-lag smoother solutions are well
known. Two fixed-interval smoother solutions, namely the maximum-likelihood
smoother developed by Rauch, Tung and Striebel [3], and the two-filter FraserPotter formula [4], have been in widespread use since the 1960s. However, the
minimum-variance fixed-interval smoother is not well known. This smoother is
simply a time-varying state-space generalisation of the optimal Wiener solution. It
differs from the Rauch-Tung-Striebel and Fraser-Potter solutions, which may not
sit well with more orthodox practitioners.
The main approaches for continuous-time fixed-point, fixed-lag and fixed-interval
smoothing are canvassed here. It is assumed throughout that the underlying noise
processes are zero mean and uncorrelated. Nonzero means and correlated
processes can be handled using the approaches of Chapters 3 and 4. It is also
assumed here that the noise statistics and state-space model parameters are known
precisely. Note that techniques for estimating parameters and accommodating
uncertainty are addressed subsequently.
“Life has got a habit of not standing hitched. You got to ride it like you find it. You got to change with
it. If a day goes by that don’t change some of your old notions for new ones,that is just about like
trying to milk a dead cow.” Woodrow Wilson Guthrie
Chapter 6 Continuous-Time Smoothing
138
Some prerequisite concepts, namely time-varying adjoint systems, backwards
differential equations, Riccati equation comparison and the continuous-time
maximum-likelihood method are covered in Section 6.2. Section 6.3 outlines a
derivation of the fixed-point smoother by Meditch [5]. The fixed-lag smoother
reported by Sage et al [6] and Moore [7], is the subject of Section 4. Section 5
deals with the Rauch-Tung-Striebel [3], Fraser-Potter [4] and minimum-variance
fixed-interval smoother solutions [8] - [10]. As before, the approach here is to
accompany the developments, where appropriate, with proofs about performance
being attained. Smoothing is not a panacea for all ills. If the measurement noise is
negligible then smoothing (and filtering) may be superfluous. Conversely, if
measurement noise obliterates the signals then data recovery may not be possible.
Therefore, estimator performance is often discussed in terms of the prevailing
signal-to-noise ratio.
6.2
Prerequisites
6.2.1 Time-varying Adjoint Systems
Since fixed-interval smoothers employ backward processes, it is pertinent to
introduce the adjoint of a time-varying continuous-time system. Let denote a
linear time-varying system
x (t ) A(t ) x(t ) B(t ) w(t ) ,
y (t ) C (t ) x(t ) D(t ) w(t ) ,
(1)
(2)
operating on the interval [0, T]. Let w denote the set of w(t) over all time t, that is,
w = {w(t), t [0, T]}. Similarly, let y = w denote {y(t), t [0, T]}. The adjoint
of , denoted by H , is the unique linear system satisfying
<y, w> =< H y, w>
(3)
for all y q and w p .
Lemma 1: The adjoint
having the realisation
H
of the system described by (1) – (2), with x(t0) = 0,
(t ) AT (t ) (t ) C T (t )u (t ) ,
z (t ) B (t ) (t ) D (t )u (t ) ,
T
T
(4)
(5)
with (T ) 0 , satisfies (3).
“The simple faith in progress is not a conviction belonging to strength, but one belonging to
acquiescence and hence to weakness.” Norbert Wiener
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
139
The proof follows mutatis mutandis from that of Lemma 1 of Chapter 3 and is set
out in [11]. The original system (1) – (2) needs to be integrated forwards in time,
whereas the adjoint system (4) – (5) needs to be integrated backwards in time.
Some important properties of backward systems are discussed in the next section.
The simplification D(t) = 0 is assumed below unless stated otherwise.
6.2.2 Backwards Differential Equations
The adjoint state evolution (4) is rewritten as
(t ) AT (t ) (t ) C T (t )u (t ) .
(6)
The negative sign of the derivative within (6) indicates that this differential
equation proceeds backwards in time. The corresponding state transition matrix is
defined below.
Lemma 2: The differential equation (6) has the solution
t
(t ) H (t , t0 ) (t0 ) H ( s, t )C T ( s)u ( s )ds ,
t0
(7)
where the adjoint state transition matrix, H (t , t0 ) , satisfies
H
H (t , t ) d (t , t0 ) AT (t ) H (t , t ) ,
0
0
dt
(8)
with boundary condition
H (t , t ) = I.
(9)
Proof: Following the proof of Lemma 1 of Chapter 3, by differentiating (7) and
substituting (4) – (5), it is easily verified that (7) is a solution of (6).
The Lyapunov equation corresponding to (6) is described next because it is
required in the development of backwards Riccati equations.
Lemma 3: In respect of the backwards differential equation (6), assume that u(t)
is a zero-mean white process with E{u(t)uT(τ)} = U(t)δ(t – τ) that is uncorrelated
with (t0 ) , namely, E{u (t ) T (t0 )} = 0. Then the covariances P(t, τ) =
dE{ (t ) T (t )}
E{ (t ) T (t )} and P (t , ) =
satisfy the Lyapunov differential
dt
equation
“Progress always involves risk; you can’t steal second base and keep your foot on first base.”
Frederick James Wilcox
Chapter 6 Continuous-Time Smoothing
140
P (t , ) A(t )T P (t , ) P(t , ) A(t ) C T (t )U (t )C (t ) .
(10)
Proof: The backwards Lyapunov differential equation (10) can be obtained by
dE{ (t ) T (t )}
using (6) and (7) within
= E{(t ) T (t ) + ( )T ( k )} (see the
dt
proof of Lemma 2 in Chapter 3).
6.2.3 Comparison of Riccati Equations
The following Riccati Equation comparison Theorem is required subsequently to
compare the performance of filters and smoothers.
Theorem 1 (Riccati Equation Comparison Theorem) [12], [8]: Let P1(t) ≥ 0 and
P2(t) ≥ 0 denote solutions of the Riccati differential equations
P1 (t ) A1 (t ) P1 (t ) P1 (t ) A1T (t ) P1 (t ) S1 (t ) P1 (t )
B1 (t )Q1 (t ) B1T (t ) B(t )Q (t ) BT (t )
(11)
and
P2 (t ) A2 (t ) P2 (t ) P2 (t ) A2T (t ) P2 (t ) S 2 (t ) P2 (t )
B2 (t )Q2 (t ) B2T (t ) B (t )Q (t ) B T (t )
(12)
with S1(t) = C1T (t ) R11 (t )C1 (t ) , S2(t) = C2T (t ) R21 (t )C2 (t ) , where A1(t), B1(t), C1(t),
Q1(t) ≥ 0, R1(t) ≥ 0, A2(t), B2(t), C2(t), Q2(t) ≥ 0 and R2(t) ≥ 0 are of appropriate
dimensions. If
(i) P1(t0) ≥ P2(t0) for a t0 ≥ 0 and
Q (t ) A1 (t )
Q (t ) A2 (t )
(ii) T1
≥ T2
for all t ≥ t0.
A1 (t ) S1 (t )
A2 (t ) S 2 (t )
Then
P1(t) ≥ P2(t)
(13)
for all t ≥ t0.
Proof: Condition (i) of the Theorem is the initial step of an induction argument.
For the induction step, denote P3 (t ) = P1 (t ) − P2 (t ) , P3(t) = P1(t) − P2(t) and
A(t ) A1T (t ) + S1 (t ) P2 (t ) − 0.5S1 (t ) P3 (t ) . Then
“The price of doing the same old thing is far higher than the price of change.” William Jefferson (Bill)
Clinton
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
141
P3 (t ) A(t ) P3 (t ) P3 (t ) AT (t )
Q (t ) A1 (t ) Q2 (t ) A2 (t ) I
P2 (t ) 1t
t
A1 (t ) S1 (t ) A2 (t ) S 2 (t ) P2 (t )
I
which together with condition (ii) yields
P3 (t ) A(t ) P3 (t ) P3 (t ) AT (t ) .
(14)
Lemma 5 of Chapter 3 and (14) imply P3 (t ) ≥ 0 and the claim (13) follows.
6.2.4 The Maximum-Likelihood Method
Rauch, Tung and Streibel famously derived their fixed-interval smoother [3] using
a maximum-likelihood technique which is outlined as follows. Let x(t) ~ ( ,
Rxx) denote a continuous random variable having a Gaussian (or normal)
distribution within mean E{x(t)} = μ and covariance E{(x(t) – μ)(x(t) – μ)T} = Rxx.
The continuous-time Gaussian probability density function of x(t) n is
defined by
p( x(t ))
1
(2 )
n/ 2
Rxx
1/ 2
exp 0.5( x(t ) )T Rxx1 ( x(t ) ) ,
(15)
in which |Rxx| denotes the determinant of Rxx. The probability that the continuous
random variable x(t) with a given probability density function p(x(t)) lies within an
interval [a, b] is given by the likelihood function (which is also known as the
cumulative distribution function)
b
P(a x (t ) b) p ( x(t ))dx .
(16)
a
The Gaussian likelihood function for x(t) is calculated from (15) and (16) as
f ( x(t ))
1
(2 )
n/ 2
Rxx
1/ 2
exp 0.5( x(t ) )T Rxx1 ( x(t ) ) dx .
(17)
It is often more convenient to work with the log-probability density function
log p( x(t )) log (2 ) n / 2 Rxx
1/ 2
0.5( x(t ) )T Rxx1 ( x )dx
(18)
and the log-likelihood function
“Faced with the choice between changing one’s mind and proving that there is no need to do so, almost
everyone gets busy on the proof.” John Kenneth Galbraith
Chapter 6 Continuous-Time Smoothing
142
log f ( x(t )) log (2 ) n / 2 Rxx
0.5 ( x(t ) )T Rxx1 ( x )dx.
1/ 2
(19)
Suppose that a given record of x(t) is assumed to be belong to a Gaussian
distribution that is a function of an unknown quantity θ. A statistical approach for
estimating the unknown θ is the method of maximum likelihood. This typically
involves finding an estimate ˆ that either maximises the log-probability density
function
ˆ arg max log p( | x(t ))
(20)
or maximises the log-likelihood function
ˆ arg max
log f ( | x (t )) .
(21)
So-called maximum likelihood estimates can be found by setting either
log p ( | x(t ))
log f ( | x(t ))
or
to zero and solving for the unknown θ.
Continuous-time maximum likelihood estimation is illustrated by the two
examples that follow.
Example 1. Consider the first-order autoregressive system
x (t ) a0 x(t ) w(t ) ,
(22)
dx(t )
, w(t) is a zero-mean Gaussian process and a0 is unknown. It
dt
follows from (22) that x (t ) ~ (a0 x(t ), w2 ) , namely,
where x (t ) =
f ( x (t ))
1
(2 ) n / 2 w
T
0
exp 0.5( x (t ) a0 x(t )) 2 w2 dt .
(23)
Taking the logarithm of both sides gives
T
log f ( x (t )) log (2 ) n / 2 w 0.5 w2 ( x (t ) a0 x(t )) 2 dt .
(24)
0
Setting
log f ( x (t ))
= 0 results in
a0
T
0
( x (t ) a0 x(t )) x(t )dt = 0 and hence
“When a distinguished but elderly scientist states that something is possible, he is almost certainly
right. When he states that something is impossible, he is probably wrong.” Arthur Charles Clarke
143
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
aˆ0
T
0
( x 2 (t )dt
1
T
0
x (t ) x (t )dt .
(25)
Example 2. Consider the third-order autoregressive system
x (t ) a2
x(t ) a1 x (t ) a0 x (t ) w(t )
(26)
d 3 x(t )
d 2 x(t )
and
x(t ) =
. The above system can be written in a
3
dt
dt 2
controllable canonical form as
where
x (t ) =
x1 (t ) a2
x (t ) 1
2
x3 (t ) 0
a1
0
1
a0 x1 (t ) w(t )
0 x2 (t ) 0 .
0 x3 (t ) 0
(27)
Assuming x1 (t ) ~ (a2 x1 (t ) a1 x2 (t ) a0 x3 (t ), w2 ) , taking logarithms, setting
to zero the partial derivatives with respect to the unknown coefficients, and
rearranging yields
1
T
T
T
T x 2 dt
x2 x3 dt x1 x3 dt x1 x3 dt
3
0
0
0
0
aˆ0
T
T
T
aˆ T x x dt
2
0 2 3
0 x2 dt 0 x2 x1dt 0 x1 x2 dt ,
1
T
T
aˆ2
T
T
x1 x3 dt x2 x1dt x12 dt x1 x1dt
0
0
0
0
in which state time dependence is omitted for brevity.
6.3
(28)
Fixed-Point Smoothing
6.3.1 Problem Definition
In continuous-time fixed-point smoothing, it is desired to calculate state estimates
at one particular time of interest, τ, 0 ≤ τ ≤ t, from measurements z(t) over the
interval t [0, T]. For example, suppose that a continuous measurement stream of
a tennis ball’s trajectory is available and it is desired to determine whether it
bounced within the court boundary. In this case, a fixed-point smoother could be
employed to estimate the ball position at the time of the bounce from the past and
future measurements.
“Don’t be afraid to take a big step if one is indicated. You can’t cross a chasm in two small jumps.”
David Lloyd George
Chapter 6 Continuous-Time Smoothing
144
A solution for the continuous-time fixed-point smoothing problem can be
developed from first principles, for example, see [5] - [6]. However, it is
recognised in [13] that a simpler solution derivation follows by transforming the
smoothing problem into a filtering problem that possesses an augmented state.
Following the nomenclature of [14], consider an augmented state vector having
x (t )
n
two components, namely, x(a)(t) =
. The first component, x(t) , is the
(t )
state of the system x (t ) = A(t)x(t) + B(t)w(t) and y(t) = C(t)x(t). The second
component, (t ) n , equals x(t) at time t = τ, that is, (t ) = x(τ). The
corresponding signal model may be written as
x ( a ) (t ) A( a ) (t ) x ( a ) (t ) B ( a ) (t ) w(t )
z (t ) C (t ) x (t ) v(t ) ,
(a)
(a)
(29)
(30)
0 (a)
A(t )
B(t )
(a)
where A(a) =
, B (t) =
and C (t) = [C(t) 0], in which
0 t A(t )
t B(t )
1 if t
t
is the Kronecker delta function. Note that the simplifications
0 if t
A(t ) 0
B (t )
A(a) =
and B(a)(t) =
arise for t > τ. The smoothing objective is to
0
0
0
produce an estimate ˆ(t ) of (t ) from the measurements z(t) over t [0, T].
6.3.2 Solution Derivation
Employing the Kalman-Bucy filter recursions for the system (29) – (30) results in
xˆ ( a ) (t ) A( a ) (t ) xˆ ( a ) (t ) K ( a ) (t ) z (t ) C ( a ) (t ) xˆ (t | t )
A( a ) (t ) K ( a ) (t )C ( a ) (t ) x ( a ) (t ) K ( a ) (t ) z (t ) ,
(31)
where
K ( a ) (t ) P ( a ) (t )(C ( a ) )T (t ) R 1 (t ) ,
in which P(a)(t) 2 n
2n
(32)
is to be found. Consider the partitioning K ( a ) (t ) =
K (t )
K (t ) , then for t > τ, (31) may be written as
“If you don’t like change, you’re going to like irrelevance even less.” General Eric Shinseki
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
xˆ(t | t ) A(t ) K (t )C (t ) 0 xˆ (t | t ) K (t )
z (t ) .
ˆ
0 (t ) K (t )
(t ) K (t )C (t )
145
(33)
Define the augmented error state as x ( a ) (t ) = x ( a ) (t ) − xˆ ( a ) (t ) , that is,
x (t | t ) x(t ) xˆ(t | t )
(t ) ( ) ˆ .
(t )
(34)
Differentiating (34) and using z(t) = C (t ) x (t | t ) + v(t) gives
x (t | t ) x (t | t ) xˆ (t | t )
(t ) 0 ˆ(t )
A(t ) K (t )C (t ) 0 x (t | t ) B (t ) K (t ) w(t )
.
0 (t ) 0
K (t ) v(t )
K (t )C (t )
(35)
P(t ) T (t )
T
Denote P(a)(t) =
, where P(t) = E{{x(t) − xˆ(t | t )][ x(t ) − xˆ(t | t )] } ,
(t ) (t )
(t ) = E{[ (t ) − ˆ(t )][ (t ) − ˆ(t )]T } and (t ) = E{[ (t ) − ˆ(t )][ x(t ) −
xˆ(t | t )]T } . Applying Lemma 2 of Chapter 3 to (35) yields the Lyapunov
differential equation
P (t ) T (t ) A(t ) K (t )C (t ) 0 P (t ) T (t )
0 (t ) (t )
(t ) (t ) K (t )C (t )
P (t ) T (t ) AT (t ) C T (t ) K T (t ) C T (t ) K T (t )
0
0
(t ) (t )
0 BT (t )
0
B(t ) K (t ) Q (t )
T
.
K (t ) 0
R (t ) K (t ) K T (t )
0
Simplifying the above differential equation yields
P (t ) A(t ) P(t ) P(t ) AT (t ) P (t )C T (t ) R 1 (t )C (t ) P(t ) B (t )Q (t ) BT (t ) ,
(t ) (t ) AT (t ) C T (t ) K T (t ) ,
(t ) (t )C T (t ) R 1 (t )C (t )T (t ) .
(36)
(37)
(38)
Equations (37) – (38) can be initialised with
“The great virtue of my radicalism lies in the fact that I am perfectly ready, if necessary, to be radical
on the conservative side” Theodore Roosevelt
Chapter 6 Continuous-Time Smoothing
146
( ) P( ) .
(39)
Thus, the fixed-point smoother estimate is given by
ˆ(t ) (t )C T (t ) R 1 (t ) z (t ) C (t ) xˆ (t | t ) ,
(40)
which is initialised with ˆ( ) = xˆ( ) . Alternative derivations of (40) are
presented in [5], [8], [15]. The smoother (40) and its associated error covariances
(36) – (38) are also discussed in [16], [17].
6.3.3 Performance
It can be seen that the right-hand-side of the smoother error covariance (38) is
non-positive and therefore Ω(t) must be monotonically decreasing. That is, the
smoothed estimates improve with time. However, since the right-hand-side of (36)
varies inversely with R(t), the improvement reduces with decreasing signal-tonoise ratio. It is shown below the fixed-point smoother improves on the
performance of the minimum-variance filter.
Lemma 4: In respect of the fixed-point smoother (40),
P(t ) (t ) .
(41)
Proof: The initialisation (39) accords with condition (i) of Theorem 1. Condition
(ii) of the theorem is satisfied since
A(t )
Q (t )
C T (t ) R 1 (t )C (t )0 0
AT (t ) C T (t ) R 1 (t )C (t )
0
0
and hence the claim (41) follows.
6.4
Fixed-Lag Smoothing
6.4.1 Problem Definition
For continuous-time estimation problems, as usual, it assumed that the
observations are modelled by x (t ) = A(t)x(t) + B(t)w(t), z(t) = C(t)x(t) + v(t), with
E{w(t ) wT ( )} = Q(t ) (t ) and E{v(t )vT ( )} = R(t ) (t ) . In fixed-lag
smoothing, it is desired to calculate state estimates at a fixed time lag behind the
“Change will not come if we wait for some other person or some other time. We are the ones we’ve
been waiting for. We are the change that we seek.” Barack Hussein Obama II
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
147
current measurements. That is, smoothed state estimates, xˆ(t | t ) , are desired at
time t, given data at time t + τ, where τ is a prescribed lag. In particular, fixed-lag
smoother estimates are sought which minimise E{[x(t) − xˆ(t | t )][ x(t ) −
xˆ(t | t )]T } . It is found in [18] that the smoother yields practically all the
improvement over the minimum-variance filter when the smoothing lag equals
several time constants associated with the minimum-variance filter for the
problem.
6.4.2 Solution Derivation
Previously, augmented signal models together with the application of the standard
Kalman filter recursions were used to obtain the smoother results. However, as
noted in [19], it is difficult to derive the optimal continuous-time fixed-lag
smoother in this way because an ideal delay operator cannot easily be included
within an asymptotically stable state-space system. Consequently, an alternate
derivation based on that in [6] is outlined in the following. Recall that the gain of
the minimum-variance filter is calculated as K(t) = P(t )C T (t ) R 1 (t ) , where P(t) is
the solution of the Riccati equation (36). Let ( , t ) denote the transition matrix
of the filter error system x (t | t ) = (A(t) – K (t )C (t )) x (t | t ) + B(t ) w(t ) −
K (t )v(t ) , that is,
(t , s) A( ) K ( )C ( ) (t , s )
(42)
and ( s, s ) = I. It is assumed in [6], [17], [18], [20] that a smoothed estimate
xˆ(t | t ) of x(t) is obtained as
xˆ (t | t ) xˆ (t ) P(t ) (t , ) ,
(43)
where
(t , t )
t
t
T ( , t )C T ( )R 1 ( ) z ( ) C ( ) xˆ ( | ) d .
(44)
The formula (43) appears in the development of fixed interval smoothers [21] [22], in which case ξ(t) is often called an adjoint variable. From the use of
Leibniz’ rule, that is,
b(t )
d b (t )
db(t )
da (t )
f (t , s ) ds f (t , b(t ))
f (t , a(t ))
f (t , s )ds ,
a
(
t
)
a
( t ) t
dt
dt
dt
“Change is like putting lipstick on a bulldog. The bulldog’s appearance hasn’t improved, but now it’s
really angry.” Rosbeth Moss Kanter
Chapter 6 Continuous-Time Smoothing
148
it can be found that
(t , t ) T (t )C T (t ) R 1 (t ) z (t ) C (t ) xˆ (t ) | t )
(45)
C T (t ) R 1 (t ) z (t ) C (t ) xˆ (t | t ) A(t ) K (t )C (t ) (t , t ) .
T
Differentiating (43) with respect to t gives
xˆ (t | t ) xˆ (t | t ) P (t ) (t , ) P(t )(t , ) .
(46)
Substituting (t , ) = P 1 (t ) xˆ (t | t ) xˆ (t | t ) and expressions for xˆ(t ) , P (t ) ,
(t , t ) into (43) yields the fixed–lag smoother differential equation
xˆ (t | t ) A(t ) xˆ (t | t ) B(t )Q (t ) BT (t ) P 1 (t ) xˆ (t | t ) xˆ (t | t )
P (t )T (t , t )C T (t ) R 1 (t ) z (t ) C (t ) xˆ (t | t ) . (47)
6.4.3 Performance
Lemma 5 [18]:
P(t) – E{[x(t) − xˆ(t | t )][ x(t ) − xˆ(t | t )]T } > 0.
(48)
Proof. It is argued from the references of [18] for the fixed-lag smoothed estimate
that
E{[ x(t ) xˆ (t | t )][ x(t ) xˆ (t | t )]T }
P(t ) P (t ) T ( s, t )C T ( s ) R 1 ( s )C ( s) ( s, t ) ds P (t ) .
t
(49)
Thus, (48) follows by inspection of (49).
That is to say, the minimum-variance filter error covariance is greater than fixedlag smoother error covariance. It is also argued in [18] that (48) implies the error
covariance decreases monotonically with the smoother lag τ.
“An important scientific innovation rarely makes its way by gradually winning over and converting its
opponents: What does happen is that the opponents gradually die out.” Max Karl Ernst Ludwig Planck
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
6.5
149
Fixed-Interval Smoothing
6.5.1 Problem Definition
Many data analyses occur off-line. In medical diagnosis for example, reviews of
ultra-sound or CAT scan images are delayed after the time of measurement. In
principle, smoothing could be employed instead of filtering for improving the
quality of an image sequence.
Fixed-lag smoothers are elegant – they can provide a small performance
improvement over filters at moderate increase in implementation cost. The best
performance arises when the lag is sufficiently large, at the expense of increased
complexity. Thus, the designer needs to trade off performance, calculation cost
and delay.
Fixed-interval smoothers are a brute-force solution for estimation problems. They
provide improved performance without having to fine tune a smoothing lag, at the
cost of approximately twice the filter calculation complexity. Fixed interval
smoothers involve two passes. Typically, a forward process operates on the
measurements. Then a backward system operates on the results of the forward
process.
The plants are again assumed to have state-space realisations of the form x (t ) =
A(t)x(t) + B(t)w(t) and y(t) = C(t)x(t) + D(t)w(t). Smoothers are considered which
operate on measurements z(t) = y(t) + v(t) over a fixed interval t [0, T]. The
performance criteria depend on the quantity being estimated, viz.,
in input estimation, the objective is to calculate a wˆ (t | T ) that minimises
E{[w(t) − wˆ (t | T )][ w(t ) − wˆ (t | T )]T } ;
in state estimation, xˆ(t | T ) is calculated which achieves the minimum
E{[x(t) − xˆ(t | T )][ x(t ) − xˆ(t | T )]T } ; and
in output estimation, yˆ(t | T ) is produced such that E{[y(t) −
yˆ(t | T )][ y (t ) − yˆ(t | T )]T } is minimised.
This section focuses on three continuous-time fixed-interval smoother
formulations; the maximum-likelihood smoother derived by Rauch, Tung and
Streibel [3], the Fraser-Potter smoother [4] and a generalisation of Wiener’s
optimal unrealisable solution [8] – [10]. Some additional historical background to
[3] – [4] is described within [1], [2], [17].
“The soft-minded man always fears change. He feels security in the status quo, and he has an almost
morbid fear of the new. For him, the greatest pain is the pain of a new idea.” Martin Luther King Jr.
Chapter 6 Continuous-Time Smoothing
150
6.5.2 The Maximum Likelihood Smoother
6.5.2.1 Solution Derivation
Rauch, Tung and Streibel [3] employed the maximum-likelihood method to
develop a discrete-time smoother for state estimation and then used a limiting
argument to obtain a continuous-time version. A brief outline of this derivation is
set out here. Suppose that a record of filtered estimates, xˆ( | ) , is available over
a fixed interval τ [0, T]. Let xˆ( | T ) denote smoothed state estimates at time 0 ≤
τ ≤ T to be evolved backwards in time from filtered states xˆ( | ) . The smoother
development is based on two assumptions. First, it is assumed that xˆ( | T ) is
normally distributed with mean A( ) xˆ ( | T ) and covariance B(τ)Q(τ)BT(τ), that
is, xˆ( | T ) ~ ( A( ) xˆ ( | T ), B(τ)Q(τ)BT(τ)). The probability density function
of xˆ( | T ) is
p( xˆ ( | T ) | xˆ ( | T ))
1
(2 )
n/ 2
B( )Q ( ) BT ( )
1/ 2
exp 0.5( xˆ ( | T ) A( ) xˆ ( | T ))T ( B ( )Q ( ) BT ( )) 1 ( xˆ ( | T ) A( ) xˆ ( | T ))
Second, it is assumed that xˆ( | T ) is normally distributed with mean xˆ( | ) and
covariance P(τ), namely, xˆ( | T ) ~ N ( xˆ ( | ) , P(τ)). The corresponding
probability density function is
p ( xˆ ( | T ) | xˆ ( | ))
1
(2 )
n/ 2
P( )
1/ 2
exp 0.5( xˆ ( | T ) xˆ ( | ))T P 1 (t )( xˆ ( | T ) xˆ ( | )).
From the approach of [3] and the further details in [6],
log p( xˆ ( | T ) | xˆ ( | T )) p ( xˆ ( | T ) | xˆ ( | ))
xˆ( | T )
log p( xˆ ( | T ) | xˆ ( | T ))
log p( xˆ (t | T ) | xˆ (t | t ))
=
+
xˆ( | T )
xˆ(t | T )
0=
results in
“If you want to truly understand something, try to change it.” Kurt Lewin
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
0
151
( xˆ ( | T ) A( ) xˆ ( | T ))T
xˆ( | T )
( B ( )Q( ) BT ( )) 1 ( xˆ ( | T ) A( ) xˆ ( | T )) P 1 ( )( xˆ (t | T ) xˆ ( | )) .
Hence, the solution is given by
xˆ ( | T ) A( ) xˆ ( | T ) G( ) xˆ ( | T ) xˆ ( | ) ,
(50)
where
G ( ) B( )Q( ) BT ( )
( xˆ ( | T ) A( ) xˆ ( | T ))T 1
P ( )
xˆ( | T )
(51)
is the smoother gain. Suppose that xˆ( | T ) , A(τ), B(τ), Q(τ), P−1(τ) are sampled at
integer k multiples of Ts and are constant during the sampling interval. Using the
xˆ ((k 1)Ts | T ) xˆ (kTs | T )
Euler approximation xˆ( kTs | T ) =
, the sampled gain
Ts
may be written as
G (kTs ) B ( kTs )Ts1Q ( kTs ) BT ( kTs )( I ATs ) P 1 (kTs ) .
(52)
Recognising that Ts1Q (kTs ) = Q(τ), see [23], and taking the limit as Ts → 0 and
yields
G ( ) B( )Q( ) BT ( ) P 1 ( ) .
(53)
To summarise, the above fixed-interval smoother is realised by the following twopass procedure.
(i) In the first pass, the (forward) Kalman-Bucy filter operates on
measurements z(τ) to obtain state estimates xˆ( | ) .
(ii) In the second pass, the differential equation (50) operates on the filtered
state estimates xˆ( | ) to obtain smoothed state estimates xˆ( | T ) .
Equation (50) is integrated backwards in time from the initial condition
xˆ( | T ) = xˆ( | ) at τ = T.
Alternative derivations of this smoother appear in [6], [20], [23], [24].
“There is a certain relief in change, even though it be from bad to worse. As I have often found in
travelling in a stagecoach that it is often a comfort to shift one’s position, and be bruised in a new
place.” Washington Irving
Chapter 6 Continuous-Time Smoothing
152
6.5.2.2 Alternative Form
For the purpose of developing an alternate form of the above smoother found in
the literature, consider a fictitious forward version of (50), namely,
xˆ (t | T ) A(t ) xˆ (t | T ) B(t )Q (t ) BT (t ) P 1 (t ) xˆ (t | T ) xˆ (t | t )
A(t ) xˆ (t | T ) B (t )Q(t ) BT (t ) (t | T ) ,
(54)
(t | T ) P 1 (t )( xˆ (t | T ) xˆ (t | t ))
(55)
where
is an auxiliary variable. An expression for the evolution of (t | T ) is now
developed. Writing (55) as
xˆ ( | T ) xˆ (t | t ) P (t ) (t | T )
(56)
and taking the time differential results in
xˆ (t | T ) xˆ (t | t ) P (t ) (t | T ) P(t )(t | T ) .
(57)
Substituting xˆ(t | t ) = A(t ) xˆ (t | t ) + P(t )C T (t ) R 1 ( z (t ) − C (t ) xˆ (t | t )) into (57)
yields
P(t )(t | T ) P(t )C T (t ) R 1C (t ) P(t )C T (t ) R 1 z (t )
A(t ) P(t ) (t ) B (t )Q(t ) BT (t ) P (t ) (t ).
(58)
P(t ) (t | T ) ,
P(t ) AT (t ) = A(t ) P (t ) –
P(t )C T (t ) R 1 (t )C (t ) P (t ) (t | T ) + B(t )Q(t ) BT (t ) –
P (t ) within (58) and
rearranging gives
Using
xˆ(t | t )
=
xˆ(t | T )
–
(t | T ) C T (t ) R 1 (t )C (t ) xˆ (t | T ) AT (t ) (t ) C T (t ) R 1 (t ) z (t ) .
(59)
The filter (54) and smoother (57) may be collected together as
xˆ (t | T )
0
A(t )
B(t )Q (t ) BT (t ) xˆ(t | T )
T
T
1
1
T
A (t )
(t | T ) C (t ) R (t ) z (t )
(t | T ) C (t ) R (t )C (t )
(60)
“Discontent is the first step in the progress of a man or a nation.” Oscar Fingal O’Flahertie Wills
Wilde
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
153
Equation (60) is known as the Hamiltonian form of the Rauch-Tung-Striebel
smoother [17].
6.5.2.3 Performance
In order to develop an expression for the smoothed error state, consider the
backwards signal model
x ( ) A( ) x( ) B ( ) w( ) .
(61)
Subtracting (50) from (61) results in
x ( ) xˆ ( | T ) ( A( ) G ( ))( x( ) xˆ ( | T ))
G ( )( x( ) xˆ ( | )) B ( ) w( ) .
(62)
Let x ( | T ) = x( ) − xˆ( | T ) denote the smoothed error state and x ( | ) =
x( ) − xˆ( | ) denote the filtered error state. Then the differential equation (62)
can simply be written as
x ( | T ) ( A( ) G ( ))( x ( | T ) G ( ) x ( | ) B ( ) w( ) ,
where
(63)
x ( | T ) = ( xˆ ( | T ) xˆ( )) . Applying Lemma 3 to (63) and using
E{xˆ ( | ) , wT ( )} = 0 gives
( | T ) ( A( ) G ( ))( | T ) ( | T )( A( ) G ( ))T B( )Q( ) B T ( ) ,
(64)
where ( | T ) = E{xˆ ( | T ) , xˆT ( | T )} is the smoother error covariance and
d ( | T )
( | T ) =
. The smoother error covariance differential equation (64) is
d
solved backwards in time from the initial condition
( | T ) P (t | t )
(65)
at t = T, where P(t | t ) is the solution of the Riccati differential equation
P (t ) ( A(t ) K (t )C (t )) P(t ) P(t )( AT (t ) C T (t ) K T (t ))
K (t ) R (t ) K T (t ) B (t )Q (t ) BT (t )
A(t ) P(t ) P(t ) AT (t ) K (t ) R(t ) K T (t ) B (t )Q(t ) BT (t ) .
(66)
“That which comes into the world to disturb nothing deserves neither respect nor patience.” Rene Char
Chapter 6 Continuous-Time Smoothing
154
It is shown below that this smoother outperforms the minimum-variance filter. For
the purpose of comparing the solutions of forward Riccati equations, consider a
fictitious forward version of (64), namely,
(t | T ) ( A(t ) G (t ))(t | T ) (t | T )( A(t ) G (t ))T B(t )Q(t ) BT (t )
(67)
initialised with
(t0 | T ) P (t0 | t0 ) 0 .
(68)
Lemma 6: In respect of the fixed-interval smoother (50),
P(t | t ) (t | T ) .
(69)
Proof: The initialisation (68) satisfies condition (i) of Theorem 1. Condition (ii) of
the theorem is met since
B(t )Q(t ) BT (t ) K (t ) R (t ) K T (t )
AT (t ) C T (t ) K T (t )
A(t ) K (t )C (t ) B (t )Q (t ) BT (t )
T
T
0
A (t ) G (t )
A(t ) G (t )
0
for all t ≥ t0 and hence the claim (69) follows.
6.5.3 The Fraser-Potter Smoother
The Central Limit Theorem states that the mean of a sufficiently large sample of
independent identically distributed random variables will be approximately
normally distributed [25]. The same is true of partial sums of random variables.
The Central Limit Theorem is illustrated by the first part of the following lemma.
A useful generalisation appears in the second part of the lemma.
Lemma 7: Suppose that y1, y2, …, yn are independent random variables and W1,
W2, … Wn are independent positive definite weighting matrices. Let μ = E{y}, u =
y1 + y2 + … + yn and
v = (W1y1 + W2y2 + … + Wnyn) (W1 + W2 + … Wn)-1.
(i)
(ii)
(70)
If yi ~ ( , R), i = 1 to n, then u ~ (n , nR );
If yi ~ (0, I), i = 1 to n, then v ~ (0, I).
Proof:
(i)
E{u} = E{y1} + E(y2) + … + E{yn} = nμ. E{(u − μ)(u − μ)T} = E{(y1 −
μ)(y1 − μ)T} + E{(y2 − μ)(y2 − μ)T} + … + E{(yn − μ)(yn − μ)T} = nR.
“Today every invention is received with a cry of triumph which soon turns into a cry of fear.” Bertolt
Brecht
155
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
(ii)
E{v} = W1(W1 + W2 + … + Wn)-1E{y1} + W2(W1 + W2 + … + Wn)-1E(y2)
+ … + Wn(W1 + W2 + … Wn)-1E{yn}) = 0. E{vvT} = E{(W1T + W2T + …
+ WnT )1W1T y1T y1W1 (W1 + W2 + … WN ) 1} + E{(W1T + W2T + … +
WnT )1W2T y2T y2W2 (W1 + W2 + … WN ) 1} + … + E{(W1T + W2T + … +
WnT ) 1WnT ynT ynWn (W1 + W2 + … WN ) 1} = (W1T + W2T + … +
WnT ) 1 (W1T W1 + W2T W2 + … WnT Wn )(W1 + W2 + … Wn )1 = 1.
Fraser and Potter reported a smoother in 1969 [4] that combined state estimates
from forward and backward filters using a formula similar to (70) truncated at n =
2. The inverses of the forward and backward error covariances, which are
indicative of the quality of the respective estimates, were used as weighting
matrices. The combined filter and Fraser-Potter smoother equations are
xˆ (t | t ) A(t ) xˆ (t | t ) P (t | t )C T (t ) R 1 (t )( z (t ) C (t ) xˆ (t | t )) ,
(t | t ) A(t ) (t | t ) (t | t )C T (t ) R 1 (t )( z (t ) C (t ) (t | t )) ,
xˆ (t | T ) ( P (t | t ) (t | t )) ( P (t | t ) xˆ (t | t ) (t | t ) (t | t )) ,
1
1
1
1
1
(71)
(72)
(73)
where P(t | t ) is the solution of the forward Riccati equation P (t | t ) =
A(t ) P (t | t ) + P(t | t ) AT (t ) − P(t | t )C T (t ) R 1 (t )C (t ) P (t | t ) + B(t )Q(t ) BT (t ) and
(t | t ) is the solution of the backward Riccati equation (t | t ) = A(t )(t | t ) +
(t | t ) AT (t ) − (t | t )C T (t ) R 1 (t )C (t )(t | t ) + B(t )Q(t ) BT (t ) .
It can be seen from (72) that the backward state estimates, ζ(t), are obtained by
simply running a Kalman filter over the time-reversed measurements. Fraser and
Potter’s approach is pragmatic: when the data is noisy, a linear combination of two
filtered estimates is likely to be better than one filter alone. However, this twofilter approach to smoothing is ad hoc and is not a minimum-mean-square-error
design.
6.5.4 The Minimum-Variance Smoother
6.5.4.1 Problem Definition
The previously described smoothers are focussed on state estimation. A different
signal estimation problem shown in Fig. 1 is considered here. Suppose that
observations z = y2 + v are available, where y2 = w is the output of a linear timevarying system and v is measurement noise. A solution is desired which
“If there is dissatisfaction with the status quo, good. If there is ferment, so much the better. If there is
restlessness, I am pleased. Then let there be ideas, and hard thought, and hard work.” Hubert Horatio
Humphrey.
Chapter 6 Continuous-Time Smoothing
156
produces estimates ŷ1 of a second reference system y1 = w in such a way to
meet a performance objective. Let y = y1 – ŷ1 denote the output estimation error.
The optimum minimum-variance filter can be obtained by finding the solution that
T . Here, in the case of smoothing, the performance objective is to
minimises yy
2
minimise yy
H
2
.
v
w
y2
z
ŷ1
y1
y
Fig. 1. The general estimation problem. The objective is to produce estimates ŷ1
of y1 from measurements z.
6.5.4.2 Optimal Unrealisable Solutions
The minimum-variance smoother is a more recent innovation [8] - [10] and arises
by generalising Wiener’s optimal noncausal solution for the above time-varying
problem. The solution is obtained using the same completing-the-squares
technique that was previously employed in the frequency domain (see Chapters 1
and 2). It can be seen from Fig. 1 that the output estimation error is generated by
y
i , where
yi
yi 2 1
(74)
v
is a linear system that operates on the inputs i = .
w
Consider the factorisation
H 2 Q 2H R ,
(75)
“Restlessness and discontent are the first necessities of progress. Show me a thoroughly satisfied man
and I will show you a failure.” Thomas Alva Edison
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
157
in which the time-dependence of Q(t) and R(t) is omitted for notational brevity.
Suppose that Δ is causal, namely Δ and its inverse, Δ−1, are bounded systems that
proceed forward in time. The system Δ is known as a Wiener-Hopf factor.
Lemma 8: Assume that the Wiener-Hopf factor inverse, Δ-1, exists over t [0, T].
Then the smoother solution
1Q 2H H 1
1Q 2H ( H ) 1
minimises yy H
1Q 2H ( 2 Q 2H R) 1 .
2
= yi
H
yi
2
(76)
.
Proof: It follows from (74) that yi yiH = 1Q 2H − 1Q 2H H − 2 Q1 H
+ H H . Completing the square leads to yi yiH = yi 1yiH1 + yi 2yiH2 ,
where
yi 2yiH2 ( 1Q 2H H )( 1Q 2H H ) H
(77)
yi 1yiH1 1Q1 H 1Q 2H ( H ) 1 2 Q1H .
(78)
and
By inspection of (77), the solution (76) achieves
yi 2yiH2 2 = 0.
Since yi 1yiH1
2
(79)
excludes the estimator solution , this quantity defines the
lower bound for yyiH
2
.
Example 3. Consider the output estimation case where 1 = 2 and
OE 2 Q 2H ( 2 Q 2H R )1 ,
(80)
which is of order n4 complexity. Using 2 Q 2H = H − R leads to the n2-order
solution
OE I R ( H ) 1 .
(81)
“Whatever has been done before will be done again. There is nothing new under the sun.” Ecclesiastes
1:9
Chapter 6 Continuous-Time Smoothing
158
It is interesting to note from (81) and
yi 1yiH1 2Q 2H 2 Q 2H ( 2 Q 2H R) 1 2 Q 2H
(82)
that lim I and lim yi 1yiH1 0 . That is, output estimation is superfluous
R 0
R 0
when measurement noise is absent. Let {yi yiH } = {yi 1yiH1 } + {yi 2yiH2 }
denote the causal part of yi yiH . It is shown below that minimum-variance filter
solution can be found using the above completing-the squares technique and
taking causal parts.
Lemma 9: The filter solution
{ } {1Q 2H H 1}
{1Q 2H H } 1
H }
minimises { yy
2
(83)
= {yi yiH } , provided that the inverses exist.
2
Proof: It follows from (77) that
{yi 2yiH2 } {( 1Q 2H H )( 1Q 2H H ) H } .
(84)
By inspection of (84), the solution (83) achieves
{yi 2yiH2 } = 0.
2
(85)
It is worth pausing at this juncture to comment on the significance of the above
results.
The formulation (76) is an optimal solution for the time-varying
smoother problem since it can be seen from (79) that it achieves the bestpossible (minimum-error-variance) performance.
Similarly, (83) is termed an optimal solution because it achieves the bestpossible filter performance (85).
By inspection of (79) and (85) it follows that the minimum-variance
smoother outperforms the minimum-variance filter.
In general, these optimal solutions are not very practical because of the
difficulty in realising an exact Wiener-Hopf factor.
Practical smoother (and filter) solutions that make use of an approximate WienerHopf factor are described below.
“There is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its
success, than to take the lead in the introduction of a new order of things.” Niccolo Di Bernado dei
Machiavelli
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
159
6.5.4.3 Optimal Realisable Solutions
Output Estimation
The Wiener-Hopf factor is modelled on the structure of the spectral factor which
is described Section 3.4.4. Suppose that R(t) > 0 for all t [0, T] and there exist
R1/2(t) > 0 such that R(t) = R1/2(t) R1/2(t). An approximate Wiener-Hopf factor, ̂ ,
is defined by the system
x (t ) A(t ) K (t ) R1/ 2 (t ) x(t )
(t )
,
R1/ 2 (t ) z (t )
C (t )
(86)
where K(t) = P(t )C T (t ) R 1 (t ) is the Kalman gain in which P(t) is the solution of
the Riccati differential equation
P (t ) A(t ) P(t ) P(t ) AT (t ) P (t )C T (t ) R 1 (t )C (t ) P(t ) B (t )Q (t ) BT (t ) .
(87)
The output estimation smoother (81) can be approximated as
ˆ ˆ H ) 1
OE I R (
I Rˆ H ˆ 1 .
(88)
An approximate Wiener-Hopf factor inverse, ˆ 1 , within (88) is obtained from
(86) and the Matrix Inversion Lemma, namely,
xˆ(t ) A(t ) K (t )C (t )
K (t ) x(t )
,
1/ 2
1/ 2
R
(
t
)
C
(
t
)
R
(t ) z (t )
(t )
(89)
where xˆ(t ) n is an estimate of the state within ˆ 1 . From Lemma 1, the
adjoint of ˆ 1 , which is denoted by ˆ H , has the realisation
(t ) AT (t ) C T (t ) K T (t ) C T (t ) R 1/ 2 (t ) (t )
.
K T (t )
R 1/ 2 (t ) (t )
(t )
(90)
where (t ) p is an estimate of the state within ˆ H . Thus, the smoother (88)
is realised by (89), (90) and
“If I have a thousand ideas and only one turns out to be good, I am satisfied.” Alfred Bernhard Nobel
Chapter 6 Continuous-Time Smoothing
160
yˆ(t | T ) z (t ) R(t ) (t ) .
(91)
Procedure 1. The above output estimator can be implemented via the following
three steps.
Step 1. Operate ˆ 1 on the measurements z(t) using (89) to obtain α(t).
Step 2. In lieu of the adjoint system (90), operate (89) on the time-reversed
transpose of α(t). Then take the time-reversed transpose of the result to
obtain β(t).
Step 3. Calculate the smoothed output estimate from (91).
Example 4. Consider an estimation problem parameterised by a = – 1, b = 2 , c
= 1, d = 0, w2 = v2 = 1, which leads to p = k = 3 – 1 [26]. Smoothed output
estimates may be obtained by evolving
xˆ (t ) 3xˆ (t ) 3z (t ) , (t ) xˆ (t ) z (t ) ,
time-reversing the (t ) and evolving
(t ) 3 (t ) 3 (t ) , (t ) (t ) (t ) ,
then time-reversing (t ) and calculating
yˆ(t | T ) z (t ) (t ) .
Filtering
The causal part {OE } of the minimum-variance smoother (88) is given by
{OE } I R{ˆ H } ˆ 1
I RR 1/ 2 ˆ 1
I R1/ 2 ˆ 1 .
(92)
Employing (89) within (92) leads to the standard minimum-variance filter,
namely,
xˆ (t | t ) ( A(t ) K (t )C (t )) xˆ (t | t ) K (t ) z (t )
yˆ (t | t ) C (t ) xˆ (t | t ) .
(93)
(94)
“Ten geographers who think the world is flat will tend to reinforce each other’s errors….Only a sailor
can set them straight.” John Ralston Saul
161
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
Input Estimation
As discussed in Chapters 1 and 2, input estimates can be found using 1 = I, and
substituting ̂ for Δ within (76) yields the solution
ˆ ˆ H ) 1 Q 1ˆ H ˆ 1 .
IE Q 21 (
2
(95)
As expected, the low-measurement-noise-asymptote of this equaliser is given by
limIE 21 . That is, at high signal-to-noise-ratios the equaliser approaches
R 0
21 , provided the inverse exists.
The development of a differential equation for the smoothed input estimate,
wˆ (t | T ) , makes use of the following formula [27] for the cascade of two systems.
Suppose that two linear systems 1
and 2
have state-space parameters
A1 B1
A2 B2
C D and C D , respectively. Then 2 1 is parameterised by
1
1
2
2
A
0
B
1
1
B C A B D . It follows that wˆ (t | T ) = Q H ˆ H (t ) is realised by
2
2 1
2 1
D2 C1 C2 D2 D1
(t ) AT (t ) C T (t ) K T (t )
0
C T (t ) R 1/ 2 (t ) (t )
T
T
AT (t )
C T (t ) R 1/ 2 (t ) (t ) .
(t ) C (t ) K (t )
wˆ (t | T ) Q(t ) DT (t ) K T (t )
Q(t ) BT (t ) Q (t ) DT (t ) R 1/ 2 (t ) (t )
(96)
in which (t ) n is an auxiliary state.
Procedure 2. Input estimates can be calculated via the following two steps.
Step 1. Operate ˆ 1 on the measurements z(t) using (89) to obtain α(t).
Step 2. In lieu of (96), operate the adjoint of (96) on the time-reversed transpose
of α(t). Then take the time-reversed transpose of the result.
State Estimation
Smoothed state estimates can be obtained by defining the reference system 1
within (76) as
“In questions of science, the authority of a thousand is not worth the humble reasoning of a single
individual.” Galileo Galilei
Chapter 6 Continuous-Time Smoothing
162
xˆ (t | T ) A(t ) xˆ (t | T ) B (t ) wˆ (t | T ) .
(97)
That is, a smoother for state estimation is given by (89), (96) and (97). In
frequency-domain estimation problems, minimum-order solutions are found by
exploiting pole-zero cancellations, see Example 13 of Chapter 1. Here in the timedomain, (89), (96), (97) is not a minimum-order solution and some numerical
model order reduction may be required.
Suppose that C(t) is of rank n and D(t) = 0. In this special case, an n2-order
solution for state estimation can be obtained from (91) and
xˆ (t | T ) C † (t ) yˆ (t | T ) ,
(98)
C † (t ) C T (t )C (t ) C T (t )
(99)
where
1
denotes the Moore-Penrose pseudoinverse.
6.5.4.4 Performance
An analysis of minimum-variance smoother performance requires an identity
which is described after introducing some additional notation. Let α = 0 w
denote the output of linear time-varying system having the realisation
x (t ) A(t ) x(t ) w(t )
(t ) x (t ) ,
(100)
(101)
where w(t) n and A(t) nn . By inspection of (100) – (101), the output of
the inverse system w = 01 y is given by
w(t ) (t ) A(t ) (t ) .
(102)
Similarly, let β = 0H u denote the output of the adjoint system 0H , which from
Lemma 1 has the realisation
“The mind likes a strange idea as little as the body likes a strange protein and resists it with similar
energy. It would not perhaps be too fanciful to say that a new idea is the most quickly acting antigen
known to science. If we watch ourselves honestly we shall often find that we have begun to argue
against a new idea even before it has been completely stated.” Wilfred Batten Lewis Trotter
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
(t ) AT (t ) (t ) u (t )
(t ) (t ) .
163
(103)
(104)
It follows that the output of the inverse system u = 0 H is given by
u (t ) (t ) AT (t ) (t ) .
(105)
The following identity is required in the characterisation of smoother performance
P (t ) AT (t ) A(t ) P(t ) P(t ) 0 H 01 P(t ) ,
(106)
where P(t) is an arbitrary matrix of compatible dimensions. The above equation
can be verified by using (102) and (105) within (106). Using the above notation,
the exact Wiener-Hopf factor satisfies
H C (t ) 0 B(t )Q (t ) BT (t ) 0H C T (t ) R(t ) .
(107)
It is observed below that the approximate Wiener-Hopf factor (86) approaches the
exact Wiener Hopf-factor whenever the problem is locally stationary, that is,
whenever A(t), B(t), C(t), Q(t) and R(t) change sufficiently slowly, so that P (t ) of
(87) approaches the zero matrix.
Lemma 10 [8]: In respect of the signal model (1) – (2) with D(t) = 0, E{w(t)} =
E{v(t)} = 0, E{w(t)wT(t)} = Q(t), E{v(t)vT(t)} = R(t), E{w(t)vT(t)} = 0 and the
quantities defined above,
ˆ ˆ H H C (t ) P (t ) H C T (t ) .
0
0
(108)
Proof: The approximate Wiener-Hopf factor may be written as ̂ =
ˆ ˆ H = C (t ) ( P H +
C (t ) 0 K (t ) R1/ 2 (t ) + R1/ 2 (t ) . It is easily shown that
0
0
ˆ ˆH =
01 P + K (t ) R (t ) K T (t )) 0H C T (t ) and using (106) gives
C (t ) ( B (t )Q (t ) BT (t )
P (t ) H C T (t ) + R(t). The result follows by
0
0
ˆ ˆ H and (107).
comparing
Consequently, the minimum-variance smoother (88) achieves the best-possible
estimator performance, namely ei 2eiH2 = 0, whenever the problem is locally
2
stationary.
Lemma 11 [8]: The output estimation smoother (88) satisfies
“Every great advance in natural knowledge has involved the absolute rejection of authority.” Thomas
Henry Huxley
Chapter 6 Continuous-Time Smoothing
164
ei 2 R(t )[( H ) 1 ( H C (t ) 0 P (t ) 0H C T (t )) 1 ] .
(109)
Proof: Substituting (88) into (77) yields
ˆ ˆ H ) 1 ] .
ei 2 R(t )[( H ) 1 (
(110)
The result is now immediate from (108) and (110).
Conditions for the convergence of the Riccati difference equation solution (87)
and hence the asymptotic optimality of the smoother (88) are set out below.
Lemma 12 [8]: Let S(t) =CT(t)R−1(t)C(t). If( i) there exist solutions P(t) ≥ P(t+δt)
of (87) for a t > δt > 0; and
(ii)
A(t t )
A(t ) Q (t t )
Q (t )
(111)
AT (t ) S (t ) ≥ AT (t ) S (t )
t
t
for all t > δt then
lim yi 2yiH2
t
2
0.
(112)
Proof: Conditions (i) and (ii) together with Theorem 1 imply P(t) ≥ P(t+δt) for all
t > δt and lim P (t ) = 0. The claim (112) is now immediate from Lemma 11.
t
6.5.5 Performance Comparison
The following scalar time-invariant examples compare the performance of the
minimum-variance filter (92), maximum-likelihood smoother (50), Fraser-Potter
smoother (73) and minimum-variance smoother (88) under Gaussian and
nongaussian noise conditions.
Example 5 [9]. Suppose that A = – 1 and B = C = Q = 1. Simulations were
conducted using T = 100 s, t = 1 ms and 1000 realisations of Gaussian noise
processes. The mean-square-error (MSE) exhibited by the filter and smoothers as
a function of the input signal-to-noise ratio (SNR) is shown in Fig. 2. As expected,
it can be seen that the smoothers outperform the filter. Although the minimumvariance smoother exhibits the lowest mean-square error, the performance benefit
diminishes at high signal-to-noise ratios.
“The definition of insanity is doing the same thing over and over again and expecting different results.”
Albert Einstein
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
165
Example 6 [9]. Suppose instead that the process noise is the unity-variance
1
2
deterministic signal w(t) = sin(t ) sin(
t ) , where sin( t ) denotes the sample variance
of sin(t). The results of simulations employing the sinusoidal process noise and
Gaussian measurement noise are shown in Fig. 3. Once again, the smoothers
exhibit better performance than the filter. It can be seen that the minimumvariance smoother provides the best mean-square-error performance. The
minimum-variance smoother appears to be less perturbed by nongaussian noises
because it does not rely on assumptions about the underlying distributions.
Fig. 2. MSE versus SNR for Example 4: (i)
minimum-variance smoother, (ii) Fraser-Potter
smoother, (iii) maximum-likelihood smoother
and (iv) minimum-variance filter.
6.6
Fig. 3. MSE versus SNR for Example 5: (i)
minimum-variance smoother, (ii) Fraser-Potter
smoother, (iii) maximum-likelihood smoother
and (iv) minimum-variance filter.
Chapter Summary
The fixed-point smoother produces state estimates at some previous point in time,
that is,
ˆ(t ) (t )C T (t ) R 1 (t ) z (t ) C (t ) xˆ (t | t ) ,
where Σ(t) is the smoother error covariance.
In fixed-lag smoothing, state estimates are calculated at a fixed time delay τ
behind the current measurements. This smoother has the form
xˆ (t | t ) A(t ) xˆ (t | t ) B(t )Q (t ) BT (t ) P 1 (t ) xˆ (t | t ) xˆ (t | t )
P (t )T (t , t )C T (t ) R 1 (t ) z (t ) C (t ) xˆ (t ) ,
where Ф(t + τ, t) is the transition matrix of the minimum-variance filter.
“The first problem for all of us, mean and women, is not to learn but to unlearn.” Gloria Marie Steinem
Chapter 6 Continuous-Time Smoothing
166
Three common fixed-interval smoothers are listed in Table 1, which are for
retrospective (or off-line) data analysis. The Rauch-Tung-Streibel (RTS) smoother
and Fraser-Potter (FP) smoother are minimum-order solutions. The RTS smoother
differential equation evolves backward in time, in which G ( ) =
Optimal smoother
FP
smoother
RTS smoother
Signals and
system
B( )Q( ) BT ( ) P 1 ( ) is the smoothing gain. The FP smoother employs a linear
combination of forward state estimates and backward state estimates obtained by
running a filter over the time-reversed measurements. The optimum minimumvariance solution, in which A(t ) = A(t ) K (t )C (t ) , in which K(t) is the predictor
gain, involves a cascade of forward and adjoint predictions. It can be seen that the
optimum minimum-variance smoother is the most complex and so any
performance benefits need to be reconciled with the increased calculation cost.
ASSUMPTIONS
E{w(t)} = E{v(t)}
= 0. E{w(t)wT(t)} =
Q(t) > 0 and
E{v(t)vT(t)} = R(t)
> 0 are known.
A(t), B(t) and C(t)
are known.
Assumes that the
filtered and
smoothed states are
normally
distributed. xˆ(t | t )
previously
calculated by
Kalman filter.
xˆ(t | t ) previously
calculated by
Kalman filter.
MAIN RESULTS
x (t ) A(t ) x(t ) B(t ) w(t )
y (t ) C (t ) x(t )
z (t ) y (t ) v(t )
xˆ ( | T ) A( ) xˆ ( | T ) G( ) xˆ ( | T ) xˆ ( | )
xˆ(t | T ) ( P 1 (t | t ) 1 (t | t )) 1
( P 1 (t | t ) xˆ (t | t ) 1 (t | t ) (t | t ))
xˆ (t )
A(t )
K (t ) xˆ(t )
1/ 2
1/ 2
(t ) R (t )C (t ) R (t ) z (t )
(t ) AT (t ) C T (t ) R 1/ 2 (t ) (t )
T
R 1/ 2 (t ) (t )
( t ) K (t )
yˆ(t | T ) z (t ) R(t ) (t )
Table 1. Continuous-time fixed-interval smoothers.
“Remember a dead fish can float downstream but it takes a live one to swim upstream.” William
Claude Fields
167
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
The output estimation error covariance for the general estimation problem can be
written as eieiH = ei1eiH1 + ei 2eiH2 , where ei1eiH1 specifies a lower
performance bound and ei 2eiH2 is a function of the estimator solution. The
optimal smoother solution achieves {ei 2eiH2 }
2
= 0 and provides the best
mean-square-error performance, provided of course that the problem assumptions
are correct. The minimum-variance smoother solution also attains best-possible
performance whenever the problem is locally stationary, that is, when A(t), B(t),
C(t), Q(t) and R(t) change sufficiently slowly.
6.7
Problems
Problem 1. Write down augmented state-space matrices A(a)(t), B(a)(t) and C(a)(t)
for the continuous-time fixed-point smoother problem.
(i)
Substitute the above matrices into P ( a ) (t ) = A( a ) (t ) P ( a ) (t ) +
P ( a ) (t )( A( a ) )T (t )
(a)
(ii)
(a)
−
P ( a ) (t )(C ( a ) )T (t ) R 1 (t )C ( a ) (t ) P ( a ) (t )
+
T
B (t )Q (t )( B (t )) to obtain the component Riccati differential
equations.
Develop expressions for the continuous-time fixed-point smoother
estimate and the smoother gain.
Problem 2. The Hamiltonian equations (60) were derived from the forward
version of the maximum likelihood smoother (54). Derive the alternative form
xˆ (t | T )
0
A(t )
B(t )Q (t ) BT (t ) xˆ(t | T )
T
T
.
1
1
T
(
t
|
T
)
C
(
t
)
R
(
t
)
z
(
t
)
C
(
t
)
R
(
t
)
C
(
t
)
A
(
t
)
(
t
|
T
)
from the backward smoother (50). Hint: use the backward Kalman-Bucy filter and
the backward Riccati equation.
Problem 3. It is shown in [6] and [17] that the intermediate variable within the
Hamiltonian equations (60) is given by
T
(t | T ) T ( s, t )C T ( s) R 1 ( s )( z ( s ) C ( s) xˆ ( s | s ))ds ,
t
where T ( s, t ) is the transition matrix of the Kalman-Bucy filter. Use the above
equation to derive
“He who rejects change is the architect of decay. The only human institution which rejects progress is
the cemetery.” James Harold Wilson
Chapter 6 Continuous-Time Smoothing
168
(t | T ) C T (t ) R 1 (t )C (t ) xˆ (t | T ) AT (t ) (t ) C T (t ) R 1 (t ) z (t ) .
Problem 4. Show that the adjoint of system having state space parameters
AT (t ) C T (t )
A(t ) B(t )
is
parameterised
by
T
.
C (t ) D (t )
D T (t )
B (t )
A(t ) I
Problem 5. Suppose 0 is a system parameterised by
, show that
0
I
P(t ) AT (t ) - A(t ) P (t ) = P(t ) 0 H + 01 P(t ) .
Problem 6. The optimum minimum-variance smoother was developed by finding
the solution that minimises y 2 y 2H . Use the same completing-the-square
2
approach to find the optimum minimum-variance filter. (Hint: Find the solution
that minimises y 2 y 2T .)
2
Problem 7 [9]. Derive the output estimation minimum-variance filter by finding a
solution Let a , b = 1, c and d = 0 denote the time-invariant state-space
parameters of the plant . Denote the error covariance, gain of the Kalman filter
and gain of the maximum-likelihood smoother by p, k and g, respectively. Show
that
H1(s) = k(s–a+kc)-1,
H2(s) = cgk(–s–a+g)-1(s–a+kc)-1,
H3(s) = kc(–a + kc)(s a + kc)-1(–s –a + kc)-1,
H4(s) = ((–a + kc)2 – (–a + kc – k)2)(s – a + kc)-1(–s –a + kc)-1
are the transfer functions of the Kalman filter, maximum-likelihood smoother, the
Fraser-Potter smoother and the minimum variance smoother,
32
respectively.
Problem 8.
(i)
Develop a state-space formulation of an approximate Wiener-Hopf factor
for the case when the plant includes a nonzero direct feedthrough matrix
(that is, D(t) ≠ 0).
(ii)
Use the matrix inversion lemma to obtain the inverse of the approximate
Wiener-Hopf factor for the minimum-variance smoother.
“It is not the strongest of the species that survive, nor the most intelligent, but the most responsive to
change.” Charles Robert Darwin
169
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
6.8
Glossary
p(x(t))
x(t ) ~ ( , Rxx )
f(x(t))
xˆ(t | t )
xˆ(t | T )
wˆ (t | T )
G(t)
ei
T
The Wiener-Hopf factor which satisfies
Q H + R.
Approximate Wiener-Hopf factor.
̂
ˆ H
ˆ H
{ˆ H }
[2]
[3]
[4]
[5]
H
=
The adjoint of ̂ .
The inverse of ˆ H .
The causal part of ˆ H .
[1]
Estimate of x(t) at time t given data over a fixed interval
T.
Estimate of w(t) at time t given data over a fixed interval
T.
Gain of the minimum-variance smoother developed by
Rauch, Tung and Striebel.
A linear system that operates on the inputs i =
vT wT and generates the output estimation error e.
Moore-Penrose pseudoinverse of C(t).
C # (t )
Δ
6.9
Probability density function of a continuous random
variable x(t).
The random variable x(t) has a normal distribution with
mean μ and covariance Rxx.
Cumulative distribution function or likelihood function
of x(t).
Estimate of x(t) at time t given data at fixed time lag τ.
References
J. S. Meditch, “A Survey of Data Smoothing for Linear and Nonlinear
Dynamic Systems”, Automatica, vol. 9, pp. 151-162, 1973
T. Kailath, “A View of Three Decades of Linear Filtering Theory”, IEEE
Transactions on Information Theory, vol. 20, no. 2, pp. 146 – 181, Mar.,
1974.
H. E. Rauch, F. Tung and C. T. Striebel, “Maximum Likelihood Estimates
of Linear Dynamic Systems”, AIAA Journal, vol. 3, no. 8, pp. 1445 –
1450, Aug., 1965.
D. C. Fraser and J. E. Potter, "The Optimum Linear Smoother as a
Combination of Two Optimum Linear Filters", IEEE Transactions on
Automatic Control, vol. AC-14, no. 4, pp. 387 – 390, Aug., 1969.
J. S. Meditch, “On Optimal Fixed Point Linear Smoothing”, International
“Once a new technology rolls over you, if you’re not part of the steamroller, you’re part of the road.”
Stewart Brand
170
[6]
[7]
[8]
[9]
[10]
Chapter 6 Continuous-Time Smoothing
Journal of Control, vol. 6, no. 2, pp. 189 – 199, 1967.
A. P. Sage and J. L. Melsa, Estimation Theory with Applications to
Communications and Control, McGraw-Hill Book Company, New York,
1971.
J. B. Moore, “Fixed-Lag Smoothing Results for Linear Dynamical
Systems”, A.T.R., vol. 7, no. 2, pp. 16 – 21, 1973.
G. A. Einicke, “Asymptotic Optimality of the Minimum-Variance FixedInterval Smoother”, IEEE Transactions on Signal Processing, vol. 55, no.
4, pp. 1543 – 1547, Apr. 2007.
G. A. Einicke, J. C. Ralston, C. O. Hargrave, D. C. Reid and D. W.
Hainsworth, “Longwall Mining Automation, An Application of
Minimum-Variance Smoothing”, IEEE Control Systems Magazine, vol.
28, no. 6, pp. 28 – 37, Dec. 2008.
G. A. Einicke, “A Solution to the Continuous-Time H-infinity FixedInterval Smoother Problem”, IEEE Transactions on Automatic Control,
vo. 54, no. 12, pp. 2904 – 2908, Dec. 2009.
[11]
D. J. N. Limebeer, B. D. O. Anderson, P. Khargonekar and M. Green, “A Game
Theoretic Approach to H Control for Time-varying Systems”, SIAM Journal on
Control and Optimization, vol. 30, no. 2, pp. 262 – 283, 1992.
[12]
G. Freiling, G. Jank and H. Abou-Kandil, “Generalized Riccati Difference
and Differential Equations”, Linear Algebra And Its Applications, pp. 291
– 303, 199
L. E. Zachrisson, “On Optimal Smoothing of Continuous Time Kalman
Processes, Information Sciences, vol. 1, pp. 143 – 172, 1969.
U. Shaked and Y. Theodore, “H∞ – Optimal Estimation: A Tutorial”,
Proc. 31st IEEE Conference on Decision Control, Ucson, Arizona, pp.
2278 – 2286, Dec. 1992.
J. S. Meditch, “On Optimal Linear Smoothing Theory”, Information and
Control, vol. 10, pp. 598 – 615, 1967.
P. S. Maybeck, Stochastic Models, Estimation and Control, Vol. 2,
Academic press, New York, 1982.
T. Kailath, A. H. Sayed and B. Hassibi, Linear Estimation, Pearson
Prentice Hall, New Jersey, 2000.
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
B. D. O. Anderson, “Properties of Optimal Linear Smoothing”, IEEE Transactions
on Automatic Control, vol. 14, no. 1, pp. 114 – 115, Feb. 1969.
S. Chirarattananon and B. D. O. Anderson, “Stable Fixed-Lag Smoothing of
Continuous-time Processes”, IEEE Transactions on Information Theory, vol. 20,
no. 1, pp. 25 – 36, Jan. 1974.
A. Gelb, Applied Optimal Estimation, The Analytic Sciences Corporation, USA,
1974.
L. Ljung and T. Kailath, “A Unified Approach to Smoothing Formulas”,
Automatica, vol. 12, pp. 147 – 157, 197
T. Kailath and P. Frost, “An Innovations Approach to Least-Squares Estimation
Part II: Linear Smoothing in Additive White Noise”, IEEE Transactions on
Automat. Control, vol. 13, no. 6, pp. 655 – 660, 1968.
“In times of profound change, the learners inherit the earth, while the learned find themselves
beautifully equipped to deal with a world that no longer exists.” Al Rogers
G. A. Einicke, Smoothing, Filtering and Prediction: Estimating
the Past, Present and Future (2nd ed.), Prime Publishing, 2019
[23]
[24]
[25]
[26]
[27]
171
F. L. Lewis, L. Xie and D. Popa, Optimal and Robust Estimation With an
Introduction to Stochastic Control Theory, Second Edition, CRC Press, Taylor &
Francis Group, 2008.
F. A. Badawi, A. Lindquist and M. Pavon, “A Stochastic Realization Approach to
the Smoothing Problem”, IEEE Transactions on Automatic Control, vol. 24, no. 6,
Dec. 1979.
J. A. Rice, Mathematical Statistics and Data Analysis (Second Edition), Duxbury
Press, Belmont, California, 1995.
R. G. Brown and P. Y. C. Hwang, Introduction to Random Signals and Applied
Kalman Filtering, Second Edition, John Wiley & Sons, Inc., New York, 1992.
M. Green and D. J. N. Limebeer, Linear Robust Control, Prentice-Hall Inc,
Englewood Cliffs, New Jersey, 1995.
“I don’t want to be left behind. In fact, I want to be here before the action starts.” Kerry Francis
Bullmore Packer
172
Chapter 6 Continuous-Time Smoothing
This page is intentionally left blank
157