Academia.eduAcademia.edu

Chapter 6 Continuous-time Smoothing

2019, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition

The previously-described minimum-mean-square-error and minimum-variance filtering solutions operate on measurements up to the current time. If some processing delay can be tolerated then improved estimation performance can be realised through the use of smoothers. There are three state-space smoothing technique categories, namely, fixed-point, fixed-lag and fixed-interval smoothing. Fixed-point smoothing refers to estimating some linear combination of states at a previous instant in time. In the case of fixed-lag smoothing, a fixed time delay is assumed between the measurement and on-line estimation processes. Fixed-interval smoothing is for retrospective data analysis, where measurements recorded over an interval are used to obtain the improved estimates. Compared to filtering, smoothing has a higher implementation cost, as it has increased memory and calculation requirements. A large number of smoothing solutions have been reported since Wiener’s and Kalman’s development of the optimal filtering results – see the early surveys [1] – [2]. The minimum-variance fixed-point and fixed-lag smoother solutions are well known. Two fixed-interval smoother solutions, namely the maximum-likelihood smoother developed by Rauch, Tung and Striebel [3], and the two-filter Fraser-Potter formula [4], have been in widespread use since the 1960s. However, the minimum-variance fixed-interval smoother is not well known. This smoother is simply a time-varying state-space generalisation of the optimal Wiener solution. It differs from the Rauch-Tung-Striebel and Fraser-Potter solutions, which may not sit well with more orthodox practitioners. The main approaches for continuous-time fixed-point, fixed-lag and fixed-interval smoothing are canvassed here. It is assumed throughout that the underlying noise processes are zero mean and uncorrelated. Nonzero means and correlated processes can be handled using the approaches of Chapters 3 and 4. It is also assumed here that the noise statistics and state-space model parameters are known precisely. Note that techniques for estimating parameters and accommodating uncertainty are addressed subsequently. Some prerequisite concepts, namely time-varying adjoint systems, backwards differential equations, Riccati equation comparison and the continuous-time maximum-likelihood method are covered in Section 6.2. Section 6.3 outlines a derivation of the fixed-point smoother by Meditch [5]. The fixed-lag smoother reported by Sage et al [6] and Moore [7], is the subject of Section 4. Section 5 deals with the Rauch-Tung-Striebel [3], Fraser-Potter [4] and minimum-variance fixed-interval smoother solutions [8] - [10]. As before, the approach here is to accompany the developments, where appropriate, with proofs about performance being attained. Smoothing is not a panacea for all ills. If the measurement noise is negligible then smoothing (and filtering) may be superfluous. Conversely, if measurement noise obliterates the signals then data recovery may not be possible. Therefore, estimator performance is often discussed in terms of the prevailing signal-to-noise ratio.

Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition Garry A. Einicke Prime Publishing – Amazon.com Second Edition published February 2019 Copyright © 2019 Garry A. Einicke All rights reserved. No part of this publication may be reproduced or used in any manner without acknowledgment and attribution of the author. ISBN 978-0-6485115-1-9 Kindle format ISBN 978-0-6485115-0-2 Printable pdf format G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 6. Continuous-time Smoothing 6.1 Introduction 137 The previously-described minimum-mean-square-error and minimum-variance filtering solutions operate on measurements up to the current time. If some processing delay can be tolerated then improved estimation performance can be realised through the use of smoothers. There are three state-space smoothing technique categories, namely, fixed-point, fixed-lag and fixed-interval smoothing. Fixed-point smoothing refers to estimating some linear combination of states at a previous instant in time. In the case of fixed-lag smoothing, a fixed time delay is assumed between the measurement and on-line estimation processes. Fixedinterval smoothing is for retrospective data analysis, where measurements recorded over an interval are used to obtain the improved estimates. Compared to filtering, smoothing has a higher implementation cost, as it has increased memory and calculation requirements. A large number of smoothing solutions have been reported since Wiener’s and Kalman’s development of the optimal filtering results – see the early surveys [1] – [2]. The minimum-variance fixed-point and fixed-lag smoother solutions are well known. Two fixed-interval smoother solutions, namely the maximum-likelihood smoother developed by Rauch, Tung and Striebel [3], and the two-filter FraserPotter formula [4], have been in widespread use since the 1960s. However, the minimum-variance fixed-interval smoother is not well known. This smoother is simply a time-varying state-space generalisation of the optimal Wiener solution. It differs from the Rauch-Tung-Striebel and Fraser-Potter solutions, which may not sit well with more orthodox practitioners. The main approaches for continuous-time fixed-point, fixed-lag and fixed-interval smoothing are canvassed here. It is assumed throughout that the underlying noise processes are zero mean and uncorrelated. Nonzero means and correlated processes can be handled using the approaches of Chapters 3 and 4. It is also assumed here that the noise statistics and state-space model parameters are known precisely. Note that techniques for estimating parameters and accommodating uncertainty are addressed subsequently. “Life has got a habit of not standing hitched. You got to ride it like you find it. You got to change with it. If a day goes by that don’t change some of your old notions for new ones,that is just about like trying to milk a dead cow.” Woodrow Wilson Guthrie Chapter 6 Continuous-Time Smoothing 138 Some prerequisite concepts, namely time-varying adjoint systems, backwards differential equations, Riccati equation comparison and the continuous-time maximum-likelihood method are covered in Section 6.2. Section 6.3 outlines a derivation of the fixed-point smoother by Meditch [5]. The fixed-lag smoother reported by Sage et al [6] and Moore [7], is the subject of Section 4. Section 5 deals with the Rauch-Tung-Striebel [3], Fraser-Potter [4] and minimum-variance fixed-interval smoother solutions [8] - [10]. As before, the approach here is to accompany the developments, where appropriate, with proofs about performance being attained. Smoothing is not a panacea for all ills. If the measurement noise is negligible then smoothing (and filtering) may be superfluous. Conversely, if measurement noise obliterates the signals then data recovery may not be possible. Therefore, estimator performance is often discussed in terms of the prevailing signal-to-noise ratio. 6.2 Prerequisites 6.2.1 Time-varying Adjoint Systems Since fixed-interval smoothers employ backward processes, it is pertinent to introduce the adjoint of a time-varying continuous-time system. Let  denote a linear time-varying system x (t )  A(t ) x(t )  B(t ) w(t ) , y (t )  C (t ) x(t )  D(t ) w(t ) , (1) (2) operating on the interval [0, T]. Let w denote the set of w(t) over all time t, that is, w = {w(t), t  [0, T]}. Similarly, let y =  w denote {y(t), t  [0, T]}. The adjoint of  , denoted by  H , is the unique linear system satisfying <y,  w> =<  H y, w> (3) for all y   q and w   p . Lemma 1: The adjoint  having the realisation H of the system  described by (1) – (2), with x(t0) = 0,  (t )   AT (t ) (t )  C T (t )u (t ) , z (t )  B (t ) (t )  D (t )u (t ) , T T (4) (5) with  (T )  0 , satisfies (3). “The simple faith in progress is not a conviction belonging to strength, but one belonging to acquiescence and hence to weakness.” Norbert Wiener G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 139 The proof follows mutatis mutandis from that of Lemma 1 of Chapter 3 and is set out in [11]. The original system (1) – (2) needs to be integrated forwards in time, whereas the adjoint system (4) – (5) needs to be integrated backwards in time. Some important properties of backward systems are discussed in the next section. The simplification D(t) = 0 is assumed below unless stated otherwise. 6.2.2 Backwards Differential Equations The adjoint state evolution (4) is rewritten as  (t )  AT (t ) (t )  C T (t )u (t ) . (6) The negative sign of the derivative within (6) indicates that this differential equation proceeds backwards in time. The corresponding state transition matrix is defined below. Lemma 2: The differential equation (6) has the solution t  (t )   H (t , t0 ) (t0 )    H ( s, t )C T ( s)u ( s )ds , t0 (7) where the adjoint state transition matrix,  H (t , t0 ) , satisfies H  H (t , t )  d  (t , t0 )   AT (t ) H (t , t ) ,  0 0 dt (8) with boundary condition  H (t , t ) = I. (9) Proof: Following the proof of Lemma 1 of Chapter 3, by differentiating (7) and substituting (4) – (5), it is easily verified that (7) is a solution of (6). The Lyapunov equation corresponding to (6) is described next because it is required in the development of backwards Riccati equations. Lemma 3: In respect of the backwards differential equation (6), assume that u(t) is a zero-mean white process with E{u(t)uT(τ)} = U(t)δ(t – τ) that is uncorrelated with  (t0 ) , namely, E{u (t ) T (t0 )} = 0. Then the covariances P(t, τ) = dE{ (t ) T (t )} E{ (t ) T (t )} and P (t , ) = satisfy the Lyapunov differential dt equation “Progress always involves risk; you can’t steal second base and keep your foot on first base.” Frederick James Wilcox Chapter 6 Continuous-Time Smoothing 140  P (t , )  A(t )T P (t , )  P(t , ) A(t )  C T (t )U (t )C (t ) . (10) Proof: The backwards Lyapunov differential equation (10) can be obtained by dE{ (t ) T (t )} using (6) and (7) within = E{(t ) T (t ) +  ( )T ( k )} (see the dt proof of Lemma 2 in Chapter 3). 6.2.3 Comparison of Riccati Equations The following Riccati Equation comparison Theorem is required subsequently to compare the performance of filters and smoothers. Theorem 1 (Riccati Equation Comparison Theorem) [12], [8]: Let P1(t) ≥ 0 and P2(t) ≥ 0 denote solutions of the Riccati differential equations P1 (t )  A1 (t ) P1 (t )  P1 (t ) A1T (t )  P1 (t ) S1 (t ) P1 (t )  B1 (t )Q1 (t ) B1T (t )  B(t )Q (t ) BT (t ) (11) and P2 (t )  A2 (t ) P2 (t )  P2 (t ) A2T (t )  P2 (t ) S 2 (t ) P2 (t )  B2 (t )Q2 (t ) B2T (t )  B (t )Q (t ) B T (t ) (12) with S1(t) = C1T (t ) R11 (t )C1 (t ) , S2(t) = C2T (t ) R21 (t )C2 (t ) , where A1(t), B1(t), C1(t), Q1(t) ≥ 0, R1(t) ≥ 0, A2(t), B2(t), C2(t), Q2(t) ≥ 0 and R2(t) ≥ 0 are of appropriate dimensions. If (i) P1(t0) ≥ P2(t0) for a t0 ≥ 0 and  Q (t ) A1 (t )   Q (t ) A2 (t )  (ii)  T1 ≥  T2   for all t ≥ t0.  A1 (t )  S1 (t )   A2 (t )  S 2 (t )  Then P1(t) ≥ P2(t) (13) for all t ≥ t0. Proof: Condition (i) of the Theorem is the initial step of an induction argument. For the induction step, denote P3 (t ) = P1 (t ) − P2 (t ) , P3(t) = P1(t) − P2(t) and A(t )   A1T (t ) + S1 (t ) P2 (t ) − 0.5S1 (t ) P3 (t ) . Then “The price of doing the same old thing is far higher than the price of change.” William Jefferson (Bill) Clinton G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 141 P3 (t )  A(t ) P3 (t )  P3 (t ) AT (t )   Q (t ) A1 (t )  Q2 (t ) A2 (t )    I  P2 (t )    1t  t     A1 (t )  S1 (t )   A2 (t )  S 2 (t )    P2 (t )  I which together with condition (ii) yields P3 (t )  A(t ) P3 (t )  P3 (t ) AT (t ) . (14) Lemma 5 of Chapter 3 and (14) imply P3 (t ) ≥ 0 and the claim (13) follows. 6.2.4 The Maximum-Likelihood Method Rauch, Tung and Streibel famously derived their fixed-interval smoother [3] using a maximum-likelihood technique which is outlined as follows. Let x(t) ~  (  , Rxx) denote a continuous random variable having a Gaussian (or normal) distribution within mean E{x(t)} = μ and covariance E{(x(t) – μ)(x(t) – μ)T} = Rxx. The continuous-time Gaussian probability density function of x(t)   n is defined by p( x(t ))  1 (2 ) n/ 2 Rxx 1/ 2 exp 0.5( x(t )   )T Rxx1 ( x(t )   ) , (15) in which |Rxx| denotes the determinant of Rxx. The probability that the continuous random variable x(t) with a given probability density function p(x(t)) lies within an interval [a, b] is given by the likelihood function (which is also known as the cumulative distribution function) b P(a  x (t )  b)   p ( x(t ))dx . (16) a The Gaussian likelihood function for x(t) is calculated from (15) and (16) as f ( x(t ))  1 (2 ) n/ 2 Rxx 1/ 2    exp 0.5( x(t )   )T Rxx1 ( x(t )   ) dx . (17) It is often more convenient to work with the log-probability density function log p( x(t ))   log (2 ) n / 2 Rxx 1/ 2  0.5( x(t )   )T Rxx1 ( x   )dx (18) and the log-likelihood function “Faced with the choice between changing one’s mind and proving that there is no need to do so, almost everyone gets busy on the proof.” John Kenneth Galbraith Chapter 6 Continuous-Time Smoothing 142 log f ( x(t ))   log (2 ) n / 2 Rxx   0.5 ( x(t )   )T Rxx1 ( x   )dx. 1/ 2 (19)  Suppose that a given record of x(t) is assumed to be belong to a Gaussian distribution that is a function of an unknown quantity θ. A statistical approach for estimating the unknown θ is the method of maximum likelihood. This typically involves finding an estimate ˆ that either maximises the log-probability density function ˆ  arg max log p( | x(t )) (20)  or maximises the log-likelihood function ˆ  arg max  log f ( | x (t )) . (21) So-called maximum likelihood estimates can be found by setting either  log p ( | x(t ))  log f ( | x(t )) or to zero and solving for the unknown θ.   Continuous-time maximum likelihood estimation is illustrated by the two examples that follow. Example 1. Consider the first-order autoregressive system x (t )   a0 x(t )  w(t ) , (22) dx(t ) , w(t) is a zero-mean Gaussian process and a0 is unknown. It dt follows from (22) that x (t ) ~  (a0 x(t ),  w2 ) , namely, where x (t ) = f ( x (t ))  1 (2 ) n / 2  w  T 0 exp 0.5( x (t )  a0 x(t )) 2  w2  dt . (23) Taking the logarithm of both sides gives T log f ( x (t ))   log (2 ) n / 2  w  0.5 w2  ( x (t )  a0 x(t )) 2 dt . (24) 0 Setting  log f ( x (t )) = 0 results in a0  T 0 ( x (t )  a0 x(t )) x(t )dt = 0 and hence “When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is probably wrong.” Arthur Charles Clarke 143 G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 aˆ0    T 0 ( x 2 (t )dt  1  T 0 x (t ) x (t )dt . (25) Example 2. Consider the third-order autoregressive system  x (t )  a2  x(t )  a1 x (t )  a0 x (t )  w(t ) (26) d 3 x(t ) d 2 x(t ) and  x(t ) = . The above system can be written in a 3 dt dt 2 controllable canonical form as where  x (t ) =  x1 (t )   a2  x (t )    1  2    x3 (t )   0 a1 0 1  a0   x1 (t )   w(t )  0   x2 (t )    0  . 0   x3 (t )   0  (27) Assuming x1 (t ) ~  (a2 x1 (t )  a1 x2 (t )  a0 x3 (t ),  w2 ) , taking logarithms, setting to zero the partial derivatives with respect to the unknown coefficients, and rearranging yields 1 T T T  T x 2 dt x2 x3 dt  x1 x3 dt    x1 x3 dt  3   0 0  0   0   aˆ0  T T   T   aˆ     T x x dt 2  0 2 3 0 x2 dt 0 x2 x1dt   0 x1 x2 dt  ,  1  T   T   aˆ2  T T   x1 x3 dt  x2 x1dt  x12 dt    x1 x1dt  0 0  0   0  in which state time dependence is omitted for brevity. 6.3 (28) Fixed-Point Smoothing 6.3.1 Problem Definition In continuous-time fixed-point smoothing, it is desired to calculate state estimates at one particular time of interest, τ, 0 ≤ τ ≤ t, from measurements z(t) over the interval t  [0, T]. For example, suppose that a continuous measurement stream of a tennis ball’s trajectory is available and it is desired to determine whether it bounced within the court boundary. In this case, a fixed-point smoother could be employed to estimate the ball position at the time of the bounce from the past and future measurements. “Don’t be afraid to take a big step if one is indicated. You can’t cross a chasm in two small jumps.” David Lloyd George Chapter 6 Continuous-Time Smoothing 144 A solution for the continuous-time fixed-point smoothing problem can be developed from first principles, for example, see [5] - [6]. However, it is recognised in [13] that a simpler solution derivation follows by transforming the smoothing problem into a filtering problem that possesses an augmented state. Following the nomenclature of [14], consider an augmented state vector having  x (t )  n two components, namely, x(a)(t) =   . The first component, x(t)   , is the  (t )  state of the system x (t ) = A(t)x(t) + B(t)w(t) and y(t) = C(t)x(t). The second component,  (t )   n , equals x(t) at time t = τ, that is,  (t ) = x(τ). The corresponding signal model may be written as x ( a ) (t )  A( a ) (t ) x ( a ) (t )  B ( a ) (t ) w(t ) z (t )  C (t ) x (t )  v(t ) , (a) (a) (29) (30) 0  (a)  A(t )  B(t )  (a) where A(a) =  , B (t) =    and C (t) = [C(t) 0], in which  0  t A(t )   t B(t )  1 if t    t   is the Kronecker delta function. Note that the simplifications 0 if t    A(t ) 0   B (t )  A(a) =  and B(a)(t) =    arise for t > τ. The smoothing objective is to 0  0  0  produce an estimate ˆ(t ) of  (t ) from the measurements z(t) over t  [0, T]. 6.3.2 Solution Derivation Employing the Kalman-Bucy filter recursions for the system (29) – (30) results in xˆ ( a ) (t )  A( a ) (t ) xˆ ( a ) (t )  K ( a ) (t )  z (t )  C ( a ) (t ) xˆ (t | t )    A( a ) (t )  K ( a ) (t )C ( a ) (t )  x ( a ) (t )  K ( a ) (t ) z (t ) , (31) where K ( a ) (t )  P ( a ) (t )(C ( a ) )T (t ) R 1 (t ) , in which P(a)(t)   2 n  2n (32) is to be found. Consider the partitioning K ( a ) (t ) =  K (t )   K (t )  , then for t > τ, (31) may be written as   “If you don’t like change, you’re going to like irrelevance even less.” General Eric Shinseki G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019  xˆ(t | t )   A(t )  K (t )C (t ) 0   xˆ (t | t )   K (t )   z (t ) .  ˆ  0    (t )   K (t )    (t )    K (t )C (t ) 145 (33) Define the augmented error state as x ( a ) (t ) = x ( a ) (t ) − xˆ ( a ) (t ) , that is,  x (t | t )   x(t )   xˆ(t | t )    (t )    ( )    ˆ  .       (t )  (34) Differentiating (34) and using z(t) = C (t ) x (t | t ) + v(t) gives  x (t | t )   x (t | t )   xˆ (t | t )         (t )   0   ˆ(t )   A(t )  K (t )C (t ) 0   x (t | t )   B (t )  K (t )   w(t )    . 0    (t )   0  K (t )   v(t )    K (t )C (t ) (35)  P(t ) T (t )  T Denote P(a)(t) =   , where P(t) = E{{x(t) − xˆ(t | t )][ x(t ) − xˆ(t | t )] } ,   (t )  (t )  (t ) = E{[ (t ) − ˆ(t )][ (t ) − ˆ(t )]T } and (t ) = E{[ (t ) − ˆ(t )][ x(t ) − xˆ(t | t )]T } . Applying Lemma 2 of Chapter 3 to (35) yields the Lyapunov differential equation  P (t )  T (t )   A(t )  K (t )C (t ) 0   P (t ) T (t )         0   (t ) (t )   (t ) (t )    K (t )C (t )  P (t ) T (t )   AT (t )  C T (t ) K T (t ) C T (t ) K T (t )     0 0  (t ) (t )    0   BT (t ) 0   B(t )  K (t )  Q (t )   T .     K (t )   0 R (t )    K (t )  K T (t )   0 Simplifying the above differential equation yields P (t )  A(t ) P(t )  P(t ) AT (t )  P (t )C T (t ) R 1 (t )C (t ) P(t )  B (t )Q (t ) BT (t ) ,  (t )  (t )  AT (t )  C T (t ) K T (t )  ,  (t )   (t )C T (t ) R 1 (t )C (t )T (t ) .  (36) (37) (38) Equations (37) – (38) can be initialised with “The great virtue of my radicalism lies in the fact that I am perfectly ready, if necessary, to be radical on the conservative side” Theodore Roosevelt Chapter 6 Continuous-Time Smoothing 146 ( )  P( ) . (39) Thus, the fixed-point smoother estimate is given by ˆ(t )  (t )C T (t ) R 1 (t )  z (t )  C (t ) xˆ (t | t )  , (40) which is initialised with ˆ( ) = xˆ( ) . Alternative derivations of (40) are presented in [5], [8], [15]. The smoother (40) and its associated error covariances (36) – (38) are also discussed in [16], [17]. 6.3.3 Performance It can be seen that the right-hand-side of the smoother error covariance (38) is non-positive and therefore Ω(t) must be monotonically decreasing. That is, the smoothed estimates improve with time. However, since the right-hand-side of (36) varies inversely with R(t), the improvement reduces with decreasing signal-tonoise ratio. It is shown below the fixed-point smoother improves on the performance of the minimum-variance filter. Lemma 4: In respect of the fixed-point smoother (40), P(t )  (t ) . (41) Proof: The initialisation (39) accords with condition (i) of Theorem 1. Condition (ii) of the theorem is satisfied since A(t )  Q (t )   C T (t ) R 1 (t )C (t )0 0    AT (t ) C T (t ) R 1 (t )C (t )    0 0    and hence the claim (41) follows. 6.4 Fixed-Lag Smoothing 6.4.1 Problem Definition For continuous-time estimation problems, as usual, it assumed that the observations are modelled by x (t ) = A(t)x(t) + B(t)w(t), z(t) = C(t)x(t) + v(t), with E{w(t ) wT ( )} = Q(t ) (t   ) and E{v(t )vT ( )} = R(t ) (t   ) . In fixed-lag smoothing, it is desired to calculate state estimates at a fixed time lag behind the “Change will not come if we wait for some other person or some other time. We are the ones we’ve been waiting for. We are the change that we seek.” Barack Hussein Obama II G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 147 current measurements. That is, smoothed state estimates, xˆ(t | t   ) , are desired at time t, given data at time t + τ, where τ is a prescribed lag. In particular, fixed-lag smoother estimates are sought which minimise E{[x(t) − xˆ(t | t   )][ x(t ) − xˆ(t | t   )]T } . It is found in [18] that the smoother yields practically all the improvement over the minimum-variance filter when the smoothing lag equals several time constants associated with the minimum-variance filter for the problem. 6.4.2 Solution Derivation Previously, augmented signal models together with the application of the standard Kalman filter recursions were used to obtain the smoother results. However, as noted in [19], it is difficult to derive the optimal continuous-time fixed-lag smoother in this way because an ideal delay operator cannot easily be included within an asymptotically stable state-space system. Consequently, an alternate derivation based on that in [6] is outlined in the following. Recall that the gain of the minimum-variance filter is calculated as K(t) = P(t )C T (t ) R 1 (t ) , where P(t) is the solution of the Riccati equation (36). Let  ( , t ) denote the transition matrix of the filter error system x (t | t ) = (A(t) – K (t )C (t )) x (t | t ) + B(t ) w(t ) − K (t )v(t ) , that is,  (t , s)   A( )  K ( )C ( )  (t , s )  (42) and ( s, s ) = I. It is assumed in [6], [17], [18], [20] that a smoothed estimate xˆ(t | t   ) of x(t) is obtained as xˆ (t | t   )  xˆ (t )  P(t ) (t , ) , (43) where  (t , t   )   t  t T ( , t )C T ( )R 1 ( )  z ( )  C ( ) xˆ ( |  )  d . (44) The formula (43) appears in the development of fixed interval smoothers [21] [22], in which case ξ(t) is often called an adjoint variable. From the use of Leibniz’ rule, that is, b(t )  d b (t ) db(t ) da (t ) f (t , s ) ds  f (t , b(t ))  f (t , a(t ))  f (t , s )ds ,  a ( t ) a ( t ) t dt dt dt “Change is like putting lipstick on a bulldog. The bulldog’s appearance hasn’t improved, but now it’s really angry.” Rosbeth Moss Kanter Chapter 6 Continuous-Time Smoothing 148 it can be found that  (t , t   )  T (t   )C T (t   ) R 1 (t   )  z (t   )  C (t   ) xˆ (t   ) | t   )  (45) C T (t ) R 1 (t )  z (t )  C (t ) xˆ (t | t )    A(t )  K (t )C (t )   (t , t   ) . T Differentiating (43) with respect to t gives xˆ (t | t   )  xˆ (t | t )  P (t ) (t , )  P(t )(t , ) . (46) Substituting  (t , ) = P 1 (t )  xˆ (t | t   )  xˆ (t | t )  and expressions for xˆ(t ) , P (t ) , (t , t   ) into (43) yields the fixed–lag smoother differential equation xˆ (t | t   )  A(t ) xˆ (t | t   )  B(t )Q (t ) BT (t ) P 1 (t )  xˆ (t | t   )  xˆ (t | t )   P (t )T (t   , t )C T (t   ) R 1 (t   )  z (t   )  C (t   ) xˆ (t   | t   )  . (47) 6.4.3 Performance Lemma 5 [18]: P(t) – E{[x(t) − xˆ(t | t   )][ x(t ) − xˆ(t | t   )]T } > 0. (48) Proof. It is argued from the references of [18] for the fixed-lag smoothed estimate that E{[ x(t )  xˆ (t | t   )][ x(t )  xˆ (t | t   )]T }   P(t )  P (t )  T ( s, t )C T ( s ) R 1 ( s )C ( s) ( s, t ) ds P (t ) . t (49) Thus, (48) follows by inspection of (49). That is to say, the minimum-variance filter error covariance is greater than fixedlag smoother error covariance. It is also argued in [18] that (48) implies the error covariance decreases monotonically with the smoother lag τ. “An important scientific innovation rarely makes its way by gradually winning over and converting its opponents: What does happen is that the opponents gradually die out.” Max Karl Ernst Ludwig Planck G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 6.5 149 Fixed-Interval Smoothing 6.5.1 Problem Definition Many data analyses occur off-line. In medical diagnosis for example, reviews of ultra-sound or CAT scan images are delayed after the time of measurement. In principle, smoothing could be employed instead of filtering for improving the quality of an image sequence. Fixed-lag smoothers are elegant – they can provide a small performance improvement over filters at moderate increase in implementation cost. The best performance arises when the lag is sufficiently large, at the expense of increased complexity. Thus, the designer needs to trade off performance, calculation cost and delay. Fixed-interval smoothers are a brute-force solution for estimation problems. They provide improved performance without having to fine tune a smoothing lag, at the cost of approximately twice the filter calculation complexity. Fixed interval smoothers involve two passes. Typically, a forward process operates on the measurements. Then a backward system operates on the results of the forward process. The plants are again assumed to have state-space realisations of the form x (t ) = A(t)x(t) + B(t)w(t) and y(t) = C(t)x(t) + D(t)w(t). Smoothers are considered which operate on measurements z(t) = y(t) + v(t) over a fixed interval t  [0, T]. The performance criteria depend on the quantity being estimated, viz.,  in input estimation, the objective is to calculate a wˆ (t | T ) that minimises  E{[w(t) − wˆ (t | T )][ w(t ) − wˆ (t | T )]T } ; in state estimation, xˆ(t | T ) is calculated which achieves the minimum  E{[x(t) − xˆ(t | T )][ x(t ) − xˆ(t | T )]T } ; and in output estimation, yˆ(t | T ) is produced such that E{[y(t) − yˆ(t | T )][ y (t ) − yˆ(t | T )]T } is minimised. This section focuses on three continuous-time fixed-interval smoother formulations; the maximum-likelihood smoother derived by Rauch, Tung and Streibel [3], the Fraser-Potter smoother [4] and a generalisation of Wiener’s optimal unrealisable solution [8] – [10]. Some additional historical background to [3] – [4] is described within [1], [2], [17]. “The soft-minded man always fears change. He feels security in the status quo, and he has an almost morbid fear of the new. For him, the greatest pain is the pain of a new idea.” Martin Luther King Jr. Chapter 6 Continuous-Time Smoothing 150 6.5.2 The Maximum Likelihood Smoother 6.5.2.1 Solution Derivation Rauch, Tung and Streibel [3] employed the maximum-likelihood method to develop a discrete-time smoother for state estimation and then used a limiting argument to obtain a continuous-time version. A brief outline of this derivation is set out here. Suppose that a record of filtered estimates, xˆ( |  ) , is available over a fixed interval τ  [0, T]. Let xˆ( | T ) denote smoothed state estimates at time 0 ≤ τ ≤ T to be evolved backwards in time from filtered states xˆ( |  ) . The smoother development is based on two assumptions. First, it is assumed that  xˆ( | T ) is normally distributed with mean A( ) xˆ ( | T ) and covariance B(τ)Q(τ)BT(τ), that is,  xˆ( | T ) ~  ( A( ) xˆ ( | T ), B(τ)Q(τ)BT(τ)). The probability density function of  xˆ( | T ) is p( xˆ ( | T ) | xˆ ( | T ))  1 (2 ) n/ 2 B( )Q ( ) BT ( ) 1/ 2    exp 0.5( xˆ ( | T )  A( ) xˆ ( | T ))T ( B ( )Q ( ) BT ( )) 1 ( xˆ ( | T )  A( ) xˆ ( | T )) Second, it is assumed that xˆ( | T ) is normally distributed with mean xˆ( |  ) and covariance P(τ), namely, xˆ( | T ) ~ N ( xˆ ( |  ) , P(τ)). The corresponding probability density function is p ( xˆ ( | T ) | xˆ ( |  ))  1 (2 ) n/ 2 P( ) 1/ 2  exp 0.5( xˆ ( | T )  xˆ ( |  ))T P 1 (t )( xˆ ( | T )  xˆ ( |  )). From the approach of [3] and the further details in [6],  log p( xˆ ( | T ) | xˆ ( | T )) p ( xˆ ( | T ) | xˆ ( |  )) xˆ( | T )  log p( xˆ ( | T ) | xˆ ( | T ))  log p( xˆ (t | T ) | xˆ (t | t )) = + xˆ( | T ) xˆ(t | T ) 0= results in “If you want to truly understand something, try to change it.” Kurt Lewin G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 0 151  ( xˆ ( | T )  A( ) xˆ ( | T ))T xˆ( | T ) ( B ( )Q( ) BT ( )) 1 (  xˆ ( | T )  A( ) xˆ ( | T ))  P 1 ( )( xˆ (t | T )  xˆ ( |  )) . Hence, the solution is given by  xˆ ( | T )  A( ) xˆ ( | T )  G( )  xˆ ( | T )  xˆ ( |  )  , (50) where G ( )   B( )Q( ) BT ( )  ( xˆ ( | T )  A( ) xˆ ( | T ))T 1 P ( ) xˆ( | T ) (51) is the smoother gain. Suppose that xˆ( | T ) , A(τ), B(τ), Q(τ), P−1(τ) are sampled at integer k multiples of Ts and are constant during the sampling interval. Using the xˆ ((k  1)Ts | T )  xˆ (kTs | T ) Euler approximation  xˆ( kTs | T ) = , the sampled gain Ts may be written as G (kTs )  B ( kTs )Ts1Q ( kTs ) BT ( kTs )( I  ATs ) P 1 (kTs ) . (52) Recognising that Ts1Q (kTs ) = Q(τ), see [23], and taking the limit as Ts → 0 and yields G ( )  B( )Q( ) BT ( ) P 1 ( ) . (53) To summarise, the above fixed-interval smoother is realised by the following twopass procedure. (i) In the first pass, the (forward) Kalman-Bucy filter operates on measurements z(τ) to obtain state estimates xˆ( |  ) . (ii) In the second pass, the differential equation (50) operates on the filtered state estimates xˆ( |  ) to obtain smoothed state estimates xˆ( | T ) . Equation (50) is integrated backwards in time from the initial condition xˆ( | T ) = xˆ( |  ) at τ = T. Alternative derivations of this smoother appear in [6], [20], [23], [24]. “There is a certain relief in change, even though it be from bad to worse. As I have often found in travelling in a stagecoach that it is often a comfort to shift one’s position, and be bruised in a new place.” Washington Irving Chapter 6 Continuous-Time Smoothing 152 6.5.2.2 Alternative Form For the purpose of developing an alternate form of the above smoother found in the literature, consider a fictitious forward version of (50), namely, xˆ (t | T )  A(t ) xˆ (t | T )  B(t )Q (t ) BT (t ) P 1 (t )  xˆ (t | T )  xˆ (t | t )   A(t ) xˆ (t | T )  B (t )Q(t ) BT (t ) (t | T ) , (54)  (t | T )  P 1 (t )( xˆ (t | T )  xˆ (t | t )) (55) where is an auxiliary variable. An expression for the evolution of  (t | T ) is now developed. Writing (55) as xˆ ( | T )  xˆ (t | t )  P (t ) (t | T ) (56) and taking the time differential results in xˆ (t | T )  xˆ (t | t )  P (t ) (t | T )  P(t )(t | T ) . (57) Substituting xˆ(t | t ) = A(t ) xˆ (t | t ) + P(t )C T (t ) R 1 ( z (t ) − C (t ) xˆ (t | t )) into (57) yields P(t )(t | T )  P(t )C T (t ) R 1C (t )  P(t )C T (t ) R 1 z (t )  A(t ) P(t ) (t )  B (t )Q(t ) BT (t )  P (t ) (t ). (58) P(t ) (t | T ) ,  P(t ) AT (t ) = A(t ) P (t ) – P(t )C T (t ) R 1 (t )C (t ) P (t ) (t | T ) + B(t )Q(t ) BT (t ) – P (t ) within (58) and rearranging gives Using xˆ(t | t ) = xˆ(t | T ) – (t | T )  C T (t ) R 1 (t )C (t ) xˆ (t | T )  AT (t ) (t )  C T (t ) R 1 (t ) z (t ) . (59) The filter (54) and smoother (57) may be collected together as  xˆ (t | T )   0 A(t ) B(t )Q (t ) BT (t )   xˆ(t | T )     T    T    1 1 T A (t )   (t | T )  C (t ) R (t ) z (t )    (t | T )   C (t ) R (t )C (t ) (60) “Discontent is the first step in the progress of a man or a nation.” Oscar Fingal O’Flahertie Wills Wilde G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 153 Equation (60) is known as the Hamiltonian form of the Rauch-Tung-Striebel smoother [17]. 6.5.2.3 Performance In order to develop an expression for the smoothed error state, consider the backwards signal model  x ( )  A( ) x( )  B ( ) w( ) . (61) Subtracting (50) from (61) results in  x ( )  xˆ ( | T )  ( A( )  G ( ))( x( )  xˆ ( | T )) G ( )( x( )  xˆ ( |  ))  B ( ) w( ) . (62) Let x ( | T ) = x( ) − xˆ( | T ) denote the smoothed error state and x ( |  ) = x( ) − xˆ( |  ) denote the filtered error state. Then the differential equation (62) can simply be written as  x ( | T )  ( A( )  G ( ))( x ( | T )  G ( ) x ( |  )  B ( ) w( ) , where (63)  x ( | T ) = ( xˆ ( | T )  xˆ( )) . Applying Lemma 3 to (63) and using E{xˆ ( |  ) , wT ( )} = 0 gives  ( | T )  ( A( )  G ( ))( | T )  ( | T )( A( )  G ( ))T  B( )Q( ) B T ( ) , (64) where ( | T ) = E{xˆ ( | T ) , xˆT ( | T )} is the smoother error covariance and d ( | T )  ( | T ) = . The smoother error covariance differential equation (64) is d solved backwards in time from the initial condition ( | T )  P (t | t ) (65) at t = T, where P(t | t ) is the solution of the Riccati differential equation P (t )  ( A(t )  K (t )C (t )) P(t )  P(t )( AT (t )  C T (t ) K T (t ))  K (t ) R (t ) K T (t )  B (t )Q (t ) BT (t )  A(t ) P(t )  P(t ) AT (t )  K (t ) R(t ) K T (t )  B (t )Q(t ) BT (t ) . (66) “That which comes into the world to disturb nothing deserves neither respect nor patience.” Rene Char Chapter 6 Continuous-Time Smoothing 154 It is shown below that this smoother outperforms the minimum-variance filter. For the purpose of comparing the solutions of forward Riccati equations, consider a fictitious forward version of (64), namely,  (t | T )  ( A(t )  G (t ))(t | T )  (t | T )( A(t )  G (t ))T  B(t )Q(t ) BT (t ) (67) initialised with (t0 | T )  P (t0 | t0 )  0 . (68) Lemma 6: In respect of the fixed-interval smoother (50), P(t | t )  (t | T ) . (69) Proof: The initialisation (68) satisfies condition (i) of Theorem 1. Condition (ii) of the theorem is met since  B(t )Q(t ) BT (t )  K (t ) R (t ) K T (t )  AT (t )  C T (t ) K T (t )  A(t )  K (t )C (t )    B (t )Q (t ) BT (t )  T T 0   A (t )  G (t ) A(t )  G (t )   0  for all t ≥ t0 and hence the claim (69) follows. 6.5.3 The Fraser-Potter Smoother The Central Limit Theorem states that the mean of a sufficiently large sample of independent identically distributed random variables will be approximately normally distributed [25]. The same is true of partial sums of random variables. The Central Limit Theorem is illustrated by the first part of the following lemma. A useful generalisation appears in the second part of the lemma. Lemma 7: Suppose that y1, y2, …, yn are independent random variables and W1, W2, … Wn are independent positive definite weighting matrices. Let μ = E{y}, u = y1 + y2 + … + yn and v = (W1y1 + W2y2 + … + Wnyn) (W1 + W2 + … Wn)-1. (i) (ii) (70) If yi ~  (  , R), i = 1 to n, then u ~  (n , nR ); If yi ~  (0, I), i = 1 to n, then v ~  (0, I). Proof: (i) E{u} = E{y1} + E(y2) + … + E{yn} = nμ. E{(u − μ)(u − μ)T} = E{(y1 − μ)(y1 − μ)T} + E{(y2 − μ)(y2 − μ)T} + … + E{(yn − μ)(yn − μ)T} = nR. “Today every invention is received with a cry of triumph which soon turns into a cry of fear.” Bertolt Brecht 155 G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 (ii) E{v} = W1(W1 + W2 + … + Wn)-1E{y1} + W2(W1 + W2 + … + Wn)-1E(y2) + … + Wn(W1 + W2 + … Wn)-1E{yn}) = 0. E{vvT} = E{(W1T + W2T + … + WnT )1W1T y1T y1W1 (W1 + W2 + … WN ) 1} + E{(W1T + W2T + … + WnT )1W2T y2T y2W2 (W1 + W2 + … WN ) 1} + … + E{(W1T + W2T + … + WnT ) 1WnT ynT ynWn (W1 + W2 + … WN ) 1} = (W1T + W2T + … + WnT ) 1 (W1T W1 + W2T W2 + … WnT Wn )(W1 + W2 + … Wn )1 = 1. Fraser and Potter reported a smoother in 1969 [4] that combined state estimates from forward and backward filters using a formula similar to (70) truncated at n = 2. The inverses of the forward and backward error covariances, which are indicative of the quality of the respective estimates, were used as weighting matrices. The combined filter and Fraser-Potter smoother equations are xˆ (t | t )  A(t ) xˆ (t | t )  P (t | t )C T (t ) R 1 (t )( z (t )  C (t ) xˆ (t | t )) , (t | t )  A(t ) (t | t )  (t | t )C T (t ) R 1 (t )( z (t )  C (t ) (t | t )) , xˆ (t | T )  ( P (t | t )   (t | t )) ( P (t | t ) xˆ (t | t )   (t | t ) (t | t )) , 1 1 1 1 1 (71) (72) (73) where P(t | t ) is the solution of the forward Riccati equation P (t | t ) = A(t ) P (t | t ) + P(t | t ) AT (t ) − P(t | t )C T (t ) R 1 (t )C (t ) P (t | t ) + B(t )Q(t ) BT (t ) and (t | t ) is the solution of the backward Riccati equation  (t | t ) = A(t )(t | t ) + (t | t ) AT (t ) − (t | t )C T (t ) R 1 (t )C (t )(t | t ) + B(t )Q(t ) BT (t ) . It can be seen from (72) that the backward state estimates, ζ(t), are obtained by simply running a Kalman filter over the time-reversed measurements. Fraser and Potter’s approach is pragmatic: when the data is noisy, a linear combination of two filtered estimates is likely to be better than one filter alone. However, this twofilter approach to smoothing is ad hoc and is not a minimum-mean-square-error design. 6.5.4 The Minimum-Variance Smoother 6.5.4.1 Problem Definition The previously described smoothers are focussed on state estimation. A different signal estimation problem shown in Fig. 1 is considered here. Suppose that observations z = y2 + v are available, where y2 =   w is the output of a linear timevarying system and v is measurement noise. A solution  is desired which “If there is dissatisfaction with the status quo, good. If there is ferment, so much the better. If there is restlessness, I am pleased. Then let there be ideas, and hard thought, and hard work.” Hubert Horatio Humphrey. Chapter 6 Continuous-Time Smoothing 156 produces estimates ŷ1 of a second reference system y1 =  w in such a way to meet a performance objective. Let y = y1 – ŷ1 denote the output estimation error. The optimum minimum-variance filter can be obtained by finding the solution that   T . Here, in the case of smoothing, the performance objective is to minimises yy 2  minimise yy H 2 . v w  y2    z  ŷ1    y1 y  Fig. 1. The general estimation problem. The objective is to produce estimates ŷ1 of y1 from measurements z. 6.5.4.2 Optimal Unrealisable Solutions The minimum-variance smoother is a more recent innovation [8] - [10] and arises by generalising Wiener’s optimal noncausal solution for the above time-varying problem. The solution is obtained using the same completing-the-squares technique that was previously employed in the frequency domain (see Chapters 1 and 2). It can be seen from Fig. 1 that the output estimation error is generated by y    i , where yi yi     2  1  (74) v  is a linear system that operates on the inputs i =   .  w Consider the factorisation  H   2 Q 2H  R , (75) “Restlessness and discontent are the first necessities of progress. Show me a thoroughly satisfied man and I will show you a failure.” Thomas Alva Edison G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 157 in which the time-dependence of Q(t) and R(t) is omitted for notational brevity. Suppose that Δ is causal, namely Δ and its inverse, Δ−1, are bounded systems that proceed forward in time. The system Δ is known as a Wiener-Hopf factor. Lemma 8: Assume that the Wiener-Hopf factor inverse, Δ-1, exists over t  [0, T]. Then the smoother solution   1Q 2H   H  1  1Q 2H ( H ) 1  minimises yy H  1Q 2H ( 2 Q 2H  R) 1 . 2 = yi  H  yi 2 (76) . Proof: It follows from (74) that yi yiH = 1Q 2H − 1Q 2H H −  2 Q1 H +   H H . Completing the square leads to yi yiH = yi 1yiH1 + yi 2yiH2 , where yi 2yiH2  (   1Q 2H   H )(   1Q 2H   H ) H (77) yi 1yiH1  1Q1 H  1Q 2H ( H ) 1  2 Q1H . (78) and By inspection of (77), the solution (76) achieves yi 2yiH2 2 = 0. Since yi 1yiH1 2 (79) excludes the estimator solution  , this quantity defines the lower bound for yyiH 2 . Example 3. Consider the output estimation case where 1 =  2 and OE   2 Q 2H ( 2 Q 2H  R )1 , (80) which is of order n4 complexity. Using  2 Q 2H =  H − R leads to the n2-order solution OE  I  R ( H ) 1 . (81) “Whatever has been done before will be done again. There is nothing new under the sun.” Ecclesiastes 1:9 Chapter 6 Continuous-Time Smoothing 158 It is interesting to note from (81) and yi 1yiH1   2Q 2H   2 Q 2H ( 2 Q 2H  R) 1  2 Q 2H (82) that lim   I and lim yi 1yiH1  0 . That is, output estimation is superfluous R 0 R 0 when measurement noise is absent. Let {yi yiH } = {yi 1yiH1 } + {yi 2yiH2 } denote the causal part of yi yiH . It is shown below that minimum-variance filter solution can be found using the above completing-the squares technique and taking causal parts. Lemma 9: The filter solution { }  {1Q 2H   H  1}  {1Q 2H   H }  1   H } minimises { yy 2 (83) = {yi yiH } , provided that the inverses exist. 2 Proof: It follows from (77) that {yi 2yiH2 }  {(   1Q 2H   H )(   1Q 2H   H ) H } . (84) By inspection of (84), the solution (83) achieves {yi 2yiH2 } = 0. 2 (85) It is worth pausing at this juncture to comment on the significance of the above results.  The formulation (76) is an optimal solution for the time-varying smoother problem since it can be seen from (79) that it achieves the bestpossible (minimum-error-variance) performance.  Similarly, (83) is termed an optimal solution because it achieves the bestpossible filter performance (85).  By inspection of (79) and (85) it follows that the minimum-variance smoother outperforms the minimum-variance filter.  In general, these optimal solutions are not very practical because of the difficulty in realising an exact Wiener-Hopf factor. Practical smoother (and filter) solutions that make use of an approximate WienerHopf factor are described below. “There is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its success, than to take the lead in the introduction of a new order of things.” Niccolo Di Bernado dei Machiavelli G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 159 6.5.4.3 Optimal Realisable Solutions Output Estimation The Wiener-Hopf factor is modelled on the structure of the spectral factor which is described Section 3.4.4. Suppose that R(t) > 0 for all t  [0, T] and there exist R1/2(t) > 0 such that R(t) = R1/2(t) R1/2(t). An approximate Wiener-Hopf factor, ̂ , is defined by the system  x (t )   A(t ) K (t ) R1/ 2 (t )   x(t )    (t )    , R1/ 2 (t )   z (t )    C (t ) (86) where K(t) = P(t )C T (t ) R 1 (t ) is the Kalman gain in which P(t) is the solution of the Riccati differential equation P (t )  A(t ) P(t )  P(t ) AT (t )  P (t )C T (t ) R 1 (t )C (t ) P(t )  B (t )Q (t ) BT (t ) . (87) The output estimation smoother (81) can be approximated as ˆ ˆ H ) 1 OE  I  R (  I  Rˆ  H ˆ 1 . (88) An approximate Wiener-Hopf factor inverse, ˆ 1 , within (88) is obtained from (86) and the Matrix Inversion Lemma, namely,  xˆ(t )   A(t )  K (t )C (t ) K (t )   x(t )  ,   1/ 2 1/ 2  R ( t ) C ( t ) R (t )   z (t )   (t )   (89) where xˆ(t )   n is an estimate of the state within ˆ 1 . From Lemma 1, the adjoint of ˆ 1 , which is denoted by ˆ  H , has the realisation  (t )   AT (t )  C T (t ) K T (t ) C T (t ) R 1/ 2 (t )    (t )     . K T (t ) R 1/ 2 (t )   (t )    (t )   (90) where  (t )   p is an estimate of the state within ˆ  H . Thus, the smoother (88) is realised by (89), (90) and “If I have a thousand ideas and only one turns out to be good, I am satisfied.” Alfred Bernhard Nobel Chapter 6 Continuous-Time Smoothing 160 yˆ(t | T )  z (t )  R(t )  (t ) . (91) Procedure 1. The above output estimator can be implemented via the following three steps. Step 1. Operate ˆ 1 on the measurements z(t) using (89) to obtain α(t). Step 2. In lieu of the adjoint system (90), operate (89) on the time-reversed transpose of α(t). Then take the time-reversed transpose of the result to obtain β(t). Step 3. Calculate the smoothed output estimate from (91). Example 4. Consider an estimation problem parameterised by a = – 1, b = 2 , c = 1, d = 0,  w2 =  v2 = 1, which leads to p = k = 3 – 1 [26]. Smoothed output estimates may be obtained by evolving xˆ (t )  3xˆ (t )  3z (t ) ,  (t )   xˆ (t )  z (t ) , time-reversing the  (t ) and evolving (t )  3 (t )  3 (t ) ,  (t )   (t )   (t ) , then time-reversing  (t ) and calculating yˆ(t | T )  z (t )   (t ) . Filtering The causal part {OE } of the minimum-variance smoother (88) is given by {OE }  I  R{ˆ  H } ˆ 1  I  RR 1/ 2 ˆ 1  I  R1/ 2 ˆ 1 . (92) Employing (89) within (92) leads to the standard minimum-variance filter, namely, xˆ (t | t )  ( A(t )  K (t )C (t )) xˆ (t | t )  K (t ) z (t ) yˆ (t | t )  C (t ) xˆ (t | t ) . (93) (94) “Ten geographers who think the world is flat will tend to reinforce each other’s errors….Only a sailor can set them straight.” John Ralston Saul 161 G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 Input Estimation As discussed in Chapters 1 and 2, input estimates can be found using 1 = I, and substituting ̂ for Δ within (76) yields the solution ˆ ˆ H ) 1  Q 1ˆ  H ˆ 1 . IE  Q 21 ( 2 (95) As expected, the low-measurement-noise-asymptote of this equaliser is given by limIE   21 . That is, at high signal-to-noise-ratios the equaliser approaches R 0  21 , provided the inverse exists. The development of a differential equation for the smoothed input estimate, wˆ (t | T ) , makes use of the following formula [27] for the cascade of two systems. Suppose that two linear systems 1 and  2 have state-space parameters  A1 B1   A2 B2  C D  and C D  , respectively. Then  2 1 is parameterised by  1 1  2 2 A 0 B  1 1   B C A B D  . It follows that wˆ (t | T ) = Q H ˆ  H  (t ) is realised by 2 2 1  2 1  D2 C1 C2 D2 D1   (t )   AT (t )  C T (t ) K T (t ) 0 C T (t ) R 1/ 2 (t )    (t )      T T AT (t ) C T (t ) R 1/ 2 (t )    (t )  .   (t )    C (t ) K (t )   wˆ (t | T )   Q(t ) DT (t ) K T (t ) Q(t ) BT (t ) Q (t ) DT (t ) R 1/ 2 (t )   (t )     (96) in which  (t )   n is an auxiliary state. Procedure 2. Input estimates can be calculated via the following two steps. Step 1. Operate ˆ 1 on the measurements z(t) using (89) to obtain α(t). Step 2. In lieu of (96), operate the adjoint of (96) on the time-reversed transpose of α(t). Then take the time-reversed transpose of the result. State Estimation Smoothed state estimates can be obtained by defining the reference system 1 within (76) as “In questions of science, the authority of a thousand is not worth the humble reasoning of a single individual.” Galileo Galilei Chapter 6 Continuous-Time Smoothing 162 xˆ (t | T )  A(t ) xˆ (t | T )  B (t ) wˆ (t | T ) . (97) That is, a smoother for state estimation is given by (89), (96) and (97). In frequency-domain estimation problems, minimum-order solutions are found by exploiting pole-zero cancellations, see Example 13 of Chapter 1. Here in the timedomain, (89), (96), (97) is not a minimum-order solution and some numerical model order reduction may be required. Suppose that C(t) is of rank n and D(t) = 0. In this special case, an n2-order solution for state estimation can be obtained from (91) and xˆ (t | T )  C † (t ) yˆ (t | T ) , (98) C † (t )   C T (t )C (t )  C T (t ) (99) where 1 denotes the Moore-Penrose pseudoinverse. 6.5.4.4 Performance An analysis of minimum-variance smoother performance requires an identity which is described after introducing some additional notation. Let α =  0 w denote the output of linear time-varying system having the realisation x (t )  A(t ) x(t )  w(t )  (t )  x (t ) , (100) (101) where w(t)   n and A(t)   nn . By inspection of (100) – (101), the output of the inverse system w =  01 y is given by w(t )   (t )  A(t ) (t ) . (102) Similarly, let β =  0H u denote the output of the adjoint system  0H , which from Lemma 1 has the realisation “The mind likes a strange idea as little as the body likes a strange protein and resists it with similar energy. It would not perhaps be too fanciful to say that a new idea is the most quickly acting antigen known to science. If we watch ourselves honestly we shall often find that we have begun to argue against a new idea even before it has been completely stated.” Wilfred Batten Lewis Trotter G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019  (t )  AT (t ) (t )  u (t )  (t )   (t ) . 163 (103) (104) It follows that the output of the inverse system u =  0 H  is given by u (t )    (t )  AT (t )  (t ) . (105) The following identity is required in the characterisation of smoother performance  P (t ) AT (t )  A(t ) P(t )  P(t ) 0 H   01 P(t ) , (106) where P(t) is an arbitrary matrix of compatible dimensions. The above equation can be verified by using (102) and (105) within (106). Using the above notation, the exact Wiener-Hopf factor satisfies  H  C (t ) 0 B(t )Q (t ) BT (t ) 0H C T (t )  R(t ) . (107) It is observed below that the approximate Wiener-Hopf factor (86) approaches the exact Wiener Hopf-factor whenever the problem is locally stationary, that is, whenever A(t), B(t), C(t), Q(t) and R(t) change sufficiently slowly, so that P (t ) of (87) approaches the zero matrix. Lemma 10 [8]: In respect of the signal model (1) – (2) with D(t) = 0, E{w(t)} = E{v(t)} = 0, E{w(t)wT(t)} = Q(t), E{v(t)vT(t)} = R(t), E{w(t)vT(t)} = 0 and the quantities defined above, ˆ ˆ H   H  C (t ) P (t ) H C T (t ) .  0 0 (108) Proof: The approximate Wiener-Hopf factor may be written as ̂ = ˆ ˆ H = C (t ) ( P  H + C (t ) 0 K (t ) R1/ 2 (t ) + R1/ 2 (t ) . It is easily shown that  0 0 ˆ ˆH =  01 P + K (t ) R (t ) K T (t )) 0H C T (t ) and using (106) gives  C (t ) ( B (t )Q (t ) BT (t )  P (t ) H C T (t ) + R(t). The result follows by 0 0 ˆ ˆ H and (107). comparing  Consequently, the minimum-variance smoother (88) achieves the best-possible estimator performance, namely ei 2eiH2 = 0, whenever the problem is locally 2 stationary. Lemma 11 [8]: The output estimation smoother (88) satisfies “Every great advance in natural knowledge has involved the absolute rejection of authority.” Thomas Henry Huxley Chapter 6 Continuous-Time Smoothing 164 ei 2  R(t )[( H ) 1  ( H  C (t ) 0 P (t ) 0H C T (t )) 1 ] . (109) Proof: Substituting (88) into (77) yields ˆ ˆ H ) 1 ] . ei 2  R(t )[( H ) 1  ( (110) The result is now immediate from (108) and (110). Conditions for the convergence of the Riccati difference equation solution (87) and hence the asymptotic optimality of the smoother (88) are set out below. Lemma 12 [8]: Let S(t) =CT(t)R−1(t)C(t). If( i) there exist solutions P(t) ≥ P(t+δt) of (87) for a t > δt > 0; and (ii) A(t   t )  A(t )   Q (t   t )  Q (t ) (111)  AT (t )  S (t )  ≥  AT (t   )  S (t   )     t t  for all t > δt then lim yi 2yiH2 t  2 0. (112) Proof: Conditions (i) and (ii) together with Theorem 1 imply P(t) ≥ P(t+δt) for all t > δt and lim P (t ) = 0. The claim (112) is now immediate from Lemma 11. t  6.5.5 Performance Comparison The following scalar time-invariant examples compare the performance of the minimum-variance filter (92), maximum-likelihood smoother (50), Fraser-Potter smoother (73) and minimum-variance smoother (88) under Gaussian and nongaussian noise conditions. Example 5 [9]. Suppose that A = – 1 and B = C = Q = 1. Simulations were conducted using T = 100 s, t = 1 ms and 1000 realisations of Gaussian noise processes. The mean-square-error (MSE) exhibited by the filter and smoothers as a function of the input signal-to-noise ratio (SNR) is shown in Fig. 2. As expected, it can be seen that the smoothers outperform the filter. Although the minimumvariance smoother exhibits the lowest mean-square error, the performance benefit diminishes at high signal-to-noise ratios. “The definition of insanity is doing the same thing over and over again and expecting different results.” Albert Einstein G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 165 Example 6 [9]. Suppose instead that the process noise is the unity-variance 1 2 deterministic signal w(t) = sin(t ) sin( t ) , where  sin( t ) denotes the sample variance of sin(t). The results of simulations employing the sinusoidal process noise and Gaussian measurement noise are shown in Fig. 3. Once again, the smoothers exhibit better performance than the filter. It can be seen that the minimumvariance smoother provides the best mean-square-error performance. The minimum-variance smoother appears to be less perturbed by nongaussian noises because it does not rely on assumptions about the underlying distributions. Fig. 2. MSE versus SNR for Example 4: (i) minimum-variance smoother, (ii) Fraser-Potter smoother, (iii) maximum-likelihood smoother and (iv) minimum-variance filter. 6.6 Fig. 3. MSE versus SNR for Example 5: (i) minimum-variance smoother, (ii) Fraser-Potter smoother, (iii) maximum-likelihood smoother and (iv) minimum-variance filter. Chapter Summary The fixed-point smoother produces state estimates at some previous point in time, that is, ˆ(t )  (t )C T (t ) R 1 (t )  z (t )  C (t ) xˆ (t | t )  , where Σ(t) is the smoother error covariance. In fixed-lag smoothing, state estimates are calculated at a fixed time delay τ behind the current measurements. This smoother has the form xˆ (t | t   )  A(t ) xˆ (t | t   )  B(t )Q (t ) BT (t ) P 1 (t )  xˆ (t | t   )  xˆ (t | t )   P (t )T (t   , t )C T (t   ) R 1 (t   )  z (t   )  C (t   ) xˆ (t   )  , where Ф(t + τ, t) is the transition matrix of the minimum-variance filter. “The first problem for all of us, mean and women, is not to learn but to unlearn.” Gloria Marie Steinem Chapter 6 Continuous-Time Smoothing 166 Three common fixed-interval smoothers are listed in Table 1, which are for retrospective (or off-line) data analysis. The Rauch-Tung-Streibel (RTS) smoother and Fraser-Potter (FP) smoother are minimum-order solutions. The RTS smoother differential equation evolves backward in time, in which G ( ) = Optimal smoother FP smoother RTS smoother Signals and system B( )Q( ) BT ( ) P 1 ( ) is the smoothing gain. The FP smoother employs a linear combination of forward state estimates and backward state estimates obtained by running a filter over the time-reversed measurements. The optimum minimumvariance solution, in which A(t ) = A(t )  K (t )C (t ) , in which K(t) is the predictor gain, involves a cascade of forward and adjoint predictions. It can be seen that the optimum minimum-variance smoother is the most complex and so any performance benefits need to be reconciled with the increased calculation cost. ASSUMPTIONS E{w(t)} = E{v(t)} = 0. E{w(t)wT(t)} = Q(t) > 0 and E{v(t)vT(t)} = R(t) > 0 are known. A(t), B(t) and C(t) are known. Assumes that the filtered and smoothed states are normally distributed. xˆ(t | t ) previously calculated by Kalman filter. xˆ(t | t ) previously calculated by Kalman filter. MAIN RESULTS x (t )  A(t ) x(t )  B(t ) w(t ) y (t )  C (t ) x(t ) z (t )  y (t )  v(t )  xˆ ( | T )  A( ) xˆ ( | T )  G( )  xˆ ( | T )  xˆ ( |  )  xˆ(t | T )  ( P 1 (t | t )   1 (t | t )) 1 ( P 1 (t | t ) xˆ (t | t )   1 (t | t ) (t | t ))  xˆ (t )   A(t ) K (t )   xˆ(t )      1/ 2   1/ 2  (t )    R (t )C (t ) R (t )   z (t )   (t )   AT (t ) C T (t ) R 1/ 2 (t )    (t )    T   R 1/ 2 (t )   (t )    ( t )   K (t ) yˆ(t | T )  z (t )  R(t )  (t ) Table 1. Continuous-time fixed-interval smoothers. “Remember a dead fish can float downstream but it takes a live one to swim upstream.” William Claude Fields 167 G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 The output estimation error covariance for the general estimation problem can be written as eieiH = ei1eiH1 + ei 2eiH2 , where ei1eiH1 specifies a lower performance bound and ei 2eiH2 is a function of the estimator solution. The optimal smoother solution achieves {ei 2eiH2 } 2 = 0 and provides the best mean-square-error performance, provided of course that the problem assumptions are correct. The minimum-variance smoother solution also attains best-possible performance whenever the problem is locally stationary, that is, when A(t), B(t), C(t), Q(t) and R(t) change sufficiently slowly. 6.7 Problems Problem 1. Write down augmented state-space matrices A(a)(t), B(a)(t) and C(a)(t) for the continuous-time fixed-point smoother problem. (i) Substitute the above matrices into P ( a ) (t ) = A( a ) (t ) P ( a ) (t ) + P ( a ) (t )( A( a ) )T (t ) (a) (ii) (a) − P ( a ) (t )(C ( a ) )T (t ) R 1 (t )C ( a ) (t ) P ( a ) (t ) + T B (t )Q (t )( B (t )) to obtain the component Riccati differential equations. Develop expressions for the continuous-time fixed-point smoother estimate and the smoother gain. Problem 2. The Hamiltonian equations (60) were derived from the forward version of the maximum likelihood smoother (54). Derive the alternative form   xˆ (t | T )   0 A(t ) B(t )Q (t ) BT (t )   xˆ(t | T )      T     T   . 1 1 T  ( t | T ) C ( t ) R ( t ) z ( t ) C ( t ) R ( t ) C ( t ) A ( t )  ( t | T )        from the backward smoother (50). Hint: use the backward Kalman-Bucy filter and the backward Riccati equation. Problem 3. It is shown in [6] and [17] that the intermediate variable within the Hamiltonian equations (60) is given by T  (t | T )   T ( s, t )C T ( s) R 1 ( s )( z ( s )  C ( s) xˆ ( s | s ))ds , t where  T ( s, t ) is the transition matrix of the Kalman-Bucy filter. Use the above equation to derive “He who rejects change is the architect of decay. The only human institution which rejects progress is the cemetery.” James Harold Wilson Chapter 6 Continuous-Time Smoothing 168 (t | T )  C T (t ) R 1 (t )C (t ) xˆ (t | T )  AT (t ) (t )  C T (t ) R 1 (t ) z (t ) . Problem 4. Show that the adjoint of system having state space parameters   AT (t ) C T (t )   A(t ) B(t )  is parameterised by  T . C (t ) D (t )  D T (t )     B (t )  A(t ) I  Problem 5. Suppose  0 is a system parameterised by  , show that 0   I  P(t ) AT (t ) - A(t ) P (t ) = P(t ) 0 H +  01 P(t ) . Problem 6. The optimum minimum-variance smoother was developed by finding the solution that minimises y 2 y 2H . Use the same completing-the-square 2 approach to find the optimum minimum-variance filter. (Hint: Find the solution that minimises y 2 y 2T .) 2 Problem 7 [9]. Derive the output estimation minimum-variance filter by finding a solution Let a   , b = 1, c   and d = 0 denote the time-invariant state-space parameters of the plant  . Denote the error covariance, gain of the Kalman filter and gain of the maximum-likelihood smoother by p, k and g, respectively. Show that H1(s) = k(s–a+kc)-1, H2(s) = cgk(–s–a+g)-1(s–a+kc)-1, H3(s) = kc(–a + kc)(s  a + kc)-1(–s –a + kc)-1, H4(s) = ((–a + kc)2 – (–a + kc – k)2)(s – a + kc)-1(–s –a + kc)-1 are the transfer functions of the Kalman filter, maximum-likelihood smoother, the Fraser-Potter smoother and the minimum variance smoother, 32 respectively. Problem 8. (i) Develop a state-space formulation of an approximate Wiener-Hopf factor for the case when the plant includes a nonzero direct feedthrough matrix (that is, D(t) ≠ 0). (ii) Use the matrix inversion lemma to obtain the inverse of the approximate Wiener-Hopf factor for the minimum-variance smoother. “It is not the strongest of the species that survive, nor the most intelligent, but the most responsive to change.” Charles Robert Darwin 169 G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 6.8 Glossary p(x(t)) x(t ) ~  (  , Rxx ) f(x(t)) xˆ(t | t   ) xˆ(t | T ) wˆ (t | T ) G(t) ei T The Wiener-Hopf factor which satisfies  Q H + R. Approximate Wiener-Hopf factor. ̂ ˆ H ˆ  H {ˆ  H } [2] [3] [4] [5]  H = The adjoint of ̂ . The inverse of ˆ H . The causal part of ˆ  H .  [1] Estimate of x(t) at time t given data over a fixed interval T. Estimate of w(t) at time t given data over a fixed interval T. Gain of the minimum-variance smoother developed by Rauch, Tung and Striebel. A linear system that operates on the inputs i = vT wT  and generates the output estimation error e. Moore-Penrose pseudoinverse of C(t). C # (t ) Δ 6.9 Probability density function of a continuous random variable x(t). The random variable x(t) has a normal distribution with mean μ and covariance Rxx. Cumulative distribution function or likelihood function of x(t). Estimate of x(t) at time t given data at fixed time lag τ. References J. S. Meditch, “A Survey of Data Smoothing for Linear and Nonlinear Dynamic Systems”, Automatica, vol. 9, pp. 151-162, 1973 T. Kailath, “A View of Three Decades of Linear Filtering Theory”, IEEE Transactions on Information Theory, vol. 20, no. 2, pp. 146 – 181, Mar., 1974. H. E. Rauch, F. Tung and C. T. Striebel, “Maximum Likelihood Estimates of Linear Dynamic Systems”, AIAA Journal, vol. 3, no. 8, pp. 1445 – 1450, Aug., 1965. D. C. Fraser and J. E. Potter, "The Optimum Linear Smoother as a Combination of Two Optimum Linear Filters", IEEE Transactions on Automatic Control, vol. AC-14, no. 4, pp. 387 – 390, Aug., 1969. J. S. Meditch, “On Optimal Fixed Point Linear Smoothing”, International “Once a new technology rolls over you, if you’re not part of the steamroller, you’re part of the road.” Stewart Brand 170 [6] [7] [8] [9] [10] Chapter 6 Continuous-Time Smoothing Journal of Control, vol. 6, no. 2, pp. 189 – 199, 1967. A. P. Sage and J. L. Melsa, Estimation Theory with Applications to Communications and Control, McGraw-Hill Book Company, New York, 1971. J. B. Moore, “Fixed-Lag Smoothing Results for Linear Dynamical Systems”, A.T.R., vol. 7, no. 2, pp. 16 – 21, 1973. G. A. Einicke, “Asymptotic Optimality of the Minimum-Variance FixedInterval Smoother”, IEEE Transactions on Signal Processing, vol. 55, no. 4, pp. 1543 – 1547, Apr. 2007. G. A. Einicke, J. C. Ralston, C. O. Hargrave, D. C. Reid and D. W. Hainsworth, “Longwall Mining Automation, An Application of Minimum-Variance Smoothing”, IEEE Control Systems Magazine, vol. 28, no. 6, pp. 28 – 37, Dec. 2008. G. A. Einicke, “A Solution to the Continuous-Time H-infinity FixedInterval Smoother Problem”, IEEE Transactions on Automatic Control, vo. 54, no. 12, pp. 2904 – 2908, Dec. 2009. [11] D. J. N. Limebeer, B. D. O. Anderson, P. Khargonekar and M. Green, “A Game Theoretic Approach to H  Control for Time-varying Systems”, SIAM Journal on Control and Optimization, vol. 30, no. 2, pp. 262 – 283, 1992. [12] G. Freiling, G. Jank and H. Abou-Kandil, “Generalized Riccati Difference and Differential Equations”, Linear Algebra And Its Applications, pp. 291 – 303, 199 L. E. Zachrisson, “On Optimal Smoothing of Continuous Time Kalman Processes, Information Sciences, vol. 1, pp. 143 – 172, 1969. U. Shaked and Y. Theodore, “H∞ – Optimal Estimation: A Tutorial”, Proc. 31st IEEE Conference on Decision Control, Ucson, Arizona, pp. 2278 – 2286, Dec. 1992. J. S. Meditch, “On Optimal Linear Smoothing Theory”, Information and Control, vol. 10, pp. 598 – 615, 1967. P. S. Maybeck, Stochastic Models, Estimation and Control, Vol. 2, Academic press, New York, 1982. T. Kailath, A. H. Sayed and B. Hassibi, Linear Estimation, Pearson Prentice Hall, New Jersey, 2000. [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] B. D. O. Anderson, “Properties of Optimal Linear Smoothing”, IEEE Transactions on Automatic Control, vol. 14, no. 1, pp. 114 – 115, Feb. 1969. S. Chirarattananon and B. D. O. Anderson, “Stable Fixed-Lag Smoothing of Continuous-time Processes”, IEEE Transactions on Information Theory, vol. 20, no. 1, pp. 25 – 36, Jan. 1974. A. Gelb, Applied Optimal Estimation, The Analytic Sciences Corporation, USA, 1974. L. Ljung and T. Kailath, “A Unified Approach to Smoothing Formulas”, Automatica, vol. 12, pp. 147 – 157, 197 T. Kailath and P. Frost, “An Innovations Approach to Least-Squares Estimation Part II: Linear Smoothing in Additive White Noise”, IEEE Transactions on Automat. Control, vol. 13, no. 6, pp. 655 – 660, 1968. “In times of profound change, the learners inherit the earth, while the learned find themselves beautifully equipped to deal with a world that no longer exists.” Al Rogers G. A. Einicke, Smoothing, Filtering and Prediction: Estimating the Past, Present and Future (2nd ed.), Prime Publishing, 2019 [23] [24] [25] [26] [27] 171 F. L. Lewis, L. Xie and D. Popa, Optimal and Robust Estimation With an Introduction to Stochastic Control Theory, Second Edition, CRC Press, Taylor & Francis Group, 2008. F. A. Badawi, A. Lindquist and M. Pavon, “A Stochastic Realization Approach to the Smoothing Problem”, IEEE Transactions on Automatic Control, vol. 24, no. 6, Dec. 1979. J. A. Rice, Mathematical Statistics and Data Analysis (Second Edition), Duxbury Press, Belmont, California, 1995. R. G. Brown and P. Y. C. Hwang, Introduction to Random Signals and Applied Kalman Filtering, Second Edition, John Wiley & Sons, Inc., New York, 1992. M. Green and D. J. N. Limebeer, Linear Robust Control, Prentice-Hall Inc, Englewood Cliffs, New Jersey, 1995. “I don’t want to be left behind. In fact, I want to be here before the action starts.” Kerry Francis Bullmore Packer 172 Chapter 6 Continuous-Time Smoothing This page is intentionally left blank 157