Tutorial On Kalman Filter
Tutorial On Kalman Filter
Tutorial On Kalman Filter
Recognition Systems
Compiled by: Tadele Belay
MSc, Mechatronics
Presented to: Prof. Farid Melgani
University of Trento,
Italy
2013
1
Rudolf Emil Kalman:
o Born 1930 in Hungary
o BS and MS from MIT
o PhD 1957 from Columbia
o Filter developed in 1960-61
o Now retired
2
The Kalman filter, also known as linear quadratic
estimation (LQE), is an algorithm that uses a
series of measurements observed over time,
containing noise (random variations) and other
inaccuracies, and produces estimates of unknown
variables that tend to be more precise than those
based on a single measurement alone
3
an optimal estimator (best)
It is recursive
4
Kalman Filtering gets popular because of:
Good results in practice due to optimality and structure.
Convenient form for online real time processing.
Easy to formulate and implement given a basic
understanding.
Measurement equations need not be inverted.
5
What is it used for?
Tracking missiles
Tracking heads/hands/drumsticks
Extracting lip motion from video
Fitting Bezier patches to point data
Economics
Navigation
6
Other applications of Kalman Filtering (or Filtering in
general):
Your Car GPS (predict and update location)
Surface to Air Missile (hitting the target)
Ship or Rocket navigation (Appollo 11 used some
sort of filtering to make sure it didnt miss the
Moon!)
7
k k k k
k k k k 1 k
w C y
v
+ =
+ + =
+
x
Bu x A x
State Estimation
Measurement
k k
k k
k k k
k
k
k
R , Q matrices covariance
(known) with noise Gaussian white mean, - zero are w , v
(known) matrices system d dimensione ely appropriat are C , B , A
measured) (known, vector output l dimensiona - p the is y
(known) vector input l dimensiona - m the is u
(unknown) vector state l dimensiona - n the is x
.. (1)
.. (2)
Given the linear dynamical system:
Observation vector
Control vector
8
Measuring
Devices
Estimator
Measurement
Error Sources
System State
(desired but
not known)
External
Controls
Observed
Measurements
Optimal
Estimate of
System State
System
Error Sources
System
Black
Box
9
k k k k k 1 k
v u B x A x + + =
+
called the state of the system
Contains all of the information about the
present state of the system,
but we cannot measure x directly.
10
k k k k k 1 k
v u B x A x + + =
+
Previous state vector
k k k k k 1 k
v u B x A x + + =
+
o Input to the system
o It is known
11
k k k k k 1 k
v u B x A x + + =
+
Is called the process noise
k k k k
w x C y + =
is the measured output
It help us to obtain an estimate of x
but we cannot necessarily take the information
from y
k
b/c it is corrupted by noise
12
k k k k
w x C y + =
State vector
k k k k
w x C y + =
called the measurement noise
sequences of white, zero mean,
Gaussian noise with zero mean
also for V
k
13
how can we determine the best estimate of the state x?
We want an estimator that gives an accurate estimate
of the true state though we cannot directly measure it.
What criteria should our estimator satisfy?
First, we want the average value of our state estimate
to be equal to the average value of the true state. i.e no
bias. Mathematically, = > E(estimate) = E(state)
Second, we want a state estimate that varies from the
true state as little as possible. Mathematically, we want
to find the estimator with the smallest possible error
variance 14
It so happens that the Kalman filter is the estimator that
satisfies these two criteria.
But the Kalman filter solution does not apply unless we
can satisfy certain assumptions about the noise that
affects our system.
Assumptions are:
1. Statevector & measurement noise are independent.
2. Statevector estimation shown to be Gaussian hence
system noise and measurement noise are
uncorrelated.
15
Background of the equations
Let us estimate a random variable with 2 Gaussian
observations:
Suppose we are estimating the value of a random
variable x:
Sensor1 = value y1, and has a variance o
1
2
Sensor2 = value y2, and has a variance o
2
2
We can show the combined estimate has a combined
variance o
2
w/c is harmonic mean of the two
variances and weighted mean would be given as:
2
2
2
1
2
2
2
1
2 '
o o
o o
o
+
=
2
2
2
2
1
2
1
1
2
2
2
1
2
2
y y y
o o
o
o o
o
+
+
+
=
2
2
2
) (
2
1
) (
o
t o
=
y
e y p
16
Observations:
The new variance is smaller than the two variances
(at most eqaul to both)
The new mean is weighted sum of the old mean
with higher weightage given to the lower variance
measurement
In the case where the two variances are equal, the
mean is (y1+y2)/2
17
Suppose that the measurements of above are done at
t
1
and t
2
times and remained constant throughout.
After the first measurement the estimate of x;
1 1
y x
=
2
2
2
2
1
2
1
1
2
2
2
1
2
2
2
y y x
o o
o
o o
o
+
+
+
=
After the second measurement
) y y ( y
1 2
2
2
2
1
2
1
1
+
+ =
o o
o
) x
y ( K x
1 2 2 1 2
+ =
1 1 2
2
2
1
2
1
2 2
2
2
1
2
1
1
y y y y
+
+
+
+ =
o o
o
o o
o
Weighted average
18
) x
y ( K x
1 2 2 1 2
+ =
2
2
2
1
2
1
o o
o
+
Correction terms introduced
by the second observation
Best prediction before the second
observation
If o
2
2
< < o
1
2
(if the second measurement has high
variance relative to first) then K(t=2) = 1, and X
2
(k2) = y
2
.
Hence, the correction form is small.
19
To go from the preceding discussion to Kalman filter;
We just need to say that the system has dynamics. So ( x
k+1
x
k
).
We shall have two arrangements.
1. One coming from our estimate of x
k
pushed through transition
model: has variance o
x
2
and o
t
2
(transition variance)
2. A direct measurement of x
k+1
; has a variance of o
2
y
X
k
X
k+1
y
K
y
K+1
+N(0,o
x
)
o
2
t
o
2
y
o
2
2
= o
y
2
o
1
2
= o
t
2
+ o
x
2
20
New variance is harmonic mean of o
1
2
& o
2
2
and a new
estimate is:
estimate
from
transition
model
estimate
from
transition
model(pred
.)
K(k+1)
new
estimate
Note that:
As we move from k+1 to k+2 the prior X
t+1
now has the
new variance that we just computed
So o
x
2
has changed but o
t
2
and o
y
2
remained the same
K
t+2
which depends on o
x
2
, o
t
2
, and o
y
2
changes, but it is
pre-computed.
estimate
from
sensor(me
asnt)
21
The computational origins of the Filters
Priori state estimation error at step k:
Posteriori estimation error:
Posteriori as a linear combination of an Priori:
k k k k
k k k k k 1 k
w x C y
v u B x A x
+ =
+ + =
+
.. (1)
.. (2)
22
The Probabilistic origins of the Filter
The a posteriori state estimate reflects the
mean of the state distribution
The a posteriori state estimate error
covariance reflects the variance of the state
distribution
.. (3)
.. (4)
.. (5)
23
An update equation for the new estimate, combing the
old estimate with measurement data thus;
( )
+ =
k k k k k
x
C y K x
Kalman gain
Innovation or
measurement residual
=
k k k
x
C y i
.. (6)
.
.
.
(7)
Filtered state estimate = predicted state estimate + Gain * Error
prior estimate of x
k
24
Substitution of (2) into (6) gives;
( )
+ + =
k k k k k k k
x
C w x C K x
x
.. (8)
.. (9)
Substitution of (8) into (4) gives;
] ) w K ) x
x )( C K I ((
) w K ) x
x )( C K I [( E P
T
k k k k k
k k k k k k
=
x )( x
x [( E ) C K I ( P
+
=
.. (10)
25
T
k
T
k k k
T
k
T
k k k k k k
K ) w w ( E K ) C K I ]( ) x
x )( x
x [( E ) C K I ( P + =
k
P
R(measurement error
covariance)
T
k k
T
k k k k
RK K ) C K I ( P ) C K I ( P + =
- - - prior estimate of the P
k
.... (11)
Equation (11) is the error covariance update equation.
The diagonal of the covariance matrix contains the
mean squared errors as shown
.... (12)
The sum of the diagonal elements of a matrix is the trace of a matrix.
26
Propositions:
Let A, B square matrices, and C be a square symmetric
matrix, then:
d(trace(AB)/dA = B
T
d(trace(ACA
T
))/dA = 2AC
The trace of a product can be rewritten as:
trace(X
T
Y) = trace(XY
T
) = trace(YX
T
)
The trace is a linear map. That is
trace(A+B) = trace(A) + trace(B)
27
Therefore the mean squared error may be minimised by minimising
the trace of P
k
which in turn will minimise the trace of P
kk
The trace of
P
k
is first differentiated with respect to K
k
and the result set to zero in
order to find the conditions of this minimum:
By expanding equation (11)
T
k
T
k k
T
k
T
k k k k k
K ) R C CP ( K K C P CP K P P + + =
Note that the trace of a matrix is equal to the trace of its transpose,
therefore it may be written as;
] K ) R C CP ( K [ T ] CP K [ T 2 ] P [ T ] P [ T
T
k
T
k k k k k k
+ + =
where; T [P
k
] is the trace of the matrix P
k
... (13)
28
Differentiating with respect to K
k
gives;
) R C CP ( K 2 ) CP ( T 2
dK
] P [ dT
T
k k k
k
k
+ + =
.... (14)
Setting to zero and re-arranging gives;
Now solving for K
k
gives;
) R C CP ( K 2 ) CP ( 2
T
k k
T
k
+ =
1 T
k
T
k
T
k
T
k
k
) R C CP ( C P
) R C CP (
) CP (
K
+ =
+
=
Equation (16) is the Kalman gain equation
.... (15)
.... (16)
29
If we are sure about measurements:
Measurement error covariance (R) decreases to zero
K decreases and weights residual more heavily than
prediction
Larger value of R the measurement error covariance
(indicates poorer quality of measurements)
If we are sure about prediction
Prediction error covariance P
-
k
decreases to zero
K increases and weights prediction more heavily than
residual
30
Finally, substitution of equation (16) into (13) , and
simplifying further yields;
T
1 T
k
T
k
T
k
T
1
T
k
T
T
k
T T
k k
1 T
k
T
k k k
) Q C CP ( ) CP ( ) CP ( ) Q C P C P C C P CP ) Q C CP ( ) CP ( P P
+ + + + =
+ =
k
1 T
k
T
k k k
CP ) R C CP ( C P P P
=
=
k k
k k k
P ) C K I (
CP K P
..... (17)
Equation 17 is the update equation for the error
covariance matrix with optimal gain. The three
equations 6, 16, and 17 develop an estimate of the
variable x
k
. State projection is achieved using;
k 1 k
x
A x
+
..... (18)
31
To complete the recursion it is necessary to find an
equation which projects the error covariance matrix
into the next time interval, k + 1. This is achieved by
first forming an expressions for the prior error:
+ +
+
=
1 k 1 k 1 k
x
e
k k k 1 k
v ) x
x ( A x
+ =
+
] v v [ E ] ) Ae )( Ae [( E ] e e [ E P
T
k k
T
k k
T
1 k 1 k 1 k
+ = =
+
+
Extending equation 4 to time k + 1;
Q A AP P
k 1 k
+ =
+
This completes the recursive filter.
Q
32
Kalman gain
K
k
= P
k|k-1
C
T
k
[C
k
P
k|k-1
C
T
k
+R]
-1
Project into k+1
Update estimate
Update covariance
P
k|k
= [I-K
k
C
k
] P
k|k-1
] x
C y [ K x
A x
1 k | k k k k 1 k | k k k | k
+ =
Q A P A P
u B x
A x
T
k k | k k k | 1 k
k k k | k k 1 k
+ =
+ =
+
+
Projected
Estimates
Updated
state
Estimates
Measurements
Initial Estimates
33
34
Theoretical Basis
-
k+1
= Ay
k
+ Bu
k
P
-
k+1
= AP
k
A
T
+ Q
Prediction (Time Update)
(1) Project the state ahead
(2) Project the error
covariance ahead
Correction (Measurement Update)
(1) Compute the Kalman Gain
(2) Update estimate with measurt y
k
(3) Update Error Covariance
k
=
-
k
+ K(z
k
- H
-
k
)
K = P
-
k+1
H
T
(HP
-
k+1
H
T
+ R)
-1
P
k+1
= (I - KH)P
-
k+1
Q A P A P
u B x
A x
T
k k | k k k | 1 k
k k k | k k 1 k
+ =
+ =
+
+
1 k | k k k k | k
1 T
k 1 k | k k
T
k 1 k | k k
1 k | k k k k 1 k | k k k | k
P ] C K I [ P
] R C P C [ C P K
] x
C y [ K x
A x
=
+ =
+ =
0 1 | 0
0 1 | 0
P
x x
=
=
35
1D Example
Previous belief
Predicted belief
Measurement with uncertainty
Corrected belief
36
The Kalman filter addresses the general problem of
trying to estimate the state of a discrete-time
controlled process that is governed by a linear
stochastic difference equation.
But what happens if the process to be estimated and
(or) the measurement relationship to the process is
non-linear?
37
Extended Kalman Filter
Suppose the state-evolution and
measurement equations are non-linear:
process noise v is drawn from N(0,Q), with
covariance matrix Q.
measurement noise w is drawn from N(0,R),
with covariance matrix R.
k k k 1 k
v ) , ( f + =
+
u x x
k k k
w ) ( c y + = x
Process model
Observation model
(a)
(b)
38
Suppose the knowledge on x
k
at time k is
), P ,
( N ~
k k k
x x
) P ,
( N ~
1 k 1 k 1 k + + +
x x
then x
k+1
at time k + 1 follows;
(d)
(c)
Predict using process model
The process model (a) is linearized at the current
estimate x
^
k
using the first order Taylor series expansion;
k k k x k k 1 k
v ) x
x ( f ) ,
( f + V + ~
+
u x x
Where, Vf
x
is the Jacobian of function f with respect to x
evaluated at x^
k
. Higher orders are ignored
(e)
39
k k x k x k k 1 k
v x f x
f ) ,
( f + V + V ~
+
u x x
System (e) is a linear process model
U
k
F
k k k 1 k
v U Fx + + ~
+
x
Now applying the linear KF prediction formula;
) u , x
( f x
f ) u , x
( f x
f ......
U x
k k k x k k k x
k k 1 k
= V + V =
+ ~
+
x
Q f P f Q FPF P
T
x k x
T
1 k
+ V V = + =
+
(f)
(g)
(h)
40
Update using observation
After the prediction step, the current estimate of the
state is x
-
k+1
.
The observation model (b) is linearized at the current
estimate x
-
k+1
, using Taylor series expansion.
1 k 1 k 1 k 1 k 1 k
w ) x x ( c ) ( c y
+ + + + +
+ V + ~ x
1 k 1 k 1 k 1 k 1 k
w cx x c ) ( c y
+ + + + +
+ V ~ V + x
C
Y
k+1
1 k 1 k 1 k
w Cx Y
+ + +
+ ~
(i)
41
Now applying the linear KF update formula using the
linear observation model (i), we have;
) x C Y ( K x x
1 k 1 k 1 k 1 k + + + +
+ =
)) ( c y ( K x
) x c x c ) ( c y ( K x
1 k 1 k 1 k
1 k 1 k 1 k 1 k 1 k
+ + +
+ + + + +
+ =
V V + + =
x
x
, KSK P P
T
1 k 1 k
=
+ +
1 T
1 k
1 T
1 k
T
1 k
T
1 k
S c P S C P K
R c P c R C P C S
+
+ +
V = =
+ V V = + =
Where S=innovation covariance and K= kalman gain given as:
42
One iteration of the EKF is composed by the following
consecutive steps:
43
It this important to state that the EKF is not an optimal lter, but
rathar it is implemented based on a set of approximations. Thus, the
matrices P(k|k) and P(k + 1|k) do not represent the true covariance of
the state estimates.
44
Kalman filter is derived for guissian, linear system, and gives a
best estimation. It is a recursive data processing algorithm -
doesnt need to store all previous measurements and reprocess
all data each time step
Extended kalman filter is derived for non linear system, gives an
approximation of the optimal estimate. Contrary to the Kalman
lter, the EKF may diverge, if the consecutive linearizations are
not a good approximation of the linear model in all the
associated uncertainty domain.
45
46
The Kalman filtering equations provide an estimate of the
state and its error covariance recursively. The estimate
and its quality depend on the system parameters and the
noise statistics fed as inputs to the estimator.
One problem with the Kalman filter is its numerical stability. If
the process noise covariance Q
k
is small, round-off error often
causes a small positive eigenvalue to be computed as a negative
number.
When the state transition and observation modelsthat is, the
predict and update functions and are highly non-linear, the
extended Kalman filter can give particularly poor performance.
This is because the covariance is propagated through
linearization of the underlying non-linear model
1. Kalman, R. E. 1960. A New Approach to Linear Filtering and Prediction
Problems, Transaction of the ASME--Journal of Basic Engineering, pp.
35-45 (March 1960).
2. Welch, G and Bishop, G. 2006. An introduction to the Kalman Filter,
http://www.cs.unc.edu/~welch/media/pdf/kalman_intro.pdf
3. Shoudong Huang, Understanding Extended Kalman Filter- Part III:
Extended Kalman Filter April 23,2010
http://services.eng.uts.edu.au/~sdhuang/Extended%20Kalman%20Filter
_Shoudong.pdf
47