Static and Dynamic Novelty Detection Methods For Jet Engine Health Monitoring
Static and Dynamic Novelty Detection Methods For Jet Engine Health Monitoring
Static and Dynamic Novelty Detection Methods For Jet Engine Health Monitoring
Novelty detection requires models of normality to be learnt from training data known to
be normal. The first model considered in this paper is a static model trained to detect
novel events associated with changes in the vibration spectra recorded from a jet engine.
We describe how the distribution of energy across the harmonics of a rotating shaft can
be learnt by a support vector machine model of normality. The second model is a
dynamic model partially learnt from data using an expectation–maximization-based
method. This model uses a Kalman filter to fuse performance data in order to
characterize normal engine behaviour. Deviations from normal operation are detected
using the normalized innovations squared from the Kalman filter.
Keywords: novelty detection; health monitoring; support vector machines;
Kalman filtering; expectation–maximization
1. Introduction
(a ) Introduction
Vibration information has traditionally been the main source of information for
identifying abnormal behaviour in a jet engine. Jet engines have a number of
rigorous pass-off tests before they can be delivered to the customer. The main
test is a vibration test over the full range of operating speeds. Vibration gauges
are attached to the casing of the engine (see figure 14 in appendix A for a block
diagram of a typical jet engine) and the speed of each shaft is measured using a
tachometer. The engine on the test bed is slowly accelerated from idle to full
speed and then gradually decelerated back to idle. As the engine accelerates, the
rotation frequency of the two (or three) shafts increases and so does the
frequency of the vibrations caused by the shafts. A tracked order is the amplitude
of the vibration signal in a narrow frequency band centred on a harmonic of the
rotation frequency of a shaft, measured as a function of engine speed. It tracks
the frequency response of the engine to the energy injected by the rotating shaft.
Most of the energy in the vibration spectrum is concentrated in the fundamental
tracked orders and their main harmonics. These, therefore, constitute the
‘vibration signature’ of the jet engine under test. It is very important to detect
departures from the normal or expected values of these tracked orders as they
will indicate an abnormal pattern of vibration.
In our previous work (Nairac et al. 1999), we investigated the vibration spectra
of a two-shaft jet engine, the Rolls-Royce Pegasus. In the available database,
there were vibration spectra recorded from 52 normal engines (the training data)
and from 33 engines with one or more unusual vibration feature (the test data).
The shape of the tracked orders with respect to speed was encoded as a low-
dimensional vector by calculating a weighted average of the vibration amplitude
over six different speed ranges (giving an 18-dimensional vector for three tracked
orders). With so few engines available, the K-means clustering algorithm was
space F endowed with a dot product. The latter need not be the case for the input
domain c which may be a general set. The connection between the input domain
and the feature space is established by a feature map F: c/F, i.e. a map such
that some simple kernel (Boser et al. 1992; Vapnik 1995)
kðx; yÞ Z ðFðxÞ$FðyÞÞ; ð2:1Þ
such as the Gaussian
2
kðx; yÞ Z eKkxKyk =c ; ð2:2Þ
provides a dot product in the image of F. In practice, we need not necessarily
worry about F, as long as a given k satisfies certain positivity conditions
(Vapnik 1995).
As F is a dot product space, we can use tools of linear algebra and geometry to
construct algorithms in F, even if the input domain c is discrete. Below, we
derive our results in F, using the following shorthand notation:
xi Z Fðx i Þ; ð2:3Þ
X Z fx1 ; .; x[ g: ð2:4Þ
Indices i and j are understood to range over 1; .; [ (in compact notation:
i; j 2½[ ); similarly, n,p2[t]. Boldface Greek letters denote [-dimensional vectors
whose components are labelled using normal face typeset.
Using an algorithm proposed for the estimation of a distribution’s support
(Schölkopf et al. 2000b), we seek to separate X from the origin with a large
margin hyperplane committing few training errors. Projections on the normal
vector of the hyperplane then characterize the X-ness of test points, and the area
where the decision function takes the value 1 can serve as an approximation of
the support of X.
The decision function is found by minimizing a weighted sum of a support
vector type regularizer and an empirical error term depending on an overall
margin variable r and individual errors xi,
1 1 X
min kwk2 C x Kr; ð2:5Þ
w2F;x2R[ ;r2R 2 n[ i i
subject to ðw$xi ÞR rKxi ; xi R 0: ð2:6Þ
The precise meaning of the parameter n governing the trade-off between the
regularizer and the training error will become clear later. Since non-zero slack
variables xi are penalized in the objective function, we can expect that if w and r
solve this problem, then the decision function
f ðxÞ Z sgnððw$xÞKrÞ; ð2:7Þ
will be positive for many examples xi contained in X, while the support vector
type regularization term kwk will still be small.
We next compute a dual form of this optimization problem. The details of the
calculation, which uses standard techniques of constrained optimization, can be
found in Schölkopf et al. (2000a). We introduce a Lagrangian and set the
derivatives with respect to w equal to zero, yielding in particular
X
wZ ai x i : ð2:8Þ
i
All patterns fx i i 2½[ ; ai O 0g are called support vectors. The expansion (2.8)
turns the decision function
P (2.7) into a form which depends only on the dot
products, f ðxÞZ sgnðð i ai xi $xÞKrÞ.
By multiplying out the dot products, we obtain a form that can be written as a
nonlinear decision function on the input domain c in terms of a kernel (2.1) (cf. (2.3)).
A short calculation yields !
X
f ðxÞ Z sgn ai kðx i ; xÞKr :
i
In the argument of the sgn, only the first two terms depend on x; therefore, we
may absorb the next terms in the constant r, which we have not fixed yet. To
compute r in the final form of the decision function
!
X
f ðxÞ Z sgn ai kðx i ; xÞKr ; ð2:9Þ
i
we employ the Karush–Kuhn–Tucker (KKT) conditions of the optimization
problem (e.g. Vapnik 1995). They state that for points xi where 0! ai ! 1=ðn[Þ,
the inequality constraint (2.6) becomes equalities (note that in general,
ai 2½0; 1=ðn[Þ), and the argument of the sgn in the decision function should
equal 0, i.e. the corresponding xi sits exactly on the hyperplane of separation.
The KKT conditions also imply that only those points xi can have a non-zero
ai for which the first inequality constraint in (2.6) is precisely met; therefore, the
support vectors Xi with aiO0 will often form but a small subset of X.
Substituting (2.8) (the derivative of the Lagrangian by w) and the
corresponding conditions for x and r into the Lagrangian, we can eliminate the
primal variables to get the dual problem. A short calculation shows that it
consists of minimizing the quadratic form
1X
W ðaÞ Z ai aj kðx i ; x j Þ ; ð2:10Þ
2 ij
subject to the constraints X
0% ai % 1=ðn[Þ; ai Z 1: ð2:11Þ
i
This convex quadratic program can be solved with standard quadratic
programming tools. Alternatively, one can employ the sequential minimal
optimization algorithm described in Schölkopf et al. (1999), which was found to
approximately scale quadratically with the training set size.
The SVM algorithm for novelty detection is applied to the distribution of vibration
energies across the vibration harmonics for the three shafts of the Rolls-Royce Trent
500. Typically, in a jet engine, the amplitude of vibration is a measure of how well
balanced an engine shaft is and does not necessarily give an indication of the health of
an engine. However, interactions between components of the engine give rise to
harmonics (known as ‘multiple orders’) or sub-harmonics (known as ‘fractional
orders’) of the shaft speed which can indicate a developing fault.
(a ) Data extraction
A number of Trent 500 engines were monitored for several months during
the Engine Development Programme and the vibration data were recorded
from each of the three shafts. As vibration spectra are extracted by the system
every 0.2 s (King et al. 2002), this represents a considerable amount of data.
To generate a set of training vectors of manageable size, only vectors which
differ from the previous accepted vector significantly are added to the training
set. When a jet engine maintains a constant condition for long periods of time,
only one training vector is generated but as the engine changes state (for
example, during acceleration), many vectors are generated.
Using the training dataset, the SVM algorithm finds the hyperplane that
separates the normal data from the origin in feature space with the largest
margin. The number of support vectors gives an indication of how well the
algorithm is generalizing (if all data points were support vectors, the algorithm
would have memorized the data). A Gaussian kernel was used with a width
cZ40.0 in equation (2.2). This value was chosen by starting with a small kernel
width (so that the algorithm memorizes the data), increasing the width and
stopping when similar results are obtained both on the training set and another
set of data kept apart for validation. The number of support vectors generated
depends both on the similarity criterion and the number of training patterns.
With of the order of 103 training patterns generated from 54 engine runs (four
different engines), the number of support vectors varied from a minimum of 7
(for the low-pressure (LP) shaft) to a maximum of 36 (for the intermediate-
pressure (IP) shaft).
(b ) Data visualization
To illustrate the effectiveness of the trained SVM model, the data were
visualized in two dimensions, using the Neuroscale algorithm (Tipping & Lowe
1998). The mapping of a dataset onto a two-dimensional space for visualization
purposes is known in the pattern recognition literature as multi-dimensional
scaling. A well-known example of such a dimensionality-reduction mapping is
neuroscale
1.1
0.8
0.4
– 0.1
– 0.2
– 0.6
–1.7 –1.5 –1.4 –1.2 –1.0 – 0.8 –0.7 –0.5 –0.3 –0.2 0.0
Figure 1. Neuroscale visualization plot for the five-dimensional vectors of speed and tracked orders
for the IP shaft.
in /s pk
(C: contractual limit)
0.54
0.51 1i
0.48 0.5i
0.45 1.5i
2i
0.42 3i
0.39 C
0.36
0.33
0.30
0.27
0.24
0.21
0.18
0.15
0.12
0.09
0.06
0.03
0.00
01 : 03 : 52.1 01:04:10.3
time
Figure 2. An example vibration pattern identified as novel (between the two vertical lines) by the
SVM model of normality for the IP shaft. The 1i and 0.5i tracked orders show a large increase in
vibration level during the interval marked by the two vertical lines.
0.2 neuroscale
0.1
0.1
–0.0
–0.1
–0.1
–0.0 0.0 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4
Figure 3. Neuroscale visualization plot for the five-dimensional vectors of speed and tracked orders
for the LP shaft.
The second use of the SVM model of normality is for an abnormal pattern of
vibration for the LP shaft as the engine decelerated by 7%, 2 min after suffering a
foreign object damage event. Again, the Neuroscale visualization plot (figure 3)
shows that the test sequence data vectors (black crosses) are some distance away
in/s pk
1.8 (C: contractual limit)
1.7 1l
1.6 0.5l
1.5 1.5l
1.4 2l
1.3 3l
1.2
1.1 C
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
15 : 07 : 11.0 15:07:20.0
time
Figure 4. A second example vibration pattern identified as novel (between the two vertical lines)
by the SVM model of normality for the LP shaft. The 1l and 2l tracked orders show a marked
decrease in vibration level towards the end of the interval marked by the two vertical lines.
from the (normal) training data shown as small grey circles, although the black
crosses are closer to the latter than in the IP shaft example. The short excursion
away from normality causes the SVM model to identify novelty for approximately
1 s, during the period indicated by the two vertical lines of figure 4.
4. Dynamic model
(a ) Introduction
A state-variable model is a dynamic model (usually linear), for which the
interrelation between engine ‘states’, inputs and outputs is described with a set
of (linear) differential equations. State-space descriptions provide a mathemat-
ically rigorous tool for system modelling and residual generation, which may be
used in fault detection (Chen & Patton 1998) or, as in this paper, novelty
detection. Residuals (the difference between the observations and predicted
observations from the model) need to be processed in order to identify the novel
event reliably and robustly. By assembling a set of linear models for a range of
power conditions, a piece-wise linear state-space model can be constructed. In
this paper, we present results from a linear dynamic model used at a given power
condition (greater than 70% of maximum engine power).
The Kalman filter is a recursive linear algorithm for estimating system states
(Gelb 1974). This section describes the use of a Kalman filter as a linear dynamic
model capable of fusing performance data to detect novel events. A further aspect
of this work, in common with §2 on static models, is the emphasis on learning from
data: some of the parameters of the linear dynamic model are learnt, using an
expectation–maximization (EM)-based method described in Ghahramani &
Hinton (1996), during a prior training phase with normal data only.
(b ) Model description
It is assumed that the system to be monitored can be described in its fault-free
condition by a linear, discrete-time, dynamic model described by equations (4.1)
and (4.2). Note that the variables which are to be related in the model must be
ones for which a linear relationship can be extracted.
The state equation of the system is given by
xðk C 1Þ Z AðkÞxðkÞ C wðkÞ; ð4:1Þ
where x(k) is the state at time-step k; A(k) is the process model at time-step k;
and w(k) is a zero-mean Gaussian noise process, with covariance matrix Q.
x(k) is a hidden state—there is no direct access to it. However, observations
y(k) are made, which are assumed to be described by a measurement equation
relating the observations to the state.
The measurement equation is
yðkÞ Z CðkÞxðkÞ C vðkÞ; ð4:2Þ
where y(k) is the observation vector; C(k) is the observation model at time-step
k; and v(k) is a zero-mean Gaussian noise process with covariance matrix R.
The model matrices A, C, Q and R are set during a training phase under
normal operating conditions (see below). During monitoring, excursions away
from the trained model can be identified using the Kalman filter innovations,
^
nðkÞ Z yðkÞKCðkÞxðkjk K1Þ; ð4:3Þ
^
where xðkjk K1Þ is the estimate of the state at time-step k, given the knowledge
at time-step k K1. The Kalman filter runs through a cycle of prediction and
correction. The filter’s estimate of state and the error covariance are updated
when a measurement is obtained.
The Kalman filter prediction
^
xðkjk K1Þ Z AðkÞxðk^ K1jk K1Þ; ð4:4Þ
has associated uncertainty, given by the state prediction covariance
Pðkjk K1Þ Z AðkÞPðk K1jk K1ÞAT ðkÞ C QðkÞ: ð4:5Þ
The Kalman gain is given by
KðkÞ Z Pðkjk K1ÞC T ðkÞS K1 ðkÞ; ð4:6Þ
where S(k) is the innovation covariance.
The innovation covariance S(k) is
SðkÞ Z CðkÞPðkjk K1ÞC T ðkÞ C RðkÞ: ð4:7Þ
The state estimate is
^
xðkjkÞ ^
Z xðkjk K1Þ C KðkÞnðkÞ: ð4:8Þ
The associated state covariance P(k) is given by
PðkÞ Z ðI KKðkÞCðkÞÞPðkjk K1Þ: ð4:9Þ
In the sections that follow, we present an approach for learning two of the four
model matrices A, C, Q and R from data. The models are speed-based
performance models of the engine, in which the relationships are learnt from
training sets of normal data. Changes in test data are then detected by
monitoring the normalized innovations squared (NIS) from the Kalman filter,
where NIS is defined as follows:
NISðkÞ Z nT ðkÞS K1 ðkÞnðkÞ: ð5:1Þ
The innovations should be zero mean and white, with covariance consistent
with that calculated by the filter (Bar-Shalom & Li 1993).
Whenever there is a departure from the normal behaviour captured in the
learnt model, there is a rise in NIS. This can be caused by gradual component
deterioration or by unexpected events which affect the relationship between the
performance parameters, or observations y, measured on the engine. To illustrate
this, we consider one such event, which occurred in a development Trent 500
engine on a test bed. A problem with the radial driveshaft caused the three shaft
speeds to undergo a sudden change from normal behaviour. In the main event
region, the speeds diverge for approximately 10 s: the high-pressure (HP) shaft
speed increases, the LP shaft speed remains approximately constant and the IP
shaft speed decreases. The engine is then shut down.
In the engine datasets, there are long periods of running at constant speed.
Accelerations and decelerations, by contrast, tend to cover a relatively small number
of data points. The data points are required to be sequential as their time history is
important. The datasets used for training models are balanced so that the different
types of operating regime are given as equal a weighting as possible. In this paper, we
also concentrate on models of normality for smooth acceleration and deceleration.
A separate model is required for sudden acceleration and deceleration, during which
the rates of change of speed are much greater than presented here.
The training data are shown in the eight sequences of figure 5. In both training
and testing, the shaft speed data used are in the region where the LP speed is
between 70 and 90% of maximum LP speed. Within this speed sub-range, the
assumption of model linearity is taken to be valid. Figure 6 presents the two
sequences of test data; figure 6b covering the main event. Although the three
shafts are separate systems, the shaft speeds should change in tandem. This
condition is violated as a result of the radial driveshaft problem in the event
sequence of figure 6, between tZ30 and 50, when (i) the LP shaft speed does not
change, (ii) the IP shaft speed decreases, and (iii) the HP shaft speed increases.
Initial tests are carried out with a dynamic model of normality based solely on
a three-speed observation vector. In order to make learning possible, we
introduce the constraint that the measurement matrix C should be equal to the
identity matrix. Hence, equation (4.2) becomes
yðkÞ Z xðkÞ C vðkÞ: ð5:2Þ
The observation vector is therefore the state corrupted by noise. The
measurement noise covariance matrix R is assumed to be diagonal and set using
engine measurement uncertainty data sheet values provided by Rolls-Royce,
R Z ½0:1538 0:0 0:0
0:0 0:0989 0:0
0:0 0:0 0:0902:
The C and R values remain fixed throughout (no learning). The state transition
matrix, A, the process noise covariance, Q, and the initial state covariance, P(0),
are learned using the EM-based algorithm of Ghahramani & Hinton (1996). The
state transition matrix is initialized to I and the process noise covariance to 0.1I.
The initial state is taken from an average over five points for one of the training
sequences, and the initial state covariance is set to I.
The learning curve for the system is shown in figure 7. After 100 EM
iterations, the state transition matrix is found to be
A Z ½0:9661 0:06601 K0:03673
K0:02079 1:041 K0:02265
K0:01780 0:03546 0:9799:
Thus, learning does not change the state transition matrix, A, significantly as it
remains approximately equal to I. The process noise covariance matrix becomes
Q Z ½0:01878 0:01166 0:01011
0:01166 0:007916 0:006619
0:01011 0:006619 0:005852:
(a) 100
95
90
85
80
75
n1v
n2v
n3v
70
0 50 100 150 0 50 100 0 50 0 50
(b) 90
88
86 n1v
n2v
n3v
84
82
80
78
76
74
72
0 50 0 100 200 300 0 100 200 300 0 50 100 150
Figure 5. Training data. The figure shows shaft speed data (acceleration and deceleration
manoeuvres) taken from several days prior to the event (a, four plots) and also from the event day
(b, four plots), but from sequences occurring well before the event. n1v, LP shaft speed; n2v, IP shaft
speed; n3v, HP shaft speed (all expressed as a percentage of the maximum speed for that shaft).
(a) (b)
95
n1v
n2v
n3v
90
85
80
75
70
0 20 40 60 80 0 20 40 60 80
Figure 6. Test data. The figure shows shaft speed data taken from the event day. (a) Test data
recorded before the event. (b) Data acquired just before and at the time of the event (tZ30).
learning curve
200
–200
log likelihood
– 400
–600
– 800
–1000
–1200
0 10 20 30 40 50 60 70 80 90 100
EM iteration number
Figure 7. The log likelihood values over 100 EM iterations, for a system with a three-speed
observation vector, are shown.
(a) (b)
100 100
95 95
n1v n2v n3v
90 90
85 85
80 80
1.4 2.5
1.2
2.0
1.0
0.8 1.5
NIS
0.6 1.0
0.4
0.5
0.2
0 0
19:23:14 19:23:31 19:23:48 19:24:06 19:24:23 19:24:40 19:24:58 19:25:15 19:25:32 19:25:49 19:27:16 19:27:33 19:27:50 19:28:08 19:28:25 19:28:42 19:29:00 19:29:17
90
85 85
80
80
75
75 70
3.0 3.5
2.5 3.0
2.5
2.0
2.0
NIS
1.5
1.5
1.0
1.0
0.5 0.5
0 0
19:35:02 19:35:11 19:35:20 19:35:28 19:35:37 19:35:46 19:35:54 19:36:03 19:36:12 19:36:20 11:42:26 11:42:43 11:43:00 11:43:18 11:43:35 11:43:52 11:44:10
time time
Figure 8. (a–d ) The change in NIS for the training data sequences taken from an engine run several
days prior to the event. This test on training data is included as a check.
(a) (b)
90
85
n1v n2v n3v
80
75
70
0.7
0.6
0.5
0.4
NIS
0.3
0.2
0.1
0
12:32:41 12:32:50 12:32:59 12:33:07 12:33:16 12:33:24 12:33:33 12:33:42 12:33:50 12:33:59 12:46:05 12:46:48 12:47:31 12:48:14 12:48:58 12:49:41 12:50:24 12:51:07
84
80
82
75 80
78
70 76
2.0 1.0
0.8
1.5
0.6
NIS
1.0
0.4
0.5
0.2
0 0
13:27:50 13:28:34 13:29:17 13:30:00 13:30:43 13:31:26 13:45:07 13:45:24 13:45:42 13:45:59 13:46:16 13:46:34 13:46:51 13:47:08
time time
Figure 9. (a–d ) The change in NIS for training data sequences taken from the day of the event, but
well before the event. This test on training data is included as a check.
84 85
82 80
80
75
78
76 70
0.8 15
0.6
10
NIS
0.4
5
0.2
0 0
17:01:06 17:01:15 17:01:24 17:01:32 17:01:41 17:01:49 17:01:58 17:02:07 17:02:15 17:02:24 17:29:28 17:29:37 17:29:46 17:29:54 17:30:03 17:30:12 17:30:20 17:30:29 17:30:37 17:30:46
time time
Figure 10. The changes in NIS for the two test sequences (three-dimensional observation model),
(a) prior to the radial driveshaft event and (b) at the time of the event. Note the two different
scales used for the NIS axis.
(a) (b)
100
n1v n2v n3v
90
80
86
84
p30
82
80
86
84
tgt
82
80
10
NIS
0
19:23:14 19:23:31 19:23:48 19:24:06 19:24:23 19:24:40 19:24:58 19:25:15 19:25:32 19:25:49 19:27:16 19:27:33 19:27:50 19:28:08 19:28:25 19:28:42 19:29:00 19:29:17
90 90
80 80
70 70
82 90
80
p30
80
78
76 70
82 90
80
tgt
80
78
76 70
6 15
4 10
NIS
2 5
0 0
19:35:02 19:35:11 19:35:20 19:35:28 19:35:37 19:35:46 19:35:54 19:36:03 19:36:12 19:36:20 11:42:26 11:42:43 11:43:00 11:43:18 11:43:35 11:43:52 11:44:10
time time
Figure 11. (a–d ) The change in NIS for the first four training sequences (several days before the
event) using the five-dimensional model (speeds plus tgt and p30).
(a) (b)
90 90
n1v n2v n3v
80 80
70 70
75 74.5
74.0
p30
74
73.5
73 73.0
75 74.5
74.0
tgt
74
73.5
73 73.0
10 3
2
NIS
5
1
0 0
12:32:41 12:32:50 12:32:59 12:33:07 12:33:16 12:33:24 12:33:33 12:33:42 12:33:50 12:33:59 12:46:05 12:46:48 12:47:31 12:48:14 12:48:58 12:49:41 12:50:24 12:51:07
85
80
80
70 75
78 79
76
p30
78
74
72 77
78 79
76
tgt
78
74
72 77
4 3
2
NIS
2
1
0 0
13:27:50 13:28:34 13:29:17 13:30:00 13:30:43 13:31:26 13:45:07 13:45:24 13:45:42 13:45:59 13:46:16 13:46:34 13:46:51 13:47:08
time time
Figure 12. (a–d ) The change in NIS for the last four training sequences (same days as the event but
well before its occurrence) using the five-dimensional model (speeds plus tgt and p30).
the eight sequences. NIS values are higher than for the three-speed model as a
result of the increased dimensionality of this model. Based on these results, it
would be possible to suggest a novelty detection threshold set at 37.0 for the five-
dimensional observation model (approximately 2.5 times the maximum value of
NIS on the training data).
Figure 13 shows the three speeds, p30, tgt and NIS for the two test sequences.
With the first test sequence (figure 13a), NIS only rises to 4.0. When the model is
applied to the event sequence which includes the diverging speeds (figure 13b),
the NIS value rises rapidly to 60.0 before subsequently decreasing. It is clear
that the five-dimensional observation model, which includes more information
about the engine than just the three shaft speeds, would have allowed an earlier
detection of the novel event (at tZ17 : 30 : 01, if a novelty threshold of 37.0 had
been used).
The limits for normal NIS values could be set in a training phase by working out
time-averaged values over data windows of several tens of seconds. These
statistically significant values could then be used for comparison in the testing
phase. However, in testing, point-by-point values of NIS (i.e. every second) would
still be used in preference to averages owing to the need to detect sudden changes.
85 90
80 80
75 70
79 80
78
p30
75
77
76 70
79 80
78
tgt
75
77
76 70
6 60
4 40
NIS
2 20
0 0
17:01:06 17:01:15 17:01:24 17:01:32 17:01:41 17:01:49 17:01:58 17:02:07 17:02:15 17:02:24 17:29:28 17:29:37 17:29:46 17:29:54 17:30:03 17:30:12 17:30:20 17:30:29 17:30:37 17:30:46
time time
Figure 13. The change in NIS for the two test sequences (five-dimensional model), (a) prior to the
event and (b) at the time of the event. Note the two different scales used for the NIS axis.
6. Conclusions
temperature) were added and a new five-dimensional model learnt from the same
training sequences, the level of discrimination between the novelty index (NIS)
on the anomalous data and the normal training data increased. This is not
surprising as more information is being provided, but increased dimensionality
can make the learning of the model from data more difficult.
A complete description of the engine’s behaviour for novelty detection
purposes can be provided through the use of multiple models. It is then possible
to switch between different dynamic models according to speed range and the
rate of change of speed, depending on the type of acceleration or deceleration
manoeuvre (smooth, rapid or ‘slam’).
In the continuing application of this work on the Rolls-Royce Trent family of
jet engines (Bailey et al. 2004), it is planned that novel events will be identified
during flight (on the rare occasions on which they occur) and notified to the
maintenance engineering crew on the ground once the aircraft has landed.
The authors wish to thank Dr Visakan Kadirkamanathan for his helpful discussions on learning
dynamic models from data.
Appendix A
P1 P2 P3 P4 P5 P6 P7 P8
T1 T2 T3 T4 T5 T6 T7 T8
P0
T0
Figure 14. Temperature and pressure notation of a typical turbo-jet engine. Picture courtesy of
Rolls-Royce plc (1986). Picture reproduced with the permission of Rolls-Royce plc.
References
Bailey, V., Utete, S., Bannister, P., Tarassenko, L., Honoré, G., Ong, M. & Nadeem, S. 2004 The
engine data store for DAME. In Proc. of the All-Hands UK e-science Meeting, Nottingham,
September 2004.
Bar-Shalom, Y. & Li, X.-R. 1993 Estimation and tracking. Artech House.
Boser, B. E., Guyon, I. M. & Vapnik, V. N. 1992 A training algorithm for optimal margin
classifiers. In Proc. of the 5th Annual ACM Workshop on Computational Learning Theory,
Pittsburgh, PA, pp. 144–152.
Chen, J. & Patton, R. J. 1998 Robust model-based fault diagnosis for dynamic systems. Boston,
MA: Kluwer Academic.
Gelb, A. (ed.) 1974 Applied optimal estimation. Cambridge, MA: MIT Press.
Ghahramani, Z. & Hinton G. E. 1996 Parameter estimation for linear dynamical systems. Report
no. CRG-TR-96-2, University of Toronto, Canada.
Hayton, P., Schölkopf, B., Tarassenko, L. & Anuzis, P. 2001 Support vector novelty detection
applied to jet engine vibration spectra. In Advances in neural information processing systems 13
(ed. T. K. Leen, T. G. Dietterich & V. Tresp), pp. 946–952. Cambridge, MA: MIT Press.
Kadirkamanathan, V., Li, P., Jaward, M. H. & Fabri, S. G. 2002 Particle filtering-based fault
detection in non-linear stochastic systems. Int. J. Syst. Sci. 33, 259–265. (doi:10.1080/
00207720110102566)
King, S. P., King, D. M., Anuzis, K., Astley, K., Tarassenko, L., Hayton, P. & Utete. S. 2002 The
use of novelty detection techniques for monitoring high-integrity plant. In Proc. of IEEE Int.
Conf. on Control Appl. 1. Glasgow. September 2002, pp. 221–226.
Korbicz, J. & Janczak, A. 2002 Artificial neural network models for fault detection and isolation of
industrial processes. Comput. Assist. Mech. Eng. Sci. 9, 55–69.
Murphy, K. 1998–2002 Bayes network Kalman filtering toolbox. http://www.ai.mit.edu/
wmurphyk, now at http://www.cs.ubc.ca/wmurphyk.
Markou, M. & Singh, S. 2003 Novelty detection: a review—part 1: statistical approaches. Signal
Process. 83, 2481–2497. (doi:10.1016/j.sigpro.2003.07.018)
Nairac, A., Townsend, N., Carr, R., King, S., Cowley, P. & Tarassenko, L. 1999 A system for the
analysis of jet engine vibration data. Integr. Comput. Aided Eng. 24, 53–65.
Rätsch, G., Mika, S., Schölkopf, B. & Müller, K. R. 2002 Constructing boosting algorithms from
SVMs: an application to one-class classification. IEEE Tran. Pattern Anal. Mach. Intell. 24,
1184–1199. (doi:10.1109/TPAMI.2002.1033211)
Rolls-Royce plc 1986 The jet engine. 4th edn. Derby, UK: Rolls-Royce plc.
Roweis, S. & Ghahramani, Z. 1999 A unifying review of linear Gaussian models. Neural Comput.
11, 305–345. (doi:10.1162/089976699300016674)
Sammon, J. W. 1969 A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 18,
401–409.
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J. & Williamson, R. C. 1999 Estimating the
support of a high-dimensional distribution. Report no. TR MSR 99–87, Microsoft Research,
Redmond, WA.
Schölkopf, B., Platt J. & Smola, A. J. 2000a Kernel method for percentile feature extraction.
Report no. TR MSR 2000–22, Microsoft Research, Redmond, WA.
Schölkopf, B., Williamson, R. C., Smola, A. J., Shawe-Taylor, J. & Platt, J. C. 2000b Support
vector method for novelty detection. In Advances in neural information processing systems 12
(ed. S. A. Solla, T. K. Leen & K.-R. Müller), pp. 582–588. Cambridge, MA: MIT Press.
Tipping, M. E. & Lowe, D. 1998 Shadow targets: a novel algorithm for topographic projections by
radial basis functions. Neurocomputing 19, 211–222. (doi:10.1016/S0925-2312(97)00066-0)
Vapnik, V. 1995 The nature of statistical learning theory. Berlin, Germany: Springer.
Venkatasubramanian, V., Rengaswamy, R., Yin, K. & Kawuri, S. N. 2003 A review of process fault
detection and diagnosis—part I: quantitative model-based methods. Comput. Chem. Eng. 27,
293–311. (doi:10.1016/S0098-1354(02)00160-6)