Journal of Process Control 14 (2004) 467–485
www.elsevier.com/locate/jprocont
Statistical process monitoring with independent component analysis
Jong-Min Lee a, ChangKyoo Yoo
a
b,1
, In-Beum Lee
a,*
Department of Chemical Engineering, Pohang University of Science and Technology, San 31 Hyoja-Dong, Pohang 790-784, South Korea
b
BIOMATH, Ghent University, Coupure Links 653, B-9000 Gent, Belgium
Received 17 January 2003; received in revised form 5 August 2003; accepted 9 September 2003
Abstract
In this paper we propose a new statistical method for process monitoring that uses independent component analysis (ICA). ICA is
a recently developed method in which the goal is to decompose observed data into linear combinations of statistically independent
components [1,2]. Such a representation has been shown to capture the essential structure of the data in many applications, including
signal separation and feature extraction. The basic idea of our approach is to use ICA to extract the essential independent components
that drive a process and to combine them with process monitoring techniques. I 2 , Ie2 and SPE charts are proposed as on-line monitoring charts and contribution plots of these statistical quantities are also considered for fault identification. The proposed monitoring method was applied to fault detection and identification in both a simple multivariate process and the simulation benchmark of
the biological wastewater treatment process, which is characterized by a variety of fault sources with non-Gaussian characteristics.
The simulation results clearly show the power and advantages of ICA monitoring in comparison to PCA monitoring.
2003 Elsevier Ltd. All rights reserved.
Keywords: Process monitoring; Fault detection; Independent component analysis; Kernel density estimation; Wastewater treatment process
1. Introduction
Monitoring and diagnosis are gaining importance in
process system engineering due to the increased number
of variables measured in chemical and biological plants
and improvements in the controllability of these variables. An important aspect of the safe operation of
chemical processes is the rapid detection of faults, process upsets, or other special events, and the location and
removal of the factors causing such events. However,
hundreds of variables may be monitored in a single
operating unit, and these variables may be recorded
hundreds or thousands of times per day. In the absence
of an appropriate processing method, only limited information can be extracted from these data. Hence, a
tool is required that can project the high-dimensional
process space into a low-dimensional space amenable
to direct visualization, and that can also identify
key variables and important features of the data. The
need to analyze high-dimensional and correlated process
*
Corresponding author. Tel.: +82-54-279-2274; fax: +82-54-2793499.
E-mail address:
[email protected] (I.-B. Lee).
1
Tel.: +32-9-264-6196; fax: +32-9-264-6220.
0959-1524/$ - see front matter 2003 Elsevier Ltd. All rights reserved.
doi:10.1016/j.jprocont.2003.09.004
data has led to the development of many monitoring
schemes that use multivariate statistical methods based
on principal component analysis (PCA) and partial least
squares. These methods have been used and extended in
various applications [3–10].
It is well known that many of the variables monitored
in process systems are not independent. The measured
process variables may be combinations of independent
variables that are not directly measurable (referred to as
latent variables in multivariate analysis). Independent
component analysis (ICA) can extract these underlying
factors or components from multivariate statistical data.
ICA defines a generative model for the observed multivariate data, which are typically in the form of a large
database of samples. In this model, the data variables
are assumed to be linear mixtures of some unknown
latent variables, where the mixing matrix of coefficients
is also unknown. The latent variables, which are called
the independent components (ICs) of the observed data,
are assumed to be non-Gaussian and mutually independent. ICA seeks to extract these independent components as well as the mixing matrix of coefficients [11].
Although ICA can be looked upon a useful extension
of PCA, its objective differs from that of PCA. PCA is a
dimensionality reduction technique that reduces the
468
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
data dimension by projecting the correlated variables
onto a smaller set of new variables that are uncorrelated
and retain most of the original variance. However, its
objective is only to decorrelate variables, not to make
them independent. PCA can only impose independence
up to second order statistics information (mean and
variance) while constraining the direction vectors to be
orthogonal, whereas ICA has no orthogonality constraint and involves higher-order statistics, i.e., not only
decorrelates the data (second order statistics) but also
reduces higher order statistical dependencies [12].
Hence, ICs reveal more useful information from observed data than principal components (PCs).
1.1. Motivating examples
In this paper, scalars are written in italic lower case,
vectors are written in bold lower case and matrices are
written in bold capitals.
To illustrate the superiority of ICA over PCA, we
applied the two types of analysis to a simple example
system, similar to that used by Hyv€
arinen and Oja [11]
and Lee [12] except that a modified mixing matrix was
used in the present work. Let us consider two source
variables that have the uniform distributions shown in
Fig. 1(a). The source variables are linearly independent,
i.e., the values of one source variable do not convey any
information about the other source variable. These
sources are linearly mixed as follows:
x ¼ As;
x1
1
¼
x2
1
2
ð1Þ
3
1
s1
:
s2
3
(a)
2
(b)
IC1 PC1
1
X2
S2
1
0
IC2
0
-1
-1
PC2
-2
-2
-2
-1
0
1
2
-3
-2
0
S1
3
2
(c)
U2
U2
2. PCA monitoring
(d)
2
1
0
-1
0
-2
-2
-3
2
X1
4
-2
0
U1
2
-4
-4
-2
0
Fig. 1(b) shows the scatter plot of the mixtures. Note
that the random variables x1 and x2 are not independent
because it is possible to predict the value of one of them
from the value of the other. When PCA is applied to
these mixed variables, it gives two principal components. The axes of the first and second PCs (PC1, PC2)
are shown in Fig. 1(b). The first PC is the axis capturing
the highest variance in the data and the second PC is the
axis orthogonal to the first PC. Fig. 1(c) shows the PCA
solution, which differs from the original because the
two principal axes are still dependent. However, the ICA
solution shown in Fig. 1(d) can recover original sources
since ICA not only decorrelates the data but also rotates it such that the axes of u1 and u2 are parallel to the
axes of s1 and s2 [12]. The axes of the first and second
independent components (IC1, IC2) are shown in Fig.
1(b).
The simple example given above clearly demonstrates that if the latent variables follow non-Gaussian
distribution, the ICA solution extracts the original
source signal to a much greater extent than the PCA
solution. Therefore, it is natural to infer that monitoring based on the ICA solution may give better results compared to PCA. To date, little literature exists
on the application of ICA techniques to the problem of
process monitoring. In the present article, the continuous process monitoring method based on ICA is
suggested. The basic idea of this approach is to extract
essential independent components that drive a process
and to combine them with process monitoring techniques.
The remainder of this article is organized as follows.
Conventional PCA monitoring is introduced in the next
section, followed by a brief introduction to the ICA
algorithm. The monitoring statistics of ICA are then
suggested and an explanation is given for the kernel
density estimation used to calculate the confidence limit
for non-Gaussian data. The superiority of process
monitoring using ICA is illustrated by applying the
proposed method both to a simple multivariate process
example and to the wastewater simulation benchmark.
Finally, a conclusion is given.
2
4
U1
Fig. 1. (a) Scatter plot of the original source data; (b) the mixtures and
axes of PCA and ICA; (c) the recovered source data using PCA; (d) the
recovered source data using ICA [11,12].
PCA can handle high dimensional, noisy, and correlated data by projecting the data onto a lower dimensional subspace which contains most of the variance of
the original data [5]. PCA decomposes the data matrix
Xp 2 Rnd (where n is the number of samples and d is the
number of variables) as the sum of the outer product of
vectors ti and pi plus the residual matrix, Ep .
Xp ¼ TPT þ E ¼
a
X
i¼1
ti pTi þ Ep ;
ð2Þ
469
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
where ti is a score vector which contains information
about relationship between samples, and pi is a loading
vector which contains information about relationship
between variables. Note that score vectors are orthogonal and loading vectors are orthonormal. Projection
into principal component space reduces the original set
of variables to a latent variables (LVs).
The portion of the measurement space corresponding
to the lowest d a singular values can be monitored by
using the squared prediction error (SPE), also called the
Q statistic [13]. The SPE is defined as the sum of squares
of each row (sample) of Ep ; for example, for the kth
sample vector in Xp , xðkÞ 2 Rd :
T
T
SPEðkÞ ¼ eðkÞ eðkÞ ¼ xðkÞ ðI Pa PTa ÞxðkÞ;
ð3Þ
where eðkÞ is the kth sample vector of Ep , Pa is the
matrix of the first a loading vectors retained in the PCA
model, and I is the identity matrix. The upper confidence
limit for the SPE can be computed from its approximate
distribution
#1=h0
" pffiffiffiffiffiffiffiffiffiffiffiffiffi
ca 2H2 h20
H2 h0 ðh0 1Þ
SPEa ¼ H1
;
ð4Þ
þ1þ
H1
H21
where ca is the standard normal deviate corresponding
to the upper ð1 aÞ percentile, kj is the eigenvalue
asP
sociated with the jth loading vector, Hi ¼ dj¼aþ1 kij for
i ¼ 1; 2; 3 and h0 ¼ 1 2H3H1 H2 3 .
2
A measure of the variation within the PCA model is
given by Hotelling’s T 2 statistic. T 2 at sample k is the
sum of the normalized squared scores, and is defined as
T 2 ðkÞ ¼ tðkÞT K1 tðkÞ;
ð5Þ
1
where K is the diagonal matrix of the inverse of the
eigenvalues associated with the retained principal components. The upper confidence limit for T 2 is obtained
using the F -distribution
2
Ta;n;a
¼
aðn 1Þ
Fa;na;a ;
na
ð6Þ
where n is the number of samples in the data and a is the
number of principal components.
The following ICA algorithm is based on the formalism presented in the survey article of Hyv€arinen and
Oja [11]. In the ICA algorithm, it is assumed that d
measured variables x1 ; x2 ; . . . ; xd can be expressed as
linear combinations of m ð 6 dÞ unknown independent
components s1 ; s2 ; . . . ; sm . The independent components
and the measured variables have means of zero. The
relationship between them is given by
X ¼ AS þ E;
ð7Þ
dn
where X ¼ ½xð1Þ; xð2Þ; . . . ; xðnÞ 2 R
is the data matrix (in contrast to PCA, ICA employs the transposed
data matrix), A ¼ ½a1 ; . . . ; am 2 Rdm is the unknown
mixing matrix, S ¼ ½sð1Þ; sð2Þ; . . . ; sðnÞ 2 Rmn is the
independent component matrix, E 2 Rdn is the residual
matrix, and n is the number of samples. Here, we assume
d P m (when d ¼ m, the residual matrix, E, becomes the
zero matrix). The basic problem of ICA is to estimate
both the mixing matrix A and the independent components S from only the observed data X. Alternatively, we
could define the objective of ICA as follows: to find a
demixing matrix W whose form is such that the rows of
^ given as
the reconstructed matrix S,
^ ¼ WX
S
ð8Þ
become as independent of each other as possible. This
formulation is not really different from the previous one,
since after estimating A, its inverse gives W when d
equals m.
From now on, we assume d equals m unless otherwise
specified. For mathematical convenience, we define that
the independent components have unit variance. This
makes the independent components unique, up to their
signs [11]. The initial step in ICA is whitening, also known
as sphering, which eliminates all the cross-correlation
between random variables. Consider a d-dimensional
random vector xðkÞ at sample k with covariance Rx ¼
EðxðkÞxT ðkÞÞ where E represents expectations. The eigendecomposition of Rx is given by
Rx ¼ UKUT :
ð9Þ
The whitening transformation is expressed as
zðkÞ ¼ QxðkÞ;
1=2
3. ICA monitoring
3.1. Independent component analysis
Independent component analysis (ICA) is a statistical
and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals. ICA was originally proposed to solve
the blind source separation problem, which involves
recovering independent source signals (e.g., different
voice, music, or noise sources) after they have been
linearly mixed by an unknown matrix, A [14].
ð10Þ
T
U . One can easily verify that Rz ¼
where Q ¼ K
EðzðkÞzT ðkÞÞ is the identity matrix under this transformation. After the transformation we have
zðkÞ ¼ QxðkÞ ¼ QAsðkÞ ¼ BsðkÞ;
ð11Þ
where B is an orthogonal matrix as verified by the following relation:
EfzðkÞzT ðkÞg ¼ BEfsðkÞsT ðkÞgBT ¼ BBT ¼ I:
ð12Þ
We have therefore reduced the problem of finding an
arbitrary full-rank matrix A to the simpler problem of
finding an orthogonal matrix B since B has fewer parameters to estimate as a result of the orthogonality
470
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
constraint. Then, from Eq. (11), we can estimate sðkÞ as
follows
^sðkÞ ¼ BT zðkÞ ¼ BT QxðkÞ:
ð13Þ
From Eqs. (8) and (13), the relation between W and B
can be expressed as
W ¼ BT Q:
ð14Þ
To calculate B, each column vector bi is initialized
and then updated so that ith independent component
^si ¼ ðbi ÞT z may have great non-Gaussianity. Hyv€arinen
and Oja [11] showed that Ônon-Gaussian represents independence’ using the central limit theorem. There are
two common measures of non-Gaussianity: kurtosis and
negentropy. Kurtosis is sensitive to outliers. On the
other hand, negentropy is based on the informationtheoretic quantity of (differential) entropy. Entropy is a
measure of the average uncertainty in a random variable
and the differential entropy H of random variable y with
density f ðyÞ is defined as
Z
H ðyÞ ¼ f ðyÞ log f ðyÞ dy:
ð15Þ
A Gaussian variable has maximum entropy among all
random variables with equal variance [11]. In order to
obtain a measure of non-Gaussianity that is zero for a
Gaussian variable, the negentropy J is defined as follows:
J ðyÞ ¼ H ðygauss Þ H ðyÞ;
ð16Þ
where ygauss is a Gaussian random variable with the same
variance as y. Negentropy is nonnegative and measures
the departure of y from Gaussianity [15]. However, estimating negentropy using Eq. (16) would require an
estimate of the probability density function. To estimate
negentropy efficiently, Hyv€
arinen and Oja [11] suggested
simpler approximations of negentropy as follows:
2
J ðyÞ ½EfGðyÞg EfGðvÞg ;
ð17Þ
where y is assumed to be of zero mean and unit variance,
v is a Gaussian variable of zero mean and unit variance,
and G is any non-quadratic function. By choosing G
wisely, one obtains good approximations of negentropy.
Hyv€
arinen and Oja [11] suggested a number of functions
for G:
G1 ðuÞ ¼
1
log coshða1 uÞ;
a1
ð18Þ
G2 ðuÞ ¼ expða2 u2 =2Þ;
ð19Þ
G3 ðuÞ ¼ u4 ;
ð20Þ
where 1 6 a1 6 2 and a2 1. Among these three functions, G1 is a good general-purpose contrast function
and was therefore selected for use in the present study.
The non-quadratic function G is described in detail in
the paper of Hyv€
arinen [16].
Based on approximate form for the negentropy,
Hyv€arinen [17] introduced a very simple and highly efficient fixed-point algorithm for ICA, calculated over
sphered zero-mean vectors z. This algorithm calculates
one column of the matrix B and allows the identification
of one independent component; the corresponding IC
can then be found using Eq. (13). The algorithm is repeated to calculate each independent component. The
algorithm is as follows:
1. Choose m, the number of ICs to estimate. Set counter
i
1.
2. Take a random initial vector bi of unit norm.
3. Let bi
EfzgðbTi zÞg Efg0 ðbTi zÞgbi , where g is the
first derivative and g0 is the second derivative of G,
where G takes the form of Eqs. (18), (19) or (20).
4. Do
following orthogonalization: bi
bi
Pi1 the
T
j¼1 ðbi bj Þbj .
bi
.
5. Normalize bi
kbi k
6. If bi has not converged, go back to step 3.
7. If bi has converged, output the vector bi . Then, If
i 6 m set i
i þ 1 and go back to step 2.
Note that the final vector bi ði ¼ 1; . . . ; mÞ given by
the algorithm equals one of the columns of the (orthogonal) mixing matrix B. After calculating B, we can
obtain ^sðkÞ and demixing matrix W from Eqs. (13) and
(14), respectively. For more details on the FastICA algorithm, see Hyv€arinen and Oja [11], Hyv€arinen [17,18],
Hyv€arinen et al. [19], and Li and Wang [20].
3.2. Ordering and dimension reduction of ICA
In chemical and biological processes, the measured
variables are quantitative (e.g., temperature, pressure,
and flow rate) and qualitative (e.g., key component
concentration). Dimension reduction in ICA is based
on the idea that these measured variables are the mixture of some independent variables [21]. An important
part of ICA monitoring is the selection of a small
number of dominant components from the list of all
independent components. This procedure has at least
two advantages [20]:
1. Robust performance: The dominant components reveal the majority of information about the stochastic
mechanism that gives rise to the observed series. The
model built based on these components will have robust performance in ICA monitoring, but without
considering trivial details.
2. Reduction of analysis complexity: To gain a good understanding of the mechanism behind the ICA monitoring sometimes entails the interpretation of the
physical meaning of the independent components,
which is a nontrivial task. Concentrating on the dominant components facilitates this analysis.
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
One approach to choosing the dominant components
is to separate the selection process into two steps:
Step 1 List all the independent components in the appropriate order.
Step 2 Select the first few components in the list as the
dominant ones.
In PCA, the order of the score vectors is determined
by their variance. Therefore, data dimension can be reduced by selecting dominant score vectors. However, the
ordering of components is very difficult in ICA and
there is no standard criterion. A number of methods
have been suggested to determine the component order.
For example, the components can be sorted according
to their non-Gaussianity [18]. Alternatively, Back and
Weigend [22] decided the component order according to
the L1 norm of each individual component. Cardoso
and Souloumica [23] used a Euclidean norm to sort the
rows of the demixing matrix W according to their contribution across all signals. Other criteria such as the
variance or data reconstruction criterion have also been
suggested to decide the order of independent components [21,24]. However, these methods are based only on
the mean squared error (MSE) data reconstruction criterion and the computation becomes prohibitively
complex when the number of variables is large.
In the present study we used a Euclidean norm ðL2 Þ to
sort the rows of the demixing matrix, W, because this
method is very simple and gives good results in ICA
monitoring. Hence, the order of the ICs is decided by the
L2 norm of each wi , the row of W [23]: argi Maxkwi k2 .
That is, the ICs are sorted using an L2 norm in order to
show only those ICs that cause dominant changes in the
process.
After the ordering of the ICs, it is important to select
the optimal number of ICs in order to achieve good
monitoring and prediction; selecting too many ICs will
cause a magnification of noise and poor process monitoring performance. The data dimension can be reduced
by selecting a few rows of W based upon the assumption
that the rows with the largest sum of squares coefficient
have the greatest effect on the variation of S. This approach is based on the idea that the dominant process
variation can be monitored by considering the cumulative sums of only the first few dominant ICs [24]. We
used a graphical technique to determine the number of
ICs similar to the SCREE test of PCA. Fig. 2 gives a
representative plot of the percentage of the L2 norm of
the sorted demixing matrix (W) against the IC number.
The sorted demixing matrix is obtained from the normal
operating data of the wastewater treatment process
(WWTP, Section 4.2). Note that the L2 norms of last
four ICs are much smaller than the rest, indicating a
break of some kind between the first three ICs and the
remaining four. The model constructed based on the ICs
in Fig. 2 would include three ICs.
3.3. Process monitoring statistics with ICA
On-line monitoring of measurement variables is carried out with the aim of continuously analyzing and
interpreting the measurements in order to detect and
isolate disturbances and faults. The implementations of
the monitoring statistics of ICA are similar to those of
the monitoring statistics of PCA. The ICA model is
50
45
% L2 norm of each row of W
40
35
30
25
20
15
10
5
0
1
2
3
471
4
5
6
7
Number of IC
Fig. 2. Plot of percent L2 norm of each row of W against IC number.
472
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
based on historical data collected during normal operation, i.e., when product is being manufactured and only
common cause variation is present. Future process behavior is then compared against this Ônormal’ or Ôincontrol’ representation.
In the normal operating condition, designated Xnormal ,
^ normal are obtained from the FastICA alW as well as S
^
gorithm ðSnormal ¼ WXnormal Þ under the assumption that
the number of variables is equal to the number of independent components. The matrices B, Q, and A used
in Eq. (11) are also obtained by whitening and the
FastICA algorithm. As mentioned in the previous section, the data dimension can be reduced by selecting a
few rows of W based upon the assumption that the rows
with the largest sum of squares coefficient have the
^ The selected a rows
greatest effect on the variation of S.
of W constitute a reduced matrix Wd (dominant part of
W), and the remaining rows of W constitute a reduced
matrix We (excluded part of W). We can construct a
reduced matrix Bd by selecting the columns from B
whose indices correspond to the indices of the rows selected from W. Bd can also be computed directly using
Eq. (14), i.e., Bd ¼ ðWd Q1 ÞT . The remaining columns
of B constitute the matrix Be . Then, new independent
data vectors, ^snew d ðkÞ and ^snew e ðkÞ, can be obtained if
new data for sample k, xnew ðkÞ, is transformed through
the demixing matrices Wd and We , i.e., ^snew d ðkÞ ¼
Wd xnew ðkÞ and ^snew e ðkÞ ¼ We xnew ðkÞ, respectively.
In PCA, two types of statistics are calculated from the
process model in normal operation: the D-statistic for
the systematic part of the process variation and the Qstatistic for the residual part of the process variation.
Similarly, these statistics can be applied to ICA monitoring. The D-statistic for sample k, also known as the I 2
statistic, is the sum of the squared independent scores
and is defined as follows:
T
I 2 ðkÞ ¼ ^snew d ðkÞ ^snew d ðkÞ:
ð21Þ
The Q-statistic for the nonsystematic part of the
common cause variation of new data, also known as the
SPE statistic, can be visualized in a chart with confidence limits. The SPE statistic at sample k is defined as
follows:
T
SPEðkÞ ¼ eðkÞ eðkÞ
T
^ðkÞÞ ðxðkÞ x
^ðkÞÞ;
¼ ðxðkÞ x
ð22Þ
^ðkÞ can be calculated as follows:
where x
^ ¼ Q1 Bd^sðkÞ ¼ Q1 Bd Wd xðkÞ:
x
ð23Þ
Simoglou et al. [25] proposed a second T 2 statistic for
monitoring the state of the system, which is based on the
excluded canonical variate analysis (CVA) states. Taking a similar approach, we propose a second I 2 metrics
ðIe2 Þ based on d a excluded independent components
ð^snew e ðkÞÞ. Monitoring the non-systematic part of the
measurements provides an additional fault detection
tool, which can detect special events entering the system.
The Ie2 statistic has the further advantage that it can
compensate for the error that results when an incorrect
number of ICs is selected for the dominant part. The use
of I 2 and Ie2 statistics allows the entire space spanned by
the original variables to be monitored through a new
basis. The Ie2 statistic is defined as follows:
Ie2 ðkÞ ¼ ^snew e ðkÞT^snew e ðkÞ:
ð24Þ
When a data-driven process monitoring technique is
executed, we assume that normal operating data conform to some fixed distribution. Hence, we need to
specify the distribution of normal operating data and
find its control limit before monitoring new data on-line.
In PCA monitoring, the confidence limit is based on a
specified distribution such as those shown in Eqs. (4) and
(6) based upon the assumption that the latent variables
follow a Gaussian distribution. In ICA monitoring,
however, the independent components over some period
do not conform to a multivariate Gaussian distribution;
hence, the confidence limits of the I 2 , Ie2 and SPE statistics cannot be determined directly from a particular
approximate distribution. Thus, we need to find an alternative method. The confidence limits of the three
statistics, I 2 , Ie2 and SPE, can be obtained by kernel
density estimation, which will be explained in the next
section.
3.4. Confidence bounds
Once a model has been developed that reflects the
normal operation region, it is necessary to detect any
departure of the d-dimensional process from its standard behavior. That is, we must calculate the limit value
to determine whether the process is in control or not. In
PCA monitoring, Hotelling’s T 2 analysis and the SPE
charts are effective tools for extracting the critical features of the data. These analyses are based on the assumption that the probability density functions of the
latent variables follow a multivariate Gaussian distribution. However, contrary to this assumption, Martin
and Morris [26] reported that the latent variables in
many industrial processes rarely have a multivariate
Gaussian distribution through tests for multivariate
normality on the scores. Hence, the use of Hotelling’s T 2
analysis and the SPE chart may be inaccurate and misleading [26]. An alternative approach to defining the
nominal operating regions is to use data-driven techniques such as non-parametric empirical density estimates using kernel extraction [26,27].
Note that the latent variables in ICA monitoring do
not follow a Gaussian distribution; hence, the confidence
limit of I 2 and SPE statistics cannot be determined directly from a particular approximate distribution. We
therefore use the kernel density estimation in calculating
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
the confidence limit of I 2 , Ie2 and SPE statistics of ICA
monitoring.
A univariate kernel estimator with kernel K is defined
by
n
nx x o
1 X
i
;
ð25Þ
K
f^ðxÞ ¼
nh i¼1
h
where x is the data point under consideration, xi is an
observation value from the data set, h is the window
width (also known as the smoothing parameter), n is the
number of observations, and K is the kernel function.
The kernel estimator is therefore a sum of Ôbumps’ located at the observations. The kernel function K determines the shape of the bumps and satisfies the condition
Z 1
KðxÞ dx ¼ 1:
ð26Þ
1
There are a number of possible kernel functions. In
practice, the form of the kernel function is not very
important, and the Gaussian kernel function is the most
commonly used [28]. The Gaussian kernel is also employed in the present study.
In ICA monitoring, two-dimensional plots of scores
in the ICs plane are used to define the nominal operating
condition. In this case the kernel estimator is defined by
n
1 X
x xi y yi
^
f ðx; yÞ ¼
K
;
;
ð27Þ
nh1 h2 i¼1
h1
h2
where h1 and h2 are the smoothing parameters, x and y are
coordinates in the plane formed from two independent
components, and xi and yi are the independent component coordinates in the normal operating condition.
Many measures have been proposed for the estimation of h, the window width or smoothing parameter.
The problem of choosing how much to smooth is of
crucial importance in density estimation. If h is too large
we ‘‘oversmooth’’, erasing detail. If h is to small we
‘‘undersmooth’’, and fail to filter out spurious detail.
Several methods exist that automatically choose an optimal (in some sense) value of h, although a subjective
choice is often equally valid. One simplistic method for
automatically choosing h is to assume some underlying
distribution, for example the standard normal density,
and then estimate a smoothing parameter based upon
this assumption. However, if this assumption is not
valid, ‘‘oversmoothing’’ often results [29]. The more
advanced methods for selecting h are based on crossvalidation, for example least squares cross-validation
(LSCV) and biased cross-validation [28]. Here, we use
LSCV to select h. For more details regarding kernel
density estimation, refer to the books of Silverman [28]
and Wand and Jones [29].
The control limits used in ICA monitoring charts can
be obtained using kernel density estimation as follows.
First, the I 2 , Ie2 or SPE values from normal operating
473
data are required. Then, the univariate kernel density
estimator is used to estimate the density function of the
normal I 2 , Ie2 or SPE values. The point, occupying the
99% area of density function, can be obtained and becomes the control limit of normal operating data (I 2 , Ie2
or SPE values).
One major advantage of the confidence region obtained using kernel density estimation is that it follows
the data more closely, and is less likely to incorporate regions of unknown operation, than confidence
regions obtained on the basis of Hotelling’s T 2 statistics
[26].
3.5. Contribution plots
In the previous section it was stated that process
faults are detected by computing three multivariate
control charts. However, the monitoring charts do not
detect a particular fault, they simply indicate the presence of a variation in the process that is not included
within the common-cause variations captured in the
NOC data. Such anomalous variations usually correlate
with a problem in the process. The monitoring charts
give no information on what is wrong with the process,
or which process variables caused the process to be out
of control. Once a fault is detected by the statistical
monitoring method, the key approach to fault isolation
using the ICA model is the use of contribution plots. By
interrogating the underlying process model at the point
where an event has been detected, contribution plots
may reveal the group of process variables that most
influence the model or the residuals [30–33].
Let us now consider the T 2 and I 2 statistics of PCA
and ICA. Large score values result in large values of T 2
and I 2 which are detected and the corresponding object
isolated. However, variable loadings do not provide a
way to identify the variables that contribute to large T 2
and I 2 values. It is important to understand that the
loadings indicate how the variables are correlated in the
model given by a set of calibration objects. On the other
hand, an individual object can deviate significantly from
the bulk of the calibration objects. If there are many
variables, say, tens of variables, it is not easy to identify
those variables that contributed the large T 2 and I 2
values. Therefore, it is necessary to introduce the concept of variable contribution.
In PCA, the variable contributions to the T 2 -value of
an object k are computed using the following equation
[33]:
Variable contribution for object ðPCAÞ
pffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffi
¼ tðkÞ K1 PT ¼ xðkÞP K1 PT ;
ð28Þ
where K is a diagonal matrix which has diagonal elements equal to eigenvalues.
474
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
Training procedure
Data scaling and whitening
Obtain ICA model from normal operation data
Develop the confidence limits of I2, Ie2 and SPE charts
using kernel density estimation
On-line monitoring procedure
For new data, calculate I2, Ie2 and SPE values
No
I2 value>
Confidence
Limit ?
Yes
Fault in deterministic part of
ICA. Diagnose the faults.
Compare variable contribution.
No
No
Ie2 value>
Confidence
Limit ?
SPE value >
Confidence
Limit ?
Yes
Yes
Fault in excluded part of ICA.
Diagnose the faults.
Compare variable contribution
Fault in residual space.
Diagnose the faults.
Compare variable contribution
Inform the operator and
deal with the detected faults.
Fig. 3. Process monitoring scheme of the proposed ICA method.
In ICA, the variable contributions of xðkÞ for I 2 ðkÞ
and Ie2 ðkÞ can be obtained using the following equations,
respectively.
xcd ðkÞ ¼
Q1 Bd^snew d ðkÞ
k^snew d ðkÞk;
kQ1 Bd^snew d ðkÞk
ð29Þ
xce ðkÞ ¼
Q1 Be^snew e ðkÞ
k^snew e ðkÞk:
kQ1 Be^snew e ðkÞk
ð30Þ
Eqs. (29) and (30) are generated based on the fact that
the sum of the squared variable contribution values
should equal the I 2 and Ie2 values, respectively, i.e.,
xcd ðkÞT xcd ðkÞ ¼ I 2 ðkÞ and xce ðkÞT xce ðkÞ ¼ Ie2 ðkÞ. Similarly, variable contributions can also be computed for
the SPE statistic, i.e., the variable contribution of the
residuals. Generally, the aberrant variables will have the
largest residuals. The residual at sample k, SPEðkÞ, is
defined as the sum of the squares of eðkÞ. Thus, the
vector eðkÞ contains information on the individual prediction errors of each process variable at sample k. By
plotting eðkÞ as a bar graph, the contributions to SPEðkÞ
can be viewed. The relative size of the bars indicates the
contribution from each variable to the prediction error,
or the lack of fit of a sample to the model. In some
situations, especially those with varying SPE statistic, it
is a good idea to use a mean of the contribution to SPE
over some period. This is done by replacing the vector
eðkÞ by a vector expressing the mean error over a period
of length l.
Our proposed strategy is depicted in Fig. 3. First, the
calibration model is built by ICA and kernel density
estimation and then the system is monitored through
two steps. As for conventional PCA monitoring, if the I 2
statistic exceeds the limit, it indicates that a process
change in the model space has occurred, if the Ie2 statistic
exceeds the limit, it indicates that a process change in the
excluded model space has occurred, and if the Q-statistic
of residual space exceeds the confidence interval, it indicates the occurrence of changes that violate the ICA
model. We use contribution plots to identify and isolate
the nature of process faults.
4. Illustrative examples (comparison ICA with PCA)
PCA uses only the information contained in the covariance matrix of the data vector x, whereas ICA uses
information on the distribution of x that is not contained in the covariance matrix. Hence, the use of ICA
for monitoring may give more sophisticated results because it uses the independent components rather than
the principle components. In this paper, the FastICA
475
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
algorithm developed by Hyv€
arinen and Oja [11] was
applied to the detection and diagnosis of faults during
monitoring.
4.1. A simple multivariate process
Let us consider the following simple multivariate
process, which is a modified version of the system suggested by Ku et al. [4].
2
3
0:118 0:191 0:287
zðkÞ ¼ 4 0:847
0:264
0:943 5zðk 1Þ
0:333 0:514 0:217
2
3
1
2
þ 4 3 4 5uðk 1Þ;
ð31Þ
2 1
yðkÞ ¼ zðkÞ þ vðkÞ;
ð32Þ
where u is the correlated input:
0:811 0:226
uðk 1Þ
uðkÞ ¼
0:477 0:415
0:193
0:689
þ
wðk 1Þ:
0:320 0:749
ð33Þ
The input w is a random vector of which each element
is uniformly distributed over the interval ð2; 2Þ. The
output y is equal to z plus a random noise vector v. Each
element of v has zero mean and a variance of 0.1. Both
input u and output y are measured but z and w are not.
Normal data with 200 samples are used for analysis. The
T
data vector for analysis consists of xðkÞ ¼ ½yT ðkÞuT ðkÞ .
The total 5 variables ðy1 ; y2 ; y3 ; u1 ; u2 Þ are scaled to zero
mean and unit variance to prevent less important vari-
ables with large magnitudes overshadowing important
variables with small magnitudes.
Conventional PCA implicitly assumes that the observations at one time are statistically independent of
observations at any past time. That is, it implicitly assumes that the measured variable at one time instant not
only has serial independence within each variable series
at past time instants but also statistical inter-independence between the different measured variable series at
past time instants. However, the dynamics of a typical
chemical or biological process cause the measurements
to be time dependent, which means that the data may
have both cross-correlation and auto-correlation. PCA
methods can be extended to the modeling and monitoring of dynamic systems by augmenting each observation vector with the previous l observations [4]. Here,
we consider only static PCA and static ICA, although
dynamic PCA and dynamic ICA with time lagged
variables can be considered.
In PCA, a three principal component model is developed, which captures about 87.5% of the variance of
the process. The disturbances for monitoring and diagnosis are as follows:
• Disturbance 1: A step change of w1 by 3 is introduced
at sample 50.
• Disturbance 2: w1 is linearly increased from sample
50 to 149 by adding 0:05ðk 50Þ to the w1 value of
each sample in this range, where k is the sample number.
The T 2 and SPE charts for PCA monitoring of the
process with disturbance 1 are shown in Fig. 4. The 99%
15
T2
10
5
0
0
20
40
60
80
100
120
140
160
180
200
0
20
40
60
80
100
120
140
160
180
200
6
5
SPE
4
3
2
1
0
Sample Number
Fig. 4. PCA monitoring charts: T 2 and SPE plots of the data for disturbance 1 with three PCs.
476
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
60
I2
40
20
0
0
20
40
60
80
100
120
140
160
180
200
0
20
40
60
80
100
120
140
160
180
200
0
20
40
60
80
100
120
140
160
180
200
Ie 2
10
5
0
SPE
15
10
5
0
Sample Number
Fig. 5. ICA monitoring charts: I 2 , Ie2 and SPE plots of the data for disturbance 1 with two ICs.
confidence limits are also shown. It is evident from these
charts that PCA cannot detect the disturbance, and
captures only the dominant randomness. However, applying ICA to the same process data gives the results
presented in Fig. 5, which show relatively correct disturbance detection in comparison to PCA. The 99%
confidence limits of I 2 , Ie2 and SPE are obtained from
kernel density estimation of normal operating condition
data. As shown in Fig. 5, I 2 exceeds the confidence limit
from sample 51, a delay of only one sample. Also, the
superiority of ICA monitoring over the PCA-based ap-
proach is evident in the plots shown in Fig. 6. The regions of the process that are out of control are easily
seen in the ICs plot, whereas most samples are within
the region of normal operation in the PCs plot, despite
the presence of the fault.
The T 2 and SPE charts for PCA monitoring of the
process with disturbance 2 are shown in Fig. 7. With
PCA, the T 2 and SPE charts again fail to detect the
disturbance, although a few T 2 values exceed the 99%
control limit. With ICA (Fig. 8), the disturbance is well
detected by the I 2 chart from about the 78th sample
(a)
PCs plot
(b)
ICs plot
Fig. 6. (a) Plot of two principal component values and (b) plot of two independent component values for disturbance 1. The normal operating region
(95% and 99%) is indicated with dashed lines.
477
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
15
T2
10
5
0
0
20
40
60
0
20
40
60
80
100
120
140
160
180
200
80
100
120
140
160
180
200
6
5
SPE
4
3
2
1
0
Sample Number
Fig. 7. PCA monitoring charts: T 2 and SPE plots of the data for disturbance 2 with three PCs.
I2
100
50
0
0
20
40
60
80
100
120
140
160
180
200
0
20
40
60
80
100
120
140
160
180
200
0
20
40
60
80
100
120
140
160
180
200
15
Ie 2
10
5
0
SPE
15
10
5
0
Sample Number
Fig. 8. ICA monitoring charts: I 2 , Ie2 and SPE plots of the data for disturbance 2 with two ICs.
onwards. In addition, the I 2 value rapidly decreases to
the normal operating condition after the stopping of
the disturbance at sample 150. Furthermore, the I 2
values increase linearly during disturbance 2, which
corresponds to the characteristics of the disturbance.
Hence, ICA may be used in disturbance isolation. For
disturbance 2, the plots of the PCs and ICs along with
normal operating region are depicted in Fig. 9. Inspection of these plots reveals that the fault does not
appear in the PCs plot whereas the ICs plot successfully
separates the out-of-control data from the normal operating data.
478
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
(b)
IC-2
PC-2
(a)
PCs plot
ICs plot
Fig. 9. (a) Plot of two principal component values and (b) plot of two independent component values for disturbance 2. The normal operating region
(95% and 99%) is indicated with dashed lines.
4.2. Wastewater treatment process
Advanced monitoring and control strategies for
WWTP have attracted much recent interest as a consequence of the increasing stringency of environmental
regulations. However, some specific features about this
process are yet to be fully addressed. First, most changes
in this biological process are slow and recovery from
failures can be time-consuming and expensive, for example it can take several months for the process to recover from an abnormal operation. Therefore, early
detection of developing abnormalities is especially important for this process. Secondly, most WWTPs are
subject to large diurnal fluctuations in the flow rate and
composition of the feed stream. Consequently, these
biological processes exhibit periodic characteristics, with
the values of the flow rate and composition of the feed
waste stream showing strong diurnal fluctuations. Since
the variables of such processes tend to fluctuate widely
over a cycle, their mean and variance do not remain
constant with time. Because of this, conventional mul-
tivariate statistical process monitoring (MSPM) methods like PCA, which implicitly assume a stationary
underlying process, may lead to numerous false alarms
and missed faults. Better treatment performance can be
expected from advanced monitoring and control strategies that account for the non-Gaussianity of the periodic patterns in the biological process [34].
The ICA monitoring algorithm proposed here was
tested for its ability to detect various disturbances in
simulated data obtained from a benchmark simulation of
the WWTP. This simulation model combines nitrification with predenitrification, the most commonly used
process for nitrogen removal. The activated sludge
model No. 1 (ASM1) and a ten-layer settler model were
used to simulate the biological reactions and the settling
process, respectively. Fig. 10 shows the flow diagram of
the modeled WWTP system. The plant was designed to
treat an average flow of 20,000 m3 d1 with an average
biodegradable COD concentration of 300 mg l1 . The
plant consists of a 5-compartment bioreactor (6000 m3 )
and a secondary settler (6000 m3 ). For the sludge con-
Qe, Ze
Q0, Z0
Unit 1
Unit 2
Unit 3
Unit 4
m = 10
Unit 5
m=6
Qf, Zf
Qa, Za
Qu, Zu
Qr, Zr
Fig. 10. Process layout for the simulation benchmark.
m=1
Qw, Zw
479
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
centration of 3 kgm3 this corresponds to a sludge load
of approximately 0.33 kg BOD5 kg1 sludge day1 which
is quite critical at 15 C, so the effluent composition is
sensitive to the applied control strategy. The first two
compartments of the bioreactor are not aerated whereas
the others are aerated. All the compartments are considered to be ideally mixed whereas the secondary settler
is modeled with a series of 10 layers with one dimension.
For more detailed information about this benchmark,
refer to the website of the COST working group (http://
www.ensic.u-nancy.fr/COSTWWTP).
Influent data and operation parameters developed by
a working group on the benchmarking of wastewater
treatment plants, COST 624, were used in the simulation
[35]. The data covered a period of two weeks. The
training model was based on a normal operation period
of one week of dry weather and validation data was used
on the data set for the last 7 days. The sampling time
was 15 min. The data used were the influent file and
outputs with noises suggested by the benchmark. We
selected seven variables, listed in Table 1, from among
the many variables used in the benchmark to build the
monitoring system. These variables were chosen because
they are typically monitored and important variables in
the real WWTP systems.
Two types of disturbances were tested using the
proposed method, external disturbances and internal
disturbances. External disturbances are defined as those
imposed upon the process from the outside and are
detectable when monitoring the influent characteristics.
Internal disturbances are caused by changes within the
process that affect the process behavior. For the external
disturbance, two short storm events were simulated,
while deterioration of nitrification was simulated to
study an internal disturbance [34,36].
The density estimate and normal probability plot of
the second score vector ðt2 Þ calculated by applying PCA
to the simulation benchmark of the normal operating
data (Fig. 11(a) and (b)) make it clear that the values of
t2 do not follow a Gaussian distribution. Thus, the calculation of T 2 and SPE charts based on the assumption
that the data are Gaussian distributed may lead to poor
monitoring performance.
4.2.1. External disturbance (storm events)
We applied the proposed monitoring method to a
process in which two sudden storm events occur after a
long period of dry weather. This example demonstrates
how external disturbances appear within the proposed
method. The pattern of measurement variables during
the storm weeks was taken from the storm condition in
the benchmark. The values of the measurement variables during the storm weeks are presented in Fig. 12
(first storm: samples 850–865 and second storm: samples
1050–1110). The effects of the first storm can be seen at
around sample 850 and the effects of the second storm
can be seen at around sample 1050. During the storm
events, the average influent flow rate increases from
Table 1
Variables used in the monitoring of the benchmark model
No.
Symbol
Meaning
1
2
3
4
5
6
7
SNH;in
Qin
TSS4
SO;3
SO;4
KL a5
SNO;2
Influent ammonium concentration
Influent flow rate
Total suspended solid (reactor 4)
Dissolved oxygen concentration (reactor 3)
Dissolved oxygen concentration (reactor 4)
Oxygen transfer coefficient (reactor 5)
Nitrate concentration (reactor 2)
Normal Probability Plot
0.4
0.999
0.997
0.99
0.98
0.95
0.90
0.35
0.25
Probability
Density estimate
0.3
0.2
0.15
0.50
0.25
0.10
0.05
0.02
0.01
0.003
0.001
0.1
0.05
0
-4
0.75
-2
0
2
Second score value (t2)
(a)
4
6
-2
0
2
Data
(b)
Fig. 11. (a) Density estimate; (b) normality check of second score ðt2 Þ obtained from PCA.
4
480
S-NO2
K La 5 S-O4 S-O3 TSS4
Q in
S NH,in
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
60
40
20
0
0
200
400
600
800
1000
1200
1400
4000 0
3000
2000
4 0
2
0
6 0
4
2
200
400
600
800
1000
1200
1400
200
400
600
800
1000
1200
1400
200
400
600
800
1000
1200
1400
300 0
200
100
200
400
600
800
1000
1200
1400
0
200
400
600
800
1000
1200
1400
0
200
400
600
800
1000
1200
1400
60000
40000
20000
2
1
0
Samples
Fig. 12. X -block variables during the storm weeks.
20,000 m3 d1 to 60,000 m3 d1 . Because sudden increase
of these influent flow rates affects other internal and
external variables through change of microbiological
reaction rate, it also makes the covariance structure
change. ICA monitoring was applied to this disturbance
case. To reduce the dimension, two independent components were extracted. Fig. 13 shows the independent
component score plot for IC1 and IC2 during the storm
weeks with confidence bounds (95% and 99%) calculated
using the kernel density estimation. Around samples 850
and 1050, which indicate the first and second storm, the
projected independent components depart from the
normal operating condition.
Fig. 14 shows the I 2 , Ie2 and SPE charts over the period that includes the storm events. Two ICs are selected
for monitoring the systematic part of process. The
confidence limits for the I 2 , Ie2 and SPE charts were
obtained from kernel density estimation of the I 2 values,
Ie2 values and SPE values of normal operating data. The
I 2 and Ie2 measures display significant deviations at
around samples 850 and 1050 while SPE measure detect
them at around sample 1050. Since the I 2 , Ie2 and SPE
Storm Event
7
6
5
1080
1085
1075
1090
10701065
1095
1060
IC-2
4
3
1055
2
1135
1130
1140 1040
1035
11001045
1050
845
880
1125
840
1030
1120
855
1025
835
11051110
1115
1020
875
830
1
0
-1
870
865
860
-2
-3
850
-14
-12
-10
-8
-6
-4
-2
0
2
4
IC-1
Fig. 13. Plot of two independent component values (samples 830–880 and samples 1020–1140) for storm events. The normal operating region (95%
and 99%) is indicated with dashed lines.
481
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
600
I
2
400
200
0
0
200
400
600
800
1000
1200
1400
200
400
600
800
1000
1200
1400
200
400
600
800
1000
1200
1400
600
Ie
2
400
200
0
0
SPE
100
50
0
0
Sample Number
Fig. 14. ICA monitoring charts: I 2 , Ie2 and SPE plots spanning the regions where the storm events (around sample 850 and 1050) occur in the
benchmark simulation.
charts produce a single statistic at each sampling time,
they detect only the existence and time of occurrence of
out of control situations.
To better detect abnormalities in the process, contribution plots were utilized in conjunction with the
multivariate charts. As discussed above, contribution
plots are a fast way to investigate an event and to isolate
the variables causing the deviation. Fig. 15 displays the
variables responsible for the deviation in the I 2 measure,
Ie2 measure and SPE measure at sample 855. The analysis
of the contributions to the I 2 and Ie2 measures during the
storm events indicates that variable 2 ðQin Þ, variable 3
Contribution to SPE Contribution to Ie2
Contribution to I2
Sample 855
10
5
0
-5
-10
1
2
3
4
5
6
7
5
6
7
5
6
7
Variables
10
5
0
-5
1
2
3
4
Variables
2
1
0
-1
-2
1
2
3
4
Variables
Fig. 15. Variables contributing to the deviation in I 2 , Ie2 and SPE at sample 855.
482
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
ðTSS4 Þ, and variable 7 ðSNO;2 Þ are primarily responsible
for these deviations. This result can also be identified by
inspection of Fig. 12. From these contribution plots, an
Ôout-of-control’ situation is identified when the contributions of some variables are larger than anticipated.
The identification of the variables which have experienced the greatest change, in conjunction with the expert
knowledge of the production engineer and operator,
makes it possible to relate a particular sequence of
changes to a particular process malfunction. This information can provide sufficient information to allow
operational personnel to narrow down the potential
causes of the process problem.
In the case of external disturbances such as storm
events, conventional PCA with two PCs also gives good
monitoring results.
Table 2
Variance captured by the PCA model
PC number
Eigenvalue of
covðX Þ
% Variance
captured this PC
% Variance
captured total
1
2
3
4
5
6
7
4.600
1.630
0.479
0.149
0.077
0.055
0.006
65.71
23.34
6.85
2.13
1.10
0.79
0.09
65.71
89.05
95.89
98.02
99.12
99.91
100.00
chart successfully detects the internal disturbance. The
I 2 measure increases rapidly around sample 288 and
reveals a diurnal variation, which indicates the detection
of successive faults. Furthermore, the pattern of internal
disturbance (step + linear) is well reflected in the I 2
chart.
Fig. 19 shows the score plot for PC1 and PC2 and the
ICs plot for IC1 and IC2 from sample 270 to 370. In the
PCs plot, most samples are within the normal operating
region despite the onset of the fault at sample 288, indicating that PCA fails to detect the small internal disturbance. In contrast, samples affected by the internal
disturbance are easily detected in the ICs plot. At these
affected samples, the ICs escape from their confidence
bounds, indicating that the internal disturbance has
distorted the internal mutual relations between the
variables and thus the process is not in the normal operation mode. The process does not return to normal
operation mode within the remaining time of the test
period. Fig. 20 shows the contribution plots to the I 2
measure at sample 320. From the contribution plot for
the I 2 value, we can conclude that internal variables,
variables 4 ðSO;3 Þ and 5 ðSO;4 Þ, make the largest contribution to the I 2 statistic. If it were necessary to avoid the
influence of single large variable we could construct the
contribution plots using the mean contribution over
some period.
4.2.2. Internal disturbance (nitrification rate decrease)
The internal disturbance was imposed by decreasing
the nitrification rate in the biological reactor through a
decrease in the specific growth rate of the autotrophs
ðlA Þ. The autotrophic growth rate at sample 288 was
decreased rapidly from 0.5 to 0.4 day1 and then linearly
decreased from 0.4 to 0.2 day1 until sample 480, as illustrated in Fig. 16.
The PCA model is able to capture most of the variability of the X -block in two PCs, as shown in Table 2.
However, it is clear from the T 2 and SPE charts shown
in Fig. 17 that the PCA method with two PCs cannot
detect the internal disturbance because the periodic and
non-Gaussian features of the wastewater plant dominate. In the SPE chart of PCA, the trace appears to start
increasing after observation 300. However, despite the
presence of the fault, most samples are below the confidence limit, giving the process operator an incorrect
picture of the process status. In contrast to the PCA
result, the I 2 , Ie2 and SPE charts of ICA monitoring for
99% confidence limits, given in Fig. 18, show that the I 2
0.7
Nitrification rate
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
100
200
300
400
500
600
700
Sample Number
Fig. 16. The form of the decrease in nitrification rate in the biological reactor (internal disturbance).
483
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
15
T2
10
5
0
0
200
400
0
200
400
600
800
1000
1200
1400
600
800
1000
1200
1400
6
5
SPE
4
3
2
1
0
Sample Number
Fig. 17. PCA monitoring charts: T 2 and SPE plots spanning the region of deteriorating nitrification in the benchmark simulation.
200
I2
150
100
50
0
0
200
400
600
800
1000
1200
1400
0
200
400
600
800
1000
1200
1400
0
200
400
600
800
1000
1200
1400
60
Ie 2
40
20
0
40
SPE
30
20
10
0
Sample Number
Fig. 18. ICA monitoring charts: I 2 , Ie2 and SPE plots spanning the region of deteriorating nitrification in the benchmark simulation.
5. Conclusions
This paper proposes a new approach to process
monitoring that uses ICA to achieve multivariate statistical process control. The approach provides a new
statistic, the I 2 statistic, to describe the state of data.
This statistic is an alternative to Hotelling’s T 2 statistic
used in PCA. In addition, methods for the calculation of
the confidence limits, ordering and dimension reduction
of the independent components are described, along
with the use of contribution plots displaying the relative
contributions of the different variables. The proposed
strategy was shown to be able to detect and isolate the
effect of multivariate disturbances. ICA monitoring
484
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
(b)
IC-2
PC-2
(a)
PCs plot
ICs plot
Fig. 19. Plot of two independent component values for samples 270–370 for the system in which the nitrification rate is decreased from sample 288.
The normal operating region (95% and 99%) is indicated with dashed lines.
Sample 320
2.5
2
Contribution to I2
1.5
1
0.5
0
-0.5
-1
1
2
3
4
5
6
7
Variables
Fig. 20. Variables contributing to the deviation in I 2 at sample 320.
gives more sophisticated results than the conventional
method using PCA because ICA imposes statistical independence on the individual components up to more
than second order and has no orthogonality constraint.
The ICA monitoring method was applied to fault detection and diagnosis in both a simple multivariate
process and the simulation benchmark of the biological
WWTP, which is characterized by a variety of fault
sources with non-Gaussian characteristics. In both case
studies, the proposed method showed better monitoring
performance than PCA, especially when the latent
variables had non-Gaussian distributions.
A number of factors deserve special consideration
when using ICA for process monitoring. One is computational load. The computational requirements of
ICA seem large in comparison to PCA. However, once
the control limit of the normal operating data has been
determined, the on-line monitoring procedures of ICA
are simple because the demixing matrix (W) has already
been determined in the modeling procedure. Another
J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485
issue that should be considered in ICA monitoring is
related to the kernel density estimation. In the present
work we have assumed that the underlying distribution
of normal operating data does not change over time.
However, this underlying distribution can potentially
change. In fact, a small change in the distribution will
have a negligible effect on the control limits of I 2 , Ie2 and
SPE. When the underlying distribution changes substantially, the ICA model should be updated. Adaptive
ICA monitoring should be investigated in future research.
Acknowledgement
This work was supported by a grant (no. R01-2002000-00007-0) from Korea Science & Engineering
Foundation.
References
[1] J.-F. Cardoso, Blind signal separation: statistical principles, Proc.
IEEE 86 (10) (1998) 2009–2025.
[2] P. Comon, Independent component analysis, a new concept,
Signal Process. 36 (1994) 287–314.
[3] P. Nomikos, J.F. MacGregor, Monitoring batch processes using
multiway principal component analysis, AIChE J. 40 (8) (1994)
1361–1375.
[4] W. Ku, R.H. Storer, C. Georgakis, Disturbance detection and
isolation by dynamic principal component analysis, Chemom.
Intell. Lab. Syst. 30 (1995) 179–196.
[5] B.M. Wise, N.B. Gallagher, The process chemometrics approach
to process monitoring and fault detection, J. Process Control 6 (6)
(1996) 329–348.
[6] D. Dong, T.J. McAvoy, Nonlinear principal component analysisbased on principal curves and neural networks, Comp. Chem.
Eng. 20 (1) (1996) 65–78.
[7] B.R. Bakshi, Multiscale PCA with application to multivariate
statistical process monitoring, AICHE J. 44 (7) (1998) 1596–1610.
[8] C. Rosen, G. Olsson, Disturbance detection in wastewater
treatment plants, Water Sci. Technol. 37 (12) (1998) 197–205.
[9] P. Teppola, Multivariate process monitoring of sequential process
data––a chemometric approach, Ph.D. thesis, Lappeenranta Univ.
of Tech., Finland, 1999.
[10] W. Li, H. Yue, S.V. Cervantes, J. Qin, Recursive PCA for adaptive
process monitoring, J. Process Control 10 (2000) 471–486.
[11] A. Hyv€
arinen, E. Oja, Independent component analysis: algorithms and applications, Neural Networks 13 (4–5) (2000) 411–430.
[12] T. Lee, Independent Component Analysis: Theory and Applications, Kluwer Academic Publishers, Boston, USA, 1998.
[13] J.E. Jackson, G.S. Mudholkar, Control procedures for residuals
associated with principal component analysis, Technometrics 21
(1979) 341–349.
[14] R.N. Vig
ario, Extraction of ocular artifacts from EEG using
independent component analysis, Electroencephal. Clinical Neurophysiol. 103 (1997) 395–404.
485
[15] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical
Learning, Springer, New York, USA, 2001.
[16] A. Hyv€arinen, New approximations of differential entropy for
independent component analysis and projection pursuit, Adv.
Neural Inform. Process. Syst. 10 (1998) 273–279.
[17] A. Hyv€arinen, Fast and robust fixed-point algorithms for independent component analysis, IEEE Trans. Neural Networks 10
(1999) 626–634.
[18] A. Hyv€arinen, Survey on independent component analysis,
Neural Comput. Surveys 2 (1999) 94–128.
[19] A. Hyv€arinen, J. Karhunen, E. Oja, Independent Component
Analysis, John Wiley & Sons, Inc, New York, USA, 2001.
[20] R.F. Li, X.Z. Wang, Dimension reduction of process dynamic
trends using independent component analysis, Comp. Chem. Eng.
26 (2002) 467–473.
[21] Y.M. Cheung, L. Xu, An empirical method to select dominant
independent components in ICA time series analysis, Proc. Int.
Joint Conf. Neural Networks (1999) 3883–3887.
[22] A.D. Back, A.S. Weigend, A first application of independent
component analysis to extracting structure from stock returns,
Int. J. Neural Sys. 8 (4) (1997) 473–484.
[23] J.-F. Cardoso, A. Soulomica, Blind beamforming for nonGaussian signals, IEEE Proc. F 140 (6) (1993) 362–370.
[24] Y.M. Cheung, L. Xu, Independent component ordering in ICA
time series analysis, Neurocomputing 41 (2001) 145–152.
[25] A. Simoglou, E.B. Martin, A.J. Morris, Statistical performance
monitoring of dynamic multivariate process using state space
modeling, Comp. Chem. Eng. 26 (6) (2002) 909–920.
[26] E.B. Martin, A.J. Morris, Non-parametric confidence bounds for
process performance monitoring charts, J. Process Control 6 (6)
(1996) 349–358.
[27] Q. Chen, R.J. Wynne, P. Goulding, D. Sandoz, The application of
principal component analysis and kernel density estimation to
enhance process monitoring, Control Eng. Pract. 8 (2000) 531–
543.
[28] B.W. Silverman, Density Estimation for Statistics and Data
Analysis, Chapman & Hall, UK, 1986.
[29] M.P. Wand, M.C. Jones, Kernel Smoothing, Chapman & Hall,
London, UK, 1995.
[30] J.F. MacGregor, C. Jaeckle, C. Kiparissides, M. Koutoudi,
Process monitoring and diagnosis by multiblock PLS methods,
AIChE J. 40 (5) (1994) 826–838.
[31] C. Rosen, Monitoring wastewater treatment systems, MS Thesis,
Lund Univ., Sweden, 1998.
[32] J.A. Westerhuis, S.P. Gurden, A.K. Smilde, Generalized contribution plots in multivariate statistical process monitoring, Chemom. Intell. Lab. Syst. 51 (2000) 95–114.
[33] P. Teppola, S.-P. Mujunen, P. Minkkinen, T. Puijola, P. Pursiheimo, Principal component analysis, contribution plots and feature
weights in the monitoring of sequential process data from a paper
machine’s wet end, Chemom. Intell. Lab. Syst. 44 (1998) 307–
317.
[34] C.K. Yoo, Process monitoring and control of biological wastewater treatment process, Ph.D. Thesis, POSTECH, Korea, 2002.
[35] M.N. Pons, H. Spanjers, U. Jeppsson, Towards a benchmark for
evaluating control strategies in wastewater treatment plants by
simulation, Escape 9, Budapest, 1999.
[36] C.K. Yoo, S.W. Choi, I.-B. Lee, Dynamic monitoring method for
multiscale fault detection and diagnosis in MSPC, Ind. Eng.
Chem. Res. 41 (2002) 4303–4317.