Academia.eduAcademia.edu

Statistical process monitoring with independent component analysis

2004, Journal of Process Control

In this paper we propose a new statistical method for process monitoring that uses independent component analysis (ICA). ICA is a recently developed method in which the goal is to decompose observed data into linear combinations of statistically independent components . Such a representation has been shown to capture the essential structure of the data in many applications, including signal separation and feature extraction. The basic idea of our approach is to use ICA to extract the essential independent components that drive a process and to combine them with process monitoring techniques. I 2 , I 2 e and SPE charts are proposed as on-line monitoring charts and contribution plots of these statistical quantities are also considered for fault identification. The proposed monitoring method was applied to fault detection and identification in both a simple multivariate process and the simulation benchmark of the biological wastewater treatment process, which is characterized by a variety of fault sources with non-Gaussian characteristics. The simulation results clearly show the power and advantages of ICA monitoring in comparison to PCA monitoring.

Journal of Process Control 14 (2004) 467–485 www.elsevier.com/locate/jprocont Statistical process monitoring with independent component analysis Jong-Min Lee a, ChangKyoo Yoo a b,1 , In-Beum Lee a,* Department of Chemical Engineering, Pohang University of Science and Technology, San 31 Hyoja-Dong, Pohang 790-784, South Korea b BIOMATH, Ghent University, Coupure Links 653, B-9000 Gent, Belgium Received 17 January 2003; received in revised form 5 August 2003; accepted 9 September 2003 Abstract In this paper we propose a new statistical method for process monitoring that uses independent component analysis (ICA). ICA is a recently developed method in which the goal is to decompose observed data into linear combinations of statistically independent components [1,2]. Such a representation has been shown to capture the essential structure of the data in many applications, including signal separation and feature extraction. The basic idea of our approach is to use ICA to extract the essential independent components that drive a process and to combine them with process monitoring techniques. I 2 , Ie2 and SPE charts are proposed as on-line monitoring charts and contribution plots of these statistical quantities are also considered for fault identification. The proposed monitoring method was applied to fault detection and identification in both a simple multivariate process and the simulation benchmark of the biological wastewater treatment process, which is characterized by a variety of fault sources with non-Gaussian characteristics. The simulation results clearly show the power and advantages of ICA monitoring in comparison to PCA monitoring.  2003 Elsevier Ltd. All rights reserved. Keywords: Process monitoring; Fault detection; Independent component analysis; Kernel density estimation; Wastewater treatment process 1. Introduction Monitoring and diagnosis are gaining importance in process system engineering due to the increased number of variables measured in chemical and biological plants and improvements in the controllability of these variables. An important aspect of the safe operation of chemical processes is the rapid detection of faults, process upsets, or other special events, and the location and removal of the factors causing such events. However, hundreds of variables may be monitored in a single operating unit, and these variables may be recorded hundreds or thousands of times per day. In the absence of an appropriate processing method, only limited information can be extracted from these data. Hence, a tool is required that can project the high-dimensional process space into a low-dimensional space amenable to direct visualization, and that can also identify key variables and important features of the data. The need to analyze high-dimensional and correlated process * Corresponding author. Tel.: +82-54-279-2274; fax: +82-54-2793499. E-mail address: [email protected] (I.-B. Lee). 1 Tel.: +32-9-264-6196; fax: +32-9-264-6220. 0959-1524/$ - see front matter  2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.jprocont.2003.09.004 data has led to the development of many monitoring schemes that use multivariate statistical methods based on principal component analysis (PCA) and partial least squares. These methods have been used and extended in various applications [3–10]. It is well known that many of the variables monitored in process systems are not independent. The measured process variables may be combinations of independent variables that are not directly measurable (referred to as latent variables in multivariate analysis). Independent component analysis (ICA) can extract these underlying factors or components from multivariate statistical data. ICA defines a generative model for the observed multivariate data, which are typically in the form of a large database of samples. In this model, the data variables are assumed to be linear mixtures of some unknown latent variables, where the mixing matrix of coefficients is also unknown. The latent variables, which are called the independent components (ICs) of the observed data, are assumed to be non-Gaussian and mutually independent. ICA seeks to extract these independent components as well as the mixing matrix of coefficients [11]. Although ICA can be looked upon a useful extension of PCA, its objective differs from that of PCA. PCA is a dimensionality reduction technique that reduces the 468 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 data dimension by projecting the correlated variables onto a smaller set of new variables that are uncorrelated and retain most of the original variance. However, its objective is only to decorrelate variables, not to make them independent. PCA can only impose independence up to second order statistics information (mean and variance) while constraining the direction vectors to be orthogonal, whereas ICA has no orthogonality constraint and involves higher-order statistics, i.e., not only decorrelates the data (second order statistics) but also reduces higher order statistical dependencies [12]. Hence, ICs reveal more useful information from observed data than principal components (PCs). 1.1. Motivating examples In this paper, scalars are written in italic lower case, vectors are written in bold lower case and matrices are written in bold capitals. To illustrate the superiority of ICA over PCA, we applied the two types of analysis to a simple example system, similar to that used by Hyv€ arinen and Oja [11] and Lee [12] except that a modified mixing matrix was used in the present work. Let us consider two source variables that have the uniform distributions shown in Fig. 1(a). The source variables are linearly independent, i.e., the values of one source variable do not convey any information about the other source variable. These sources are linearly mixed as follows: x ¼ As;    x1 1 ¼ x2 1 2 ð1Þ 3 1   s1 : s2 3 (a) 2 (b) IC1 PC1 1 X2 S2 1 0 IC2 0 -1 -1 PC2 -2 -2 -2 -1 0 1 2 -3 -2 0 S1 3 2 (c) U2 U2 2. PCA monitoring (d) 2 1 0 -1 0 -2 -2 -3 2 X1 4 -2 0 U1 2 -4 -4 -2 0 Fig. 1(b) shows the scatter plot of the mixtures. Note that the random variables x1 and x2 are not independent because it is possible to predict the value of one of them from the value of the other. When PCA is applied to these mixed variables, it gives two principal components. The axes of the first and second PCs (PC1, PC2) are shown in Fig. 1(b). The first PC is the axis capturing the highest variance in the data and the second PC is the axis orthogonal to the first PC. Fig. 1(c) shows the PCA solution, which differs from the original because the two principal axes are still dependent. However, the ICA solution shown in Fig. 1(d) can recover original sources since ICA not only decorrelates the data but also rotates it such that the axes of u1 and u2 are parallel to the axes of s1 and s2 [12]. The axes of the first and second independent components (IC1, IC2) are shown in Fig. 1(b). The simple example given above clearly demonstrates that if the latent variables follow non-Gaussian distribution, the ICA solution extracts the original source signal to a much greater extent than the PCA solution. Therefore, it is natural to infer that monitoring based on the ICA solution may give better results compared to PCA. To date, little literature exists on the application of ICA techniques to the problem of process monitoring. In the present article, the continuous process monitoring method based on ICA is suggested. The basic idea of this approach is to extract essential independent components that drive a process and to combine them with process monitoring techniques. The remainder of this article is organized as follows. Conventional PCA monitoring is introduced in the next section, followed by a brief introduction to the ICA algorithm. The monitoring statistics of ICA are then suggested and an explanation is given for the kernel density estimation used to calculate the confidence limit for non-Gaussian data. The superiority of process monitoring using ICA is illustrated by applying the proposed method both to a simple multivariate process example and to the wastewater simulation benchmark. Finally, a conclusion is given. 2 4 U1 Fig. 1. (a) Scatter plot of the original source data; (b) the mixtures and axes of PCA and ICA; (c) the recovered source data using PCA; (d) the recovered source data using ICA [11,12]. PCA can handle high dimensional, noisy, and correlated data by projecting the data onto a lower dimensional subspace which contains most of the variance of the original data [5]. PCA decomposes the data matrix Xp 2 Rnd (where n is the number of samples and d is the number of variables) as the sum of the outer product of vectors ti and pi plus the residual matrix, Ep . Xp ¼ TPT þ E ¼ a X i¼1 ti pTi þ Ep ; ð2Þ 469 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 where ti is a score vector which contains information about relationship between samples, and pi is a loading vector which contains information about relationship between variables. Note that score vectors are orthogonal and loading vectors are orthonormal. Projection into principal component space reduces the original set of variables to a latent variables (LVs). The portion of the measurement space corresponding to the lowest d  a singular values can be monitored by using the squared prediction error (SPE), also called the Q statistic [13]. The SPE is defined as the sum of squares of each row (sample) of Ep ; for example, for the kth sample vector in Xp , xðkÞ 2 Rd : T T SPEðkÞ ¼ eðkÞ eðkÞ ¼ xðkÞ ðI  Pa PTa ÞxðkÞ; ð3Þ where eðkÞ is the kth sample vector of Ep , Pa is the matrix of the first a loading vectors retained in the PCA model, and I is the identity matrix. The upper confidence limit for the SPE can be computed from its approximate distribution #1=h0 " pffiffiffiffiffiffiffiffiffiffiffiffiffi ca 2H2 h20 H2 h0 ðh0  1Þ SPEa ¼ H1 ; ð4Þ þ1þ H1 H21 where ca is the standard normal deviate corresponding to the upper ð1  aÞ percentile, kj is the eigenvalue asP sociated with the jth loading vector, Hi ¼ dj¼aþ1 kij for i ¼ 1; 2; 3 and h0 ¼ 1  2H3H1 H2 3 . 2 A measure of the variation within the PCA model is given by Hotelling’s T 2 statistic. T 2 at sample k is the sum of the normalized squared scores, and is defined as T 2 ðkÞ ¼ tðkÞT K1 tðkÞ; ð5Þ 1 where K is the diagonal matrix of the inverse of the eigenvalues associated with the retained principal components. The upper confidence limit for T 2 is obtained using the F -distribution 2 Ta;n;a ¼ aðn  1Þ Fa;na;a ; na ð6Þ where n is the number of samples in the data and a is the number of principal components. The following ICA algorithm is based on the formalism presented in the survey article of Hyv€arinen and Oja [11]. In the ICA algorithm, it is assumed that d measured variables x1 ; x2 ; . . . ; xd can be expressed as linear combinations of m ð 6 dÞ unknown independent components s1 ; s2 ; . . . ; sm . The independent components and the measured variables have means of zero. The relationship between them is given by X ¼ AS þ E; ð7Þ dn where X ¼ ½xð1Þ; xð2Þ; . . . ; xðnÞ 2 R is the data matrix (in contrast to PCA, ICA employs the transposed data matrix), A ¼ ½a1 ; . . . ; am  2 Rdm is the unknown mixing matrix, S ¼ ½sð1Þ; sð2Þ; . . . ; sðnÞ 2 Rmn is the independent component matrix, E 2 Rdn is the residual matrix, and n is the number of samples. Here, we assume d P m (when d ¼ m, the residual matrix, E, becomes the zero matrix). The basic problem of ICA is to estimate both the mixing matrix A and the independent components S from only the observed data X. Alternatively, we could define the objective of ICA as follows: to find a demixing matrix W whose form is such that the rows of ^ given as the reconstructed matrix S, ^ ¼ WX S ð8Þ become as independent of each other as possible. This formulation is not really different from the previous one, since after estimating A, its inverse gives W when d equals m. From now on, we assume d equals m unless otherwise specified. For mathematical convenience, we define that the independent components have unit variance. This makes the independent components unique, up to their signs [11]. The initial step in ICA is whitening, also known as sphering, which eliminates all the cross-correlation between random variables. Consider a d-dimensional random vector xðkÞ at sample k with covariance Rx ¼ EðxðkÞxT ðkÞÞ where E represents expectations. The eigendecomposition of Rx is given by Rx ¼ UKUT : ð9Þ The whitening transformation is expressed as zðkÞ ¼ QxðkÞ; 1=2 3. ICA monitoring 3.1. Independent component analysis Independent component analysis (ICA) is a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals. ICA was originally proposed to solve the blind source separation problem, which involves recovering independent source signals (e.g., different voice, music, or noise sources) after they have been linearly mixed by an unknown matrix, A [14]. ð10Þ T U . One can easily verify that Rz ¼ where Q ¼ K EðzðkÞzT ðkÞÞ is the identity matrix under this transformation. After the transformation we have zðkÞ ¼ QxðkÞ ¼ QAsðkÞ ¼ BsðkÞ; ð11Þ where B is an orthogonal matrix as verified by the following relation: EfzðkÞzT ðkÞg ¼ BEfsðkÞsT ðkÞgBT ¼ BBT ¼ I: ð12Þ We have therefore reduced the problem of finding an arbitrary full-rank matrix A to the simpler problem of finding an orthogonal matrix B since B has fewer parameters to estimate as a result of the orthogonality 470 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 constraint. Then, from Eq. (11), we can estimate sðkÞ as follows ^sðkÞ ¼ BT zðkÞ ¼ BT QxðkÞ: ð13Þ From Eqs. (8) and (13), the relation between W and B can be expressed as W ¼ BT Q: ð14Þ To calculate B, each column vector bi is initialized and then updated so that ith independent component ^si ¼ ðbi ÞT z may have great non-Gaussianity. Hyv€arinen and Oja [11] showed that Ônon-Gaussian represents independence’ using the central limit theorem. There are two common measures of non-Gaussianity: kurtosis and negentropy. Kurtosis is sensitive to outliers. On the other hand, negentropy is based on the informationtheoretic quantity of (differential) entropy. Entropy is a measure of the average uncertainty in a random variable and the differential entropy H of random variable y with density f ðyÞ is defined as Z H ðyÞ ¼  f ðyÞ log f ðyÞ dy: ð15Þ A Gaussian variable has maximum entropy among all random variables with equal variance [11]. In order to obtain a measure of non-Gaussianity that is zero for a Gaussian variable, the negentropy J is defined as follows: J ðyÞ ¼ H ðygauss Þ  H ðyÞ; ð16Þ where ygauss is a Gaussian random variable with the same variance as y. Negentropy is nonnegative and measures the departure of y from Gaussianity [15]. However, estimating negentropy using Eq. (16) would require an estimate of the probability density function. To estimate negentropy efficiently, Hyv€ arinen and Oja [11] suggested simpler approximations of negentropy as follows: 2 J ðyÞ  ½EfGðyÞg  EfGðvÞg ; ð17Þ where y is assumed to be of zero mean and unit variance, v is a Gaussian variable of zero mean and unit variance, and G is any non-quadratic function. By choosing G wisely, one obtains good approximations of negentropy. Hyv€ arinen and Oja [11] suggested a number of functions for G: G1 ðuÞ ¼ 1 log coshða1 uÞ; a1 ð18Þ G2 ðuÞ ¼ expða2 u2 =2Þ; ð19Þ G3 ðuÞ ¼ u4 ; ð20Þ where 1 6 a1 6 2 and a2  1. Among these three functions, G1 is a good general-purpose contrast function and was therefore selected for use in the present study. The non-quadratic function G is described in detail in the paper of Hyv€ arinen [16]. Based on approximate form for the negentropy, Hyv€arinen [17] introduced a very simple and highly efficient fixed-point algorithm for ICA, calculated over sphered zero-mean vectors z. This algorithm calculates one column of the matrix B and allows the identification of one independent component; the corresponding IC can then be found using Eq. (13). The algorithm is repeated to calculate each independent component. The algorithm is as follows: 1. Choose m, the number of ICs to estimate. Set counter i 1. 2. Take a random initial vector bi of unit norm. 3. Let bi EfzgðbTi zÞg  Efg0 ðbTi zÞgbi , where g is the first derivative and g0 is the second derivative of G, where G takes the form of Eqs. (18), (19) or (20). 4. Do following orthogonalization: bi bi  Pi1 the T j¼1 ðbi bj Þbj . bi . 5. Normalize bi kbi k 6. If bi has not converged, go back to step 3. 7. If bi has converged, output the vector bi . Then, If i 6 m set i i þ 1 and go back to step 2. Note that the final vector bi ði ¼ 1; . . . ; mÞ given by the algorithm equals one of the columns of the (orthogonal) mixing matrix B. After calculating B, we can obtain ^sðkÞ and demixing matrix W from Eqs. (13) and (14), respectively. For more details on the FastICA algorithm, see Hyv€arinen and Oja [11], Hyv€arinen [17,18], Hyv€arinen et al. [19], and Li and Wang [20]. 3.2. Ordering and dimension reduction of ICA In chemical and biological processes, the measured variables are quantitative (e.g., temperature, pressure, and flow rate) and qualitative (e.g., key component concentration). Dimension reduction in ICA is based on the idea that these measured variables are the mixture of some independent variables [21]. An important part of ICA monitoring is the selection of a small number of dominant components from the list of all independent components. This procedure has at least two advantages [20]: 1. Robust performance: The dominant components reveal the majority of information about the stochastic mechanism that gives rise to the observed series. The model built based on these components will have robust performance in ICA monitoring, but without considering trivial details. 2. Reduction of analysis complexity: To gain a good understanding of the mechanism behind the ICA monitoring sometimes entails the interpretation of the physical meaning of the independent components, which is a nontrivial task. Concentrating on the dominant components facilitates this analysis. J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 One approach to choosing the dominant components is to separate the selection process into two steps: Step 1 List all the independent components in the appropriate order. Step 2 Select the first few components in the list as the dominant ones. In PCA, the order of the score vectors is determined by their variance. Therefore, data dimension can be reduced by selecting dominant score vectors. However, the ordering of components is very difficult in ICA and there is no standard criterion. A number of methods have been suggested to determine the component order. For example, the components can be sorted according to their non-Gaussianity [18]. Alternatively, Back and Weigend [22] decided the component order according to the L1 norm of each individual component. Cardoso and Souloumica [23] used a Euclidean norm to sort the rows of the demixing matrix W according to their contribution across all signals. Other criteria such as the variance or data reconstruction criterion have also been suggested to decide the order of independent components [21,24]. However, these methods are based only on the mean squared error (MSE) data reconstruction criterion and the computation becomes prohibitively complex when the number of variables is large. In the present study we used a Euclidean norm ðL2 Þ to sort the rows of the demixing matrix, W, because this method is very simple and gives good results in ICA monitoring. Hence, the order of the ICs is decided by the L2 norm of each wi , the row of W [23]: argi Maxkwi k2 . That is, the ICs are sorted using an L2 norm in order to show only those ICs that cause dominant changes in the process. After the ordering of the ICs, it is important to select the optimal number of ICs in order to achieve good monitoring and prediction; selecting too many ICs will cause a magnification of noise and poor process monitoring performance. The data dimension can be reduced by selecting a few rows of W based upon the assumption that the rows with the largest sum of squares coefficient have the greatest effect on the variation of S. This approach is based on the idea that the dominant process variation can be monitored by considering the cumulative sums of only the first few dominant ICs [24]. We used a graphical technique to determine the number of ICs similar to the SCREE test of PCA. Fig. 2 gives a representative plot of the percentage of the L2 norm of the sorted demixing matrix (W) against the IC number. The sorted demixing matrix is obtained from the normal operating data of the wastewater treatment process (WWTP, Section 4.2). Note that the L2 norms of last four ICs are much smaller than the rest, indicating a break of some kind between the first three ICs and the remaining four. The model constructed based on the ICs in Fig. 2 would include three ICs. 3.3. Process monitoring statistics with ICA On-line monitoring of measurement variables is carried out with the aim of continuously analyzing and interpreting the measurements in order to detect and isolate disturbances and faults. The implementations of the monitoring statistics of ICA are similar to those of the monitoring statistics of PCA. The ICA model is 50 45 % L2 norm of each row of W 40 35 30 25 20 15 10 5 0 1 2 3 471 4 5 6 7 Number of IC Fig. 2. Plot of percent L2 norm of each row of W against IC number. 472 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 based on historical data collected during normal operation, i.e., when product is being manufactured and only common cause variation is present. Future process behavior is then compared against this Ônormal’ or Ôincontrol’ representation. In the normal operating condition, designated Xnormal , ^ normal are obtained from the FastICA alW as well as S ^ gorithm ðSnormal ¼ WXnormal Þ under the assumption that the number of variables is equal to the number of independent components. The matrices B, Q, and A used in Eq. (11) are also obtained by whitening and the FastICA algorithm. As mentioned in the previous section, the data dimension can be reduced by selecting a few rows of W based upon the assumption that the rows with the largest sum of squares coefficient have the ^ The selected a rows greatest effect on the variation of S. of W constitute a reduced matrix Wd (dominant part of W), and the remaining rows of W constitute a reduced matrix We (excluded part of W). We can construct a reduced matrix Bd by selecting the columns from B whose indices correspond to the indices of the rows selected from W. Bd can also be computed directly using Eq. (14), i.e., Bd ¼ ðWd Q1 ÞT . The remaining columns of B constitute the matrix Be . Then, new independent data vectors, ^snew d ðkÞ and ^snew e ðkÞ, can be obtained if new data for sample k, xnew ðkÞ, is transformed through the demixing matrices Wd and We , i.e., ^snew d ðkÞ ¼ Wd xnew ðkÞ and ^snew e ðkÞ ¼ We xnew ðkÞ, respectively. In PCA, two types of statistics are calculated from the process model in normal operation: the D-statistic for the systematic part of the process variation and the Qstatistic for the residual part of the process variation. Similarly, these statistics can be applied to ICA monitoring. The D-statistic for sample k, also known as the I 2 statistic, is the sum of the squared independent scores and is defined as follows: T I 2 ðkÞ ¼ ^snew d ðkÞ ^snew d ðkÞ: ð21Þ The Q-statistic for the nonsystematic part of the common cause variation of new data, also known as the SPE statistic, can be visualized in a chart with confidence limits. The SPE statistic at sample k is defined as follows: T SPEðkÞ ¼ eðkÞ eðkÞ T ^ðkÞÞ ðxðkÞ  x ^ðkÞÞ; ¼ ðxðkÞ  x ð22Þ ^ðkÞ can be calculated as follows: where x ^ ¼ Q1 Bd^sðkÞ ¼ Q1 Bd Wd xðkÞ: x ð23Þ Simoglou et al. [25] proposed a second T 2 statistic for monitoring the state of the system, which is based on the excluded canonical variate analysis (CVA) states. Taking a similar approach, we propose a second I 2 metrics ðIe2 Þ based on d  a excluded independent components ð^snew e ðkÞÞ. Monitoring the non-systematic part of the measurements provides an additional fault detection tool, which can detect special events entering the system. The Ie2 statistic has the further advantage that it can compensate for the error that results when an incorrect number of ICs is selected for the dominant part. The use of I 2 and Ie2 statistics allows the entire space spanned by the original variables to be monitored through a new basis. The Ie2 statistic is defined as follows: Ie2 ðkÞ ¼ ^snew e ðkÞT^snew e ðkÞ: ð24Þ When a data-driven process monitoring technique is executed, we assume that normal operating data conform to some fixed distribution. Hence, we need to specify the distribution of normal operating data and find its control limit before monitoring new data on-line. In PCA monitoring, the confidence limit is based on a specified distribution such as those shown in Eqs. (4) and (6) based upon the assumption that the latent variables follow a Gaussian distribution. In ICA monitoring, however, the independent components over some period do not conform to a multivariate Gaussian distribution; hence, the confidence limits of the I 2 , Ie2 and SPE statistics cannot be determined directly from a particular approximate distribution. Thus, we need to find an alternative method. The confidence limits of the three statistics, I 2 , Ie2 and SPE, can be obtained by kernel density estimation, which will be explained in the next section. 3.4. Confidence bounds Once a model has been developed that reflects the normal operation region, it is necessary to detect any departure of the d-dimensional process from its standard behavior. That is, we must calculate the limit value to determine whether the process is in control or not. In PCA monitoring, Hotelling’s T 2 analysis and the SPE charts are effective tools for extracting the critical features of the data. These analyses are based on the assumption that the probability density functions of the latent variables follow a multivariate Gaussian distribution. However, contrary to this assumption, Martin and Morris [26] reported that the latent variables in many industrial processes rarely have a multivariate Gaussian distribution through tests for multivariate normality on the scores. Hence, the use of Hotelling’s T 2 analysis and the SPE chart may be inaccurate and misleading [26]. An alternative approach to defining the nominal operating regions is to use data-driven techniques such as non-parametric empirical density estimates using kernel extraction [26,27]. Note that the latent variables in ICA monitoring do not follow a Gaussian distribution; hence, the confidence limit of I 2 and SPE statistics cannot be determined directly from a particular approximate distribution. We therefore use the kernel density estimation in calculating J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 the confidence limit of I 2 , Ie2 and SPE statistics of ICA monitoring. A univariate kernel estimator with kernel K is defined by n nx  x o 1 X i ; ð25Þ K f^ðxÞ ¼ nh i¼1 h where x is the data point under consideration, xi is an observation value from the data set, h is the window width (also known as the smoothing parameter), n is the number of observations, and K is the kernel function. The kernel estimator is therefore a sum of Ôbumps’ located at the observations. The kernel function K determines the shape of the bumps and satisfies the condition Z 1 KðxÞ dx ¼ 1: ð26Þ 1 There are a number of possible kernel functions. In practice, the form of the kernel function is not very important, and the Gaussian kernel function is the most commonly used [28]. The Gaussian kernel is also employed in the present study. In ICA monitoring, two-dimensional plots of scores in the ICs plane are used to define the nominal operating condition. In this case the kernel estimator is defined by   n 1 X x  xi y  yi ^ f ðx; yÞ ¼ K ; ; ð27Þ nh1 h2 i¼1 h1 h2 where h1 and h2 are the smoothing parameters, x and y are coordinates in the plane formed from two independent components, and xi and yi are the independent component coordinates in the normal operating condition. Many measures have been proposed for the estimation of h, the window width or smoothing parameter. The problem of choosing how much to smooth is of crucial importance in density estimation. If h is too large we ‘‘oversmooth’’, erasing detail. If h is to small we ‘‘undersmooth’’, and fail to filter out spurious detail. Several methods exist that automatically choose an optimal (in some sense) value of h, although a subjective choice is often equally valid. One simplistic method for automatically choosing h is to assume some underlying distribution, for example the standard normal density, and then estimate a smoothing parameter based upon this assumption. However, if this assumption is not valid, ‘‘oversmoothing’’ often results [29]. The more advanced methods for selecting h are based on crossvalidation, for example least squares cross-validation (LSCV) and biased cross-validation [28]. Here, we use LSCV to select h. For more details regarding kernel density estimation, refer to the books of Silverman [28] and Wand and Jones [29]. The control limits used in ICA monitoring charts can be obtained using kernel density estimation as follows. First, the I 2 , Ie2 or SPE values from normal operating 473 data are required. Then, the univariate kernel density estimator is used to estimate the density function of the normal I 2 , Ie2 or SPE values. The point, occupying the 99% area of density function, can be obtained and becomes the control limit of normal operating data (I 2 , Ie2 or SPE values). One major advantage of the confidence region obtained using kernel density estimation is that it follows the data more closely, and is less likely to incorporate regions of unknown operation, than confidence regions obtained on the basis of Hotelling’s T 2 statistics [26]. 3.5. Contribution plots In the previous section it was stated that process faults are detected by computing three multivariate control charts. However, the monitoring charts do not detect a particular fault, they simply indicate the presence of a variation in the process that is not included within the common-cause variations captured in the NOC data. Such anomalous variations usually correlate with a problem in the process. The monitoring charts give no information on what is wrong with the process, or which process variables caused the process to be out of control. Once a fault is detected by the statistical monitoring method, the key approach to fault isolation using the ICA model is the use of contribution plots. By interrogating the underlying process model at the point where an event has been detected, contribution plots may reveal the group of process variables that most influence the model or the residuals [30–33]. Let us now consider the T 2 and I 2 statistics of PCA and ICA. Large score values result in large values of T 2 and I 2 which are detected and the corresponding object isolated. However, variable loadings do not provide a way to identify the variables that contribute to large T 2 and I 2 values. It is important to understand that the loadings indicate how the variables are correlated in the model given by a set of calibration objects. On the other hand, an individual object can deviate significantly from the bulk of the calibration objects. If there are many variables, say, tens of variables, it is not easy to identify those variables that contributed the large T 2 and I 2 values. Therefore, it is necessary to introduce the concept of variable contribution. In PCA, the variable contributions to the T 2 -value of an object k are computed using the following equation [33]: Variable contribution for object ðPCAÞ pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi ¼ tðkÞ K1 PT ¼ xðkÞP K1 PT ; ð28Þ where K is a diagonal matrix which has diagonal elements equal to eigenvalues. 474 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 Training procedure Data scaling and whitening Obtain ICA model from normal operation data Develop the confidence limits of I2, Ie2 and SPE charts using kernel density estimation On-line monitoring procedure For new data, calculate I2, Ie2 and SPE values No I2 value> Confidence Limit ? Yes Fault in deterministic part of ICA. Diagnose the faults. Compare variable contribution. No No Ie2 value> Confidence Limit ? SPE value > Confidence Limit ? Yes Yes Fault in excluded part of ICA. Diagnose the faults. Compare variable contribution Fault in residual space. Diagnose the faults. Compare variable contribution Inform the operator and deal with the detected faults. Fig. 3. Process monitoring scheme of the proposed ICA method. In ICA, the variable contributions of xðkÞ for I 2 ðkÞ and Ie2 ðkÞ can be obtained using the following equations, respectively. xcd ðkÞ ¼ Q1 Bd^snew d ðkÞ k^snew d ðkÞk; kQ1 Bd^snew d ðkÞk ð29Þ xce ðkÞ ¼ Q1 Be^snew e ðkÞ k^snew e ðkÞk: kQ1 Be^snew e ðkÞk ð30Þ Eqs. (29) and (30) are generated based on the fact that the sum of the squared variable contribution values should equal the I 2 and Ie2 values, respectively, i.e., xcd ðkÞT xcd ðkÞ ¼ I 2 ðkÞ and xce ðkÞT xce ðkÞ ¼ Ie2 ðkÞ. Similarly, variable contributions can also be computed for the SPE statistic, i.e., the variable contribution of the residuals. Generally, the aberrant variables will have the largest residuals. The residual at sample k, SPEðkÞ, is defined as the sum of the squares of eðkÞ. Thus, the vector eðkÞ contains information on the individual prediction errors of each process variable at sample k. By plotting eðkÞ as a bar graph, the contributions to SPEðkÞ can be viewed. The relative size of the bars indicates the contribution from each variable to the prediction error, or the lack of fit of a sample to the model. In some situations, especially those with varying SPE statistic, it is a good idea to use a mean of the contribution to SPE over some period. This is done by replacing the vector eðkÞ by a vector expressing the mean error over a period of length l. Our proposed strategy is depicted in Fig. 3. First, the calibration model is built by ICA and kernel density estimation and then the system is monitored through two steps. As for conventional PCA monitoring, if the I 2 statistic exceeds the limit, it indicates that a process change in the model space has occurred, if the Ie2 statistic exceeds the limit, it indicates that a process change in the excluded model space has occurred, and if the Q-statistic of residual space exceeds the confidence interval, it indicates the occurrence of changes that violate the ICA model. We use contribution plots to identify and isolate the nature of process faults. 4. Illustrative examples (comparison ICA with PCA) PCA uses only the information contained in the covariance matrix of the data vector x, whereas ICA uses information on the distribution of x that is not contained in the covariance matrix. Hence, the use of ICA for monitoring may give more sophisticated results because it uses the independent components rather than the principle components. In this paper, the FastICA 475 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 algorithm developed by Hyv€ arinen and Oja [11] was applied to the detection and diagnosis of faults during monitoring. 4.1. A simple multivariate process Let us consider the following simple multivariate process, which is a modified version of the system suggested by Ku et al. [4]. 2 3 0:118 0:191 0:287 zðkÞ ¼ 4 0:847 0:264 0:943 5zðk  1Þ 0:333 0:514 0:217 2 3 1 2 þ 4 3 4 5uðk  1Þ; ð31Þ 2 1 yðkÞ ¼ zðkÞ þ vðkÞ; ð32Þ where u is the correlated input:   0:811 0:226 uðk  1Þ uðkÞ ¼ 0:477 0:415   0:193 0:689 þ wðk  1Þ: 0:320 0:749 ð33Þ The input w is a random vector of which each element is uniformly distributed over the interval ð2; 2Þ. The output y is equal to z plus a random noise vector v. Each element of v has zero mean and a variance of 0.1. Both input u and output y are measured but z and w are not. Normal data with 200 samples are used for analysis. The T data vector for analysis consists of xðkÞ ¼ ½yT ðkÞuT ðkÞ . The total 5 variables ðy1 ; y2 ; y3 ; u1 ; u2 Þ are scaled to zero mean and unit variance to prevent less important vari- ables with large magnitudes overshadowing important variables with small magnitudes. Conventional PCA implicitly assumes that the observations at one time are statistically independent of observations at any past time. That is, it implicitly assumes that the measured variable at one time instant not only has serial independence within each variable series at past time instants but also statistical inter-independence between the different measured variable series at past time instants. However, the dynamics of a typical chemical or biological process cause the measurements to be time dependent, which means that the data may have both cross-correlation and auto-correlation. PCA methods can be extended to the modeling and monitoring of dynamic systems by augmenting each observation vector with the previous l observations [4]. Here, we consider only static PCA and static ICA, although dynamic PCA and dynamic ICA with time lagged variables can be considered. In PCA, a three principal component model is developed, which captures about 87.5% of the variance of the process. The disturbances for monitoring and diagnosis are as follows: • Disturbance 1: A step change of w1 by 3 is introduced at sample 50. • Disturbance 2: w1 is linearly increased from sample 50 to 149 by adding 0:05ðk  50Þ to the w1 value of each sample in this range, where k is the sample number. The T 2 and SPE charts for PCA monitoring of the process with disturbance 1 are shown in Fig. 4. The 99% 15 T2 10 5 0 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 6 5 SPE 4 3 2 1 0 Sample Number Fig. 4. PCA monitoring charts: T 2 and SPE plots of the data for disturbance 1 with three PCs. 476 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 60 I2 40 20 0 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Ie 2 10 5 0 SPE 15 10 5 0 Sample Number Fig. 5. ICA monitoring charts: I 2 , Ie2 and SPE plots of the data for disturbance 1 with two ICs. confidence limits are also shown. It is evident from these charts that PCA cannot detect the disturbance, and captures only the dominant randomness. However, applying ICA to the same process data gives the results presented in Fig. 5, which show relatively correct disturbance detection in comparison to PCA. The 99% confidence limits of I 2 , Ie2 and SPE are obtained from kernel density estimation of normal operating condition data. As shown in Fig. 5, I 2 exceeds the confidence limit from sample 51, a delay of only one sample. Also, the superiority of ICA monitoring over the PCA-based ap- proach is evident in the plots shown in Fig. 6. The regions of the process that are out of control are easily seen in the ICs plot, whereas most samples are within the region of normal operation in the PCs plot, despite the presence of the fault. The T 2 and SPE charts for PCA monitoring of the process with disturbance 2 are shown in Fig. 7. With PCA, the T 2 and SPE charts again fail to detect the disturbance, although a few T 2 values exceed the 99% control limit. With ICA (Fig. 8), the disturbance is well detected by the I 2 chart from about the 78th sample (a) PCs plot (b) ICs plot Fig. 6. (a) Plot of two principal component values and (b) plot of two independent component values for disturbance 1. The normal operating region (95% and 99%) is indicated with dashed lines. 477 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 15 T2 10 5 0 0 20 40 60 0 20 40 60 80 100 120 140 160 180 200 80 100 120 140 160 180 200 6 5 SPE 4 3 2 1 0 Sample Number Fig. 7. PCA monitoring charts: T 2 and SPE plots of the data for disturbance 2 with three PCs. I2 100 50 0 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 15 Ie 2 10 5 0 SPE 15 10 5 0 Sample Number Fig. 8. ICA monitoring charts: I 2 , Ie2 and SPE plots of the data for disturbance 2 with two ICs. onwards. In addition, the I 2 value rapidly decreases to the normal operating condition after the stopping of the disturbance at sample 150. Furthermore, the I 2 values increase linearly during disturbance 2, which corresponds to the characteristics of the disturbance. Hence, ICA may be used in disturbance isolation. For disturbance 2, the plots of the PCs and ICs along with normal operating region are depicted in Fig. 9. Inspection of these plots reveals that the fault does not appear in the PCs plot whereas the ICs plot successfully separates the out-of-control data from the normal operating data. 478 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 (b) IC-2 PC-2 (a) PCs plot ICs plot Fig. 9. (a) Plot of two principal component values and (b) plot of two independent component values for disturbance 2. The normal operating region (95% and 99%) is indicated with dashed lines. 4.2. Wastewater treatment process Advanced monitoring and control strategies for WWTP have attracted much recent interest as a consequence of the increasing stringency of environmental regulations. However, some specific features about this process are yet to be fully addressed. First, most changes in this biological process are slow and recovery from failures can be time-consuming and expensive, for example it can take several months for the process to recover from an abnormal operation. Therefore, early detection of developing abnormalities is especially important for this process. Secondly, most WWTPs are subject to large diurnal fluctuations in the flow rate and composition of the feed stream. Consequently, these biological processes exhibit periodic characteristics, with the values of the flow rate and composition of the feed waste stream showing strong diurnal fluctuations. Since the variables of such processes tend to fluctuate widely over a cycle, their mean and variance do not remain constant with time. Because of this, conventional mul- tivariate statistical process monitoring (MSPM) methods like PCA, which implicitly assume a stationary underlying process, may lead to numerous false alarms and missed faults. Better treatment performance can be expected from advanced monitoring and control strategies that account for the non-Gaussianity of the periodic patterns in the biological process [34]. The ICA monitoring algorithm proposed here was tested for its ability to detect various disturbances in simulated data obtained from a benchmark simulation of the WWTP. This simulation model combines nitrification with predenitrification, the most commonly used process for nitrogen removal. The activated sludge model No. 1 (ASM1) and a ten-layer settler model were used to simulate the biological reactions and the settling process, respectively. Fig. 10 shows the flow diagram of the modeled WWTP system. The plant was designed to treat an average flow of 20,000 m3 d1 with an average biodegradable COD concentration of 300 mg l1 . The plant consists of a 5-compartment bioreactor (6000 m3 ) and a secondary settler (6000 m3 ). For the sludge con- Qe, Ze Q0, Z0 Unit 1 Unit 2 Unit 3 Unit 4 m = 10 Unit 5 m=6 Qf, Zf Qa, Za Qu, Zu Qr, Zr Fig. 10. Process layout for the simulation benchmark. m=1 Qw, Zw 479 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 centration of 3 kgm3 this corresponds to a sludge load of approximately 0.33 kg BOD5 kg1 sludge day1 which is quite critical at 15 C, so the effluent composition is sensitive to the applied control strategy. The first two compartments of the bioreactor are not aerated whereas the others are aerated. All the compartments are considered to be ideally mixed whereas the secondary settler is modeled with a series of 10 layers with one dimension. For more detailed information about this benchmark, refer to the website of the COST working group (http:// www.ensic.u-nancy.fr/COSTWWTP). Influent data and operation parameters developed by a working group on the benchmarking of wastewater treatment plants, COST 624, were used in the simulation [35]. The data covered a period of two weeks. The training model was based on a normal operation period of one week of dry weather and validation data was used on the data set for the last 7 days. The sampling time was 15 min. The data used were the influent file and outputs with noises suggested by the benchmark. We selected seven variables, listed in Table 1, from among the many variables used in the benchmark to build the monitoring system. These variables were chosen because they are typically monitored and important variables in the real WWTP systems. Two types of disturbances were tested using the proposed method, external disturbances and internal disturbances. External disturbances are defined as those imposed upon the process from the outside and are detectable when monitoring the influent characteristics. Internal disturbances are caused by changes within the process that affect the process behavior. For the external disturbance, two short storm events were simulated, while deterioration of nitrification was simulated to study an internal disturbance [34,36]. The density estimate and normal probability plot of the second score vector ðt2 Þ calculated by applying PCA to the simulation benchmark of the normal operating data (Fig. 11(a) and (b)) make it clear that the values of t2 do not follow a Gaussian distribution. Thus, the calculation of T 2 and SPE charts based on the assumption that the data are Gaussian distributed may lead to poor monitoring performance. 4.2.1. External disturbance (storm events) We applied the proposed monitoring method to a process in which two sudden storm events occur after a long period of dry weather. This example demonstrates how external disturbances appear within the proposed method. The pattern of measurement variables during the storm weeks was taken from the storm condition in the benchmark. The values of the measurement variables during the storm weeks are presented in Fig. 12 (first storm: samples 850–865 and second storm: samples 1050–1110). The effects of the first storm can be seen at around sample 850 and the effects of the second storm can be seen at around sample 1050. During the storm events, the average influent flow rate increases from Table 1 Variables used in the monitoring of the benchmark model No. Symbol Meaning 1 2 3 4 5 6 7 SNH;in Qin TSS4 SO;3 SO;4 KL a5 SNO;2 Influent ammonium concentration Influent flow rate Total suspended solid (reactor 4) Dissolved oxygen concentration (reactor 3) Dissolved oxygen concentration (reactor 4) Oxygen transfer coefficient (reactor 5) Nitrate concentration (reactor 2) Normal Probability Plot 0.4 0.999 0.997 0.99 0.98 0.95 0.90 0.35 0.25 Probability Density estimate 0.3 0.2 0.15 0.50 0.25 0.10 0.05 0.02 0.01 0.003 0.001 0.1 0.05 0 -4 0.75 -2 0 2 Second score value (t2) (a) 4 6 -2 0 2 Data (b) Fig. 11. (a) Density estimate; (b) normality check of second score ðt2 Þ obtained from PCA. 4 480 S-NO2 K La 5 S-O4 S-O3 TSS4 Q in S NH,in J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 60 40 20 0 0 200 400 600 800 1000 1200 1400 4000 0 3000 2000 4 0 2 0 6 0 4 2 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 300 0 200 100 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 60000 40000 20000 2 1 0 Samples Fig. 12. X -block variables during the storm weeks. 20,000 m3 d1 to 60,000 m3 d1 . Because sudden increase of these influent flow rates affects other internal and external variables through change of microbiological reaction rate, it also makes the covariance structure change. ICA monitoring was applied to this disturbance case. To reduce the dimension, two independent components were extracted. Fig. 13 shows the independent component score plot for IC1 and IC2 during the storm weeks with confidence bounds (95% and 99%) calculated using the kernel density estimation. Around samples 850 and 1050, which indicate the first and second storm, the projected independent components depart from the normal operating condition. Fig. 14 shows the I 2 , Ie2 and SPE charts over the period that includes the storm events. Two ICs are selected for monitoring the systematic part of process. The confidence limits for the I 2 , Ie2 and SPE charts were obtained from kernel density estimation of the I 2 values, Ie2 values and SPE values of normal operating data. The I 2 and Ie2 measures display significant deviations at around samples 850 and 1050 while SPE measure detect them at around sample 1050. Since the I 2 , Ie2 and SPE Storm Event 7 6 5 1080 1085 1075 1090 10701065 1095 1060 IC-2 4 3 1055 2 1135 1130 1140 1040 1035 11001045 1050 845 880 1125 840 1030 1120 855 1025 835 11051110 1115 1020 875 830 1 0 -1 870 865 860 -2 -3 850 -14 -12 -10 -8 -6 -4 -2 0 2 4 IC-1 Fig. 13. Plot of two independent component values (samples 830–880 and samples 1020–1140) for storm events. The normal operating region (95% and 99%) is indicated with dashed lines. 481 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 600 I 2 400 200 0 0 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 600 Ie 2 400 200 0 0 SPE 100 50 0 0 Sample Number Fig. 14. ICA monitoring charts: I 2 , Ie2 and SPE plots spanning the regions where the storm events (around sample 850 and 1050) occur in the benchmark simulation. charts produce a single statistic at each sampling time, they detect only the existence and time of occurrence of out of control situations. To better detect abnormalities in the process, contribution plots were utilized in conjunction with the multivariate charts. As discussed above, contribution plots are a fast way to investigate an event and to isolate the variables causing the deviation. Fig. 15 displays the variables responsible for the deviation in the I 2 measure, Ie2 measure and SPE measure at sample 855. The analysis of the contributions to the I 2 and Ie2 measures during the storm events indicates that variable 2 ðQin Þ, variable 3 Contribution to SPE Contribution to Ie2 Contribution to I2 Sample 855 10 5 0 -5 -10 1 2 3 4 5 6 7 5 6 7 5 6 7 Variables 10 5 0 -5 1 2 3 4 Variables 2 1 0 -1 -2 1 2 3 4 Variables Fig. 15. Variables contributing to the deviation in I 2 , Ie2 and SPE at sample 855. 482 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 ðTSS4 Þ, and variable 7 ðSNO;2 Þ are primarily responsible for these deviations. This result can also be identified by inspection of Fig. 12. From these contribution plots, an Ôout-of-control’ situation is identified when the contributions of some variables are larger than anticipated. The identification of the variables which have experienced the greatest change, in conjunction with the expert knowledge of the production engineer and operator, makes it possible to relate a particular sequence of changes to a particular process malfunction. This information can provide sufficient information to allow operational personnel to narrow down the potential causes of the process problem. In the case of external disturbances such as storm events, conventional PCA with two PCs also gives good monitoring results. Table 2 Variance captured by the PCA model PC number Eigenvalue of covðX Þ % Variance captured this PC % Variance captured total 1 2 3 4 5 6 7 4.600 1.630 0.479 0.149 0.077 0.055 0.006 65.71 23.34 6.85 2.13 1.10 0.79 0.09 65.71 89.05 95.89 98.02 99.12 99.91 100.00 chart successfully detects the internal disturbance. The I 2 measure increases rapidly around sample 288 and reveals a diurnal variation, which indicates the detection of successive faults. Furthermore, the pattern of internal disturbance (step + linear) is well reflected in the I 2 chart. Fig. 19 shows the score plot for PC1 and PC2 and the ICs plot for IC1 and IC2 from sample 270 to 370. In the PCs plot, most samples are within the normal operating region despite the onset of the fault at sample 288, indicating that PCA fails to detect the small internal disturbance. In contrast, samples affected by the internal disturbance are easily detected in the ICs plot. At these affected samples, the ICs escape from their confidence bounds, indicating that the internal disturbance has distorted the internal mutual relations between the variables and thus the process is not in the normal operation mode. The process does not return to normal operation mode within the remaining time of the test period. Fig. 20 shows the contribution plots to the I 2 measure at sample 320. From the contribution plot for the I 2 value, we can conclude that internal variables, variables 4 ðSO;3 Þ and 5 ðSO;4 Þ, make the largest contribution to the I 2 statistic. If it were necessary to avoid the influence of single large variable we could construct the contribution plots using the mean contribution over some period. 4.2.2. Internal disturbance (nitrification rate decrease) The internal disturbance was imposed by decreasing the nitrification rate in the biological reactor through a decrease in the specific growth rate of the autotrophs ðlA Þ. The autotrophic growth rate at sample 288 was decreased rapidly from 0.5 to 0.4 day1 and then linearly decreased from 0.4 to 0.2 day1 until sample 480, as illustrated in Fig. 16. The PCA model is able to capture most of the variability of the X -block in two PCs, as shown in Table 2. However, it is clear from the T 2 and SPE charts shown in Fig. 17 that the PCA method with two PCs cannot detect the internal disturbance because the periodic and non-Gaussian features of the wastewater plant dominate. In the SPE chart of PCA, the trace appears to start increasing after observation 300. However, despite the presence of the fault, most samples are below the confidence limit, giving the process operator an incorrect picture of the process status. In contrast to the PCA result, the I 2 , Ie2 and SPE charts of ICA monitoring for 99% confidence limits, given in Fig. 18, show that the I 2 0.7 Nitrification rate 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 100 200 300 400 500 600 700 Sample Number Fig. 16. The form of the decrease in nitrification rate in the biological reactor (internal disturbance). 483 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 15 T2 10 5 0 0 200 400 0 200 400 600 800 1000 1200 1400 600 800 1000 1200 1400 6 5 SPE 4 3 2 1 0 Sample Number Fig. 17. PCA monitoring charts: T 2 and SPE plots spanning the region of deteriorating nitrification in the benchmark simulation. 200 I2 150 100 50 0 0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 60 Ie 2 40 20 0 40 SPE 30 20 10 0 Sample Number Fig. 18. ICA monitoring charts: I 2 , Ie2 and SPE plots spanning the region of deteriorating nitrification in the benchmark simulation. 5. Conclusions This paper proposes a new approach to process monitoring that uses ICA to achieve multivariate statistical process control. The approach provides a new statistic, the I 2 statistic, to describe the state of data. This statistic is an alternative to Hotelling’s T 2 statistic used in PCA. In addition, methods for the calculation of the confidence limits, ordering and dimension reduction of the independent components are described, along with the use of contribution plots displaying the relative contributions of the different variables. The proposed strategy was shown to be able to detect and isolate the effect of multivariate disturbances. ICA monitoring 484 J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 (b) IC-2 PC-2 (a) PCs plot ICs plot Fig. 19. Plot of two independent component values for samples 270–370 for the system in which the nitrification rate is decreased from sample 288. The normal operating region (95% and 99%) is indicated with dashed lines. Sample 320 2.5 2 Contribution to I2 1.5 1 0.5 0 -0.5 -1 1 2 3 4 5 6 7 Variables Fig. 20. Variables contributing to the deviation in I 2 at sample 320. gives more sophisticated results than the conventional method using PCA because ICA imposes statistical independence on the individual components up to more than second order and has no orthogonality constraint. The ICA monitoring method was applied to fault detection and diagnosis in both a simple multivariate process and the simulation benchmark of the biological WWTP, which is characterized by a variety of fault sources with non-Gaussian characteristics. In both case studies, the proposed method showed better monitoring performance than PCA, especially when the latent variables had non-Gaussian distributions. A number of factors deserve special consideration when using ICA for process monitoring. One is computational load. The computational requirements of ICA seem large in comparison to PCA. However, once the control limit of the normal operating data has been determined, the on-line monitoring procedures of ICA are simple because the demixing matrix (W) has already been determined in the modeling procedure. Another J.-M. Lee et al. / Journal of Process Control 14 (2004) 467–485 issue that should be considered in ICA monitoring is related to the kernel density estimation. In the present work we have assumed that the underlying distribution of normal operating data does not change over time. However, this underlying distribution can potentially change. In fact, a small change in the distribution will have a negligible effect on the control limits of I 2 , Ie2 and SPE. When the underlying distribution changes substantially, the ICA model should be updated. Adaptive ICA monitoring should be investigated in future research. Acknowledgement This work was supported by a grant (no. R01-2002000-00007-0) from Korea Science & Engineering Foundation. References [1] J.-F. Cardoso, Blind signal separation: statistical principles, Proc. IEEE 86 (10) (1998) 2009–2025. [2] P. Comon, Independent component analysis, a new concept, Signal Process. 36 (1994) 287–314. [3] P. Nomikos, J.F. MacGregor, Monitoring batch processes using multiway principal component analysis, AIChE J. 40 (8) (1994) 1361–1375. [4] W. Ku, R.H. Storer, C. Georgakis, Disturbance detection and isolation by dynamic principal component analysis, Chemom. Intell. Lab. Syst. 30 (1995) 179–196. [5] B.M. Wise, N.B. Gallagher, The process chemometrics approach to process monitoring and fault detection, J. Process Control 6 (6) (1996) 329–348. [6] D. Dong, T.J. McAvoy, Nonlinear principal component analysisbased on principal curves and neural networks, Comp. Chem. Eng. 20 (1) (1996) 65–78. [7] B.R. Bakshi, Multiscale PCA with application to multivariate statistical process monitoring, AICHE J. 44 (7) (1998) 1596–1610. [8] C. Rosen, G. Olsson, Disturbance detection in wastewater treatment plants, Water Sci. Technol. 37 (12) (1998) 197–205. [9] P. Teppola, Multivariate process monitoring of sequential process data––a chemometric approach, Ph.D. thesis, Lappeenranta Univ. of Tech., Finland, 1999. [10] W. Li, H. Yue, S.V. Cervantes, J. Qin, Recursive PCA for adaptive process monitoring, J. Process Control 10 (2000) 471–486. [11] A. Hyv€ arinen, E. Oja, Independent component analysis: algorithms and applications, Neural Networks 13 (4–5) (2000) 411–430. [12] T. Lee, Independent Component Analysis: Theory and Applications, Kluwer Academic Publishers, Boston, USA, 1998. [13] J.E. Jackson, G.S. Mudholkar, Control procedures for residuals associated with principal component analysis, Technometrics 21 (1979) 341–349. [14] R.N. Vig ario, Extraction of ocular artifacts from EEG using independent component analysis, Electroencephal. Clinical Neurophysiol. 103 (1997) 395–404. 485 [15] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer, New York, USA, 2001. [16] A. Hyv€arinen, New approximations of differential entropy for independent component analysis and projection pursuit, Adv. Neural Inform. Process. Syst. 10 (1998) 273–279. [17] A. Hyv€arinen, Fast and robust fixed-point algorithms for independent component analysis, IEEE Trans. Neural Networks 10 (1999) 626–634. [18] A. Hyv€arinen, Survey on independent component analysis, Neural Comput. Surveys 2 (1999) 94–128. [19] A. Hyv€arinen, J. Karhunen, E. Oja, Independent Component Analysis, John Wiley & Sons, Inc, New York, USA, 2001. [20] R.F. Li, X.Z. Wang, Dimension reduction of process dynamic trends using independent component analysis, Comp. Chem. Eng. 26 (2002) 467–473. [21] Y.M. Cheung, L. Xu, An empirical method to select dominant independent components in ICA time series analysis, Proc. Int. Joint Conf. Neural Networks (1999) 3883–3887. [22] A.D. Back, A.S. Weigend, A first application of independent component analysis to extracting structure from stock returns, Int. J. Neural Sys. 8 (4) (1997) 473–484. [23] J.-F. Cardoso, A. Soulomica, Blind beamforming for nonGaussian signals, IEEE Proc. F 140 (6) (1993) 362–370. [24] Y.M. Cheung, L. Xu, Independent component ordering in ICA time series analysis, Neurocomputing 41 (2001) 145–152. [25] A. Simoglou, E.B. Martin, A.J. Morris, Statistical performance monitoring of dynamic multivariate process using state space modeling, Comp. Chem. Eng. 26 (6) (2002) 909–920. [26] E.B. Martin, A.J. Morris, Non-parametric confidence bounds for process performance monitoring charts, J. Process Control 6 (6) (1996) 349–358. [27] Q. Chen, R.J. Wynne, P. Goulding, D. Sandoz, The application of principal component analysis and kernel density estimation to enhance process monitoring, Control Eng. Pract. 8 (2000) 531– 543. [28] B.W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall, UK, 1986. [29] M.P. Wand, M.C. Jones, Kernel Smoothing, Chapman & Hall, London, UK, 1995. [30] J.F. MacGregor, C. Jaeckle, C. Kiparissides, M. Koutoudi, Process monitoring and diagnosis by multiblock PLS methods, AIChE J. 40 (5) (1994) 826–838. [31] C. Rosen, Monitoring wastewater treatment systems, MS Thesis, Lund Univ., Sweden, 1998. [32] J.A. Westerhuis, S.P. Gurden, A.K. Smilde, Generalized contribution plots in multivariate statistical process monitoring, Chemom. Intell. Lab. Syst. 51 (2000) 95–114. [33] P. Teppola, S.-P. Mujunen, P. Minkkinen, T. Puijola, P. Pursiheimo, Principal component analysis, contribution plots and feature weights in the monitoring of sequential process data from a paper machine’s wet end, Chemom. Intell. Lab. Syst. 44 (1998) 307– 317. [34] C.K. Yoo, Process monitoring and control of biological wastewater treatment process, Ph.D. Thesis, POSTECH, Korea, 2002. [35] M.N. Pons, H. Spanjers, U. Jeppsson, Towards a benchmark for evaluating control strategies in wastewater treatment plants by simulation, Escape 9, Budapest, 1999. [36] C.K. Yoo, S.W. Choi, I.-B. Lee, Dynamic monitoring method for multiscale fault detection and diagnosis in MSPC, Ind. Eng. Chem. Res. 41 (2002) 4303–4317.