Chemical Process Performance Evaluation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 170

Chemical Process

Performance Evaluation
15. Characterization of Heterogeneous Catalysts, edited by
Francis Delannay
16. BASIC Programs for Chemical Engineering Design,
CHEMICAL INDUSTRIES James H. Weber
17. Catalyst Poisoning, L. Louis Hegedus and Robert W. McCabe
A Series of Reference Books and Textbooks 18. Catalysis of Organic Reactions, edited by John R. Kosak
19. Adsorption Technology: A Step-by-Step Approach to Process
Evaluation and Application, edited by Frank L. Slejko
20. Deactivation and Poisoning of Catalysts, edited by
Jacques Oudar and Henry Wise
Founding Editor
21. Catalysis and Surface Science: Developments in Chemicals
from Methanol, Hydrotreating of Hydrocarbons, Catalyst
HEINZ HEINEMANN Preparation, Monomers and Polymers, Photocatalysis
Berkeley, California and Photovoltaics, edited by Heinz Heinemann
and Gabor A. Somorjai
22. Catalysis of Organic Reactions, edited by Robert L. Augustine
23. Modern Control Techniques for the Processing Industries,
Series Editor T. H. Tsai, J. W. Lane, and C. S. Lin
24. Temperature-Programmed Reduction for Solid Materials
JAMES G. SPEIGHT Characterization, Alan Jones and Brian McNichol
Laramie, vryoming 25. Catalytic Cracking: Catalysts, Chemistry, and Kinetics,
Bohdan W. Wojciechowski and Avelino Corma
26. Chemical Reaction and Reactor Engineering, edited by
J. J. Carberry and A. Varma
27. Filtration: Principles and Practices: Second Edition,
1. Fluid Catalytic Cracking with Zeolite Catalysts, Paul B. Venuto
edited by Michael J. Matteson and Clyde Orr
and E. Thomas Habib, Jr. 28. Corrosion Mechanisms, edited by Florian Mansfeld
2. Ethylene: Keystone to the Petrochemical Industry, Ludwig Kniel, 29. Catalysis and Surface Properties of Liquid Metals and Alloys,
Olaf Winter, and Karl Stork Yoshisada Ogino
3. The Chemistry and Technology of Petroleum, James G. Speight 30. Catalyst Deactivation, edited by Eugene E. Petersen
4. The Desulfurization of Heavy Oils and Residua, and Alexis T. Bell
James G. Speight 31. Hydrogen Effects in Catalysis: Fundamentals and Practical
5. Catalysis of Organic Reactions, edited by William R. Moser Applications, edited by Zoltan Paal and P. G. Menon
6. Acetylene-Based Chemicals from Coal and Other Natural 32. Flow Management for Engineers and Scientists,
Resources, Robert J. Tedeschi Nicholas P. Cheremisinoff and Paul N. Cheremisinoff
7. Chemically Resistant Masonry, Walter Lee Sheppard, Jr. 33. Catalysis of Organic Reactions, edited by Paul N. Rylander,
8. Compressors and Expanders: Selection and Application Harold Greenfield, and Robert L. Augustine
for the Process Industry, Heinz P. Bloch, Joseph A. Cameron, 34. Powder and Bulk Solids Handling Processes: Instrumentation
Frank M. Danowski, Jr., Ralph James, Jr., and Control, Koichi linoya, Hiroaki Masuda,
Judson S. Swearingen, and Marilyn E. Weightman and Kinnosuke Watanabe
9. Metering Pumps: Selection and Application, James P. Poynton 35. Reverse Osmosis Technology: Applications for High-Purity-
10. Hydrocarbons from Methanol, Clarence D. Chang Water Production, edited by Bipin S. Parekh
11. Form Flotation: Theory and Applications, Ann N. Clarke 36. Shape Selective Catalysis in Industrial Applications,
and David J. Wilson N. y. Chen, William E. Garwood, and Frank G. Dwyer
12. The Chemistry and Technology of Coal, James G. Speight 37. Alpha Olefins Applications Handbook, edited by
13. Pneumatic and Hydraulic Conveying of Solids, O. A. Williams George R. Lappin and Joseph L. Sauer
14. Catalyst Manufacture: Laboratory and Commercial 38. Process Modeling and Control in Chemical Industries,
Preparations, Alvin B. Stiles edited by Kaddour Najim
39. Clathrate Hydrates of Natural Gases, E. Dendy Sloan, Jr. 65. Shape Selective Catalysis in Industrial Applications:
40. Catalysis of Organic Reactions, edited by Dale W. Blackburn Second Edition, Revised and Expanded, N. Y. Chen,
41. Fuel Science and Technology Handbook, edited by William E. Garwood, and Francis G. Dwyer
James G. Speight 66. Hydrocracking Science and Technology, Julius Scherzer
42. Octane-Enhancing Zeolitic FCC Catalysts, Julius Scherzer and A. J. Gruia
43. Oxygen in Catalysis, Adam Bielanski and Jerzy Haber 67. Hydrotreating Technology for Pollution Control: Catalysts,
44. The Chemistry and Technology of Petroleum: Second Edition, Catalysis, and Processes, edited by Mario L. Occelli
Revised and Expanded, James G. Speight and Russell Chianelli
45. Industrial Drying Equipment: Selection and Application, 68. Catalysis of Organic Reactions, edited by Russell E. Malz, Jr.
C. M. van't Land 69. Synthesis of Porous Materials: Zeolites, Clays,
46. Novel Production Methods for Ethylene, Light Hydrocarbons, and Nanostructures, edited by Mario L. Occelli
and Aromatics, edited by Lyle F. Albright Billy L. Crynes, and Henri Kessler
and Siegfried Nowak 70. Methane and Its Derivatives, Sunggyu Lee
47. Catalysis of Organic Reactions, edited by William E. Pascoe 71. Structured Catalysts and Reactors, edited by Andrzej Cybulski
48. Synthetic Lubricants and High-Performance Functional Fluids, and Jacob A. Moulijn
edited by Ronald L. Shubkin
72. Industrial Gases in Petrochemical Processing, Harold Gunardson
73. Clathrate Hydrates of Natural Gases: Second Edition,
49. Acetic Acid and Its Derivatives, edited by Victor H. Agreda
Revised and Expanded, E. Dendy Sloan, Jr.
and Joseph R. Zoeller
74. Fluid Cracking Catalysts, edited by Mario L. Occelli
50. Properties and Applications of Perovskite- Type Oxides,
and Paul O'Connor
edited by L. G. Tejuca and J. L. G. Fierro
75. Catalysis of Organic Reactions, edited by Frank E. Herkes
51. Computer-Aided Design of Catalysts, edited by
76. The Chemistry and Technology of Petroleum: Third Edition,
E. Robert Becker and Carmo J. Pereira
Revised and Expanded, James G. Speight
52. Models for Thermodynamic and Phase Equilibria Calculations,
77. Synthetic Lubricants and High-Performance Functional Fluids:
edited by Stanley I. Sandler
Second Edition, Revised and Expanded, Leslie R. Rudnick
53. Catalysis of Organic Reactions, edited by John R. Kosak and Ronald L. Shubkin
and Thomas A. Johnson 78. The Desulfurization of Heavy Oils and Residua,
54. Composition and Analysis of Heavy Petroleum Fractions, Second Edition, Revised and Expanded, James G. Speight
Klaus H. Altgelt and Mieczyslaw M. Boduszynski 79. Reaction Kinetics and Reactor Design: Second Edition,
55. NMR Techniques in Catalysis, edited by Alexis T. Bell Revised and Expanded, John B. Butt
and Alexander Pines 80. Regulatory Chemicals Handbook, Jennifer M. Spero,
56. Upgrading Petroleum Residues and Heavy Oils, Murray R. Gray Bella Devito, and Louis Theodore
57. Methanol Production and Use, edited by Wu-Hsun Cheng 81. Applied Parameter Estimation for Chemical Engineers,
and Harold H. Kung Peter Englezos and Nicolas Kalogerakis
58. Catalytic Hydroprocessing of Petroleum and Distillates, 82. Catalysis of Organic Reactions, edited by Michael E. Ford
edited by Michael C. Oballah and Stuart S. Shih 83. The Chemical Process Industries Infrastructure: Function
59. The Chemistry and Technology of Coal: Second Edition, and Economics, James R. Couper, O. Thomas Beasley,
Revised and Expanded, James G. Speight and W. Roy Penney
60. Lubricant Base Oil and Wax Processing, Avilino Sequeira, Jr. 84. Transport Phenomena Fundamentals, Joel L. Plawsky
61. Catalytic Naphtha Reforming: Science and Technology, 85. Petroleum Refining Processes, James G. Speight
edited by George J. Antos, Abdullah M. Aitani, and Baki Gzum
and Jose M. Parera 86. Health, Safety, and Accident Management in the Chemical
62. Catalysis of Organic Reactions, edited by Mike G. Scaras Process Industries, Ann Marie Flynn and Louis Theodore
and Michael L. Prunier 87. Plantwide Dynamic Simulators in Chemical Processing
63. Catalyst Manufacture, Alvin B. Stiles and Theodore A. Koch and Control, William L. Luyben
64. Handbook of Grignard Reagents, edited by Gary S. Silverman 88. Chemical Reactor Design, Peter Harriott
and Philip E. Rakita 89. Catalysis of Organic Reactions, edited by Dennis G. Morrell
90. Lubricant Additives: Chemistry and Applications, edited by 113. Bubbles, Drops, and Particles in Non-Newtonian Fluids,
Leslie R. Rudnick Second Edition, R. P. Chhabra
91. Handbook of Fluidization and Fluid-Particle Systems, 114. The Chemistry and Technology of Petroleum, Fourth Edition,
edited by Wen-Ching Yang James G. Speight
92. Conservation Equations and Modeling of Chemical 115. Catalysis of Organic Reactions, edited by Stephen R. Schmidt
and Biochemical Processes, Said S. E. H. Elnashaie 116. Process Chemistry of Lubricant Base Stocks, Thomas R. Lynch
and Parag Garhyan 117. Hydroprocessing of Heavy Oils and Residua, edited by
93. Batch Fermentation: Modeling, Monitoring, and Control, James G. Speight and Jorge Ancheyta
Ali l;inar, Gulnur Birol, Satish J. Parulekar, and Cenk Undey 118. Chemical Process Performance Evaluation, Ali Cinar,
94. Industrial Solvents Handbook, Second Edition, Ahmet Palazoglu, and Ferhan Kayihan
Nicholas P. Cheremisinoff
95. Petroleum and Gas Field Processing, H. K. Abdel-Aal, Mohamed
Aggour, and M. Fahim
96. Chemical Process Engineering: Design and Economics,
Harry Silla
97. Process Engineering Economics, James R. Couper
98. Re-Engineering the Chemical Processing Plant: Process
Intensification, edited by Andrzej Stankiewicz
and Jacob A. Moulijn
99. Thermodynamic Cycles: Computer-Aided Design
and Optimization, Chih Wu
100. Catalytic Naphtha Reforming: Second Edition,
Revised and Expanded, edited by George 1. Antos
and Abdullah M. Aitani
101. Handbook of MTBE and Other Gasoline Oxygenates,
edited by S. Halim Hamid and Mohammad Ashraf Ali
102. Industrial Chemical Cresols and Downstream Derivatives,
Asim Kumar Mukhopadhyay
103. Polymer Processing Instabilities: Control and Understanding,
edited by Savvas Hatzikiriakos and Kalman B . Migler
104. Catalysis of Organic Reactions, John Sowa
105. Gasification Technologies: A Primer for Engineers
and Scientists, edited by John Rezaiyan
and Nicholas P. Cheremisinoff
106. Batch Processes, edited by Ekaterini Korovessi
and Andreas A. Linninger
107. Introduction to Process Control, Jose A. Romagnoli
and Ahmet Palazoglu
108. Metal Oxides: Chemistry and Applications, edited by
J. L. G. Fierro
109. Molecular Modeling in Heavy Hydrocarbon Conversions,
Michael 1. Klein, Ralph J. Bertolacini, Linda J. Broadbelt,
Ankush Kumar and Gang Hou
110. Structured Catalysts and Reactors, Second Edition, edited by
Andrzej Cybulski and Jacob A. Moulijn
111. Synthetics, Mineral Oils, and Bio-Based Lubricants: Chemistry
and Technology, edited by Leslie R. Rudnick
112. Alcoholic Fuels, edited by Shelley Minteer
Chemical Process
Performance
Evaluation

Ali Cinar
Illinois Institute of Technology
Chicago, Illinois, U.S.A.

Ahmet Palazoglu
University of California
Davis, California, U.S.A.

Ferhan Kayihan
Integrated Engineering Technologies
Tacoma, Washington, U.S.A.

o ~Y~~F~~~~~O"P
Boca Raton london New York

CRC Press is an imprint of the


Taylor & Francis Group, an informa business
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2007 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor.& Francis Group, an Informa business

No claim to original U.S. Government works


Printed in the United States of America on acid-free paper
10987654321

International Standard Book Number-lO: 0-8493-3806-9 (Hardcover)


International Standard Book Number-13: 978-0-8493-3806-9 (Hardcover)

This book contains information obtained from authentic and highly regarded sources. Reprinted
material is quoted with permission, and sources are indicated. A wide variety of references are
listed. Reasonable efforts have been made to publish reliable data and information, but the author
and the publisher cannot assume responsibility for the validity of all materials or for the conse-
quences of their use.
To MINE, BEDIRHAN AND TO THE MEMORY OF MY PARENTS
No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any
electronic, mechanical, or other means, now known or hereafter invented, including photocopying, (A. CINAR)
microfilming, and recording, or in any information storage or retrieval system, without written
permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www. To MINE, AYCAN, OMER AND MY PARENTS
copyright.com (http://www.copyright.coml) or contact the Copyright Clearance Center, Inc. (CCe)
(A. PALAZOGW)
222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that
provides licenses and registration for a variety of users. For organizations that have been granted a
photocopy license by the CCC, a separate system of payment has been arranged.

Trademarl, Notice: Product or corporate names may be trademarks or registered trademarks, and To GULSEVIN, ARKAN, TARHAN AND TO THE MEMORY OF MY PARENTS
are used only for identification and explanation without intent to infringe. (F. KAYIHAN)

Library of Congress CataIoging-in-Publication Data

Cinar,Ali.
Chemical process performance evaluation / Ali Cinar, Ahmet Palazoglu,
Ferhan Kayihan. FOR THEIR LOVE, SUPPORT AND INSPIRATION.
p. em. -- (Chemical industries; 117)
Includes bibliographical references and index.
ISBN 0-8493-3806-9 (aile paper)
1. Chemical process control--Statistical methods. 2. Chemical
industry--Quality control--Statistical methods. I. Palazoglu, Ahmet. II. Kayihan,
Ferhan, 1948- III. Title. IV. Series.

TP155.75.C552007
660'.281--dc22 2006051787

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Preface
As the demand for profitability and competitiveness increases in the global
marketplace, industrial manufacturing operations face a growing pressure to
maintain safety, flexibility and environmental compliance. This is a result
of pushing the operational boundaries to maximize productivity that may
sometimes compromise the safe and rational operational practices. To min-
imize costly plant shut-downs and to diminish the probability of accidents
and catastrophic events, an industrial plant is kept under close surveillance
by computerized process supervision and control systems that collect data
from process units and analyze the data to assess process status. Over
the years, analysis and diagnosis methods have evolved from simple control
charts to more sophisticated statistical techniques and signal processing
capabilities. The goal of this book is to introduce the reader to the fun-
damentals and applications of a variety of process performance evaluation
approaches, including process monitoring, controller performance monitor-
ing and fault diagnosis. The material covered represents a culmination of
decades of theoretical and practical research carried out by the authors and
is based on the early notes that supported several short courses that the
authors gave over the years. It is intended as advanced study material for
graduate students and can be used as a textbook for undergraduate or grad-
uate courses on process monitoring. By emphasizing the balance between
the practice and the theory of statistical monitoring and fault diagnosis, it
would also be an excellent reference for industrial practitioners, as well as
a resource for training courses.
The reader is expected to have a rudimentary knowledge of statistics and
have an awareness of general monitoring and control concepts such as fault
detection, diagnosis and feedback control. The book will be constructed
upon these basic building blocks, introducing new concepts and techniques
when necessary. The early chapters of the book present the reader with the
use of multivariate statistics and various tools that one can use for process
monitoring and diagnosis. This includes a chapter on empirical process
modeling and another chapter on the modeling of process signals. In later
chapters, several fault diagnosis methods and the means to discriminate
between sensor faults and process upsets are discussed in detail. Then, the
statistical modeling techniques are extended to the assessment of control
performance. The book concludes with an extensive discussion on the use
of data analysis techniques for the special case of web and sheet processes.
Several case studies are included to demonstrate the implementation of
the discussed methods and hopefully to motivate the readers to explore
these ideas further in solving their own specific problems. The focus of this
book is on continuous processes. However, there are a number of process
applications, especially in pharmaceuticals and specialty chemicals, where
the batch mode of operation is used. The monitoring of such processes has
been discussed in detail in another book by Cinar et al. [41].
For further information on the authors, the readers are referred to the
individual Web pages: Ali Cinar, wwv).chee.iit.ed'u/rv cinar,!, Ahmet Pala-
zoglu, www.chms.ucdavis.edu/research/web/pse/ahmet/, and Ferhan Kayi- Contents
han, ietek.netj. Furthermore, for supplementary materials and corrections,
the readers can access the publisher's Web site www.crcpress.com 1 .
We are indebted to all our students and colleagues who, over the years,
set the challenges and provided the enthusiasm that helped us tackle such an
exciting and rewarding set of problems. Specifically, we would like to thank Nomenclature
our students S. Beaver, J. DeCicco, F. Doymaz, S. Kendra, F. Kosebalaban-
Tokatli, A. Negiz, A. Norvilas, A. Raich, W. Sun, E. Tatara, C. Undey 1 Introduction 1
and J. Wong, who have conducted the research related to the techniques 1.1 Motivation and Historical Perspective 2
discussed in the book. We thank our colleagues, Y. Arkun, F. J. Doyle III, 1.2 Outline 4
K. A. McDonald, T. Ogunnaike, J. A. Romagnoli and D. Smith for many
years of fruitful discussions, colored with lots of fun and good humor. We 2 Univariate Statistical Monitoring Techniques 7
also would like to acknowledge CRC Press / Taylor & Francis for supporting 2.1 Statistics Concepts . . . . . . . . 8
this book project. This has been a wonderful experience for us and we hope 2.2 Univariate SPM Techniques . . . . . . . . 11
that our readers share our excitement about the future of the field of process 2.2.1 Shewhart Control Charts . . . . . 11
monitoring and evaluation. 2.2.2 Cumulative Sum (CUSUM) Charts 18
2.2.3 Moving Average Monitoring Charts for Individual Mea-
Ali Cinar surements . . . . . . . . . . . . . . . . . . . . . 19
Ahmet Palazoglu 2.2.4 Exponentially Weighted Moving Average Chart 22
Ferhan Kayihan 2.3 Monitoring Tools for Autocorreleated Data . . . . . . 22
2.3.1 Monitoring with Charts of Residuals. . . . . . 26
2.3.2 Monitoring with Detecting Changes in Model Param-
eters . . . . . . . . . . . . . . . . . . . . . 27
2.4 Limitations of Univariate Monitoring Techniques 32
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . 35

3 Multivariate Statistical Monitoring Techniques 37


3.1 Principal Components Analysis . . 37
3.2 Canonical Variates Analysis . . . . 43
3.3 Independent Component Analysis. 43
3.4 Contribution Plots . . . . . . 46
3.5 Lineal' Methods for Diagnosis 48
1 Under the menu Electronic Products located on the left side of the screen, click on
Downloads & Updates. A list of books in alphabetical order with Web downloads will
3.5.1 Clustering....... 48
appear. Locate this book by a search, or scroll down to it. After clicking on the book 3.5.2 Discriminant Analysis 50
title, a brief summary of the book will appear. Go to the bottom of this screen and click 3.5.3 Fisher's Discriminant Analysis 53
on the hyperlinked 'Download' that is in a zip file.
3.6 Nonlinear Methods for Diagnosis 58 7.2.1 pH Neutralization Simulation 161
3.6.1 Neural Networks . . . . . 58 7.2.2 CSTR Simulation . 164
3.6.2 Kernel-Based Techniques 64 7.3 Fault Diagnosis Using HMMs . 166
3.6.3 Support Vector Machines 66 7.3.1 Case Study of HTST Pasteurization Process. 167
3.7 Summary . 69 7.4 Fault Diagnosis Using Contribution Plots 174
4 Empirical Model Development 7.5 Fault Diagnosis with Statistical Methods. 179
73
4.1 Regression Models . . . 7.6 Fault Diagnosis Using SVM . 191
75
4.2 PCA Models . 7.7 Fault Diagnosis with Robust Techniques 192
78
4.3 PLS Regression Models 7.7.1 Robust Monitoring Strategy .. 192
79
4.4 Input-Output Models of Dynamic Processes 7.7.2 Pilot-Scale Distillation Column 198
83
4.5 State-Space Models. 7.8 Summary . 202
89
4.6 Summary . . . . . . . . . . . . . . . . 97
8 Sensor Failure Detection and Diagnosis 203
5 Monitoring of Multivariate Processes 99 8.1 Sensor FDD Using PLS and CVSS Models . 204
5.1 SPM Methods Based on PCA . 100 8.2 Real-Time Sensor FDD Using PCA-Based Techniques 215
5.2 SPM Methods Based on PLS . 105 8.2.1 Methodology 218
5.3 SPM Using Dynamic Process Models. 108 8.2.2 Case Study 224
5.4 Other MSPM Techniques 112 8.3 Summary . . . . . . 230
5.5 Summary . 114
9 Controller Performance Monitoring 231
6 Characterization of Process Signals 115 9.1 Single-Loop Controller Performance Monitoring 233
6.1 \AJavelets . 115 9.2 Multivariable Controller Performance Monitoring 237
6.1.1 Fourier Transform . 116 9.3 CPM for MPC 238
6.1.2 Continuous \AJavelet Tl.·ansform 119 9.4 Summary . 248
6.1.3 Discrete 'Wavelet Transform 123
6.2 Filtering and Outlier Detection 127 10 Web and Sheet Processes 251
6.2.1 Simple Filters. 128 10.1 Traditional Data Analysis . 252
6.2.2 Wavelet Filters . 131 10.1.1 MD/CD Decomposition . 2.52
6.2.3 Robust Filter . 133 10.1.2 Time Dependent Structure of Profile Data. 256
6.3 Signal Representation by Fuzzy Triangular Episodes 135 10.2 Orthogonal Decomposition of Profile Data . 257
6.4 Development of Markovian Models 138 10.2.1 Gram Polynomials . 259
6.4.1 Markov Chains . 139 10.2.2 Principal Components Analysis 262
6.4.2 Hidden Markov Models . 141 10.2.3 Flatness of Scanner Data 264
6.5 Wavelet-Domain Hidden Markov Models 145 10.3 Controller Performance . . . . . 268
6.6 Summary . . . . . . . . . . . . . . . . . 147 10.3.1 MD Control Performance 269
10.3.2 Model-Based CD Control Performance. 271
7 Process Fault Diagnosis 149 10.4 Summary . . . . . . . . . . . . . . . . . . . .. 274
7.1 Fault Diagnosis Using Triangular Episodes and HMMs 149
7.1.1 CSTR Simulation . 152 Bibliography 277
7.1.2 Vacuum Column . 155
7.2 Fault Diagnosis Using \\Tavelet-Domain HMMs 157 Index 305
Nomenclature

Symbols
a Number of principal components retained for a PC model

Transition probability between states i and j

AcB State and input coefficient matrices in continuous state-


space systems

Inner relation regression coefficient in PLS

Probability distribution for observation j

Quadratic discrimination score for the ith population

Concentration of species A

Total contribution of variable Xj to T 2

Contribution of variable Xj to the normalized score ti<JS;

State and input coefficient matrices in output equation of


state-space systems

d(x'y) Distance between x and y

di(x) Linear discriminant score for the ith population

E Residuals matrix (n x m)

e(k) Prediction error (residual) at time k

Episode of a signal between points a and b

F Residuals matrix of quality variables in PLS

F Feature space
FL (d)' FH (d) Soft-thresholding and hard-thresholding wavelet filters
T'x'y Crosscorrelation between x and y
Fw (d) Wiener wavelet filter
Residual contribution index for jth variable with confi-
F' G State and input coefficient matrices in discrete-time state- dence level a
space systems
s Covariance matrix
J Cost function, CPM performance measure
s A Markov state
K(u'v) Kernel function
Si Score distance based on the PC model for fault i
M Sphering matrix in rCA
Variance of variable i
1'vI Control horizon in MPC
Scores contribution index for jth variable with confidence
m Number of process variables in a data set level a
n Number of samples in a data set Between-dass scatter matrix
0 An observable output sequence in a HMM Within-class scatter matrix
p Loadings matrix (m x a) Total scatter matrix

P Loadings vector (m xl) T Scores matrix (n x a)

Pi PC loading i, ordered eigenvector i of XTX t Scores vector (n xl)


P Prediction horizon in MPC T Length of observation sequence in a HMM

Q Weight matrix of quality variables in PLS T Temperature

q Flow rate Hotelling's T 2 statistic

q Number of quality variables in a data set TRAN Matrix defined in Eq. 7.2

q Shift operator in time series models u Scores matrix of quality variables in PLS
q-l v An observation symbol in a HMM
Backward shift operator in time series models
Q' R Positive definite weight matrices in MPC Vp Plant noise

R Residuals block matrix in multipass sensor FDD Output sensor noise

R Range of variable 'i w FDA vectors to maximize scatter between classes


"
T'i Residual based on the PC model for fault i
w(t - T) A STFT window function centered at T

T'z Autocorrelation at lag 1 Disturbance coefficients matrix to state variables and out-
puts, respectively
rs(index Sensor index of residuals
w vVeight matrix of process variables in PLS
w Projection matrix Mahalanobis angle between a and b with vertex at origin

Sample mean of variable x T Target for the mean, first-order system time constant
x Process variables data matrix (x x m) MPC cost function
y Quality variables data matrix (x x q) Autoregressive parameter, residual Mahalanobis angle
z(k) A discrete signal evaluated at time instant k ¢(k) MPC cost function at time k
z(t) A continuous signal evaluated at time t ¢:X-+F Nonlinear map from input space X to feature space F

1jJ(t) A wavelet function


Greek Characters A wavelet function with dilation parameter s and transla-
,8
tion parameter u
Low-pass filter constant
(3 Vector of regression coefficients
~
Subscripts
Magnitude of step change
E
Initial conditions
Random variation (uncorrelated zero-mean Gaussian), mea-
surement error c Coolant
CPM performance measures (#: hist, des) f Feed
Ridge parameter min Minimum value of a variable
AHMM rn, max Maximum values of a variable
Forgetting factor T Reference state/value
ith eigenvalue s Steady-state
w Frequency
7T Initial HMM state distribution Superscripts
Classes of events such as distinct operation modes T Transpose of a matrix
1'-·· 'g

Covariance matrix
Abbreviations
Standard deviation
()
AIC Akaike information criteria
Model parameters vector
ANN Artificial neural network
Euclidian angle between points a and b with vertex at the
origin AR Autoregressive
ARIMA Autoregressive integrated moving average EGA1 Expected cost of misclassification

ARL Avel'age run length EM Expectation maximization

ARMA Autoregressive moving average EWMA Exponentially weighted moving average

ARMAX Autoregressive moving average with exogenous inputs FDA Fisher's discriminant analysis

ARX Autoregressive model with exogenous inputs FDD Fault detection and diagnosis

ASM Abnormal situation management FFT Fast Fourier transform

ASR Automatic speech recognition FPE Final prediction error

BESI Backward elimination sensor identification FT Fourier transform

BJ Box-Jenkins GUI Graphical user interface

BSSIR Backward substitution for sensor identification and reconstruction HJ\IM Hidden Markov model

GG Correlation coefficient HMT Hidden Markov tree

GWT Continuous wavelet transform HPCA Hierarchical principal components analysis

CLP Closed-loop potential HPLS Hierarchical partial least squares

CPCA Consensus principal components analysis HTST High-temperature short-time pasteurization

CPM Controller performance monitoring ICA Independent component analysis

CQI Continuous quality improvement KBS Knowledge-based system

CSTR Continuous stirred tank reactor KDE Kernel density estimation

CUMPRESS Cumulative prediction sum of squares LGL Lower control limit

CUSUM Cumulative sum LWL Lo\ver warning limit

CV Canonical variate LFCM Liquid-fed ceramic melter

CVA Canonical variates analysis LQG Linear quadratic Gaussian (control problem)

CVSS Canonical variate state space (models) LV Latent variable

GL Centerline of SPM chart MSE Mean square error

DWT Discrete wavelet transform MA Moving average

DCS Distributed control system MBPCA Multiblock principal components analysis

DMC Dynamic matrix control MBPLS Multiblock partial least squares


M1MO Multi-input multi-output SFCM Slurry-fed ceramic melter

MM Moving median filter S1SO Single-input single-output

MPC Model predictive control SNR Signal-to-noise ratio


MSPM Multivariate statistical process monitoring SPC Statistical process control
MV Multivariate SPM Statistical process monitoring
MVC Minimum variance control SQC Statistical quality control
NAR Nonlinear autoregressive STFT Short-time Fourier transform
NARMAX Nonlinear ARMAX SV Singular values or support vectors
NLPCA Nonlinear principal components analysis SVD Singular value decomposition
NLTS Nonlinear time series SVM Support vector machine
NO Normal operation UCL Upper control limit
NOR Normal operating region UWL Upper warning limit
O-NLPCA Orthogonal nonlinear principal components analysis WT Wavelet transform
OE Output error
PC Principal component
PCA Principal components analysis
PCD Parameter change detection (method)
PCR Principal components regression
PLS Partial least squares (Projection to latent structures)
PLS Partial least squares
PRESS Prediction sum of squares
RSVS Redundant sensor voting system
RTKBS Real-time knowledge-based systems
RVWLS Recursive variable weighted least squares
RWLS Recursive weighted least squares
SPE Squared prediction error
1

Introduction

Today, a number of process and controller performance monitoring tech-


niques can provide an inexpensive, algorithmic means to assure and main-
tain process quality and safety without resorting to costly investments in
hardware. These techniques also help maximize hardware utilization and
efficiency. This book represents a compilation and overview of such tech-
niques to help the reader gain a healthy understanding of the fundamentals
and the current developments and get a glimpse of what the future may
hold. This book is intended to be a resource and a reference source for
those who are interested in evaluating the potential of these techniques for
specific applications, and learn their strengths and limitations.
The goal of statistical process monitoring (SPM) is to detect the oc-
currence and the nature of operational changes that cause a process to
deviate from its desired target. The methodology for detecting changes is
based on statistical techniques that deal with the collection, classification,
analysis and interpretation of data. This, then, needs to be followed by
process diagnosis that aims at locating the root cause of the process change
and enables the process operators to take necessary actions to correct the
situation, thereby returning the process back to its desired operation.
The detection and diagnosis tasks can be carried out on the process
measurements to obtain critical insights into the performance of not only
the process itself but also the automatic control system that is deployed
to assure normal operation. Today, the integration of such tasks into the
process control software associated with Distributed Control Systems (D-
CS) is in progress. The technologies continue to advance, especially in the
incorporation of multivariate statistics as well as recent developments in
signal processing methods such as wavelets and hidden Markov models.
This chapter will first present the motivations behind the application of
various statistical techniques to process measurements along with a histor-
ical view of the key technological developments in this area. This will be
followed by an overview of each chapter to help guide the reader.

1
2 Chapter 1. Introduction 1.1. Motivation and Historical Perspective 3

1.1 Motivation and Historical Perspective tions in the data set that exhibit the largest variance, by exploiting the
cross correlations among the set of variables considered. The manifestation
Traditional statistical process control (SPC) has focused on monitoring of multivariate statistics in regression modeling has been the developmen-
quality variables based on reports from the quality control laboratory and t of partial least squares (PLS) by H. Wold [331] and later by S. Wold
if the quality variables are outside the range of their specifications, making and H. Martens [85]. These concepts have been introduced to the chemi-
adjustments to recover their desired levels (hence controlling the process). cal engineering community by J.F. MacGregor who led the deployment of
Often, on-line analyzers/sensors may not be available or may be costly for key technological advances in continuous and batch monitoring to a variety
certain quality attributes (e.g., saltiness of potato chips, trace impurity con- of industrial applications [146, 153]. These efforts were complemented by
tent of an aqueous stream, number average molecular weight of a polymer) the development of performance indexes that quantify the effectiveness of
and could require analytical tests that yield results in hours or days. Today, control systems by Harris [103].
for swift and robust detection of abnormal process operation, the process One of the most influential books on the subject of PCA was by LT.
variables, that are much more frequently and directly measured, are used Jolliffe [128] who published recently a new edition [129] of his book. The
to infer process status. In other words, system temperatures, pressures and book by Smilde et al. [276] is the most recent contribution to the literature
stream flow rates can be used as indicators of certain product properties on multivariate statistics, with special emphasis on chemical systems. Two
in an indirect but often reliable manner. An added advantage of the use of books coauthored by R. Braatz [38, 260] review a number of fault detection
process variables is their direct link to process faults, reducing the time for and diagnosis techniques for chemical processes. Cinar [41] coauthored
fault diagnosis. a book on monitoring of batch fermentation and fault diagnosis in batch
With the ever-increasing recognition of the consequences of plant acci- process operations.
dents on the plant personnel and the surrounding communities [216], the The use of mathematical and statistical modeling methods to relate
use of process variables in determining the process status has become an chemical data sets to the state of the chemical system is referred to as
integral element of abnormal situation management (ASM) practices. Nat- chemometrics. A key figure in the development of chemometrics and its
urally, statistical techniques have been in the forefront of tools that have application to industrial problems has been B.R. Kowalski [18, 147, 319]
been employed by plant operators to avoid plant failures and catastrophic who led the Center for Process Analytical Chemistry (CPAC) that was es-
events. A consortium, called ASM, led by Honeywell and several chemical tablished in 1984. To aid qualitative and quantitative analysis of chemical
and petrochemical companies ( www.asmconsortium.com ) was established data, Eigenvector Technologies Inc., a developer of independent commer-
in 1992 and continues to offer technology solutions on alarm management cial software, has provided a number of software solutions, primarily as a
and decision support systems. Matlab® Toolbox [328].
From a historical perspective, with the introduction of univariate con- The industrial importance of monitoring technologies in the sheet and
trol charts by Walter A. Shewhart [267] of Bell Labs, the statistical quality web forming processes has been emphasized chiefly by DuPont in their
control (SQC) has become an essential element of quality assurance efforts polymer manufacturing activities and by Weyerhaeuser in papermaking.
in the manufacturing industry. It was "'V.E. Deming who championed Shew- Among many academic contributions towards the fundamental develop-
hart's use of statistical measures for quality monitoring and established a ment of both control and monitoring methodologies for sheet processes, the
series of quality management principles that resulted in substantial business works of Rawlings and Chien [244], Rigopoulos et al. [250, 251], Jiao et al.
improvements both in Japan and the U.S. [52]. [124]' Featherstone and Braatz [73] and Skelton et al. [275] are particularly
The leading edge research conducted at Kodak during the 1970s and significant.
1980s resulted in J.E. Jackson's landmark papers and book [120, 121, 122] There is a substantial body of work, with a new emphasis, now origi-
that reformulated the SQC concepts within the context of multivariate nating from China and Singapore, as well as from academic institutions in
statistics. The key element of these techniques was the Principal Compo- Taiwan, Korea and Hong Kong that aim to respond to the ever-increasing
nents Analysis (PCA) that was introduced much earlier by K. Pearson in demands on quality assurance in the expanding local manufacturing indus-
1901 [225, 226] and H. Hotelling in 1933 [113]. In fact, the history of P- tries (see, for example, [28, 84]).
CA can be traced back to the 1870s when E. Beltrami and C. Jordan first Many industrial corporations espoused continuous quality control (C-
formulated the singular value decomposition. peA reveals the key direc- QI) using six-sigma principles [4] which establish management strategies to
4 Chapter 1. Introduction 1.2. Outline 5

maintain product quality levels. The material presented in this book pro- sheet processes, the nomenclature in Chapter 10 should be regarded as
vide the framework and the tools to implement six-sigma on multivariate mostly independent of the rest of the book.
processes. The reader should consult the Publisher's \;v'eb site www.crcpre88.com
for supplementary materials and updates.

1.2 Outline
The book follows a rational presentation structure, starting with the fun-
damentals of univariate statistical techniques and a discussion on the im-
plementation issues in Chapter 2. After stating the limitations of univari-
ate techniques, Chapter 3 focuses on a number of multivariate statistical
techniques that permit the evaluation of process performance and provide
diagnostic insight. To exploit the information content of process measure-
ments even further, Chapter 4 introduces several modeling strategies that
are based on the utilization of input-output process data. Chapter 5 pro-
vides statistical process monitoring techniques for continuous processes and
three case studies that demonstrate the techniques.
Complementary to the statistical techniques presented before, Chapter
6 reviews a number of process signal modeling methods that originally e-
merged from the signal processing community, and shows how they can
be utilized in the context of process monitoring and diagnosis. Chapter 7
presents several case studies that show how the techniques can be imple-
mented. The special case of sensor failures and their detection and diagnosis
is considered worthy of a separate chapter (Chapter 8).
When a failure occurs during operation, the cause can be attributed not
only to the process equipment, or the sensor network but also to the con-
troller. Controller performance monitoring (CPJ\1), considered as a subset
of plantwide process monitoring and diagnosis activities, deserves a separate
discussion. Thus, Chapter 9 provides an overview of controller performance
monitoring tools and offers a case study to illustrate the key concepts.
The final chapter (Chapter 10) focuses on web and sheet forming pro-
cesses. It demonstrates how the statistical techniques can be applied to
evaluate process and control performance for quality assurance and to ac-
quire fundamental insight towards the operation of such processes.
The Nomenclature section defines the variables and special characters as
well as the acronyms used in the book. The reader is cautioned that, given
the breadth of the subjects covered, to sustain a consistent nomenclature in
the book and still be able to maintain fidelity to the traditional (historical)
use of nomenclature for various techniques is a difficult if not an impossible
task. Yet, the use of various indices and variable definitions should be
clear within the context of each technique, and every attempt is made to
eliminate potential conflicts. In addition, given the uniqueness of web and
2

Univariate Statistical
Monitoring Techniques

Traditional approaches in process performance evaluation rely on charac-


teristics and time trends of critical process variables such as controlled
variables and manipulated variables. Ranges of variation of these variables,
their frequency of reaching hard constraints, or any abnormal trends in their
behavior have been used by many experienced plant personnel to track pro-
cess performance. Variances of these variables and their histograms have
also been used. More formal techniques for process performance evalua-
tion rely on the extension of statistical process control (SPC) to continuous
processes.
The first applications of SPC were in discrete parts manufacturing.
¥lhen the measured dirnensions of a machined part were significantly differ-
ent from their desirable values (exceeding the tolerances), the manufactur-
ing operation was stopped, adjustments were made and the manufacturing
unit was restarted. Work stoppage for adjustment had a cost in terms of
lost production time and parts manufactured during startup that do not
meet the specifications. Consequently, manufacturing was interrupted to
'control' the process when the cost of off-specification production exceeded
the cost of adjustment. The statistical techniques and graphical tools to
assess this trade-off were called statistical process control. Adjustments in
continuous processes such as distillation, reforming or catalytic cracking in
refineries do not necessitate work stoppage, but the material and/or energy
flow to the process is adjusted incrementally. Hence, there are no contribu-
tions to the cost of adjustment from work stoppage. Adjustments are made
frequently by using automatic control techniques such as feedback and/or
feedforward control [253]. To discriminate such control from SPC, the term
engineering process control has been used in the SPC community. In fact,
the task of performance evaluation has become 'monitoring' the operation of
the process (which may be regulated using automatic control techniques) to

7
8 Chapter 2. Univariate Statistical Monitoring Techniques 2.1. Statistics Concepts 9

determine if the process is performing as desired. Consequently, the terms


statistical pmcess monitoTing (SPM) and automatic contr'ol are used in this Table 2.1. Population and sample statistics.
book.
Process monitoring is implemented as a periodically repeated hypoth- Statistic Population (size m p ) Sample (size m s )
esis testing that checks if
Mean 'I
t
= ..l..
7TI
",m p
L7,=1
X"
1,
p
• the mean value of a process variable has not shifted away from its 2 _ 1 ",m p (
target value, and
Variance (J - m p 0i=1 .1:., - Jl )2
Range
• the spTead of a process variable has not changed significantly.
Simple graphical procedures (monitoTing chaTts) are used to emulate hy-
pothesis testing.
Some statistics concepts such as mean, range, and variance, test of hy- The characteristics of a population that follows the Normal distribution
pothesis, and Type I and Type II errors are introduced in Section 2.1. <:re summarized by its mean and variance. Variance can also be inferred
Various univariate SPM techniques are presented in Section 2.2. The crit- from the range of variables for small sample sizes. The convention on
ical assumptions in these techniques include independence and identical summation and representation of mean values is
distribution (i'id) of data. The independence assumption is violated if data
are autocorrelated. Section 2.3 illustrates the pitfalls of using such SPM _ 1m. 1 n m
'/./. - -~ Xi]' •.
mL.. x.. =-~~X'7' (2.2)
techniques with strongly autocorrelated data and outlines SPM techniques mn L..L.. .
.1=1 i=l.i=l
for autocorrelated data. Section 2.4 presents the shortcomings of using
univariate SPM techniques for multivariate data. where n is the number of samples (groups) and m is the number of ob-
servations in a sample (sample size). The subscripts (.' indicate the index
2.1 Statistics Concepts ~sed in averaging. When there is no ambiguity, average values are denoted
m the book using only ;7; and x.
The population and sample statistics for
One or more observations may be made at each sampling instant. The variables that have a Normal distribution are given in Table 2.1.
collection of all observations from a popl1lat'ion at a specific sampling time In chemical processes, often a single measurement of a process or a
is called a sample. Significant variation in process behavior is detected produ~t va~ia~le is made at a sampling instant. The lack of multiple ob-
by monitoring changes in the location (central tendency) by inspecting s~rvatIOns Illm~s the use of classical Shewhart charts (Section 2.2.1). The
the sample mean, median, or mode, and in the sample spTead (scatter) smgle observatIOn at each sampling time and the existence of random mea-
by inspecting the sample range or standard deviation. Process variables surement errors have made SPM techniques based on cumulative sums.
may have different types of probability distributions. However, if a vari- moving averages and moving ranges attractive for performance evaluation.
able is influenced by many inputs having different probability distribu- Often decisions have to be made about populations based all the infor-
tions, then the probability distribution of the process variable approaches mation from a sample. A statistical hypothesis is an assumption or a guess
Normal (Gaussian) distribution asymptotically. The central limit theo- a~out the population. It is expressed as a statement about the parameters
rem justifies the Normality assumption: Consider the independent random of the probability distributions of the populations. Procedures that enable
variables Xl, .12, . " ,;E m with mean P'i and variance (J, , i = 1,'" ,nL If decision making whether to accept or reject a hypothesis are called tests of
y = .1:1 + :r:2 + ... + .1 m then the distribution of hypothe,:es. For example, if the equality of the mean of a variable (p.) to a
m
value a IS to be tested, the hypotheses are:
1
-j""m 2
(y - LIl,) (2.1 ) Null hypothesis: Ho : Jl =a
0i=1 (J., i=l Alternate hypothesis: fh: 11 i= a
approaches N(O,l) as m approaches infinity. Here, lV(O,l) denotes the .!wo kinds of errors may be committed when testing a hypothesis: re-
Normal probability distribution with mean 0 and variance 1. Jectmg a hypothesis when it is true, and accepting a hypothesis when it is
10 Chapter 2. Univariate Statistical Monitoring Techniques 2.2. Univariate SPM Techniques 11

false. The first is called Type I or ex error. It is considered as the producer's techniques this is not possible and other approaches such as computation
risk since the manufacturer thinks that a product with acceptable proper- of average run lengths (Section 2.2.1) are used to estimate ex and ,3 errors.
ties is not acceptable to ship to customers and discards it. The second error
is called Type II or ,3 error. This is the consumer's risk because a .defective
product has not been detected and is sent to the customer. ThIS can be 2.2 Univariate SPM Techniques
summarized as, The SPlVI techniques used for monitoring a single variable include Shew-
hart, cumulative sum (CUSUlVI), moving average (lVIA), and exponentially
Type I (ex) error P{Teject H o I H o is tT'ue} weighted moving average (EvVlVIA) charts. Shewhart charts consider only
(Producer's risk): the current observation in assessing the process performance (Figure 2.2).
CUSUlVI and lVIA charts give an equal weight to all observations that they
Type II UJ) error P{fail to Teject H o I H o is false} use in performance assessment. While CUSUlVI charts consider all mea-
(Consumer's risk):
surements since the beginning of the campaign, lVIA charts use a sliding
In the development of the SPlVI chart, first ex is selected to compute window that discards old measurements. EWlVIA charts use a 'functional
the confidence limit for testing the hypothesis. Then, a test procedure is sliding window' by gradually forgetting past values and emphasizing the
desiO"ned to obtain a small value for if possible. ex is a function of sample information in more recent observations.
size:nd is reduced as sample size increases. Figure 2.1 displays graphically Since in most chemical processes each measurement is made only once at
the ex and 3 errors for a variable that has Normal distribution. In the upper each sampling time (no repeated measurements), all univariate monitoring
plot, the a~ea under the curve to the left of the line denoting the value x a charts will be developed for single observations except for Shewhart charts.
is the ex error. In the lower plot, the mean of x has shifted from ·1:1 to X2·
The area to the right of the line x = a denotes the ,3 error.
Shewhart Chart CUSUM Chart
Critical Value

_ Reject Ho if x < a ---i---


Time
L 11111111111111111 ..
Time
Specified
~ Moving Average Chart EWMA Chart

Sampling
Distribution of x
assuming HI true at
!-l =x2 Time
111111 ..
Time
lUll
Figure 2.2. Schematic representation of univariate SPC charts.

Figure 2.1. Type I (ex) and Type II (;3) errors.


2.2.1 Shewhart Control Charts
The value for ex error can be computed for simple SPC charts such Shewhart charts indicate that a special (assignable) cause of variation is
as Shewhart charts using theoretical derivations. For more complex SPC present when the sample data point plotted is outside the control limits. A
12 Chapter 2. Univariate Statistical Monitoring Techniques 2.2. Univariate SPM Techniques 13

graphical test of hypothes'is is performed by plotting the sample mean, and The assumptions of Shewhart charts are:
the range or standard deviation and comparing them against their control
limits, AShewhart chart is designed by specifying the centerline (C L), the • The distribution of the data is approximately Normal.
upper contmllimit (UCL) and the lower control limit (LCL).
• The sample group sizes are equal.

• All sample groups are weighted equally.


o Individual points
.. Mean • The observations are independent .

If only one observation is available, individual values can be used to de-


o
velop the.1: chart (rather than the.f chart) and the range chart is developed
by using the 'moving range' concept discussed in Subsection 2.2.3.
Q)
:0
co
0 Describing Variation The locat'ion or central tendency of a variable is
.~
0 described by its mean, median or mode. The spread or scatter of a variable
>
CD is described by its range or standard deviation. For small sample sizes
0
CD (n < 6, n = number of observations in a sampling time), the range chart
0
or the standard deviation chart can be used. For larger sample sizes, the
efficiency of computing the variance from the range is reduced drastically.
0
Hence, the standard deviation charts should be used when n > 10.
Selection of Control Limits Three parameters affect the control limit
selection:

'I. the estimate of average level of the variable,


Figure 2.3. A dot diagram of individual observations of a variable.
'l'l. the variable spread expressed as range or standard deviation, and
Two Shewhart charts (sample mean and standard deviation or the m. a constant based on the probability of Type I error, a.
range) are plotted simultaneously. Sample means are inspected to assess
between samples variation (process variability over time) by plotting the The '3iT' (iT denoting the standard deviation of the variable) control lim-
Shewhart mean chart chart, :i: represents the average (mean) of :r:). How- its are the most popular control limits. The constant 3 yields a Type I
ever. one has to make sure that there is no significant change in within sam- error probability of 0.00135 on each side (a = 0.0027). The control limits
ple variation which may give an erroneous impression of changes in between expressed as a function of population standard deviation iT are:
samples variation. The mean values at times k 2 and k - 1 in Figure 2.3
look similar but within sample variation at time k - 1 is significantly differ- UCL = Target + 3iT, LCL = Target - 3iT (2.3)
ent than that of the sample at time k: - 2. Hence, it is misleading to state
that between sample variation is negligible and the process level is constan- The ,1; chart considers only the current data value in assessing the status
t. ~Within sample variations of samples at times k: 2 and k are similar, of the process. Run rules have been developed to include historical infor-
consequently, the difference in variation between samples is meaningful. mation such as trends in data. The run rules sensitize the chart, but they
The Range chart (R chart), or the standard deviation chart, is used also increase the false alarm probability. The warning limits are useful in
(8 chart) to monitor with-in sa:rnple process variation or sp~'ead developing additional run rules in order to increase the sensitivity of Shew-
variability at a given time). The process spread must be m-control for hart charts. The warning limits are established at '2-sigma' level, which
proper interpretation of the :i: chart. The;1; chart must be used together corresponds to a/2=0.02275. Hence,
with a spread chart. UHl L = Target + 2iT LW L Target - 2iT
14 Chapter 2. Univariate Statistical Monitoring Techniques 2.2. Univariate SPM Techniques 15

If r run rules are used simultaneously and rule i has a Type I error The standard deviation of R is estimated by using the standard devia-
probability of O'i, the overall Type I error probability O'total is tion of RlrJ, d3 :
r
(2.8)
O'total = 1- II (1 - O'i) (2.5)
i=l The control limits of the R chart are
If 3 rules are used simultaneously and O'i = 0.05, then 0' = 0.143. For
UCL, LCL =
- R
R ± 3d3 - (2.9)
O'i = 0.01, one would have 0' = 0.0297. d2
Run rules, also known as 'Western Electric Rules [323], enable decision Defining
making based on trends in data. A process is declared out-of-control if any d
run rules are met. Some of the run rules are: D3 = 1- 3-3 and (2.10)
d2
• One point outside the control limits. the control limits of the R chart become

• Two of three consecutive points outside the 2rJ warning limits but and LCL = RD 3 . (2.11 )
still inside the control limits. D 4 and D 3 for various values of m are given in Table 2.2.
• Four of five consecutive points outside the 1rJ limits. The x chart
The estimator for the mean process level (centerline) is X. Since the
• Eight consecutive points on one side of the centerline.
estimate of the standard deviation of the mean pTOcess level rJ is RI d 2 ,
• Eight consecutive points forming a rv,n up or a r'un down.
(2.12)
• A nonrandom or unusual pattern in the data.
Patterns in data could be any systematic behavior such as shifts in process The control limits for an x chart based on Rare
level, cyclic (periodic) behavior, stratification (points clustering around the
centerline), trends or drifts. UCL,LCL (2.13)
The Mean and Range Charts and the values for A 2 are listed in Table 2.2.
Development of the x and R charts starts with the R chart. Since the
control limits of the x chart depends on process variability, its limits are The Mean and Standard Deviation Charts
not meaningful before R is in-control. The S chart is preferable for monitoring variation when the sample size
The Range Chart is large or varying from sample to sample. Although S2 is an unbiased es-
Range is the difference between the maximum and minimum observa- timate of rJ 2 , the sample standard deviation S is not an unbiased estimator
tions in a sample. If there are n samples of size m, then of rJ. For a variable with a Normal distribution, S estimates C4rJ, where C4
is a parameter that depends on the sample size m. The standard deviation
of Sis rJV1 - d. \Vhen rJ is to be estimated from past data of n samples,
(2.6)
1 n

The random variable RI rJ is called the relative mnge. The parameters of


S=;:;:2: Si (2.14)
i=l
its distribution depend on sample size m, with the mean being d 2 (Table
and S/ C4 is an unbiased estimator of rJ. The exact values for C4 are given
2.2). For example, d2 1.683 for m = 3. An estimate of rJ (the estimates
in Table 2.2. An approximate relation based on sample size m is
are denoted by a hat ~) can be computed from the range data by using
4(m - 1)
(2.7) C4 C::' (2.15)
4m-3
16 Chapter 2. Univariate Statistical Monitoring Techniques 2.2. Univariate SPM Techniques 17

Defining the constants


Table 2.2. Control chart constants for various values of group size m.

X and R Charts X and S Charts


B3 = 1- !!....V1
C4
- c~ and B4 = 1+ C43 VI - c~ (2.17)
Chart for Chart for
the limits of the 5 chart are expressed as
Averages Averages Chart for
(X) Chart for Range (R) (X) Standard Deviation (S) and (2.18)
Group Control Standard Control Control Standard Control
Size Limits Deviation Limits Limits Deviation Limits The values for B 3 and B 4 are listed in Table 2.2.
m d2 D4 C4 B~,
The :r Chart
2 1.880 1.128 3.267 2.659 0.7979 3.267 When fJ 8 I C4, the control limits for the x chart are
3 1.023 1.693 2.574 1.954 0.8862 2.568
4 0.729 2.059 2.282 1.628 0.921:) 2.266
_ 3-
UCL, LCL =.7: ± . ;;:;;5 (2.19)
5 0.577 2.326 2.114 1.427 0.9400 2.089 C4ym
6 0.483 2.534 2.004 1.287 0.9515 0.030 1.970
7 0.419 2.704 0.076 1.924 1.182 0.9594 0.118 1.882 Defining the constant A 3 = , the limits of the :r chart become
8 0.373 2.847 0.136 1.864 1.099 0.9650 0.185 1.815
9 0.:3:37 2.970 0.184 1.816 1.032 0.9693 0.239 1.761
UCL=x+A 3 8 and LCL=x-A:38 (2.20)
10 0.308 3.078 0.22:3 1.777 0.975 0.9727 0.284 1.716
11 0.285 :3.17:3 0.256 1.744 0.927 0.9754 0.321 1.679 with the values of A:3 given in Table 2.2.
12 0.266 3.258 0.283 1.717 0.886 0.9776 0.354 1.646
13 0.249 :3.336 0.307 1.69.3 0.850 0.9794 0.382 1.618 A verage Run Length
14 0.23.5 3.407 0.:328 1.672 0.817 0.9810 0.406 1..594
The avemge run length (ARL) is the average number of samples (or
15 0.223 3.472 0.347 1.6.5:3 0.789 0.982:3 0.428 1..572
16 0.212 3..5:32 0.363 1.6:37 0.763 0.983.5 0.448 1..552 sample averages) plotted in order to get an indication that the process is
17 0.203 :).588 0.:378 1.622 0.739 0.984.5 0.466 1.53·:1 out-of-control. ARL can be used to compare the efficacy of various SPC
18 0.194 3.640 0.391 1.608 0.718 0.9854 0.482 1..518 charts and methods. ARL(O) is the in-contml ARL, i.e. the ARL to gen-
19 0.187 :3.689 0.40:3 1.597 0.698 0.9862 0.497 1..50:3
erate an out-of-control signal even though in reality the process remains
20 0.180 3.735 0.415 1.58.5 0.680 0.9869 0..510 1.490
21 0.17:3 :).778 0.425 1..57.5 0.663 0.9876 0 ..52:3 1.477 in-control. The ARL to detect a shift in the mean of magnitude c"c:r is
22 0.167 :3.819 0.4:34 1..566 0.647 0.9882 0..5:34 1.466 represented by ARL(c,,) where c" is a constant and c:r is the standard devi-
23 0.162 3.8.58 0.443 1..557 0.633 0.9887 0.545 1.455 ation of the variable. A good chart must have a high ARL(O) (for example,
24 0.157 3.895 0.451 1.548 0.619 0.9892 0.555 1.445
1.541 0.606 0.9896 0.565 1.435
ARL(O) =400 indicates that there is one false alarm on the average out of
2.5 0.153 :3.931 0.459
400 successive samples plotted) and a low ARL(c,,) (bad news is displayed
as soon as possible).
UCLX,LCL x = X ±A2R UCLX,LCLx =X±A,5
For a Shewhart chart, the ARL is calculated from
UCLR = D4R UCLs = B45
LCLR = D:3R LCLs = B~l5 ARL E[R] = ~ (2.21)
P
&=R/d2 & = 5/C4
where p is the probability that a sample exceeds the control limits, R is the
D3 1 - 3d 3 1d2 D4 = 1 + 3d3 1d 2 run length and E[·] denotes the expected value. For an x chart with 3c:r
limits, the probability that a point will be outside the control limits even
though the process is in control is p = 0.0027. Consequently, the ARL(O) is
The S Chart ARL = lip = 1/0.0027 370. For other types of charts such as CUSUIvI,
The control limits of the 5 chart are it is difficult or impossible to derive ARL(O) values based on theoretical
arguments. Instead, the magnitude of the level change to be detected is
~
UCL,LCL=5±3- 1-c;;.
- ,8R C4
(2.16)
selected and Monte Carlo simulations are carried out to compute the run
lengths, their averages and variances.
18 Chapter 2. Univariate Statistical Monitoring Techniques 2.2. Univariate SPM Techniques 19

2.2.2 Cumulative Sum (CUSUM) Charts Given the a and (3 probabilities, the size of the shift in the mean to be
detected (6), and the standard deviation of the average value of the variable
The cumulative sum (CUSUM) chart incorporates all the information in a x (a r ), the parameters in Eq. 2.25 are:
data sequence to highlight changes in the process average level. The values
to be plotted on the chart are computed by subtracting the overall mean
IIO from the data and then accumulating the differences. The quantity
c5 = ~ and d= (~) In C: (3) (2.26)

A two-sided CUSUM chart can be generated by running two one-sided


Si = L(")':j - flo) (2.22) CUSUM charts simultaneously with the upper and lower reference values.
j=l The recursive formulae for h'igh and low side shifts that include resetting to
zero are
is plotted against the sample number i. CUSUM charts are more effective
than Shewhart charts in detecting small process shifts, since they combine max [O,Xi - (fIO + K) + SH(i -1)]
information from several samples. \iVhen several observations are available max [0, (fIo K) - Xi + SL(i - 1)] (2.27)
at each sampling time (sample size m > 1, the observation :Ej is replaced
by the sample average at time j, . The CUSUM values can be computed respectively. The starting values are usually set to zero, SH(O) = S£(O) =
recur-sively 0. When SH(i) or SL(i) exceeds the decision inter-val H, the process is
out-of-control. ARL-based methods are usually utilized to find the chart
(2.23)
parameter values Hand K. The rule of thumb for ARL(6) for detecting
If the process is in-control at the target value fIo, the CUSUM Si should °
a shift of magnitude 6 in the mean when 6 oF and 6 > K is
meander randomly in the vicinity of 0. If the process mean is shifted, an H
upward or downward trend will develop in the plot. Visual inspection of ARL(6) = 1 + -1\-- (2.28)
u-K
changes of slope indicates the sample number (and consequently the time)
of the process shift. Even when the mean is on target, the CUSU:t\iI Si may 2.2.3 Moving Average Monitoring Charts for Individ-
wander far from the zero line and give the appearance of a signal of change
in the mean. Control limits in the form of a V-mask were employed when
ual Measurements
CUSUM charts were first proposed in order to decide that a statistically Moving avemge (MA) charts are developed by selecting a data window
significant change in slope has occurred and the trend of the CUSUM plot length (I) that includes the consecutive samples used for computing the
is different than that of a random walk. CUSUM plots generated by a moving average. A new sample value is reported, the data window is moved
computer became more popular in recent years and the V-mask has been by one sampling time increment, deleting the oldest data and including the
replaced by upper and lower confidence limits of one-sided CUSUM charts. most recent one. In MA charts, averages of the consecutive data groups
One-sided CUSUM charts are developed by plotting of size I are plotted. The control limit computations are based on aver-
ages and standard deviation values computed from moving ranges. Since
each MA point has (I - 1) common data points, the successive MAs are
Si = L[Xj (fIo + K)] (2.24) highly autocorrelated (autocorrelation is presented in Section 2.3). This au-
j=] tocorrelation is ignored in the usual eonstruction of these charts. The MA
control charts should not be used with strongly autocorrelated data. The
where K is the r-eference value to detect an increase in the mean level. If
MA charts detect small drifts efficiently (better than X chart) and they
Si becomes negative for fIl > fIo, it is reset to zero. \Vhen Si exceeds the
can be used when the original data do not have Normal distribution. The
decision interval H, a statistically significant increase in the mean level is
disadvantages of the MA charts are slow response to sudden shifts in level
declared. Values for K and H can be computed from the relations:
and the generation of autocorrelation in computed values.
6 Three approaches can be used for estimating S for individual measure-
K H= d6 (2.25) ments:
2 2
20 Chapter 2. Univariate Statistical Monitoring Techniques 2.2. Univariate SPM Techniques 21

1. If a rational blocking of data exists, compute an estimate of 5 based 1. Compute the moving average M A(k) of span I at time k as
on it. It is advisable to compare this estimate with the estimates
obtained by using the other methods to check for discrepancies. x(k) + x(k - 1) + ... + x(k 1+1)
(2.31)
1\!1A(k) = I
2. The overall 5 estimate. Use all the data together to calculate an
overall standard deviation. This estimate of 5 will be inflated by the 2. Compute the variance of M A(k:)
between-sample variation. Thus, it is an upper bound for S. If there
are changes in process level, compute 5 for each segment separately, 1 k cr 2
then combine them by using V(1\!1A(k:)) = r I: V(Xi) = T (2.32)
'i=k-l+l

I:~~l (mi 1)5; Hence, cr = 8/ C4 VI or cr = 1\!1R/ d2 , using 1\11R for R. The values for
Sw (2.29)
I:~l (mi - 1) the parameters C4 and d 2 are listed in Table 2.2.

where h is the number of segments with different process levels and 3. Compute the control limits with the centerline at x:
mi is the number of observations in each sample.
38
3. Estimation of 5 by moving Tanges of I s'uccessive data po'ints. Use UCL,LCL = x± ,11 or (2.33)
c4vl
differences of successive observations as if they were ranges of n ob-
servations. A plot of 5 for group size I versus I will indicate if there is In general, the span I and the magnitude of the shift to be detected are
between-sample variation. If the plot is flat, the between-sample vari- inversely related.
ation is insignificant. This approach should not be used if there is a
trend in data. If there are missing observations, all groups containing Spread Monitoring by Moving Range Charts
them should be excluded from computations. In a moving range chart, the range of two consecutive sample groups of
size I are computed and plotted. For I 2': 2,
The procedure for estimating 5 by moving ranges is:
1. Calculate moving ranges of size I, I 2,3, ... , using 25 to 100 obser-
A1R(k) max(x;) - min(xi) I, i = (k 1+ 1), k (2.34)
vations. The computation procedure is:
1\!1R(k) =1 ma.x(x;) - m'in(xi) 1, i = (k -I + 1), k (2.30) 1. Select the range size l. Often I = 2.

2. Calculate the mean of the ranges for each l. 2. Obtain estimates of ]vI Rand cr = ]'vI R/d2 by using the moving ranges
M R( k) of length I. For a total of n samples:
3. Divide the result of Step 2 by d2 (Table 2,2) (for each I).
1 71-1+1
4. Tabulate and plot results for all I. ' " l\IR(k:) (2.35)
n 1+1 L......J
k=1
Process Level Monitoring by Moving Average (MA) Charts
In a moving average chart, the averages of consecutive groups of size I 3. Compute the control limits with the centerline at 1'\11 R:
are computed and plotted. The control limit computations are based on
these averages. Several original data points at the start and end of the (2.36)
chart are excluded, since there are not enough data to compute the moving
average at these times. The procedure for developing the MA chart consists The values for the parameters D 3 and D 4 are listed in Table 2.2 and
of the following steps: cr R = d:d'l/ d 2 , and d 2 and d: 3 depend on I.
22 Chapter 2. Univariate Statistical Monitoring Techniques 2.3. Monitoring Tools for Autocorreleated Data 23

2.2.4 Exponentially Weighted Moving Average Chart each other. Characteristics of process disturbances in continuous processes
The exponentially weighted moving average (EWMA) z(k) is defined as include:
• Changes in level -, typical forms of disturbance trajectories include
z(k) = wx(k) + (1 - w)z(k 1) (2.37) step changes and exponential (overdamped) variations usually ob-
served, for example, in feed composition, temperature or impurity
where 0 < w :S; 1 is a constant weight, .1:(k) is the sample at time k, and the
levels,
starting value at k = 1 is z(O) = x. EWlVIA attaches a higher weight to more
recent data and has a fading memory where old data are discarded from • Drifts, ramps, or meandering trajectories that describe catalyst deac-
the average. Since the E\VMA is a weighted average of several consecutive tivation, fouling of heat transfer surfaces,
observations, it is insensitive to nonnormality in the distribution of the
• Random variations such as erratic pump or control valve behavior.
data. It is ~ very useful chart for plotting individual observations (m = 1).
If x(k) are mdependent random variables with variance a 2 , the variance of The process mean Ii( k) at time k varies over time with respect to the
z(k) is target or nominal value for the mean T:
a;(k) a2(2~1JJ[I-(I-w)2k] (2.38) fJ(k) - T = ¢(IJ(k - 1) - T) + e(k) + 60(k)f (2.40)
The last term in brackets in Eq. 2.38 quickly approaches 1 as k increases and where e( k) is an iid random variation in the mean, 'driving force' for random
the variance reaches a limiting value. Often the asymptotic expression for disturbances, ¢ is an autoregressive parameter -1 :S; ¢ :S; 1, and f is the
the variance is used for computing the control limits. The weight constant magnitude of an abrupt (step) or sustained incremental (ramp) level change
~' deter~ines the memory of EWlVIA, the rate of decay of past sample in the variable. The serial correlation is mathematically described by the
mformatlOn. For w = 1, the chart becomes a Shewhart chart. As w ......;. 0, autoregressive term d;.
EWlVIA approaches CUSUM. A good value for most cases is in the rarwe b
The strength of correlation dies out as the number of sampling intervals
0.2 :S; w :S; 0.3. A more appropriate value of w for a specific application can between observations increases. In other words, as the sampling interval
be computed by considering the ARL for detecting a specific magnitude increases, the correlation between successive samples decreases. In some
of level shift or by searching 11) which minimizes the prediction e;ror for industrial monitoring systems, a large sampling interval is selected in order
a historical data set by an iterative least squares procedure. 50 or more to reduce correlation. The penalty for this mode of operation is loss of
observations should be utilized in such procedures. E\VlVIA is also known infoTmahon about the dynamic behavior of the process. Such policies for
as geometric moving average, exponential smoothing, or first-order filter circumventing the effects of autocorrelation in data should be avoided.
(Section 6.2.1). .
Statistics for Correlated Data
Upper and the lower control limits for an E\VMA chart are calculated as The correlation between observations made at different times (auto-
correlation) is described mathematically by computing the autocoTTelation
UCL(k) fJO+ 3az(k) function, the degree of correlation between observations made k time units
CL 1)0 (2.39) apart (k = 1, 2, ... ). The correlation coefficient is a measure of the lineaT
LCL(k) fJo - 3a z (k) association between two variables. It does not describe a cause-and-effect
relation. The autocorrelation depends on sampling interval. Most statis-
tical and mathematical software packages include routines for computing
2.3 Monitoring Tools for Autocorreleated correlation and autocorrelation.
Data The sample cOTTelation function between two variables :.r and y is de-
noted by Tx,y and it is equal to:
\Vhenever there are ineTt'ial elements (capacity) in a process such as storage
~~=l (x(k) - x)(y(k) - y)
tanks, reactors or separation columns, the observations from such processes (2.41)
exhibit serial correlation over time. Successive observations are related to
24 Chapter 2. Univariate Statistical Monitoring Techniques 2.3. Monitoring Tools for Autocorreleated Data 25

where :Y: and :0 are the sample means for x and y, respectively.
If the variable y is variable x shifted by l sampling times, the correlation Table 2.3. ARL for detecting a fault of magnitude S by CUSUM and EWMA
between time shifted values of the same variable are described by charts for two levels of 9.

(2.42) CUSUM EWMA


S 0" EI c/J = 0.25 9 = 0.75 9 = 0.25 9 = 0.75
0 0.9 383 188 355 186
Since the time series of only one variable is involved and the time lag l
0 0.1 130 35 136 36
between the two time series is the parameter that changes, the autocorre-
0.5 0.9 37 37 37 37
lation coefficient is represented by rl. The upper limit of the summation in
0.5 0.1 32 27 31 27
the denominator varies with l. In order to have an equal number of data
1 0.9 11 16 10.7 14
in both series, n - l values are used in the summation.
1 0.1 11 16 4.3 15.6
The plot of autocorrelation T{ versus lag l is called autocorrelation func-
2 0.9 4.4 7 4.2 6.7
tion or correlogmm. Usually the autocorrelation for l = n/5 lags are com-
2 0.1 4.6 8 4.2 7.5
puted. Confidence intervals on individual sample autocorrelations can be
computed for hypothesis testing: The approximate 95 o/c: confidence interval
for an individual rt based on the assumption that all sample autocorrela-
tions are equal to zero is ±2/ Vii.
Simulation studies were conducted to determine the effect of low and high
A simple procedure is used for determining the number of lags l with
values of 9 and low and high values of O"E on the ARL of CUSUM and
non-zero autocorrelation:
EWMA charts [104]. The chart parameters were CUSUM: K = 0.5 and
• Compute the first l = n/5 autocorrelations. H = 5: and EWMA: w 0.18 and UCL = 2.90".,. The ARLs for a step
changein the mean introduced at k:o = 1 with a magnitude of t:. = (1-9)t:. *
• Compute the confidence interval ±2/ Vii (hence, the ultimate mean shift magnitude is t:. *) were tabulated.
A subset of the ARL results from this study listed in Table 2.3 indicate
• Check if any autocorrelation coefficient is outside the confidence lim-
that the in-control ARL are very sensitive to the presence of autocorrela-
it. Visual inspection of the plot of the autocorrelation function and
tion. but the detection capabilities of CUSUM and EWMA for true shifts
numerical comparison of the autocorrelation coefficients with the con-
are not significantly affected. In the absence of autocorrelation, the ARL(O)
fidence limits are the popular methods for the assessment of autocor-
relation in data. for CUSUM is 465 and that for EWMA is 452. The ARL(O) for low levels
of autocorrelation (9 0.25) are 383 and 355, respectively, and they drop
In general, the magnitude of T[ decreases as l increases. drastically to 188 and 186 for high levels of autocorrelation (9 = 0.75),
increasing the false alarm rates by a factor of 2.5.
Effects of Autocorrelation on SPC Methods The effects of autocorrelation on monitoring charts have also been re-
A process described by ported by other researchers for Shewhart [186] and CUSUM [343, 6] charts.
Modification of the control limits of monitoring charts by assuming that
1;( k:) M(k:) + £1 (k:) the process can be represented by an autoregressive time series model
M(k:)-T 9(P,(k: - 1) - T) + £2(k:) + S(k:o)t:. (2.43) Section 4.4 for terminology) of order 1 or 2, and use of recursive Kalman
filter techniques for eliminating autocorrelation from process data have also
where E1 (k:) and E2 (k:) are iid random variables, 9 is the autoregressive been proposed
parameter with -1 :s: 9 :s: 1, p.( k:) is the process mean at time k:, and T is [66].
the target or nominal value for the mean. Here, E1(k:) denotes the inherent Two alternative methods for monitoring processes with autocorrelated
variability in the process due to causes such as measurement errors and
data are discussed in the following sections. One method relies on the
E2(k:) the random variation in the mean, 'driving force' for the disturbances. existence of a process model that can predict the observations and computes
26 Chapter 2. Univariate Statistical Monitoring Techniques 2.3. Monitoring Tools for Autocorreleated Data 27

the residuals between the predicted and computed values at each sampling the EvVMA predictor would provide a good one-step-ahead forecast. If the
time. As described in Section 2.3.1, it assumes that the residuals will have a EvVMA model is a good predictor, then the sequence of prediction errors
Normal distribution with zero mean and consequently regular SPM charts e( k) should be uncorrelated.
could be used on the residuals to monitor process behavior. The second Considering the fact that e(k) indicates only the degree of disparity
method uses a process model as well, but here the model is updated at each between observations collected and their model predictions, the residual-
sampling time using the latest observations. As outlined in Section 2.3.2, s charts may not be reliable for signaling significant variations in process
it is assumed that model parameters will not change significantly while mean. Plots of residuals are good in detecting upsets such as events that
there are no drastic changes in the process. Hence, SPM is implemented affect observations directly (for example sampling and measurement er-
by monitoring the changes in the parameters of this recursive model. rors). They may perform poorly in detecting shifts in the mean, especially
when the correlation is high and positive. One alternative is to develop a
Shewhart chart for the EWMA prediction errors and use it along with a
2.3.1 Monitoring with Charts of Residuals Shewhart chart of the original data. This way, the chart of the original
Autocorrelation in data affects the accuracy of the charts developed based observations gives a clearer picture of process dynamics (the process is out-
on the iid assumption. One way to reduce the impact of autocorrelation of-control if the confidence interval excludes the target), while the residuals
is to estimate the value of the observation from a model and compute chart displays process information after accounting for autocorrelation in
the error between the measured and estimated values. The errors, also data (the residuals may remain small if the model continues to describe the
called res'iduals, are assumed to have a Normal distribution with zero mean. process behavior accurately).
Consequently regular SPM charts such as Shewhart or CUSUM charts could
be used on the residuals to monitor process behavior. This method relies 2.3.2 Monitoring with Detection of Changes in Model
on the existence of a process model that can predict the observations at Parameters
each sampling time. Various techniques for empirical model development
are presented in Chapter 4. The most popular modeling technique for An alternative SPM framework for autocorrelated data is developed by
SPM has been time series models [1, 202] outlined in Section 4.4, because monitoring variations in time series model parameters that are updated at
they have been used extensively in the statistics community, but in reality each new measurement instant. Parameter change detection with recursive
any dynamic model could be used to estimate the observations. If a good weighted least squares was used to detect changes in the parameters and
process model is available, the prediction errors (residual) e( k) = y(k) - y( k) the order of a time series model that describes stock prices in financial
can be used to monitor the process status. If the model provides accurate markets [263]. Here, the recursive least squares is extended with adaptive
predictions, the residuals have a Normal distribution and are independently forgetting.
distributed with mean zero and constant variance (equal to the prediction Consider an autocorrelated process described by an autoregressive mod-
error variance). el AR(p),
Conventional Shewhart, CUSUM, and EWMA SPM charts can be devel-
(2.44)
oped for the residuals [1, 202, 173, 259]. Data points that are out-of~control
or unusual patterns on such charts indicate that the model does not rep-
resent the process any more. Often this implies that the original variable
where E(k) is an uncorrelated zero-mean Gaussian process with variance 0';
and is the constant term (bias) parameter. The parameter change
x(k) is out-of-control. However, the model may continue to represent the
detection (PCD) method monitors the magnitude of changes in model pa-
process when x(k) is out-of-control. In this case, the residuals chart does
rameters ¢( k) and signals an out-of-control status when the changes are
not signal this behavior.
greater than a specified threshold value. The estimate ¢p+l (k) for a gener-
To reduce the burden of model development, use of EvVMA equations
al AR(p) model contains the process variable level Ih implicitly as
have been proposed as a forecasting model [202]. The accuracy of predic-
tions will depend on the representation capability of the EWMA model
for a specific process [70, 176, 261]. If the observations from a process are (2.45)
positively correlated and the process mean does not drift too quickly, then
28 Chapter 2. Univariate Statistical Monitoring Techniques 2.3. Monitoring Tools for Autocorreleated Data 29

The first step in the PCD monitoring scheme is to establish the null hv- k 2: n. The sequential change detection algorithm is based on
pothesis H a· An AR model is developed from calibmtion data. The moci~l
n
information includes the model parameter vector ¢0 (n), the inverse covari- P (I ¢t(k) 1,1 (k + 1) I,'" ,I ¢t(k + n c ) 1> rVPoii(n)o-E) :So 0.5 ,
ance matrix Po (n), and the noise (disturbance) variance 0-;. Based on this (2.50)
information, the mean and variance of the model parameters are comput- where (k) = q)i(k) - (n), i = 1,'" P + 1, k > n and POi., (n) represents
ed. The test against the alternate hypothesis involves updating of model the ith diagonal of the inverse covariance matrix P o(n). The design param-
parameters recursively at each measurement instant through recursive vari- eters n c and r depend on the AR parameters: The parameter ~( is a positive
able weighted least squares (RVWLS) with adaptive forgetting filter (Eqs. valued threshold that is adjusted to reduce false alarms. The parameter n c
2.46 - 2.49) as new measurement information becomes available. RVWLS represents the length of a run necessary for declaring the process to be out-
with adaptive forgetting algorithm is summarized next. of-control. The stopping time for the sequential detection is the time when
For the AR(p) model Eq. 2.44, the (p + 1) x 1 column vector x(k) is n c successive parameter estimates are outside the limits in either positive
defined as x(k) = [y(k - 1) y(k 2). y(k - p) where [y denotes or negative direction. The most common value for the run length n c is 7.
the transpose. RVvVLS with adaptive forgetting is given by Eqs. 2.46 Once a change is detected, estimation is performed by reducing the value
2.49: of the forgetting factor A(k) to a small value Ao at that time step and then
setting A = 1 until the filter is converged. Updated parameter estimates
" P(k - l)x(k) A
are utilized to distinguish between a level and a structure change in the
¢ (k) = ¢ (k - 1) + A-;-(k:-'----::-0
7:(
1 -x-(;-:-k)=T'=p---:C(k:-'-"--------:1)-x:-"(k--'))
):-"+-' --'--. E( k) (2.46)
underlying AR model. Proper values must be selected for n, }\la, A o , n c ,
and r to design the SPM charts. The RL distribv.tion under change and
P(k) = ~1~ [P(k _ 1) _ P(k - l)x(k)x(k)Tp(k - 1) ] no change conditions are used for assessing the performance of the SPM
(2.47)
A(k 1) (A(k - 1) + x(k)TP(k - l)x(k)) schemes and selecting the values of the PCD method parameters.
The filter is initialized using the null hypothesis. Change detection is
A(k) = [1 x(k)TK(k)] (2.48) done by using the stopping rule suggested by Eq. 2.50. Two indicators
are utilized to summarize the conclusions reached by the detection phase.
P(k - l)x(k) One indicator signals if a change is detected in model parameters and if so
K (k) - -;-::------;-c:'-c-=~--'---'-:_ __,_ (2.49)
. - (A + x(k)TP(k - l)x(k)) which parameter has changed. The second indicator signals the direction of
change (positive or negative). Determining the values of the two indicators
The unit delay of the forgetting factor A in Eqs. 2.46 - 2.49 is necessary to
concludes the detection phase of the PCD method.
avoid a solution of a quadratic equation at each time step for A(k). This
If the alternate hypothesis is accepted at the detection phase, estimation
improves the steady-state performance of the filter and allows tracking when
of change by PCD method is initiated by reducing the forgetting factor to
model parameters are changing. A A value close to 1 averages out the effect
a small value at the detection instant. This will cause the filter to converge
of e( k) while a A close to 0 tracks more quickly parameter variation in tirne.
quickly to the new values of model parameters. Shewhart charts for each
The steady-state performance of the RVvVLS when the parameters are not
model parameter are used for observing the new identified values of the
time-varying deteriorates due to the estimation noise, if the value of A is
model parameters. At this point the out-of-control decision made at the
kept away from unity. A good compromise for A is when 0.95 < A < 1.0.
detection phase can be reassessed. If the identified values of the parameters
which is not suitable to track fast changes in the parameters. Ther~fore.
are inside the range defined by the null hypothesis, then the detection
a scheme is needed to make A small when the parameters are varying and
make it close to 1 at other times. decision can be reversed and the alarm is declared false.
The discr"imination phase of the method runs in parallel with the es-
Detection, Estimation and Discrimination timation phase. It tries to find out whether the change experienced is in
Assume that n observations are available to form the calibmtion data set. the autoregressive parameters or in the constant term (level) of the auto-
The parameter estimates (n) and the variance estimate of the noise 0-; correlated process variable. The parameter estimates from the estimation
process e(k) are computed. Under the null hypothesis H a, the distribution phase are used to estimate the level parameter :9k (Eq. 2.45). If the al-
of the parameter estimates after time n becomes ¢(k) c-v lY( c3 0 (n), P o(n )0-;), ternate hypothesis is accepted, the change experienced involves variation
2.3. Monitoring Tools for Autocorreleated Data 31
30 Chapter 2. Univariate Statistical Monitoring Techniques

in the process mean. If the null hypothesis is accepted, then the change
experienced does not involve the level of the process variable. If the null
hypothesis is accepted, and a subset of the AR model parameters except
the constant term parameter show signs of change, it is deduced that the
AR process exhibits only a structure change. If the alternate hypothesis is
accepted and a subset of the identified AR parameters (including the con-
stant term parameter) are out-of-control, then a combined structure and
level change is experienced.
Example The PCD method is used for monitoring a laboratory-scale
spray dryer operation where fine aluminum oxide powder is produced by
drying dilute solutions of water and aluminum oxide. On-line particle size
and velocity, inlet hot air and exhaust air temperatures were measured.
The SPM scheme based on on-line temperature measurements checks if
the process is operating under the selected settings, and producing the l00·l-----:::'=-O----7:.oO:---s
0 2 o ---c',ooOO::--c-!:,200:----:'-7::
=o----:.o7 40,-----;;,so
Time (Seel
desired particle size distribution [213]. AR(3) models are used for both
temperatures. The exhaust air temperature is modeled by
Figure 2.4. Process data (circles) and model predictions (solid line) for the
T(k) = 0.5885T(k - 1) + 0.2385T(k 2) +.. (2.51) exit air temperature from the spray dryer.
+ 0.1595T(k 3) + 1.5384 + e(k)
20r----------
with the standard deviation of e( k) equal to 0.4414 for the in-control data w

used in developing the model (Hypothesis H o) and 0.4915 for the data with (3'" 10
.~ i-------.,..,......:,...-....,.-
the slurry pump speed disturbance (Figure 2.4). a: 0 . . '- ) "t .. " 'i'/1": ~
I '- ~"J/
Figure 2.4 shows new process data where the slurry pump speed was ::
'"
\ If.,.
j ,
~ .10
deliberately increased to 150% of its original value at the end of 90 .sec. I, ..! 'I ,'.

while keeping all remaining process variables at their desired settings. Due "
a:
..40L....--------.............J ..20 L....---:- - - -O~O-.--lS-0.....J
to the increased load for evaporation, the exit temperature of the air drops o so 100 lS0 0 S0 1
Time [Secl Time ISec}
below its desired level. Figure 2.4 also illustrates how well the AR(3) mod-
els generated under H o perform in predicting the responses, despite the
slurry pump speed disturbance. Good prediction is expected, since the AR Figure 2.5. CUSUM monitoring charts of exit temperature residuals: (a)
model has a root at 0.99 for the exit temperature, acting as integrator. The Level (mean), (b) Spread.
residual Shewhart charts for level and spread obtained from H o (the AR(3)
model) perform poorly. Residual CUSUM charts signal out-of-control sta-
tus for level and spread (Figure 2.5). The level residual CUSUM (Figure direction at 96 .sec and then detects a negative shift at 102 .sec. However,
2.5a) first signals a positive deviation (false alarm). the behavior of the constant parameter in Figure 2.7 clearly indicates a bias
The performance of the PCD method is displayed with Shewhart charts shift in the negative direction.
of parameters for the same disturbance (Figure 2.6) with solid lines describ- To diagnose the kind of disturbance(s) experienced by the exit and inlet
ing the 95% control limits and the dashed lines describing the symmetric temperatures, the charts based on the implicit levels are depicted in Figure
PCD scheme detection thresholds. The first AR parameter of the exit tem- 2.7. The implicit level points calculated are shown by circles. ·While the
perature model ((PI) is diagnosed as changing in the positive direction by level parameter remains essentially the same for the inlet temperature (not
the PCD method at 111.5 .sec (Figure 2.6, top left). The level residual shown), the implicit level of the exit temperature changes drastically after
CUSUM (Figure 2.5a) first detects an out-of-control status in the positive 102 sec. As a result, only a structure change is detected for the inlet
32 Chapter 2. Univariate Statistical Monitoring Techniques 2.4. Limitations of Univariate Monitoring Techniques 33

0.7
0.35
N
:; 0.65, ----------:-1
i=' r 1_-
~ 0 . 3 f - - - - - - - -_ _---1
~ 0.6 '
~0.25
'" 0.551---
0;
~ ..........,
'"
0;
~ 0.21--- --1 105
:;;
0.. 0.5 8:0.15 1000'---2:':-O--4:':-O-----:6:':-O-----:8:':-0-----:10'-::O-~12'-::O'---:-14'-::O-~160
Time [Sec)
o 50 100 150 a 50 100 150
Time [Sec] TIme [Sec]
Figure 2.7. Diagnostic chart of dryer air exit temperature based on the
implicit level parameter.
~0.25 4
'? 3f--- --I
~ 0.21-----------..........,
'0 _
&i 2 high frequencies and enabled a data-rich plant environment. This change
~ 0.15 ---:.,-:...-:.,-..:.:::: ~ 1 .:::-=-=-=-:::: - - - - - ..l,= :::::: :::
E (.) accentuated the limitations of univariate SPM techniques for monitoring
~ 0.1 o multivariable processes. The critical limitation is the exclusion of the cor-
0..
0.05'- .........J ·1 relation among various variables from the quantitative information provided
a 50 100 150 o'-----------'
50 100 150 by univariate SPM tools.
Time [Sec] Time [Sec]
The outcome of this limitation is illustrated by monitoring a two-variable
process (Figure 2.8). Shewhart charts of variables :1:1 and .T2 are plotted a-
Figure 2.6. Shewhart charts for model parameters long with the Xl - :1:2 biplot. The biplot shows on the Xl versus .T2 plane the
, and
(constant), respectively. observed values of Xl and :112 for each sampling time. The sampling time
stamps are not printed for simplifying the picture. Note the single data
point marked by a circled cross. According to their Shewhart charts, both
temperature, while changes in both level and structure are detected for the variables are in-control at all times. However, the biplot provides a different
exit temperature. assessment. If one were to use the confidence limits of the Shewhart charts
which form a rectangle that makes the borders of the biplot, the assessment
is identical. But if it is assumed that the two variable process has a multi-
variate Normal distribution, then the confidence limits are represented by
2.4 Limitations of Univariate Monitoring Tech- the ellipse that is mostly inside the rectangle of the biplot. However, most
nIques of the area inside the rectangle is outside the ellipse, and the ends of the el-
lipse extend beyond the corners of the rectangle. Based on the multivariate
In the era of single-loop control systems in chemical processing plants, there confidence limits, data points outside the ellipse are out-of-control. Hence,
was little infrastructure for monitoring multivariable processes by using the data point marked by a circled cross indicates an out-of-control situa-
multivariate statistical techniques. A limited number of process and qual- tion. In contrast, the portions of the ellipse outside the rectangle (upper
ity variables were measured in most plants, and use of univariate SPM left and lower right regions in the biplot) are in-control. While defective
tools for monitoring critical process and quality variables seemed appropri- products (represented the data point marked by a circled cross) would
ate. The installation of computerized data acquisition and storage systems, be shipped out as conforming to the specifications if univariate charts were
the availability of inexpensive sensors for typical process variables such as used, good products with .T1, .T2 characteristics that are inside the ellipse
temperature, flow rate, and pressure, and the development of advanced but outside the rectangle would be discarded as defective.
chemical analysis systems that can provide reliable information on quality The elliptical confidence region is generated by slicing the probability
variables at high frequencies increased the number of variables measured at distribution 'bell' in Figure 2.9 by a plane parallel to the .T1, X2 base plane
34 Chapter 2. Univariate Statistical Monitoring Techniques 2.5. Summary 35

.. ... .... UCL

. .. . :·N,:. ..
...... .........

.. .. " .. - ...... :.:
. . . .. .
..
.+ .." • Mean

..

M
..
..
.. +
.:
LCL

....
Sample number
..
.... ... . ..
(lJ
....... .
.. ....
.D
E
.., .:. '....
:::l
C
(lJ ...
0.. .., .+ ..
E
ctS
.. . .......
. ..... ...
(f)
• Ji!.,.

....J ....J
o....J o
::::>
Figure 2.9. The plot of the probability distribution function of a two-
Figure 2.8. Monitoring of a two-variable process by two univariate Shewhart variable (Xl, .1:2) process.
charts and a biplot of .1:1 vs .1:2.
their ability to capture the correlation information neglected by univariate
of the figure. The probability distributions of Xl or X2 are the familiar monitoring techniques. Simple charts (no more complicated than Shewhart
'bell-shaped curves' obtained by projecting the three-dimensional bell to charts) can summarize the status of the process. 'While the mathematical
the f(xl: .1:2) - :1:1 or f(Xl' - X2 vertical planes, respectively. Their and statistical techniques used are more complex, most multivariate process
confidence limits yield the familiar Shewhart chart limits. But, the slicing monitoring software shield these computations from the user and provide
of the bell at a specific confidence level, given by the value of f(Xl, easy-to-interpret graphs for monitoring a process.
yields an ellipse. The lengths of the major and minor axes of the ellipse are
functions of the variances of Xl and X2, while their slopes are determined
by the covariance of Xl and X2. 2.5 Summary
The shortcomings of using univariate charts for monitoring multivari-
able processes include too many false alarms, too many missed alarms and Various univariate statistical process monitoring techniques are discussed
the difficulty of visualizing and interpreting 'the big picture' about the pro- in this chapter. The philosophy and implementation of Shewhart charts
cess status. Plant personnel are expected to form an opinion about the are presented first. Then, cumulative sum (CUSUM) charts are introduced
process status by integrating and interpretation from a large number of for monitoring processes with individual measurements and for detecting
charts that ignore the correlation between the variables. small changes in the mean. Moving average (MA) charts are presented and
The appeal of multivariate process monitoring techniques is based on extended to exponentially weighted moving average (EWMA) charts that
36 Chapter 2. Univariate Statistical Monitoring Techniques

attach more importance to recent data. Most chemical processes generate


autocorrelated data. The impact of strong autocorrelation on univariate
SPM techniques is reviewed and two SPM techniques for autocorrelated
data are introduced. Finally, the limitations of univariate SPM techniques 3
for monitoring multivariable processes are discussed. The statistical foun-
dations for multivariate SPM techniques are introduced in Chapter 3, vari-
ous empirical multivariable model development techniques are presented in
Chapter 4, and the multivariable SPM methods for continuous processes Multivariate Statistical
are discussed in Chapter 5.
Monitoring Techniques

Many process performance evaluation techniques are based on multivariate


statistical methods. Various statistical methods that provide the founda-
tions for model development, process monitoring and diagnosis are present-
ed in this chapter. Section 3.1 introduces principal components analysis and
partial least squares. Canonical variates analysis and independent compo-
nents analysis are discussed in Sections 3.2 and 3.3. Contribution plots that
indicate process variables that have made large contributions to significant
changes in monitoring statistics are presented in Section 3.4. Statistical
methods used for diagnosis of source causes of process abnormalities de-
tected are introduced in Section 3.5. Nonlinear methods for monitoring
and diagnosis are introduced in Section 3.6.

3.1 Principal Components Analysis


Principal Components Analysis (PCA) is a multivariable statistical tech-
nique that can extract the strong correlations of a data set through a set of
empirical orthogonal functions. Its historic origins may be traced back to
the works of Beltrami in Italy (1873) and Jordan in France (1874) who inde-
pendently formulated the singular value decomposition (SVD) of a square
matrix. However, the first practical application of PCA may be attributed
to Pearson's work in biology [226] following which it became a standard
multivariate statistical technique [3, 121, 126, 128].
PCA techniques can be used either as a detrending (filtering) tool for
efficient data analysis and visualization or as a model-building structure
to describe the expected variation under normal operation (NO). For a
particular process, NO data set covers targeted operating conditions dur-
ing satisfactory perforrnance. PCA model is based on this representative

37
38 Chapter 3. Multivariate Statistical Monitoring Techniques 3.1. Principal Components Analysis 39

data set. The model can be used to detect outliers in data, provide da- able, select variables can be given a slightly higher scaling weight than that
ta reconciliation and monitor deviations from NO that indicate excessive corresponding to unit variance scaling [25, 94]. The directions extracted by
variation from normal target or unusual patterns of variation. Operation the orthogonal decomposition of X are the eigenvectors Pi of XTX or the
under various known upsets can also be modeled if sufficient historical data PC loadings
are available to develop automated diagnosis of source causes of abnormal X = t1PlT + t2P2T + ... + taPaT + E (3.1)
process behavior [242]. where E is n x m matrix of residuals. The dimension a is chosen such
Principal components (PC) are a new set of coordinate axes that are that most of the significant process information is taken out of E, and E
orthogonal to each other. The first PC indicates the direction of largest represents random error. If the directions are extracted sequentially, the
variation in data, the second PC indicates the largest variation unexplained first eigenvector is lined in the direction of maximum data variance and the
by the first PC in a direction orthogonal to the first PC (Figure 3.1). The second one, while being orthogonal to the first, is aligned in the direction of
number of PCs is usually less than the number of measured variables. maximum variance of the residual, and so forth. The residual is obtained at
each step by subtracting the variance already explained by the PC loadings
already selected, and used as the 'data matrix' for the computation of the
next PC loading.
The eigenvalues of the covariance matrix of X define the corresponding
amount of variance explained by each eigenvector. The projection of the
measurements (observations) onto the eigenvectors define new points in
the measurement space. These points constitute the score matri.T, T whose
columns are ti given in Eq. 3.1. The relationship between T, P, and X can
also be expressed as
T=XP, X=Tp T +E (3.2)

X where P is an m x a matrix whose jth column is the jth eigenvector of


J----------- 2 XTX, and T is an n x a score matrix.
The PCs can be computed by spectral decomposition [126], computa-
tion of eigenvalues and eigenvectors, or singular value decomposition. The
covariance matrix S (S=XTX/(m - 1)) of data matrix X can be decom-
posed by spectT-al decomposition as
(3.3)

Figure 3.1. PCs of three-dimensional data set projected on a single plane. where P is a unitary matrix 1 whose columns are the normalized eigenvectors
From [242], reproduced with permission. Copyright © 1996 AIChE. of S and A is a diagonal matrix that contains the ordered eigenvalues Ai of
S. The scores T are computed by using the relation T = XP.
PCA involves the orthogonal decomposition of the set of process mea- Singu.lar value decomposition of the data matrix X is given as
surements along the directions that explain the maximum variation in the X=U~VT (3.4)
data. For a continuous process, the elements of the n x m data matrix
T
XD are XD,i.i where i = 1,'" ,n indicates the number of samples and where the columns of U are the normalized eigenvectors of XX , the
j = 1, ... ,m indicates the number of variables. To remove magnitude and columns of V are the normalized eigenvectors of XTX, and ~ is a 'di-
variance biases in data, X D is mean-centered and variance-scaled to get X. agonal' matrix having as its elements the singular values, or the positive
Each row of X represents the time series of a process measurement with 1 A unitary matrix A is a complex matrix in which the inverse is equal to the conjugate
mean 0 and variance 1 reflecting equal importance of each variable. If a of the transpose: A -1 = A *. Orthogonal matrices are unitary. If A is a Teal unitary
priori knowledge about the relative importance about the variables is avail- matrix then A -] AT.
40 Chapter 3. Multivariate Statistical Monitoring Techniques 3.1. Principal Components Analysis 41

square roots of the magnitude ordered eigenvalues of XTX. For an n x m and meaningful physical quantities like temperatures, pressures and com-
matrix X, U is n x n, V is m x m and :E is n x m. Let the rank of X be positions. In statistical analysis and modeling, the quantification of data
denoted as T, T ::; min(m, n). The first T' rows of :E make a T' x T' diago- variance is of great importance. PCA provides a direct method of orthogo-
nal matrix, the remaining n - T' rows are filled with zeros. Term by term nal decomposition onto a new set of basis vectors that are aligned with the
comparison of the second equation in Eq. 3.2 and Eq. 3.4 yields directions of maximum data variance.
The empirical formulations proposed for the automated selection of a
P=V and T=U:E (3.5) usually give good results in finding a that captures the dominant correla-
tions or variance in the data set with minimum number of PCs. But this
For a data set that is described well by two PCs, the data can be
is essentially a practical matter dependent on the particular problem and
displayed in a plane. The data are scattered as an ellipse whose axes are in
the appropriate balance between parsimony and information detail. One
the direction of PC loadings in Figure 3.1. For higher number of variables
approach is demonstrated in the following example.
data will be scattered as an ellipsoid.
The selection of appropriate number of PCs or the maximum significant Example Let m = 20 and n = 1000 and generate X Dl by Gaussian
dimension a is critical for developing a parsimonious PCA model [120, 126, random assignment of simulated data. Let Xl be the corresponding mean-
258]. A quick method for computing an approximate value for a is to add centered and variance-scaled data set, which is essentially free of any struc-
PCs to the model until the percent of the cumulative variation explained by tured correlation among the variables. PCA analysis of Xl identifies the or-
including additional PCs becomes small. The percent cumulative variation thogonal eigenvectors U l and the associated eigenvalues {AI, ... , A2o} while
is given as separating the marginally different variance contributions along each PC.
C l t' V ' 2:~-1 Ai
Ai For this case a = 0 and the complete data representation is basically the
0/
10 umu a we arzance = 2::':1 (3.6) random contributions, Xl = XIR = E. Now generate a new set of data
X D2 by a combination of X Dl and time-variant multiple (five) correlat-
A more precise method that requires more computational time is cross-
ed functions within m 20. X 2 is the corresponding mean-centered and
validation [155, 332]. It is implemented by excluding part of the data,
variance-scaled version of X D2 . Note that mean-centering along the rows
performing PCA on the remaining data, and computing the prediction error
of XD2 removes any possibility of retaining a static correlation structure
sum of squares (PRESS) using the data retained (excluded from model
in X 2 · Thus, X 2 has only random components and time dependent corre-
development). The process is repeated until every observation is left out
lated variabilities contributing towards the overall variance of the data set.
once. The order a is selected as that minimizes the overall PRESS. Two
Figure 3.2 shows the comparison of two cases in terms of both eigenvalues
additional criteria for choosing the optimal number of PCs have also been
and variance characteristics associated with sequential PCs. Eigenvalues
proposed by Wold [332] and Krzanowski [155], related to cross-validation.
are presented in a scaled form as Ad Al and the variance contributions are
Wold [332] proposed checking the ratio,
plotted as fractional cumulative values as in Eq. 3.6. Random nature of Xl
PRESSa is evident in the similarity of eigenvalue magnitudes where each subsequent
R (3.7) value is only marginally smaller than the previous one. As a result, contri-
RSSa - l
butions to overall variance with additional modes essentially form a linear
where RSSa is the residual sum of squares based on the PCA model after trend confirming similarity in variabilities explained through each PC. On
adding the ath principal component. When R exceeds unity upon addition the other hand, the parts of the plots showing the characteristics of X 2
of another PC, it suggests that the ath component did not improve the reflect the distinct difference between the first three eigenvalues compared
prediction power of the model and it is better to use a - 1 components. to the rest. The scaled eigenvalue plot shows that the asymptotic trend
Another approach is based on the SCREE plots that indicate the dimension (slope) of the initial higher values v.Then compared to the smaller values
at which the smooth decrease in the magnitude of the covariance matrix differentiate the first three eigenvalues from the rest suggesting that a ~ :3.
eigenvalues appear to level off to the right of the plot [253]. With a = 3, almost of the total variance can be captured. Note that
PCA is simply an algebraic method of transforming the coordinate sys- starting with a + 1, the relative contributions of additional PCs can not
tem of a data set for more efficient description of variability. The conve- be clearly differentiated from the contributions of higher orders. For some
nience of this representation is in the equivalence of data to measurable practical cases, the distinction between dominant and random modes may
42 Chapter 3. Multivariate Statistical Monitoring Techniques 3.3. Independent Component Analysis 43

not be as clear as this example demonstrates. However, combined with 3.2 Canonical Variates Analysis
specific process knowledge, the two plots presented here are always useful
in selecting the appropriate a. Canonical correlation analysis identifies and quantifies the associations be-
tween two sets of variables [126]. Canonical correlation analysis is conduct-
ed by using canonical variates. Consider n observations of two random vec-
tors x and y of dimensions p and q forming data sets X pxn and Y qxn with
••• Cov(X) :Ell, Cov(X) = :E 22 , and Cov(X, Y) = :E l2 . Also :E 12 = :Erl
+ • + • and without loss of generality p ::; q.
•• • • For coefficient vectors a and b form the linear combinations u aTX
and v = bTy. Then, for the first pair Ul, Vl the the maximum correlation

max Corr( Ul, vd = Pl (3.8)


a,b

is attained by the the linear combination (first canonical pair)

Vl = f1T~-1/2y
L.<22 (3.9)

Figure 3.2. Scaled eigenvalues (left) and cumulative contributio~s of se- The kth pair of canonical variates k = 2, 3, ... p,
quential PCs towards total variance for two simulated data sets. Flrst data . _ fT~-1/2
set has only normally distributed random numbers (circles) while the sec- 'Uk - k L.<22 Y (3.10)
ond one has time dependent correlated variables in addition to random
maximizes Corr (Uk, Vk) = Pk among those linear combinations uncorre-
noise (diamonds).
lated with the preceding k - 1 canonical variables. Here pi, p~, ... P~ are
· 1ues 0 f covanances
. ~-1/2~ ~-l~ ~-1/2
t h e elgenva L.<ll L.<l2L.<22 L.<21L.<ll an d el, e2, ... e p
are the associated p x 1 eigenvectors. P; are also the eigenvalues of covari-
ances :E~21/2:E21:El/:E12:E~21/2 with the corresponding q x 1 eigenvectors
Partial Least Squares f l , f2, ... f p . Detailed discussion of canonical variates and canonical corre-
lations analysis are provided in most multivariate statistical analysis books
Partial Least Squares (PLS) projections to latent structures, develops [126].
a biased model between two blocks of variables X and Y. PLS selects Canonical variates will be used in the formulation of subspace state-
latent variables so that variation in X which is most predictive of the Y is space models in Section 4.5.
extracted. The PLS approach was developed in the 1970s by H. Wold for
analyzing social sciences data by estimating model parameters using the
Nonlinear Iterative Partial Squares (NIPALS). The method was further 3.3 Independent Component Analysis
developed in the 1980s by S. Wold and H. Martens for more complex data
structures in science and technology applications. PLS ,vorks on the sample Independent Component Analysis (ICA) is a signal processing method for
covariance matrix (XTy)(yT X ) [338]. The original PLS methodology transforming multivariate data into statistically independent components
provides a linear multivariate model. The modeling algorithm ~s desc~ibed expressed as linear combinations of observed variables [91, 119, 134]. Con-
in Section 4.3. Nonlinear extensions can be developed by usmg vanable sider a process with m zero-mean variables x = X2 ... Xm)T. The
transformations in the X and/or Y blocks if the nonlinearity is within zero-mean independent variables s (s] S2 ... SI)T are defined by
these blocks or by using a nonlinear functional form in the so-called inner
relation if the nonlinearity is between the X block and the Y block [61]. x=As (3.11)
44 Chapter 3. Multivariate Statistical Monitoring Techniques 3.3. Independent Component Analysis 45

where A is the mixing matrix of dimension m x 1 that will be determined. where f-i denotes a learning-rate parameter, ,\ a Lagrangian multiplier, and 1
For n samples, Eq. 3.11 becomes an iteration index. A fixed-point algorithm can be used instead of a learning
algorithm for finding the local extrema of the fourth-order cumulant [1381.
X=AS (3.12)
The fixed-points b of Equation 3.16 satisfy .
where the dimensions of X and S are m x nand 1 x n, respectively. The
mathematical problem to solve is the estimation of S and A from X. A E [4 (b T z)3 z] - 1211bl12b + 2,\b = 0 (3.17)
separating matrix W'xm is calculated to achieve this so that the compo-
and are obtained by iteration:
nents of the reconstructed data matrix Y = WX become as independent
as possible from each other. The limitations of ICA are:
(3.18)
1. The signs, powers and orders of independent components (IC) can
not be estimated. The fixed-point algorithm for ICA is summarized by Kano et al. [138]:

2. Only non-Gaussian ICs can be estimated, only one of them can be 1. Transform measured variables x to unit-variance uncorrelated vari-
Gaussian. ables z using Eq. 3.13. PCA can accomplish this transformation.

These limitations have little impact for their use in process monitoring 2. Start with a random initial vector bi(O) of unit norm Ilbll = 1. For
because the estimations in limitation (1) is crucial only when an exact i ;:::: 2, bi(O) is projected using
reconstruction of ICs is necessary and if the original signals are Gaussian,
arbitrarily selecting one of the ICs as Gaussian yields ICs that are useful bi(O) = bi(O) - Bi-1BT-1bi(0) (3.19)
for monitoring [138]. and then it is normalized so that Ilbi(O)11 = 1.
To perform ICA, measured variables Xi are first transformed to uncorre-
Start with I = O.
lated, unit-variance variables Zj called sphering or prewhitening. This can
be implemented by PCA. The relationship between z and s is expressed as (a) b i is updated using

z=Mx=MAs Bs (3.13) bi(l + 1) = E [(bi(lfz)3 z] - 3b i (l) (3.20)


where M is the sphering matrix and B = MA. Since Si are mutually
The expected value is estimated by using a large number of sam-
independent and and Zj are mutually uncorrelated
ples.
(3.14) (b) bi(l + 1) is projected using
if the covariance of s, E[ssTJ, is an identity matrix. Hence, B is an or- (3.21)
thogonal matrix according to Eq. 3.14. Since M is determined by PCA,
estimation of A is reduced to the estimation of the orthogonal matrix B. and normalized so that Ilbi(1 + 1)11 = 1.
Kurtosis or the fourth-order cumulant is used in computing B. The (c) If Ibi(l + 1) T b i(I)1 is close enough to 1 go to the next step,
fourth-order cumulant K:4(U) of a zero-mean variable -u is otherwise let I = 1 + 1 and go back to Step (a).

(3.15) 3. Let b i = bi(l + 1), i = i +1 and go back to Step 2. This iteration


ends when -i = I.
The columns of B are obtained by minimizing or maximizing K:4 (b T z) under
the constraint libll = 1 by using a gradient method [51, 138]. A learning 4. The independent components Yare obtained from
algorithm based on the gradient method has the form
(3.22)
b(l + 1) = b(l) ± Ii {E [4 (b(lf z)3 z] - 121Ib(I)112b(l) + 2'\b(I)} (3.16) where B = B,.
46 Chapter 3. Multivariate Statistical Monitoring Techniques 3.4. Contribution Plots 47

3.4 Contribution Plots The overall contribution of each variable is computed by summing over
all scores with high values. For each score with high values (using a thresh-
Multivariate process monitoring techniques use measurements of process
old value of 2.5, for example) the variable contributions are calculated [146].
variables to detect significant deviations in process operation from the de-
Then, the values over all the I high scores are summed for contributions
sired or normal operation (NO) and trigger the need to determine special
that have the same sign as the score:
causes affecting the process. Multivariate monitoring charts such as T 2
and S P E charts (Section 5.1) indicate when the process goes out of con- 1. For all I high scores (l ~ m):
trol, but they do not provide information on the source causes of abnormal
process operation. The engineers and plant operators need to determine i. Compute the contribution of variable Xj to the normalized score
the actual problem once an out-of-control situation is indicated. Miller (t;jSi)2
et al. [197, 198] have introduced variable contributions and contribution
plots concept to address this need. Contribution plots indicate the process 1'2 t2
cont i •j = 8~ Pi.j (X J - XJ ) (3.26)
variables that have contributed significantly to inflate T 2 -statistic (or D),
"
squared prediction error S P E-statistic (or Q) and scores. The fault diag-
nosis activity is completed by using process knowledge (of plant personnel ii. Set contt
.]
to zero if it is negative (sign opposite to the score ti)
or a knowledge-based system) to relate these process variables to various
2. Calculate the total contribution of variable Xj
equipment failures and disturbances.
Contributions of process variables to the T 2 -statistic.
T,vo different approaches for calculating variable contributions to T 2 _ (3.27)
statistic have been proposed. The first approach introduced by Miller et
al. [198] and by MacGregor et al. [146, 177] calculates the contribution of
each process variable to a separate score. T 2 can be written as The second approach was proposed by Nomikos [217] and implement-
ed on batch process data. This approach calculates contributions of each
process variable to the T 2 -statistic rather than contributions of separate
(3.23) scores.

(3.28)
where, as before, ti denotes the scores, '\ the eigenvalues of S, m the
number of variables, and 87
the variance of ti (the ith ordered eigenvalue
of S). Each score can be written as Contributions of process variables to the SP E-statistic.
m
Contribution to the S P E-statistic is calculated using the individual
residuals. The contribution of variable j to the S PEat time k is
ti pr(x-X)=~Pi.j(Xj-Xj) (3.24)
j=l
(3.29)
where Pi is the loading, the eigenvector of S corresponding to Ai, and
Pi,j, Xj, and Xj are associated with the jth variable. The contribution of For a data set of length n:
each variable :X:j to the score of PC i is given by Eq. 3.24 n
CONTlPE (Xj - Xj)(Xj Xj)1' = ~ (ei.j)2 (3.30)
(3.25)
i=l

Considering that variables with high levels of contribution that are of the where Xj is the vector of predicted values of the (centered and scaled)
same sign as the score are responsible for driving T 2 to higher values, on- measured variable j (with n observations) and ej denotes the residuals.
ly those variables are included in the analysis [146]. For example, only It is always a good practice to check individual process variable plots
variables with negative contributions are selected if the score is negative. for those variables diagnosed as responsible for flagging an out-of-control
48 Chapter 3. Multivariate Statistical Monitoring Techniques 3.5. Linear Methods for Diagnosis 49

situation. "\iVhen the number of variables is large, analyzing contribution The distance d(x, y) between two items x = [.1;1 X2 and y
plots and corresponding variable plots to reason about the faulty condi- [Y1 Y2 ... YmV can be expressed as the Euclidian dl:stance,
tion may become tedious and challenging. This analysis can be automated
and linked with real-time diagnosis [219, 304] by using knowledge-based
systems.
d(x, y) = V(x - y)T(x - y) (3.31 )

or the statistical distance (or Mahalanobis distance),


3.5 Linear Methods for Diagnosis
d(x,y) = V(x - y)TS-1(X - y) (3.32)
Fault diagnosis determines the source cause(s) of abnormal process oper-
ation. The fault may be one of many that are already known because of where S is the covariance matrix, or the Minkowski metric,
previous experience or a new one. Fault diagnosis activity usually compares

[~
the performance of the process (trajectories of process variables) under the
current fault to process behavior under various faults (fault signatures) to d(x, y) = (3.33)
determine the current fault. A combination of statistical techniques and
process knowledge should first be used to catalog process behaviors (fault Other distance measures include the Canberra metric and the Czekanowski
signatures) from historical data. Pattern-matching methods for this ac- coefficient [126]. Clustering can be hierarchical such as grouping of species
tivity have been proposed [270, 271, 273]. It is important to consider the and subspecies in biology or nonhierarchical such as grouping of items. For
effects of data compression methods used for storing historical data when fault diagnosis nonhierarchical clustering is used to group data to k clusters
such data are used for pattern matching and cataloging of faults [272]. corresponding to k known faults.
The identification of fault signatures for faults that have not been deter- k-means clustering is a popular nonhierarchical clustering method that
mined by plant personnel may necessitate unsupervised learning. This can assigns each item to the cluster having the nearest centroid (mean). It was
be achieved by clustering (Section 3.5.1). Once data clusters with various proposed by MacQueen [178], and consists of
faults have been determined, discrimination and classification are used for
fault diagnosis [63, 79]. Two linear statistical techniques, discriminant anal- 1. Partitioning the items into k initial clusters or specifying k initial
ysis (Section 3.5.2) and Fisher's discriminant analysis (Section 3.5.3), are mean values as seed points.
introduced to illustrate the strengths and limitations of these techniques.
Neural networks have also been used for fault classification and diagnosis 2. Proceeding through the list of items by assigning an item to the clus-
[252, 311, 312]. NN-based classification is useful when a small number of ter whose mean is nearest (using a distance measure, usually the
faults in a closed set are to be diagnosed, but for more complex cases with Euclidian distance)
multiple faults or new faults NN do not provide a reliable framework and 3. Recalculation of the mean for the cluster receiving the new new item
they may converge to local optima during training. Support vector ma- and the cluster losing the item.
chines (SVM) provide another nonlinear technique for event classification
and fault diagnosis (Section 3.6.3). 4. Repeating Steps 2 and 3 until no more reassignments take place.

The traditional hierarchical and nonhierarchical (e.g., k-means) clus-


3.5.1 Clustering
tering algorithms [69] have a number of drawbacks that require caution in
Searching the data for groupings (classes) according to some characteristics their implementation for time series data. The hierarchical clustering algo-
is an important exploratory process. Cluster analysis performs grouping rithms assume an implicit parent-child relationship between the members
(classification) on the bases of similarity measures [126]. Items and cases are of a cluster which may not be relevant for time series data. However. thev
usually clustered by indicating proximity using some measure of distance or can provide good initial estimates of patterns that may exist in the data se~.
angle. Variables are usually grouped on the basis of measures of association The k-means algorithm requires the estimate of the number of clusters (i.e.,
such as correlation coefficients. k) and its solution depends on the initial assignments as the optimization
50 Chapter 3. Multivariate Statistical Monitoring Techniques 3.5. Linear Methods for Diagnosis 51

can get stuck in local minima. Furthermore, time series data are inher- prior probability by Pi i = 1,'" ,g and their probability density function-
ently autocorrelated that violates the key assumption of independent data s by fi(x). Assume that fi(x) are multivariate Normal density functions
elements for traditional clustering algorithms. Beaver and Palazoglu [14] with population and sample means f-L,: and Xi, respectively and population
[16] proposed an agglomerative k-means algorithm that overcomes these and sample variances I;i and S'i, respectively. The cost of misclassification
drawbacks and can also present the results in terms of a dendrogram, thus is c(kli), the cost of allocating an object to Irk (for k = 1,'" ,g) when it
facilitating the selection of final cluster solution depending on the desired belongs to Iri (for i = 1, ... ,g). If R k is the set of x's classified as Irk, the
level of resolution. The algorithm is referred to as k-PCA Models as it uses probability of classifying an event as Irk when it actually belongs to Iri is
dynamic PCA as the prototype model for time series data. It is applied
to data collected from the operation of a pilot-plant that exhibits cyclic P(kli) P(classifying event as IrklIri)
dynamic response [15] and shows how the periods of faulty and normal
operations can be distinguished from one another. .Lkfi(X)dX i,k=l,"',g (3.34)
Displaying multivariate data in low-dimensional space can be useful for
visual clustering of items. For example, plotting the scores of the first few with PCili) = 1 - 'E}k=l.k#i P(kli). The notation P(alb) indicates the con-
pairs of principal components as biplots of the first versus the second or ditional probability of observing a, premised on the presence of b. The
the third principal components can cluster normal process operation and conditional expected cost of misclass~fication (EClvI) of an event in Irl to
operation under various faults. Examples of biplots and their interpretation any other class is
9
for fault diagnosis are presented in Chapter 7.
Pattern-matching methods to catalog process behaviors (fault signa-
ECM(IrI) L
P(kI1)c(kI1)
= (3.35)
k=2
tures) from historical data have been proposed [271, 270, 273]. For high-
This conditional expected cost of misclassifying an event belonging to Ir]
dimensional data, distance measures may not be enough to describe the
occurs with prior probability Pl (the probability of IrI). The conditional
locations of specific clusters with respect to one another. Angle measures
overall expected cost of misclassification is computed by multiplying each
provide additional information [243, 154].
EC M (Irl) with its prior probability and summing over all classes

3.5.2 Discriminant Analysis ECN! PlECM(IrI) + '" + pgECM(Ir g)


9 9
Statistical discrimination and classification separate distinct sets of objects Pl L P(kI1)c(kI1) + P2 L P(kI2)c(kI2)
(or events), and allocate new objects (or events) into previously defined k=2 k=l, k#2
groups of objects, respectively [126]. Discrimination uses discrimination g-]
criteria called discriminants for converting salient features of objects from + ... + Pg L P(klg)c(klg)
several known populations to quantitative information separating these k=1
populations as much as possible. Classification sorts new objects or events
into previously labeled classes by using rules derived to optimally assign
new objects to the labelled classes. A good classification procedure should
t,p, C~t,#, P(k1i)C(k1i)) (3.36)
yield few misclassifications. The probability of occurrence of an event may
be greater if it belongs to a population that has a greater likelihood of oc- The determination of the optimal classification procedure becomes selection
currence. A good classification rule should take these 'prior probabilities of mutually exclusive and exhaustive classification recrions
b
R]., R 2 i ... ,R
g
of occurrence' into consideration and account for the costs associated with such that the ECN! in Eq. 3.36 is minimized [126]. The classification re-
misclassification. gions that minimize Eq. 3.36 are defined by allocating x to that population
Consider a data set with 9 distinct events such as normal process oper- Irk, k = 1" .. ,g for which
ation and operation under 9 - 1 different faults. The operation type (class) 9
is determined on the basis of m measured variables x = [Xl X2 .,. xm]T
that are random variables. Denote the classes by Iri, i = 1,'" ,g, their
L pdi(X)c(kli) (3.37)
i=1, i#k
52 Chapter 3. Multivariate Statistical Monitoring Techniques 3.5. Linear Methods for Diagnosis 53

is smallest [3, 126]. If all misclassification costs are equal, the event indi- A simplification is possible if the population covariance matrices :E i are
cated by data x will be assigned to that population I<k for which the sum equal for all i. Then,:E i :E and Eq. 3.41 reduces to
:Z=Ll, itkPi.!i(x) is smallest. Hence, the omitted term Pkfk(x) is largest,
and the minimum ECM rule for equal misclassification costs becomes [126]:
Allocate x to I<k if Pkfk(X) > pdi(X) for alli =F k.
Since the second and third terms are independent of i, they are the same for
given prior probabilities, density functions, and misclassification costs (when
all d i Q (x) and can be ignored in classification. Since the remaining terms
they are not equal). This classification rule is identical to the rule that max-
consist of a constant for each'i (lnpi -1/2J-Lr:EJ-Li) and a linear combination
imizes the 'posterior' probability P(l<klx) (P(x) comes from I<k given that
of the components of x, a linear discriminant score is defined as
x was observed) where

Pkfk(x) (prior) x (likelihood) (3.43)


x (3.38)
P(l<kl ) = :Z=f=l pdi(X) - :Z=[(prior) x (likelihood)]
An estimate of di(x) can be computed based on the pooled estimate of:E
with k: = 1,'" ,g. If the populations follow Normal distributions with [126]:
mean vectors J-Li' covariance matrices :E i , and general'ized variance l:Eil
(determinant of the covariance), Ii (x) is defined as d'i ()
X -TS-I-
= Xi 1 -TS-I-
pl X - "2 Xi
I
pl Xi + npi i = 1,'" ,g (3.44)
where
(3.39)
(3.45)
for i 1, ... ,g and all misclassification costs are equal, then x is allocated
to I<k if and n g denotes the data length (number of observations) in class g. The
P
minimum total probability of misclassification rule for Normal populations
lnpk - 2 In(21<) with equal covariance matrices becomes [126]:
max In PiFi(X) Allocate x to I<k if dk(x) is the largest of all di(x), i = 1,' .. ,g.
~

The constant p/2 In(21<) is the same for all populations and can be ignored
in discriminant analysis. The quadr'atic discrimination score for the ith 3.5.3 Fisher's Discriminant Analysis
population d i Q (x) is defined as [126]
Fisher suggested to transform the multivariate observations x to another
coordinate system that enhances the separation of the samples belonging
1,"',g (3.41) to each class I<i [74]. Fisher's discriminant analysis (FDA) is optimal in
terms of maximizing the separation among the set of classes. Suppose that
The generalized variance l:E i !, the prior probability P, and the Mahalanobis there is a set of n( = nl + n2 + ... + ng) m-dimensional (number of process
distance contribute to the quadratic score d i Q (x). Using the discriminant variables) samples Xl,'" ,xn belonging to classes I<i, i 1,'" ,g. The to-
scores, the minimum total probability of misclassification rule for Normal tal scatter of data points (ST) consists of two types of scatter, within-class
populations and unequal covariance matrices becomes [126]: scatter Sw and between-class scatter SB. The objective of the transforma-
Allocate x to I<k if dkQ(x) is the largest of all diQ(x), i = 1,'" ,g. tion proposed by Fisher is to maximize SB while minimizing Sw. Fisher's
approach does not require that the populations have Normal distributions,
In practice, population mean and covariances (J-Li and :E i ) are unknown. but it implicitly assumes that the population covariance matrices are equal,
Computations are based on historical data sets of classified observations, because a pooled estimate of the common covariance matrix (Spl) is used
and sample mean (x;) and covariance matrices (Si) are used in Eq. 3.41. (Eq. 3.45).
54 Chapter 3. Multivariate Statistical Monitoring Techniques 3.5. Linear Methods for Diagnosis 55

FDA for data belonging to two classes where 8; is the pooled estimate of the variance,
The transformation is based on a weighted sum of observations x. In
the case of two classes, the linear combination of the samples (x) takes
values Zl1,' .. ,Zlpl for the observations from the first population 7Tl and (3.4 7)
the values Z2l,'" ,Z2p2 for the observations from the second population 7T2.
Denote the weight vector that transforms x to Z by w. FDA is illustrated
The linear combination that maximizes the separation is [126]
for the case of two normal populations with a common covariance matrix
in Figure 3.3. First consider separation using either Xl or X2 axis. The -)T s-l
A
Z = WTx = (-
Xl - x2 pi X (3.48)
diagrams by the abscissa and ordinate indicate that several observations
belonging to one class (7Tl) are mixed with observations belonging to the which maximizes the ratio
other class (7T2)' The linear discriminant function Z w T x defines the
line in the upper portion of Figure 3.3 that observations are projected on (WTXl - wTX2?
(3.49)
in order to maximize the ratio of between-class scatter and within-class wTSplW

scatter [63, 126].


over all possible coefficient vectors w where d = (Xl - X2). The maximum
Points and projections
of the ratio in Eq. 3.49 is T 2 = (Xl - X2)TS;/(Xl - X2) [126]. For two
populations with equal covariances, FDA corresponds to the particular case
of the minimum EONI rule discussed in Section 3.5.2. The first terms in
Eqs. 3.43 and 3.44 are the linear function obtained by FDA that maximizes
x
('oj the univariate between-class scatter relative to the within-class scatter (Eq.
3.48) [126].
The allocation rule of a new observation Xo to classes 7Tl or 7T2 based on
FDA is [126]
Xl Allocate Xo to 7Tl if
Probability density functions for projections
(3.50)

Allocate Xo to 7T2 otherwise.


Separation of Many Classes (g > 2)
The generalization of the within-class scatter' matri.T Sw for g classes is
g

0.5(Yl+ Y2) Yl Sw = L(ni - l)Si (3.51)


as IT > Classify as IT 1 i=l
2

where ni denotes the number of observations in class i and


Figure 3.3. Fisher's discriminant technique for two populations (g 2),
7Tl (*) and 7T2 ( 0 ), with equal covariances. 1 n'i

S·" = -
ni -1 "'(x'
~ ZJ
- x)(x/'I.J - x)T
7, Z
(3.52)
The separation of the two sets of z's can be assessed in terms of the j=l

difference between Zl and Z2 expressed in standard deviation units:


represents the covariance matrix and the mean vector for class i [63].
IZI - z21 Sw/(nl + n2 + ... + n g - g) = Spl is an estimate of :E. The w that
separation = -'------ (3.46) maximizes WTSBW/WTSplW also maximizes WTSBW/WTSww.
56 Chapter 3. Multivariate Statistical Monitoring Techniques 3.5. Linear Methods for Diagnosis 57

Define the between-class scatter matTix S B and the total scatter matrix [63, 36]. The eigenvalues in Eq. 3.58 can be computed as the roots of the
ST as [63, 118]: characteristic polynomial det(SB - AaSw) = 0 and then solving (SB -
9 AaSW)Wa = 0 directly for the eigenvectors W a [63].
SB = Lni(Xi X)(Xi - xf (3.53) Classification with FDA
i=1 FDA is used to diagnose faults by modifying the quadratic discrimina-
9 TH
tion SCOTe for the ith population defined in Eq. 3.41 in the FDA framework
ST = 'L." L." Xij
" "'( - X) (Xij x-)T (3.54)
such that
i=1 j=1
1
where x is the total mean vector di Q (xo)=lnpi-2"(xo-Xi) T
W a (WaSiW
T
a
) -1 T
Wa(xo
1
~ In [det (W;Si W a)]
gIg ni

X= ~ L nixi = ~ L L Xij (3.55) - (3.59)


i=1 i=1 j=1
where W a contains the first a FDA vectors [36]. The allocation rule is:
and n = 2.:f=1 ni denotes the total number of observations in all classes.
Equation 3.54 can be rewritten by adding -Xi + xi to each term and re- Allocate Xo to 7Tk if dkQ(xo) is the largest of all diQ(xo), i = 1,'" ,g.
arranging the sums so that the total scatter is the sum of the within-class The classification rule in conjunction with Bayes' rule is used [126, 36]
scatter and the between-class scatter as [63]: so that the posterior probability (Eq. 3.38) assuming 2.:f=1 P(7TkjX) = 1
9 n'i that the class membership of the observation Xo is i. This assumption may
ST L L(Xij - Xi + Xi X)(Xij - Xi + Xi - xf lead to a situation where the observation will be classified wrongly to one
i=1 j=1 of the fault cases which were used to develop the FDA discriminant when
9 ni 9 an unknown fault occurs. Chiang et al. [36] proposed several screening
L L(Xij - Xi)(Xij - xif +L ni(Xi - X)(Xi - xf(3.56) procedures to detect unknown faults. One of them involves FDA related
i=1 ,7=1 i=1 T 2 -statistic before applying Eq. 3.59 as
Sw +SB
(3.60)
The first FDA vector WI that maximizes the scatter between classes
(S B) while minimizing the scatter within classes (Sw) is obtained as so that it can be used to determine if the observation is associated with
fault class i. The threshold for Ti~a is defined as
WTSBW
max-=---- (3.57)
WioWTSWW 2 _ a(n - 1) (n + 1) F ( _)
Ta,a - ( ) a a, n a (3.61)
under the assumption of Sw being invertible [63, 36]. The second FDA nn-a
vector is calculated to maximize the scatter between classes while minimiz-
where Fa(a, n - a) denotes the F distribution with a and n - a degrees
ing the scatter within classes among all axes perpendicular to the first FDA
of fi:eedom [126]. Chiang et al. [36] introduced another class of data that
vector (wd. Additional FDA vectors are determined if necessary by using
are collected under NO to allow the class information in the known fault
the same maximization objective and orthogonality constraint. These F-
data to improve the ability to detect faults. The first step then becomes
DA vectors W a form the columns of an optimal W that are the generalized
the detection of an out-of-control situation. A threshold for NO class is
eigenvectors corresponding to the largest eigenvalues in
developed based on Eq. 3.61 for detection; if Tl a :::: T'/;,a' there is an
(3.58) out-of-control situation. One proceeds with calculation at thresholds for
each class i using Eq. 3.61. If Ti~a :::: T,;,a for all i 1, ... ,g, then the
where the magnitude ordered eigenvalues Aa indicate the degree of overall observation Xo does not belong to any fault class i, and it is most likely
separability among the classes by linearly transforming the data onto W a associated with an unknown fault. If Ti~a :::; T'/;,a for some fault class i,
58 Chapter 3. Multivariate Statistical Monitoring Techniques 3.6. Nonlinear Methods for Diagnosis 59

then Xo belongs to a known fault class. Once this is determined, Fisher's 1980s when Rumelhart et al. [257] popularized a much faster learning pro-
discriminant score in Eq. 3.59 can be used to assign it to a fault class ITi cedure called back-propagation, which could train a multi-layer perceptron
with the highest diQ(xo) of all diQ(xo),i = 1,'" ,g. to compute any desired function.
FDA and PCA can also be combined to avoid assigning an unknown ANNs are nonlinear 'black-box' systems. This nonlinearity is distribut-
fault to one of the known fault classes [118, 36, 260]. PCA is widely used ed throughout the network. ANNs have the ability to adapt, or learn, in
for fault detection as discussed in Chapter 5. Chiang et al. [36] proposed response to variations in their environment through training. They can
two algorithms incorporating FDA and PCA. In the first algorithm (P- be retrained to deal with minor changes in the operational and/or envi-
CA/FDA), PCA is used to detect unknown faults and FDA to diagnose ronmental conditions. When operating in a non-stationary environment,
faults (by assigning them to fault classes). The NO class and classes with ANNs can be designed to adjust their synaptic weights in real-time. This
fault conditions are used to develop the PCA model. When a new obser- is valuable in adaptive pattern classification and adaptive control. ANNs
vation Xo becomes available, T; value is calculated based on PCA as perform multivariable pattern recognition tasks very well. They can learn
from examples (training) by constructing an input-outp'ut mapping for the
T a2 = XoTp a/\a
\-lpT
a Xo (3.62) system of interest. In the pattern classification case an ANN can be de-
signed to provide information about similar and unusual patterns. Training
where Aa is (a x a) diagonal matrix containing eigenvalues and P are the and pattern recognition must be made by using a closed set of patterns.
loading vectors. A set of threshold values based on NO and the known fault All possible patterns to be recognized should be present in the data set.
classes using Eq. 3.61 is calculated. If T~ ::; T~ a' it is concluded that this A properly designed and implemented ANN is usually capable of robust
is a known class (either NO or faUlty) and FDA assignment rule is used to computation. Its performance degrades gracefully under adverse operating
diagnose the fault class (or NO class if it is in-control). conditions and when some of its connections are severed. ANNs have some
The second combined algorithm (FDA/PCA) deploys FDA initially to serious limitations as well. Tl'aining ANNs may take long times when struc-
determine the most probable fault classi. Then it uses PCA T 2 -statistic turally complex ANNs or inappropriate optimization algorithms are used.
to find out if the observation Xo is truly associated with fault class i. ANNs may not produce reliable results if the size of input-output data is
small. Their accuracy for modeling and classification improve when large
amounts of historical data rich in variations are available. Training may
3.6 Nonlinear Methods for Diagnosis cause the network to be accurate in some operating zones, but inaccurate
in others. While trying to minimize the error during training, the opti-
This section introduces artificial neural networks, kernel-based techniques mization may get trapped in local minima. Like all data-based techniques,
and support vector machines to establish the basis of monitoring techniques there is no guarantee of complete reliability or accuracy. In fault diagnosis
to be discussed in the subsequent chapters. applications, for example, ANNs may misdiagnose some faults 1% of the
time while other faults in the same domain 25% of the time. It is hard to
determine a priori (when back-propagation algorithm is used) what faults
3.6.1 Neural Networks will be prone to higher levels of misdiagnosis. There are practical problems
Artificial neural networks (ANNs) can be used for modeling nonlinear sys- related to training data set selection [152, 165].
tems, classification and fault diagnosis. ANNs have been inspired from the The basic structure of ANNs typically includes multi-layered, inter-
way the human brain works as an information-processing system in a highly connected neurons (or computational units) that nonlinearly relate input-
complex, nonlinear and massively parallel fashion. Other names for ANNs output data. A nonlinear model of a neuron, which forms the core of the
include parallel distributed processors, connectionist models (or networks), ANNs is characterized by three basic attributes (Figure 3.4):
self-organizing systems, neuro-computing systems and neuromorphic sys-
tems. ANNs have a large number of highly interconnected nodes also called 1. A set of connections (synaptic weights) describing the amount of in-
as processing elements or artificial neurons. The first computational model fluence a node has on nodes in the next layer; a positive weight causes
of a biological neuron, the binary threshold unit was proposed by McCul- one unit to excite another, while a negative weight causes one unit to
loch and Pitts in 1943 [192]. Interest in ANNs was gradually revived in inhibit another. The signal Xj at the input synapse j connected to
60 Chapter 3. Multivariate Statistical Monitoring Techniques 3.6. Nonlinear Methods for Diagnosis 61

neuron k in Figure 3.4 is multiplied by weight Wkj (see Eq. 3.63). input signals, bk is the bias, Vk is the activation potential (or induced local
field), !pC) is the activation function, and Yk is the output signal of the neu-
2. A summation opemtor of input signals, weighted by the respective ron. The bias is an external parameter providing an affine transformation
synapses of the neuron. to the output Uk of the linear combiner.
3. An activation junction with limits on the amplitude of the output of a Several activation functions are used as appropriate to the task at hand:
neuron. The amplitude range is usually given in a closed interval [0,1]
1. Thr'eshold Function. Also known as McCulloch-Pitts model [192]
or [-1,1]. Activation function !pC) defines the output Yk of a neuron
(see Eq. 3.65) in terms of the activation potential Vk (see Eq. 3.64).
I, V;?O
Typical activation functions include the unit step change and sigmoid !p(V) { 0, (3.66)
v < 0.
functions.

2. Piecewise-linear Function.
Fixed input
I,
x O=+1 0
!p(v). = v, (3.67)
{ 0,

where the amplification factor inside the linear region of operation is


Activation assumed to be the unity.
function

~~outPut 3. Sigmoid Function. This S-shaped function is by far the most common
inputs L:J ~ Yk form of activation function used. A typical expression is

1
Summing !p(v) = --- (3.68)
junction 1+ e- av
where a is the slope parameter.

Synaptic weights 4. Hyperbolic Tangent Function. This is a form of sigmoid function but
(including bias)
it produces values in the range [-1, +1] instead of [0,1]
Figure 3.4. A nonlinear model of a single neuron as illustrated in [107]. e'U e- v
!p(v) = tanh( v) = -U . - -V (3.69)
e + e-
A neuron k can be described by the following set of equations [107]:
m Processing units (neurons) are linked to each other to form a network as-
Uk = L WkjXj (3.63) sociated with a learning algorithm. A neural network can be formed with
any kind of topology (architecture). In general, three kinds of network
j=O
topologies are used [107]:
(3.64)
• Single-layer jeedjorward netwurk.s include input layer of source nodes
and
that projects onto an output layer of neurons (computation nodes),
(3.65)
but not vice versa. They are also called jeedjoTward networks. Since
where Xl, X2,···, Xj, ... , X m are the input signals; Wkl, Wk2,.·., Wkj, .. , Wkrn the computation takes place only on the output layer nodes, the input
are the synaptic weights of neuron k, Uk is the linear combiner output of the layer does not count as a layer (Figure 3.5 (a) ).
62 Chapter 3. Multivariate Statistical Monitoring Techniques 3.6. Nonlinear Methods for Diagnosis 63

• Multi-layer feedforward networks contain an input layer connected to desired response represents the optimum action to be performed to adjust
one or more layers of hidden neurons (hidden units) and an output neural network weights under the influence of the training vector and er-
layer (Figure 3.5(b)). The hidden units internally transform the data ror signal. The er'Tor signal is the difference between the desired response
representation to extract higher-order statistics. The input signals (historical value) and the actual response (computed value) of the network.
are applied to the neurons in the first hidden layer, the output signals This corrective algorithm is repeated iteratively until a preset convergence
of that layer are used as inputs to the next layer, and so on for the rest criterion is reached. One of the most widely used supervised training algo-
of the network. The output signals of the neurons in the output layer rithms is the ermr back-propagation or generalized delta rule [257, 321]. The
reflect the overall response of the network to the activation pattern alternative is learning without a teacher in which the network must find the
supplied by the source nodes in the input layer. This type of network regularities in the training data by itself. This paradigm has two subgroup-
is especially useful for pattern association (i.e., mapping input vectors s: Reinforcement learning and unsupervised learning. In Reinforcement
to output vectors). leaming/Neurodynarnic programming, where learning the relationship be-
tween inputs and outputs is performed through continued interaction with
• Recurrent networks differ from feedforward networks in that they have the environment to minimize a scalar index of performance [19]. In unsu-
at least one feedback loop. An example of this type of network is given pemised learning, or selrorganized learning there is no external teacher to
in Figure 3.5(c) which is one of the earliest recurrent networks called oversee the learning process. Once the network is tuned to the statistical
Jordan network [131]. The activation values of the output units are regularities of the input data, it forms internal presentations for encoding
fed back into the input layer through a set of extra units called the the input automatically [17, 107].
state units. Learning takes place in the connection between input and There are many educational and commercial software packages available
hidden units as well as hidden and output units. Recurrent networks for development and deployment of ANNs. Some of those packages such
are useful for pattern sequencing (i.e., following the sequences of the as Gensym's NeurOn-Line® Studio include data preprocessing modules to
network activation over time). The presence of feedback loops has filter or scale data and eliminate outliers [89].
a profound impact on the learning capability of the network and on
its performance [107]. Applications to chemical process modeling and Autoassociative Neural Networks
identification have been reported [32, 310, 345]. Autoassociative neural networks provide a special five-layer network
structure (Figure 3.6) that can implement nonlinear PCA by reducing vari-
Techniques for network architecture selection for feedforward networks able dimensionality and producing a feature space map that retains the
have been proposed [151, 166,234,317,318]. Once the network architecture maximum possible amount of information from the original data set [150].
is specified, an input-output data set is used to train the network. This Autoassociative neural networks use conventional feedforward connections
involves the computation of appropriate values for the weights associated and sigmoidal or linear nodal transfer functions.
with each interconnection. The data are propagated forward through the The network has three hidden layers, including a 'bottleneck' layer which
network to generate an output to be compared with the actual output, and is of a smaller dimension than either the input layer or the output layer.
based on error magnitudes the weights are adjusted to minimize the error. The network is trained to perform an identity mapping by approximating
The overall procedure of training can be seen as learning for the network the input information at the output layer. Since there are fewer nodes in
from its environment through an interactive process of adjustments applied the bottleneck layer than the input or output layers, the bottleneck nodes
to its weights and bias levels. A number of learning rules such as error- implement data compression and encode the essential information in the
cOTTection, memory-based, Hebbian, competitive, Boltzmann learning have inputs for its reconstruction in subsequent layers. In the NLPCA framework
been proposed [107] to adjust network weights. and terminology, autoassociative neural networks seek to provide a mapping
There are two learning paradigms that determine how a network relates of the form
to its environment. In supervised learning (learning with teacher), a teacher T = G(X) (3.70)
provides output targets for each input pattern, and corrects the network's
errors explicitly. The teacher has knowledge of the environment (in the where G is a nonlinear vector function composed of f individual nonlinear
form of a historical set of input-output data) so that the neural network functions G = [G 1 G 2 ... Gf] analogous to the loading vectors P. The
is provided with desired response when a training vector is available. The inverse transformation that reconstructs the input information and restores
64 Chapter 3. Multivariate Statistical Monitoring Techniques 3.6. Nonlinear Methods for Diagnosis 65

the original dimensionality of the data is implemented by a second nonlinear Kernel Density Estimation
vector function H = [HI H 2 ... H m ] where m is the number of variables
The density function j of any random quantity x gives a natural description
Xj = Hj (T) and E = Y - Y (3.71 ) of the distribution of a data set x and allows probabilities (P) associated
with x to be found as follows [268],
where the residual E indicates the loss of information that is minimized by
the selection of functions G and H. The number of bottleneck nodes is
similar to the number of principal components retained for the selection of
the subspace dimension that retains relevant information in data.
P{a<x<b}= l bj
(X)dX Va<b (3.74)

The limitations of autoassociative NNs to implement NLPCA is dis- A set of observed data points is assumed to be available as samples
cussed by Malthouse [183]. Their use in process monitoring problems is from an unknown probability density function. Density estimation is the
reported in [55,61]. construction of an estimate of the density function from the observed da-
ta. In parametric approaches, one assumes that the data belong to one
of a known family of distributions and the required function parameters
3.6.2 Kernel-Based Techniques are estimated. This approach becomes inadequate when one wants to ap-
If one does not wish to bias the boundaries of the NO region of a system, proximate a multi-model function, or for cases where the process variables
kernel density estimation (KDE) can be used to find the contours under- exhibit nonlinear correlations [127]. Moreover, for most processes, the un-
neath the joint probability density of the PC pair, starting from the one derlying distribution of the data is not known and most likely does not
that captures most of the information. Below, a brief review of KDE is follow a particular class of density function. Therefore, one has to estimate
presented first that will be used as part of the robust monitoring tech- the density function using a nonparametric (unstructured) approach.
nique discussed in Section 7.7. Then, the use of kernel-based methods for The histogram is perhaps the most common yet the simplest density
formulating nonlinear Fisher's discriminant analysis (FDA) is discussed. estimator. vVhile its computational ease and graphical representation are a
A kernel is a function K such that for all u, v E X benefit for univariate signals, the visualization of higher dimensional data
becomes problematic. To construct the histogram, one has to choose an
K(u, v) = (¢(u) . ¢(v)) (3.72) origin and a bin width. The selection of these parameters determines the
where ¢ is a mapping from the input space X to an (inner product) feature degree of smoothing inherent in the procedure. An alternative density esti-
space F and (,.) denotes the inner product. Some of the popular kernels mate is the naive estimator which can be unsatisfactory as its bin width still
are: needs to be established to produce a density estimate. Despite the simplici-
ty of the histogram and naive estimates, their discontinuous representation
• Linear support vector machines: K(u, v) = vTu of the density function causes difficulty if the derivatives of the estimate or
• Nonlinear support vector machines: K(u, v) (T + v T u)d smooth representation of the estimate are required [268]. Thus, a kernel or
a wavelet density estimation method may be preferred [265, 262].
• Radial basis function kernel: K(u, v) = exp (-Ilu - vll§/(J2) Kernel estimate with kernel K is defined by
• Multi-layer perceptron: tanh (K:]V T u + K:2) .f(x) = ~~K (.1: -h Xi) (3.75)
Kernels are symmetric functions. Mercer's theorem provides a characteri- nh~
i=1
zation of kernels: a symmetric function K (u, v) is a kernel function if and
Here, h denotes the window width, and is also referred to as the smoothing
only if the matrix
parameter. The quality of a density estimate is primarily determined by
K = (K(Ui' Vi))n_L,)- 1
(3.73)
the choice of the parameter h, and only secondarily by the choice of the
is positive semi-definite (has nonnegative eigenvalues) [42]. Mercer's condi- kernel K [265, 21]. For applications, the kernel K is often selected as a
tion holds for all (J values in the radial basis function kernels and positive symmetric probability density function, e.g., the Normal density.
values of T in polynomial kernels, but not for all positive choices of K:I and To decide how much to smooth is a critical step in density estimation. A
K:2 in multi-layer perceptron kernels [287]. number of alternative measures exist to estimate h [265]. The appropriate
66 Chapter 3. Multivariate Statistical Monitoring Techniques 3.6. Nonlinear Methods for Diagnosis 67

choice is, in fact, influenced by the purpose for which the density estimate considered can be functions
is to be used. For robust monitoring (see Section 7. 7), h is selected using
9
= L Wi¢i(X) + b
least squares cross-validation [256, 21].
f(x) (3.76)
i=l
Kernel-Based FDA
where Wi are weights, g the dimension of the feature space, b the bias, and
One way to extend discriminant analysis to nonlinear cases is to use kernel- ¢ :X ---+ F is nonlinear map from the input space X to some feature space
based FDA. Kernel methods embed data into a feature vector space and F [42]. The nonlinear machine can thus be constructed in two steps: (1) a
detect linear relationships in that space. The linear relations include regres- fixed nonlinear mapping transforms data into the feature space F and (2)
sion, classification or principal components. If the feature space is selected a linear machine is used to classify them in the feature space.
'properly', pattern recognition can be easy. Kernel-based algorithms are Consider a system with k pattern classes. The general pattern recog-
structured as two modules: A kernel function that implements embedding nition problem with k classes is to construct a decision function O'iven 1
of data into the feature space and a learning algorithm that learns in the independent and identically distributed samples of an unknown f:nction
linear feature space. Kernel methods exploit information about the inner (x1,Yd, ... ,(XI,YI) where Xi (in the attribute space X) is oflength d and
products between data items. The inner products in feature space can be Yi is of length k. The decision function f(x, a) is chosen from a set offunc-
very complex, but if the kernel is given, there is no need to specify what tions selected a priori and is defined by the parameter a for the problem
features of the data are being used. Usually the kernel function type such at hand. To select a, the loss L(y, f(x, a)) is minimized. For example, for
as polynomials, radial basis functions or splines is selected in advance ac- the binary pattern recognition problem (k = 2), a hyperplane is construct-
cording to the nature of the application and its parameters are computed ed to separate the two classes labeled Y E {-I, I} so that the distance
for the specific problem information. Mercer's theorem is used to charac- between the hyperplane and the nearest point (the margin) is maximized.
terize if a symmetric function is a kernel. A Bayesian framework has been This yields the following optimization problem:
developed for SVM classifiers, Gaussian processes and kernel FDA [306].
min Jp(w,E;)=~(w,w)+C.L~=lE;i
such that Yi((W' Xi) + b) 2: 1- E;i' i = 1, ... ,I (3.77)
3.6.3 Support Vector Machines
E;i 2: 0, i = 1, ... ,I
When linear classification tools do not provide reliable fault diagnosis, non-
linear techniques are needed. Neural network based classification has been where W is the weight vector and E;i are the slack variables. The regu-
implemented for over a decade for cases where a small number of faults larization parameter C adjusts the trade-off between the two terms of the
in a closed set are to be diagnosed [252, 63]. A shortcoming of NN-based objective function J p in Eq. 3.77. The first term represents model com-
FD is the possibility of converging to local optima during training. Sup- plexity and the second term model accuracy that is related to classification
port Vector Machines (SVM) with kernel-based learning methods provide error in the training data. For small values of C, the model does not have
another powerful alternative. SVMs are learning systems based on statis- enough detail to describe the data. Large values of C cause over-fitting.
tical learning theory [308] that use a space of linear functions in a high ~Iethods for selecting optimal values of C have been developed by taking
dimensional feature space F for classification problems. Support vectors mto account the kernel function used, the noise level and the characteristics
are representative training data points that provide the best hyperplanes of the feature space [130].
for separating various classes in the data. The aim of support vector (SV) Linear learning machines can be expressed in a dual representation,
classification is to devise a computationally efficient way of learning 'good' enabling expression of the hypotheses as a linear combination of the training
separating hyperplanes in the feature space [42]. point (x,) so that the decision rule can be evaluated by using just inner
To learn nonlinear relations with a linear machine, a set of nonlinear products between the test points (x) and the training points:
features are selected and the data are 'rewritten' in a new representation.
I
This is achieved by applying a fixed nonlinear mapping of the data to a
feature space where the linear machine can be used. The set of hypotheses f(x) = L OOiYi (¢(Xi) . ¢(x)) +b. (3.78)
i=l
68 Chapter 3. Multivariate Statistical Monitoring Techniques 3.7. Summary 69

Kernels are used to compute the inner product (¢(Xi)' ¢(x)) directly in the 3.7 Summary
feature space as a function of input points and to merge the two steps of
the nonlinear learning machine. Various statistical methods that provide the foundations for model devel-
The dual solution to this problem is: opment, process monitoring and fault diagnosis are presented in this chap-
ter. Linear techniques such as principal components analysis, partial least
max JD(O) = I:~=1 0i - ~ I:~,j=l YiYjOiOjK(Xi,Xj) (3.79) squares, canonical variates analysis and independent components analysis
such that I:i=l OiYi = 0, 0::; 0i ::; 0, i = 1, ... ,I enable the development of powerful multivariate techniques for detection
of abnormal process operation. Once an abnormality is detected and val-
where ¢ denotes a mapping ¢ : X -+ F and the kernel function K (u, v) = idated, its source cause must be determined. One approach is the use
(¢( u) . ¢( v)) has been introduced instead of the linear relationship in Eq. of contribution plots that indicate process variables that have made large
3.77. The resulting decision function is contributions to significant changes in monitoring statistics. 'When these
variables are identified, process knowledge is used to pin down the source
(3.80) cause of the abnormality. The other alternative is the use of statistical
classification methods such as Fisher's discriminant analysis for diagnosis
of source causes directly. Both detection and diagnosis techniques can be
In both the dual solution and decision function, only the inner product developed using a nonlinear approach. Nonlinear methods like neural net-
in the attribute space and the kernel function based on attributes appear, works, kernel density estimation and support vector machine are introduced
but not the elements of the very high dimensional feature space. The in the last section to provide insight into the deployment of such monitoring
constraints in the dual solution imply that only the attributes closest to the and diagnosis tools.
hyperplane, the so-called SVs, are involved in the expressions for weights
w. Data points that are not SVs have no influence and slight variations in
them (for example caused by noise) will not affect the solution. ~i provides
a more quantitative leverage against noise in data that may prevent linear
separation in feature space [42]. Imposing the requirement that the kernel
satisfies Mercer's conditions (K(Xi, Xj );,1=1 must be positive semi-definite)
means that the matrix YiYj(K(Xi,Xj)L=l is also positive semi-definite.
Consequently, the optimization in Eq. 3.79 is convex and has a unique
solution that can be found efficiently, ruling out the problem of local minima
encountered in training neural networks [42].
The k-class pattern recognition problem with SVMs was initially solved
by using one-against-the-rest and one-against-one classifiers. Recently, k:-
class SVMs have been proposed [324]. The optimization problem Eq. 3.79
is generalized to yield the decision function

(3.81 )

where
k

=2:: rn=l

and the inner product (X'i . x) can be replaced with the kernel function
K(Xi,Xj)
70 Chapter 3. Multivariate Statistical Monitoring Techniques 3.7. Summary 71

Input layer of Output layer of


source nodes neurons

(a) Single-layer feed- G function Hfunction


forward network.
~ ~

Input layer of Layer of hidden Layer of output


source nodes neurons neurons INPUT MAPPING 80TTLE- DE- OUTPUT
LAYER LAYER NECK MAPPING LAYER
(b) Multi-layer feedforward net- LAYER LAYER
work.

Figure 3.6. Network architecture for determination of f nonlinear factors


using an autoassociative neural network. (J indicates nodes with sigmoidal
functions, * indicates nodes with sigmoidal or linear functions [150].

(c) Recurrent network as in [131].

Figure 3.5. Three fundamentally different network architectures.


4

Empirical Model
Development

Process models may be developed by using either first principles such as


material and energy balances, or process input and output information (da-
ta). First principles (fundamental) models describe the internal dynamics
of the process based on physical, chemical or biological laws, and explain
the behavior of the process. But the cost of model development is high.
They may be biased by the views and speculations of the model developer,
and are limited by the lack of information about specific model parameters.
Often, some physical, chemical or transport parameters are computed
using empirical relations, or they are derived from experimental data. In
either case, there is some uncertainty about their accuracy. As details are
added to the model, it may become too complex and too large to run model
computations on the computer within an acceptable amount of time. Fun-
damental models developed may be too large for computations that are fast
enough to be used in process monitoring and control activities. These activ-
ities require fast update of model predictions so that regulation of process
operation can be made in a timely manner.
The alternative model development paradigm is based on developing
relations based on process data. Input-o'utput modeL, are much less expen-
sive to develop. However, they only describe the relationships between the
process inputs and outputs, and their utility is limited to features that are
included in the available data sets. There are numerous well-established
techniques for linear input-output model development. Methods for devel-
opment of linear models are easier to implement and more popular. Since
most monitoring and control techniques are based on the linear framework,
use of linear models is a natural choice. The design of experiments to collect
data and the amount of data available have an impact on the accuracy and
predictive capability of the model developed. Data collection experiments
should be designed such that all key features of the process are excited in

73
74 Chapter 4. Empirical Model Development 4.1. Regression Models 75

the frequency ranges of interest. These models can be used for interpolation also been proposed. Several nonlinear time series modeling techniques have
but they should not be used for extrapolation. been reported. Nonlinear system science methods provide a different frame-
Nonlinear empirical models are more accurate over a wider range of op- work for nonlinear model development and model reduction. This chapter
erating conditions and they are more appealing for processes with strong focuses on linear data-based modeling techniques. References are provided
nonlinearities. Various nonlinear input-output model development tech- for their extensions to the nonlinear framework.
niques have been proposed during the last fifty years, but they have not. Various multivariate regression techniques are outlined in Section 4.1.
been widely accepted. The model structures are dependent on the type of Section 4.2 introduces PCA-based regression and its extension to capture
nonlinearities in the data. Since the model may have terms that are com- dynamic variations in data. PLS regression is discussed in Section 4.3.
posed of combinations of inputs and/or outputs, exciting and capturing Input-output modeling of dynamic processes with time series models is
the interactions among variables is crucial. Hence, the use of routine oper- introduced in Section 4.4 and state-space modeling techniques are presented
ational data for model development, without any consideration of exciting in Section 4.5.
the key features of the model, may yield good fits to the data, but pro-
vide models that have poor predictive ability. The amount of data needed 4.1 Regression Models
for model development is the smallest for first principle models, moderate
for linear input-output models, and the largest for nonlinear input-output Models between groups of variables such as process measurements X m xl
models. and quality variables Y q x 1 can be developed by using various regression
As manufacturing processes have become increasingly instrumented in techniques. Here, the subscripts indicate the vector dimensions (number
recent years, more variables are being measured and data are being rec~rded of variables). If n samples have been collected for each group of variables,
more frequently. This yields data overload, and most of the useful mfor- the data matrices are X nxm and Y nxq' The existence of a model pro-
mation may be hidden in large data sets. The correlated or redundant vides the opportunity to predict process or product variables and compare
information in these process measurements must be refined to retain the the measured and predicted values. The residuals between the predicted
essential information about the process. Process knowledge must be ex- and measured values of the variables can be used to develop various SPM
tracted from measurement information, and presented in a form that is easy techniques (residuals-based univariate SPM was discussed in Section 2.3.1)
to display and interpret. Various methods based on multivariate statistics, and tools for identification of variables that have contributed to the out-of-
systems theory and artificial intelligence are presented in this chapter for control signal.
data-based input-output model development. Consider a process with two measured variables (m = 2) and one quality
Models are developed to satisfy different types of objectives. One case is variable (q = 1) that are related by a linear model. The term linear is used
the interpretation and modeling of one block of data such as measurements to indicate that the equation that relates the regressors x = to
of process variables. Principal components analysis (PCA) may be useful the response (dependent) variable Yl is a linear function of the equation
for this to retain essential process information while reducing the size of parameters {3. The model equation that can be used for predicting new
the data set. A second case is the development of a relationship between values of Yj given values of x are
two groups of data such as process variables and product variables, i.e.,
(4.1 )
the regression problem. PCA regression or partial least squares (PLS) re-
gression techniques would be good candidates for addressing this problem. where Po is the constant (intercept) tenn. The relationship may be more
Discrimination and classification are activities also related to process mon- complex where interactions of the variables (Xj:r:2) or polynomial terms of
itoring that lead to fault diagnosis. One can consider PCA and PLS based regressors (for example, xi or can also be included in the model. For
techniques as well as artificial neural networks (ANN) and knowledge-based example, a second-order model with interaction for the same variables as
systems for such problems. Since all these techniques are based on process above is:
data, the reliability of data is critical for obtaining dependable results from (4.2)
the implementation of these techniques.
ANNs (Section 3.6.1) provide one framework for nonlinear model de- The interaction implies that the effect caused by changing one regressor
velopment. Extensions of PCA and PLS to develop nonlinear models have variable depends on the level of the other regressor variable in the tenn.
76 Chapter 4. Empirical Model Development 4.1. Regression Models 77

The response surface of the models with such nonlinearities are not linear centered, unit-variance) and computing correlations and coefficients of de-
any more, but as long as the equation is linear in regression coefficients (3, termination.
it is considered a linear regression model. m
The model equation for multivariable linear regression can be general- Xij - Xj
ized and written in a compact form by using matrices for each of the q dj
d] = 2)Xij - ?, 1, .. ·,n,j=1, .. ·,m (4.7)
i=1
dependent variables y for the n data sets
There is significant degree of colinearity among some predictor variables if
Ynxl = Znx xl + EnXl (4.3) the following conditions hold:

where E is the random error which accounts for measurement error and 1. The correlation between any hvo predictors exceeds 0.95 (only colin-
effects of other variables not explicitly considered in the model, and earity between two predictors can be assessed).

z ~ l; Z11
Z2l

Znl
Z12
Z22

2n 2
Z'm
Z2m

2~m
j (4.4)

with the first column of Z being a multiplier of the constant term in (3. It
2. The coefficient of determination R] of each predictor variable j re-
gressed on all the other predictor variables exceeds 0.90, or the vari-
ance inflation factor VIF j = (1 - R;)-l is less than 10 (variable ,j
is colinear with one or more of the other predictors). V I F j is the
(j"j)th diagonal element of the matrix yTy-l where Y = [2ij], R;
can be computed from the relationship between R; and V I F j .

is assumed that E( E) = 0 and V aT( E) = 0' 21. The value of the dependent 3. Some of the eigenvalues of the correlation matrix yTy are less than
variable for a new observation at Zo is f) = zo(3· 0.05. Large elements of the corresponding eigenvectoTs identify the
The equations for various dependent variables Y can be developed sepa- predictor variables involved in the colinearity.
rately since it is assumed that they do not affect each other. The equations
can be expressed in compact form as 4. The determinant of XTX has a value between 0 and 1. In this case,
the smaller the value of the determinant, the higher the degree of
Y nxq = Znx(m+l)(3(m+l)Xq + Enxq (4.5) colinearity.

5. One or more eigenvalues of XTX having values near 0 implies the


with
presence of colinearity.
(4.6)

and the covariance matrix :E = [O'ij], but the observations from different Regression techniques that can deal with colinear data include stepwise
trials are uncorrelated [126]. Multivariable linear regression is usually used regression, ridge regression, principal components regression, and partial
for steady-state data, but by adding lagged values of variables one can least squares (PLS) regression. The last two approaches are discussed in
extend it for time-varying data. However, time series models and state- Sections 4.2 and 4.3.
space models are more useful for developing dynamic system models. Stepwise regression
Colinearity among process variables can have a significant impact on Stepwise regression is one of the early techniques that can deal with co-
the accuracy of the multivariable regression model and predictions. Colin- linear data [108, 203]. Predictor variables are added to or deleted from the
earity causes numerical difficulties in computing the inverse (XTX)-l or prediction (regression) equation one at a time. Stepwise variable selection
(ZTZ)-l because some columns of X are almost identi~al and conseql:e~lt­ procedures are useful when a large number of candidate predictors is avail-
Iv the determinant is almost zero. This causes uncertamty and sensItIvIty able. It is expected that only one of the strongly c:olinear variables will be
i;l the estimates of (3. The standard errors of the estimates of regression included in the model. Major disadvantages of stepwise regression are the
coefficients (3 associated with the colinear regressors become very large. limitations in identifying alternative candidate subsets of predictors, and
Colinearity can be detected by standardizing all predictor variables (mean- the inability to guarantee the optimality of the final model.
78 Chapter 4. Empirical Model Development 4.3. PLS Regression Models 79

Ridge Regression PCA to reduce the effect of noise and to optimize the predictive power of
The regression coefficients are biased by introducing a parameter along the PCR model. This is generally done by using cross-validation. Then,
the diameter of ZTZ [109]. The computation of regression coefficients (3 in the regression equation becomes
Eq. 4.3 is modified by introducing a ridge parameter fl,:
y = T(3 +E (4.11)
(4.8)
where the optimum matrix of regression coefficients (3 is obtained as
Standardized ridge estimates {3j with j = 1, ... ,m are calculated for a range
of values of fl, and plotted versus fl,. This plot is called a Tidge tmce. The (4.12)
(3 estimates usually change dramatically when fl, is initially incremented by
a small amount from O. As fl, is increased, the trace stabilizes. A fl, value In contrast to the inversion of XTX when some of the x are colin-
that stabilizes all (3 coefficients is selected and the final values of (3 are ear, the inversion of TTT does not cause any problems due to the mutual
estimated. orthogonality of the scores. Score vectors corresponding to small eigenval-
A good estimate of the fl, value is obtained using ues can be left out in order to avoid colinearity problems. Since principal
components regression is a two-step method, there is a risk that useful pre-
(4.9) dictive information would be discarded with a principal component that is
excluded. Hence caution must be exercised while leaving out vectors corre-
sponding to small eigenvalues. If regression based on the original variables
where are the least squares estimates for the standardized predictor
x is preferred, the most important variables can be selected by inspecting
variables, and IV!SE is the least squares mean squared error, SSE/ (n -
the variables that contribute to the first few loadings and avoiding those
m 1).
that provide duplicate information.
Ridge regression estimators are biased. The trade-off for stabilization
To include information about process dynamics, lagged variables can
and variance reduction in regression coefficient estimators is the bias in the
be included in X. The (auto)correlograms of all x variables should be
estimators and the increase in the squared error.
developed to determine first how many lagged values are relevant for each
Nonlinear regression models are generated when one or more of the
variable. Then the data matrix should be augmented accordingly and used
coefficients (3 are part of a nonlinear term [12] such as
to determine the principal components that will be used in the regression
step.
(4.10)
Nonlinear extensions of PCA have been proposed by using autoasso-
Chemical reaction rate terms are a familiar example to most chemists and ciative neural networks discussed in Section 3.6.1 (an illustrative exam-
chemical engineers. Sometimes, it is possible to make the equation linear by ple is provided in Section 7.7.1) or by using principal curves and surfaces
using a transformation such as taking the log. Otherwise, the computation [106, 161].
of regression parameters become more complex.

4.3 PLS Regression Models


4.2 peA Models
Partial least squares (PLS) regression, develops a biased regression model
Principal components regression (PCR) is one of the techniques to deal with between X and Y. In the context of chemical process operations, usually X
ill-conditioned data matrices by regressing the dependent variables such denotes the process variables and Y the quality variables. PLS selects latent
as quality measurements on the principal components scores of regressor variables so that variation in X which is most predictive of the product
variables such as the measured variables (flow rates, temperature) of the quality data Y is extracted. PLS works on the sample covariance matrix
process. The implementation starts by representing the data matrix X (XTy)(yTX) [86,87,111,172,188,334,338]. Measurements ofm process
with its scores matrix T using the transformation T = XP. The number variables taken at n different times are arranged into a (n x m) process
of principal components to retain in the model is determined as in the data matrix X. The q quality variables are given by the corresponding
80 Chapter 4. Empirical Model Development 4.3. PLS Regression Models 81

(n x q) matrix Y. PLS modeling works better when the data are fairly column of Y with greatest variance is chosen. Starting in the X data block
symmetrically distributed and have fairly constant 'error variance' [67]. for the first latent variable:
Both X and Y data blocks are usually centered and scaled to unit variance
because in PLS the influence of a variable on model parameters increases T_
WI -
uTx tl =
XWl
(4.15)
with the variance of the variable. The PLS model can be built by using IluTUl11 IlwTwll1
the nonlinear iterative partial least squares algorithm (NIPALS). The PLS
In the Y data block:
model consists of outer relations (X and Y blocks individually) and an
inner relation (linking both blocks) (Figure 4.1). The outer relations for
qT tTy Yql
the X and Y blocks are respectively Ul= (4.16)
IltTtJil I!qTqll1
0-

X = TpT +E = LtiP[ +E (4.13) Convergence is checked by comparing t l in Eq. 4.15 with the tl from
i=l the previous iteration. If their difference is smaller than a prespecified
0-
threshold, one proceeds to Eq. 4.17 to calculate X data block loadings PI
Y = UQT +F = LUiq[ +F (4.14)
and weights WI are rescaled using the converged UI. Otherwise, UI from
i=l
Eq. 4.16 is used for another iteration. If Y is univariate, Eqs. 4.16 can be
where E and F represent the residuals matrices. Linear combinations of x omitted, and ql = 1. The loadings of the X data block and are computed
vectors are calculated from the latent variable scores ti = w; x and those and the scores and weights are rescaled:
for the y vectors from Ui = q; Y so that they maximize the covariance
between X and Y explained at each dimension. Wi and qi are the weight T tTX PIa
vectors and Pi are the loading vectors of X. The number of latent variables (4.17)
PI = IltTtll1 ' PIn = IIPlal1
can be determined by cross-validation [332] or more pragmatic techniques
discussed in Section 3.1. (4.18)
pT where the subscript 0 refers to old and n to new values. The regression
T u coefficient bi for the inner relation is computed using

[~J
T
b-~ (4.19)
I - IltTtJil

T \Vhen the scores, weights, and loadings have been determined for a latent
W variable (at convergence), X- and Y-block matrices are adjusted to exclude
the variation explained by that latent variable. Equations 4.20 and 4.21

~J
illustrate the computation of the residuals after the first latent variable
and weights have been determined:

(4.20)
Figure 4.1. The matrix relationships in PLS as shown by [67]. T and U
(4.21 )
are PLS scores matrices of X and Y blocks, respectively, P contains the X
loadings, Wand Q are weight matrices for X and Y blocks, respectively, The entire procedure is repeated for finding the next latent variable and
and E and F are residual matrices of X and Y blocks. weights starting with Eq. 4.15. The variations in data matrices X and Y
explained by the earlier latent variables are excluded from X and Y bv
For the first latent variable, PLS decomposition is started by selecting replacing them in the next iteration with their residuals that contain une;-
Yj, an arbitrary column of Y as the initial estimate for Ul. Usually, the plained variation. After the convergence of the first set of latent variables
82 Chapter 4. Empirical Model Development 4.4. Input-Output Models of Dynamic Processes 83

to their final values, X and Yare replaced with the residuals E l and F 1, higher-order polynomials, models of lower degree and higher number of
respectively, and all subscripts are incremented by 1. knots should be considered for lower prediction errors and improved stabil-
Several enhancements have been made to the PLS algorithm [48, 93, ity [333]. B splines provide an attractive alternative to quadratic and cubic
169, 184, 336, 333, 339]. Commercial software is available for developing splines when the number of knots is large [49]. Other nonlinear PLS mod-
PLS models [328, 269]. els that rely on nonlinear inner relations have been proposed [61, 96, 288].
Nonlinear relations within X or Y can also be modeled.
Nonlinear PLS Models
To model nonlinear relationships between X and Y, their projections
should be nonlinearly related to each other [336]. One alternative is the 4.4 Input-Output Models of Dynamic Pro-
use of a polynomial function such as
cesses
(4.22)
Time series models have been popular in many fields ranging from modeling
where i represents the model dimension, COi, Cli, and C2i are constants, stock prices to climate. They could be cast as a regression problem where
and ti is a vector of errors (innovations). This quadratic function can be the regressor variables are the previous values of the same variable and past
generalized to other nonlinear functions of ti: values of inputs. They are 'black box' models that describe the relation-
ship of the present value of the output to external variables but do not
(4.23) provide any knowledge about the physical description of the processes they
represent. It will be assumed that data are collected using a fixed sampling
where fe) may be a polynomial, exponential, or logarithmic function.
rate (the sampling time between any two consecutive samples is identical).
Another structure for expressing a nonlinear relationship between X
Time series models relate the current value of the observed variable to
and Y is splines [333] or smoothing functions [75]. Splines are piecewise
polynomials joined at knots (denoted by Zj) with continuity constraints • Past values of the observed variable: Autoregressive terms (up to
on the function and all its derivatives except the highest. Splines have order p) AR
good approximation power, high flexibility and smooth appearance as a
result of continuity constraints. For example, if cubic splines are used for • Integrated AR terms (up to order d) I
representing the inner relation: • Past values of the prediction error or past values of the predicted
s values: Moving average terms (up to order r) MA
'U = bo + blt + b2t 2 + b3t 3 + L bj+3(t - Zj)t (4.24)
j=l • Past values of control signals and known disturbances: Exogenous
variables X.
where the s knot locations and the model coefficients bi are the free param-
eters of the spline function. There are l + s + 1 coefficients where l is the The prediction error e (k) is the difference between the observed and
order of the polynomial. The term bj +3 (t - denotes a function with the predicted values at a specific time e(k) = y(k) - f)( k). An autoregressive
values 0 or bj+3 (t Zj)3 depending on the value of t: - integrated - moving average model is represented as ARIMA(p, d, r). For
example, ARIMA(O,l,l) = IMA(l,l) indicates:
t > Zj (4.25)
t < Zj (y(k) y(k - 1)) e(k) + ee(k - 1) (4.26)
y(k) y(k - 1) + e(k) + ee(k - 1)
The desirable number of knots and degrees of polynomial pieces can be
estimated using cross-validation. An initial value for s can be n/7 or V(n) where e is a parameter for the MA term. Many processes can be approx-
for n > 100 where n is the number of data points. Quadratic splines can be imated by an ARIMA(p, d, r) with p, r ::::; 2 and d = 0 or 1. Time series
used for data without inflection points, while cubic splines provide a general models are often developed by using a data set consisting of individual
approximation for most continuous data. To prevent over-fitting data with observations over time.
84 Chapter 4. Empirical Model Development 4.4. Input-Output Models of Dynamic Processes 85

Design the
• Validate model.
experiment and
collect data Model identification is an iterative process. There are several software pack-
ages with modules that automate time series model development. When
Data a model is developed to describe data that have stochastic variations, one
has to be cautious about the degree of fit. By increasing model complexity
Should data \----.1 Reconcile and (adding extra terms) a better fit can be obtained. But, the model may
be filtered? present data
describe part of the stochastic variation in that particular data which will
Data not occur identically in other data sets. Consequently, although the fit to
the 'training' data may be improved, the prediction errors may get worse.
Choice of Fit the model Model Inputs, outputs and disturbances will be denoted as u, y, and d, re-
model structure f----.I to data spectively. For multivariable processes where n1(k), n2(k), "', nm(k) are
the m inputs, the input vector u( k) at time k is written as a column vector.
Similarly, the p outputs are defined by a column vector:

Data Model structure


not OK not OK Y1 (k) )
u(k) y(k) = : (4.27)
(
Yp(k)
No Is the model
acceptable?
Disturbances d(k) and residuals e(k) are also represented by column vectors
Yes
with appropriate dimensions in a similar manner.
A general linear discrete-time model for a variable y( k) can be written
as
Figure 4.2. Model identification strategy suggested by Ljung [171].
y(k) = 1)(k) + w(k) (4.28)
where w(k) is a disturbance term such as measurement noise and T)(k) is
Model development (also called system identification) involves several
the noise-free output
critical activities including design of experiments and collection of data,
l](k) = G(q,B)u(k) (4.29)
data pretreatment, model fitting, model validation and acceptability of the
model for its use. A vast literature has been developed over the last 50 with the rational function G(q, B) and input n(k). q is called the shift
years in various aspects of model identification [99, 170, 174, 246, 278]. A operator and q-1 the backward shift operator such that
schematic diagram in Figure 4.2 where the ovals represent human activities
and decision making steps and the rectangles represent computer-based y(k - 1) = q-1 y (k) (4.30)
computations and decisions illustrates the links between critical activities.
and B represents the model parameters such as fi and bi in Eq. 4.3l.
A short list complements the process outlined in Figure 4.2:
The function G(q, B) relates the inputs to noise-free outputs whose values
• Design and perform experiments, collect data, are not known because the outputs are corrupted by measurement noise.
Assume that relevant information for the current value of output y(k) is
• Plot data and autocorrelation functions, postulate structure of time provided by past values of y(k) with a window length (number of previous
series model, sampling times) n y and past values of u(k) for n u previous instances. The
• Estimate model parameters: relationship between these variables is

- Form one-step-ahead predictor, l](k) + hl](k - 1) + ... + 1)(k ny )


- Compute 'least squares' estimates, = b1n(k) + b2n(k - 1) + ... + bn"v.(k: - (nu 1)) (4.31 )
86 Chapter 4. Empirical Model Development 4.4. Input-Output Models of Dynamic Processes 87

where fi, i = 1,2, ... , n y and bi , i = 1,2, ... , n u are parameters to be • Output Error (OE) model. When the properties of disturbances
determined from data. Writing Eq. 4.31 by using two polynomials in q are not modeled and the noise model H (q) is chosen to be identity
(n c = 0 and nd = 0), the noise source w(k) is equal to e(k), the
TJ(k) (1 + .!lq-l + ... + fnyq-n y )
difference (error) between the actual output and the noise-free output.
'u(k) (b 1 + b2 q-l + ... + bnuq-Cnu-l)) (4.32) • AutoRegressive Moving Average model with eXogenous in-
puts (ARMAX). If the same denominator is used for C and H
and defining the polynomials

F(q) (l + .!lq-l + + fnyq-n v ) (4.39)

B(q) (b 1 + b2 q-l + + bnuq-Cnu-l)) (4.33) Hence, Eq. 4.38 becomes

Equation 4.31 can be written in a compact form as A(q)y(k) = B(q)u(k) + C(q)c(k) (4.40)

TJ(k) = C(q, e) u(k) with C(q, e) = ;~:~ (4.34) where A(q)y(k) is the autoregressive term, C(q)c(k) is the moving
average of white noise, and B(q)u(k) represents the contribution of
If there is a delay in the effects of inputs on the output by nk sampling external inputs. Use of a common denominator is reasonable if the
times, Eq. 4.31 is modified as dominating disturbances enter the process together with the inputs.

TJ(k) + hf)(k l)+"'+fn y TJ(k-n y) (4.35) • AutoRegressive model with eXogenous inputs (ARX). A spe-
cial case of ARMAX is obtained by letting C (q) = 1 (n c = 0).
= b1u(k nk) + b2 u(k - (nk + 1)) + ... + bnu u(k - (n u + nk - 1))
These models are used for prediction of the output given the values of
The disturbance term can be expressed in the same way
inputs and outputs in previous sampling times. Since white noise cannot
w(k) = H(q, e)c(k) be predicted, its current value c( k) is excluded from prediction equations.
(4.36)
Predicted values are denoted by a (hat) over the variable symbol. To
A

where c( k) is white noise and emphasize that predictions are based on a specific parameter set e, the
nomenclature is further extended to f)(k I e).
H(q e) = C(q) = 1 + Clq-l + + cncq-n c The computation of parameters e is usually cast as a minimization prob-
(4.37)
, D(d) 1 + d1q-l + + dndq-n d lem of prediction errors e(k, e) = y(k) - f)(k I e) for given sets of data over
a time period. For n data points
The model (Eq. 4.28) can be written as

y(k) = C(q, e)u(k) + H(q, e)c(k) (4.38) (4.41 )

where the parameter vector e contains the coefficients bi , Ci, d i and fi of the
transfer functions C(q, e) and H(q, e). The model structure is described where argmin denotes the minimizing argument. This criteria must be
by five parameters ny, n u , nk, n c , and nd. Since the model is based modified to prevent over-fitting of data. The objective function in Eq. 4.41
on polynomials, its structure is finalized when the parameter values are can be reduced by adding more parameters to the model. The resulting
selected. These parameters and the coefficients are determined by fitting model may fit the data used for model development very well including part
candidate models to data and minimizing some criteria based on reduction of the noise in the data. But the over-fitting may cause large prediction
of prediction error and parsimony of the model. errors when new data are used with the model. Several criteria have been
The model Eq. 4.38 is known as the Box-Jenkins (BJ) model [23]. It proposed to balance model fit and model complexity. Two of them are
has several special cases: given here to illustrate how accuracy and parsimony are balanced:
88 Chapter 4. Empirical Model Development 4.5. State-Space Models 89

• Akaike's Information Criterion (AIC) for the appropriate nonlinear structure is part of the model developmen-

min ( 1 + -2 I) I:>2(i, e)
n
(4.42)
t effort. Use of a nonlinear model development paradigm which is not
compatible with the types of nonlinearities that exist in data can have a
l,e n i=l significant negative effect on model development effort and model accuracy.
A new methodology has been proposed for developing multivariable
where I is the number of parameters estimated (dimension of e). additive NARX (Nonlinear Autoregressive with eXogenous inputs) models
• Final Prediction Error (FPE) based on subspace modeling concepts [50]. The model structure is similar
to that of a Generalized Additive Model (GAM) and is estimated with a

mm
. (1 + I -1) ~
I.e
lin
1 - Inn
2 .
L...,;e (z,e)
i=l
(4.43)
nonlinear Canonical Variates Analysis (CVA) algorithm called CANALS.
The system is modeled by partitioning the data into two groups of variables.
The first is a collection of 'future' outputs, the second is a collection of
past input and outputs, and 'future' inputs. Then, future outputs are
The merits and limitations of these and other criteria are discussed in the
predicted in terms of past and present inputs and outputs. This approach
literature [170, 278]
is similar to linear subspace state-space modeling [159, 211, 307]. The
Nonlinear Time Series Models appeal of linear and nonlinear subspace state-space modeling is the ability
The linear model structures discussed in this section can handle mild to develop models with error prediction for a future window of output
nonlinearities. They can also result from linearization around an operating (window length selected by user) and with a well-established procedure that
point. Simple alternatives can be considered for developing linear models minimizes trial-and-error and iterations. An illustrative example of such
with better predictive capabilities than a traditional ARMAX model for modeling is presented based on a simulated continuous chemical reactor
nonlinear processes. If the nature of nonlinearity is known, a transformation that exhibits multiple steady-states in the outputs for a fixed level of the
of the variable can be utilized to improve the linear model. A typical input [50].
example is the knowledge of the exponential relationship of temperature
in reaction rate expressions. Hence, the log of temperature with the rate
constant can be utilized instead of the actual temperature as a regressor. 4.5 State-Space Models
The second method is to build a recursive linear model. By updating model
parameters frequently, mild nonlinearities can be accounted for. The rate State variables are the minimum set of variables that are necessary to de-
of change of the process and the severity of the nonlinearities are critical scribe completely the state of a system. The n state variables of a system
factors for the success of this approach. Another approach is based on the at time t is represented as x(t) [X1(t) X2(t) .. . xn(t)]T. In quantitative
estimation of nonlinear systems by using multiple linear models [11, 82, 83]. terms, given the values of state variables x(t) at time to and the values of
Time series modeling is extended to nonlinear models by using a vari- inputs u(t) (Eq. 4.27) for t > to, the values of outputs y(t) can be comput-
ety of structures. These models have the capability to describe patholog- ed for t > to. All process variables of interest can be included in a model
ical dynamic behavior and to provide accurate predictions over a wider as state variables while the measured variables can form the set of output
range of operating conditions compared to linear models. ANNs were variables. This way, the model can be used to compute all process variables
introduced in Section 3.6.1. Various other nonlinear model developmen- based on measured values of output variables and the state-space model.
t paradigms include Volterra kernels [185, 315], cascade (block-oriented) In this section, classical state-space models are discussed first. They
models [97, 157, 187, 314], polynomial models, threshold models [297], and provide a versatile modeling framework that can be linear or nonlinear,
models based on spline functions. Polynomial models include bilinear mod- continuous- or discrete-time, to describe a wide variety of processes. State
els [72,201], state-dependent models [233], nonlinear autoregressive moving variables can be defined based on physical variables, mathematical solution
average models with exogenous inputs (NARMAX) [30, 31, 167, 231], non- convenience or ordered importance of describing the process. Subspace
linear polynomial models with exponential [98] and trigonometric functions models are discussed in the second part of this section. They order state
(NPETM), and multivariate adaptive regression splines (MARS) [76]. A variables according to the magnitude of their contributions in explaining
unified nonlinear model development framework is not available, and search the variation in data. State-space models also provide the structure for
90 Chapter 4. Empirical Model Development 4.5. State-Space Models 91

developing state estimators where one can estimate corrected values of state model becomes
variables, given process input and output variables and estimated values of
process outputs. x(t) Ax(t) + Bu(t)
State-space models relate the variation in state variables over time to y(t) Cx(t) + Du(t) (4.49)
their values in the immediate past and to inputs with differential or differ-
ence equations. Algebraic equations are then used to relate output variables where the dimensions of the coefficient matrices are A nxn , B nxm , C pxn
to state variables and inputs at the same time instant. Consider a system and D pxm , respectively.
of first-order differential equations (Eq. 4.44) describing the change in state The linear discrete-time model for k 0,1,2,' .. is
variables and a system of output equations (Eq. 4.45) relating the outputs
to state variables: x(k + 1) Fx(k) + Gu(k)
~~ = x(t) = f(x(t), u(t)) (4.44) y(k) Cx(k) + Du(k) (4.50)

y(t) = h(x(t), u(t)) (4.45) Matrices A and B are related to matrices F and G as

If x( t) and u( t) are known at time to, x( to) can be computed using Eq. (4.51)
4.44. For an infinitesimally small interval 5t, one can compute x(to + 5t)
using Euler's method
where the sampling interval T = 5t is assumed to be equal for all values of
x(to + 5t) = x(to) + 5t. f(x(to), u(to)) (4.46) k. Since the coefficient matrices have constant elements, these models are
called linear time-invariant models. Mild nonlinearities in the process can
Then, the output y(to + 5t) can be computed using x(to + 5t) and Eq. 4.45. often be described better by making the matrices in model equations (Eqs.
This computation sequence can be repeated to compute values of x(t) and 4.49 and 4.50) time dependent. This is indicated by symbols such as A(t)
y(t) for t > to if the corresponding values of u(t) are given for subsequent or F(k).
values of time such as to + 25t,' .. ,to + k5t. The model composed of Eqs.
4.44-4.45 is called the state-space model, the vector x(t), the state vector, Disturbances
and its components Xi(t), the state variables. The dimension of x(t), n, is Some disturbances can be measured, but the presence of others is only
the model order. recognized because of their influence on process and/or output variables.
State-space models can also be developed for discrete-time systems. Let The state-space model needs to be augmented to incorporate the effects
the current time be denoted as k and the next time instant where input of disturbances on state variables and outputs. Following Eq. 4.28, the
values become available as k + 1. The equivalents of Eqs. 4.44-4.45 in state-space equation can be written as
discrete time are
x(t) f(x(t), u(t), w(t))
x(k + 1) = f(x(k), u(k)) k 0, I, 2,'" (4.47)
y(t) h(x(t), u(t), w(t)) (4.52)
y(k) = h(x(k), u(k)) (4.48)
where w(t) denotes disturbances. It is necessary to describe w(t) in order
For the current time k, the state at time k + 1 is now computed by the to compute how the state variables and outputs behave in presence of dis-
difference equations 4.47-4.48. Usually, the time interva15t = t(k+ 1) -t(k) turbances. If the disturbances are known and measured, their description
between the two discrete times is a constant equal to the sampling time. can be appended to the model. For example, the linear state-space model
can be written as
Linear State-Space Models
The functional relations f(x, u) and h(x, u) in Eqs. 4.44-4.45 or Eqs. x(t) Ax(t) +Bu(t) + W1Wl(t)
4.47-4.48 can be restricted to be linear. The linear continuous state-space y(t) Cx(t) + Du(t) + W2W2(t) (4.53)
92 Chapter 4. Empirical Model Development 4.5. State-Space Models 93

where WI (t) and W2(t) are disturbances affecting the state variables and where ~~: (x ss ' u ss ) indicates that the partial derivative with respect to Xi
outputs, respectively, and WI and W 2 are the corresponding coefficient is evaluated at (x ss ' u ss ) and rk denotes the higher order nonlinear terms
matrices. This model structure can also be used to incorporate modeling that are assumed to be negligible. Define Jacobian matrices A and B that
uncertainties (represented by WI (t)) and measurement noise (represented have the partial derivatives in Eq. 4.56 as their elements:
by W2(t)). of, of, of,

( ) B~( )
aXl au]
Another alternative is to develop a model for unknown disturbances to BUm

describe w(t) as the output from a dynamic system with a known input A (4.57)
uw(t) that has a simple functional form. afn afn afn
a:rJ au] BUm

fw(xw(t), uw(t)) with the partial derivatives being evaluated at (x ss , u ss )' Using Eq. 4.55,
h w ( X w (t), U w ( t)) (4.54) Eq. 4.56 can be written in a compact form as

where the subscript w indicates state variables, inputs and functions of the
f(x, u) = A(x - x ss ) + B(u u ss ) r(x - X ss , u - uss) (4.58)
disturbance(s). Typical choices for input forms may be an impulse, white Neglecting the higher order terms rk(x - x ss , u u ss ) and defining the
noise or infrequent random step changes. Use of fixed impulse and step deviation variables
changes lead to deterministic models, while white noise or random impulse
and step changes yield stochastic models [171]. The disturbance model is x X - X ss , u = u - U ss , (4.59)
appended to the state and output model to build an augmented dynamic
Eq. 4.44 can be written as
model with known inputs.
~ = Ax+Bii (4.60)
Linearization of Nonlinear Systems
The output equation is developed in a similar manner:
The behavior of a nonlinear process can be approximately described
by a linear model in the vicinity of a known operating point developed y Cx+Dii (4.61)
by linearizing the nonlinear model. The nonlinear terms of the model are
expanded by using the linear terms of Taylor series and the equations are where the elements of C and D are the partial derivatives oh i / OXj with 'j =
written in terms of deviations of process variables (the so-called deviation 1,···,p andj = 1,'" ,71, and oh;jouj with i = 1,···,p andj = 1,'" ,m,
variables) from the operating point to obtain the linear model. The model respectively. Hence, the linearized equations are of the same form as the
can then be expressed in state-space form [253]. original state-space equations in Eq. 4.49. Linearization of discrete-time
Consider the state-space model, Eqs. 4.44~4.45, and assume that it has nonlinear models follows the same procedure and yields linear difference
a stable stationary solution (a steady-state) at x = X SSl U = USB: equations similar to Eq. 4.50.
Subspace State-Space Models
(4.55 )
Subspace state-space models are developed by using techniques that
determine the largest directions of variation in the data to build models.
If f(x, u) has continuous partial derivatives in the neighborhood of the
Two subspace methods, PCA and PLS have already been introduced in
stationary solution x = X SSl U = U SS ' then for £ = 1,' .. ,n:
Sections 4.2 and 4.3. Usually, they are used with steady-state data, but
they could also be used to develop models for dynamic relations by aug-
fe(X,u) - :X:ss,d + ... (4.56) menting the appropriate data matrices with lagged values of the variables.
In recent years, dynamic model development techniques that rely on sub-
space concepts have been proposed [158, 159,307, 313]. Subspace methods
are introduced in this section to develop state-space models for process
monitoring and closed-loop control.
94 Chapter 4. Empirical Model Development 4.5. State-Space Models 95

Consider a simple state-space model without external inputs v,(k) models. It expresses the covariance between future and past stacked vectors
of output measurements. Defining the stacked vectors of future (ytJ ) and
x(k + 1) Fx(k) + HE(k)
past (YkK) data with respect to the current sampling time k as
y(k) Cx(k) + E(k) (4.62)
where x(k) is the state variable vector of dimension n at time k and y(k) is
the observation vector with p output measurements. The stochastic input
y(k)
y(k+1) 1 y(k-1) ]
y(k - 2)
E(k) is the serially uncorrelated innovation vector having the same dimen- Yk+ J
- . and Y kK = : (4.63)
sion as y(k) and covariance E[E(k)E(k+I)T] = ~ if 1= 0, and 0 otherwise. [ [
y(k+J-1) y(k - K)
This representation would be useful for process monitoring activities where
'appropriate' state variables (usually the first few state variables) are used the Hankel matrix (note that H K J is different than the H matrix in Eq.
to determine if the process is operating as expected. The statistics used 4.62) is
in statistical process monitoring (SPM) charts assume no correlation over
time between measurements. If state-space models are developed such that
the state variables and residuals are uncorrelated at zero lag, the statistics AK
A K +1 ]
can be safely applied to these calculated variables instead of measured pro- (4.64)
cess outputs. Several techniques, balanced realization [5], PLS realization
[210], N4SID [307], and the canonical variate realization [158, 209] can be AJ+l A J +K - 1

used for developing these models. where A q is the autocovariance of y (k) 's which are q time periods apart
Subspace algorithms generate the process model by successive approxi- and E[·] denotes the expected value of a stochastic variable. The non-zero
mation of the memory or the state variables of the process by determining singular values of the Hankel matrix determine the order of the system,
successively functions of the past that have the most information for pre- i.e., the dimension of the state variables vector. The non-zero and dominant
dicting the future [159]. In the canonical variate (CV) realization approach, singular values of HJ K are chosen by inspection of singular values or metrics
canonical variates analysis (Section 3.2) is used to develop the state-space such as AIC.
models [158] where the first state variable contains the largest amount of Canonical variate realization requires that covariances of future and
information about the process dynamics, the second state variable is or- past stacked observations be conditioned against any singularities by taking
thogonal to the first (does not repeat the information explained in the their square roots. The Hankel matrix is scaled by using R K and Rj
previous state variable) and describes the largest amount of the remaining defined in Eq. 4.66. The scaled Hankel matrix (H JK ) and its singular
process variation. The first few significant state variables can often be used value decomposition is given as
to describe the greatest variation in the process. The system order n is de-
termined by inspecting the dominant singular values (SV) of a covariance
(4.65)
matrix (the ratio of the specific SV to the sum of all the SVs [5] gener-
ated by singular value decomposition (SVD) or an information theoretic where
approach such as the Akaike Information Criterion (AIC) [1.58] introduced
in Section 4.4. (Rj)
The data used in subspace state-space model development consists of
the time series data of output and input variables. For illustration, assume (R K) (4.66)
a case with only output data and the objective is to build a model of
the form Eq. 4.62. Since the whole data set is already known, it can be U has dimensions pJ x a and contains the a left eigenvectors of H J K. :E
partitioned as past and future with respect to any sampling time. Defining is a x a and contains the singular values (SV). V is K p x a and contains
a past data window of length K and a future data window of length J that the a right singular vectors of the decomposition. The SVD matrices in
are shifted from the beginning to the end of the data set, stacked vectors of Eq. 4.65 include only the SVs and singular vectors corresponding to the a
data are formed. The Hankel matrix (Eq. 4.64) is used to develop subspace state variables retained in the model. The full SV matrix :E has dimension
96 Chapter 4. Empirical Model Development 4.6. Summary 97

J p x K p and it contains all SVs in a descending order. If the process found using CANALS [305]. Using nonlinear CVA to fit dynamic models
noise is small, all SVs smaller than the ath SV are effectively zero and the is not new. ACE algorithm was used to visually infer nonlinear functions
corresponding state variables are excluded from the model. for single output additive models [29]. DeCicco and Cinar [50] proposed
The state variables are given as a CANALS-based approach where the nonlinear functions estimated are
directly utilized for prediction. Also, a collection of multiple future output-
_
xk -
",1/2V T
L..i
(R-)
K
-1/2 y-
k-I (4.67) s is considered, which leads to the latent variables model structure. The
K
latent variables are then linked to the outputs using linear projection type
Once x(k) (or for the continuous case x(t)) is known, F, G (or A, B), C de- nonlinear model structures such as projection pursuit regression [77] or a
fined in Eq. 4.62, and the stochastic input covariance A can be constructed linear model through least squares regression.
[209]. The covariance matrix of the state vector based on CV decomposition
E[x(k)x(kf] = ~ reveals that x(k) are independent at zero-lag.
The subspace state-space model that includes external inputs is of the 4.6 Summary
form:
Several input-output model development techniques that extract dynamic
x(k + 1) Fx(k) + Gu(k) + H 1w(k) relations from process data are discussed in this chapter. Methods based
y(k) Cx(k) + Du(k) + H 2 v(k) (4.68) on multivariate statistics, systems theory and artificial intelligence are p-
resented. Various multivariate regression techniques are outlined first, to
where F, G, C, D, HI and H 2 are system matrices, and wand v are zero- provide the foundation for the discussion on PCA-based regression and its
mean noise vectors that have Normal distribution. The model, Eq. 4.68, extension to capture dynamic variations in data. Next, PLS regression is
can be developed by using CV realization or other methods such as N4SID introduced, with a similar extension to capture dynamic variations. Then,
[307]. When CV realization is used, these models are called canonical vari- input-output modeling of dynamic processes with time series models is in-
ates state-space (CVSS) models. troduced. The last modeling framework presented is state-space modeling
that enables the extraction of arbitrary variables (state variables) that de-
Extensions to Nonlinear State-Space Models
scribe the dynamics of the system, while relating the input and output
Various extensions of linear state-space approach have been proposed
variables. Since most chemical processes are nonlinear, the extensions of
for developing nonlinear models [227, 274]. An extension of linear CVA
these modeling paradigms to the nonlinear frameworks are also introduced.
for finding nonlinear state-space models was proposed by Larimore [160]
Extensions of PCA and PLS to develop nonlinear models, nonlinear time
where use of alternating conditional expectation (ACE) algorithm [24] was
series modeling techniques and nonlinear state-space modeling techniques
suggested as the nonlinear CVA method. Their examples used linear CVA
are briefly introduced and references are provided for each method.
to model a system by augmenting the linear system with polynomials of
past outputs.
Subspace modeling can be cast as a reduced rank regression (RRR) of
collections of future outputs on past inputs and outputs after removing the
effects of future inputs. CVA performs this regression. In the case of a lin-
ear system, an approximate Kalman filter sequence is recovered from this
regression. The state-space coefficient matrices are recovered from the state
sequence. The nonlinear approach extends this regression to allow for pos-
sible nonlinear transformations of the past inputs and outputs, and future
inputs and outputs before RRR is performed. The model structure consists
of two sub models. The first model is a multivariable dynamic model for a
set of latent variables. the second relates these latent variables to outputs.
The latent variables are linear combinations of nonlinear transformations of
past inputs and outputs. These nonlinear transformations or functions are
5

Monitoring of
Multivariate Processes

Multivariate SPM (MSPM) methods are gaining acceptance in monitoring


continuous processes because multivariate monitoring charts provide more
accurate information about the process, give warnings earlier than the sig-
nals of univariate charts, and are easy to compute and interpret. MSPM
relies on the statistical distance concept which is a generalization of the S-
tudent t statistic. First discussed in [226] and later proposed independently
in [112] and [179], it provides a useful statistic for representing the devia-
tion of the process from its desired state. If the process has a few variables,
the statistical distance statistic T 2 can be computed by using all variables
and its charts can be plotted for MSPM [190]. If the number of variables is
large and there is significant colinearity among some of them, the PCA or
PLS can be used. If the data used for chart development are process vari-
ables, MSPM charts are based on principal components (PC). When both
process and quality variables are used, and the two blocks of data need to
be related as well, the MSPM charts are based on the latent variables (LV)
of PLS. Both sets of charts summarize the information about the status
of the process by using two statistics, the Hotelling's T2 and the squared
prediction error (SFE). The details are discussed in Sections 5.1 and 5.2.
The charts are simply the plots of T 2 or SFE values computed by using the
information collected at each sampling time on the time axis. The T 2 chart
indicates the distance of the current operation from the desired operation
as captured by the PCs or LVs included in the development of the PCA
or the PLS model of the process. Since only the first few PCs or LVs that
capture most of the variation in the data are used to build the model, the
model is a somewhat accurate but incomplete description of the process.
The SFE chart captures the magnitude of the error caused by deviations
resulting from events that are not described by the model. The T 2 chart
indicates a deviation based on process behavior that can be explained by

99
100 Chapter 5. Monitoring of Multivariate Processes 5.1. SPM Methods Based on peA 101

the model while the S P E chart indicates a significant deviation that can
not be explained by the model (the prediction error is inflated). The T2 SPE
and S P E charts must be used as a pair and if either chart indicates a
+ SPE
significant deviation from expected operation, the presence of an abnormal ++
+
process operation must be declared. ++ +
If the process is out-of-control, the next step is to find the source cause +
+
of the deviation (fault diagnosis) and then to remedy the situation. Fault
diagnosis can be conducted by associating process behavior patterns to
SPE
specific faults or by relating the process variables that have significant de-
Ii( M M M
viations from their expected values to various equipment that can cause Envelop of NOC M

such deviations as discussed in Chapter 7. If the latter approach is used,


univariate charts provide readily the information about process variables
(a) (b)
with significant deviation. Since multivariate monitoring charts summa-
rize the information from many process variables, the variables that inflate
Figure 5.1. The multivariate monitoring space. (a) Three-dimensional rep-
T 2 or SPE statistics must be determined. This is usually done by using
resentation, (b) Two-dimensional representation.
contribution plots (Sections 3.4 and 7.4).
To include the information about process dynamics in the models, the
data matrix can be augmented with lagged values of data vectors, or model
where S is the estimated covariance matrix of scores and Fa,n-a,o: is the
identification techniques such as subspace state-space modeling can be used
F distribution value with a and 71 - a degrees of freedom in a significance
(Section 4.5). Negiz and Cinar [209] have proposed the use of state variables
level, 71 is the number of samples in the reference set, a is the number of
developed with canonical variates based realization to implement SPM to
PCs retained in the model. Inspection of many biplots becomes inefficient
multivariable continuous processes. Another approach is based on the use
and difficult to interpret when a large number of PCs are needed to describe
of Kalman filter residuals [326]. MSPM with dynamic process models is
the process. Monitoring charts based on squared residuals (SPE) and T 2
discussed in Section 5.3. The last section (Section 5.4) of the chapter gives
become more useful. By appending the confidence interval (UC L) to such
a brief survey of other approaches proposed for MSPM.
plots, a multivariate SPM chart as easy to interpret as a Shewhart chart is
obtained.
5.1 SPM Methods Based on peA Sometimes, plots of individual PC scores can be used for preliminary
analysis of variables that contribute to an out-of-control signal. The control
Multivariate SPM methods with PCs can employ various types of moni- limits for new t scores under the assumption of Normality at significance
toring charts. If only a few PCs can describe the process behavior in a level a at any time interval k is given by [100]
satisfactory manner, biplots could be used as visual aids that are easy to
interpret. Such biplots can be generated by projecting the data to two di- (5.2)
mensional surfaces as PCI versus PC2 , PCI versus SPE, and PC2 -SPE
as illustrated in Figure 5.1. where 71 and Sref are the number of observations and the estimated standard
Data representing normal operation (NO) and various faults are clus- deviation of the t-score sample at sampling time k (mean is always 0) and
tered in different regions, providing the opportunity to diagnose source t n - I ,o:/2 is the critical value of the Studentized variable with n - 1 degrees
causes as well [153]. Score biplots are used to detect any departure from of freedom at significance level a/2.
the· in-control reojon
b
defined bv
"
the confidence limits calculated from the
reference set. The axis lengths of the confidence ellipsoids in the direction Hotelling's T 2 charts
of ith principal component are given by [126] Hotelling's T 2 plot detects the small shifts and deviations from normal
operation defined by the model since it includes contributions of all vari-
±[S(i, ~ 1)/(71(71 - a))] I/')- 1) ables that can become significant faster than the deviation of an individual
102 Chapter 5. Monitoring of Multivariate Processes 5.1. SPM Methods Based on PCA 103

variable. The T 2 statistic based on process variables at sampling time k is A variant of T2 statistic is the D statistic:

T 2(k) = (x(k) - x)T S-1 (x(k) - x) (5.3) t~S-ltan


D(k) B a / 2 ,(n-a-l)/2 (5.9)
(n - 1)2 rv

where x and S are estimated from process data. If the individual observa-
tion vector x( k) is independent of x and S, then T 2 follows an F distribution
Squared Prediction Error (SP E) charts
with m and n - m (m measured variables, n sample size) degrees of freedom
[190]: Squared Prediction Error (SPE) charts show deviations from NO based

T
2
rv
[m(n + 1)(n
(
nn-m )
1)] F m,n-m (5.4)
on variations that are not captured by the model. Recall Eq.3.2 that can
be rearranged to compute the prediction error (residual) E

If the observation vector x is not independent of the estimators x and S, X=Tp T +E E=X X (5.10)
but is included in their computation, then T 2 follows a Beta distribution T
with m/2 and (n m - 1)/2 degrees of freedom [190]: where X = Tp denotes the estimates of the data X. The location of the
projection of an observation (at sampling time k) on the a-dimensional PC

T
2 [(n -n 1)2] B m / 2 ,(n-m-l)/2 (5.5)
space is given by its score t a (k). The orthogonal distance of the observation
rv
x (k) from the projection space is the prediction error e (k) which is squared
to compute SPE(k). The e(k) gives a measure of how close the observation
The T 2 charts based on PCs use at time k is to the a-dimensional space
Tn m
(5.6)
SPE(k) e(kf e(k) = L c7(k) = L (k) - (k)]2 (5.11)
j=1 j=1
and follow an F or a Beta distribution for the same conditions leading to
Eqs. 5.4 and 5.5, with a and n a degrees of freedom for the F d~str~but~on,
where (k) is computed from the PCA model. SP E is also called the
and a/2 and (n - a 1)/2 degrees of freedom for the Beta dlstnbutIOn, Q-statistic.
assuming that the data follow a multivariate Normal distribution [121, 120].
Statistical limits on the Q-statistic are computed by assuming that the
As before. a denotes the number of PCs, t a is a vector containing the scores
data have a multivariate Normal distribution [120, 121]. The control limits
from the first a PCs [121] and S is the (a x a) estimated covariance matrix
for Q-statistic are given by Jackson and Mudholkar [122] based on Box's
which is diagonal due to the orthogonality of the t scores [298]. The T d
[22] formulation (Eq. 5.12) for quadratic forms with significance level of a
based on PCs can also be calculated at each sampling time k as [121] given in Eqs. 5.12 and 5.13 as

2
(5.7) Qa = 9Xh,a (5.12)

(.5.13)
where the PC scores ti have variance Ai (or estimated variance sf from
the scores of the reference set) which is the ith largest eigenvalue of the where X~ is the chi-squared variable with h degrees of freedom and z is
covariance matrix S. The term (k) that indicates the explicit dependence the standard normal variable corresponding to the upper (1- a) percentile
on sampling time will be omitted from the T 2 equations in the .ren:ain~er (za has the same sign as ho). B values are calculated using the unused
of the book without loss of generality. If tables for the Beta dIstnbutIOn eigenvalues of the covariance matrix of observations (eigenvalues that are
are not readily available, this distribution can be approximated by using not retained in the model) as [327]
[298]: nl,

(a/(n - a - l))Fa ,n-a-L(,


1 + (a/(n - a - l))Fa ,n-a-l,a
(5.8) Bi = L Xj, for i = 1,2, and 3 (5.14)
j=k+l
Chapter 5. Monitoring of Multivariate Processes 5.2. SPM Methods Based on PLS 105
104

The other parameters are

(5.15)

()i'Scan be estimated from the estimated covariance matrix of residuals


(residual matrix used in Eq. 5.11) for use in Eq. 5.13 to develop control
limits on Q for comparing residuals. A simplified approximation for Q-
limits has also been suggested in [68] by rewriting Box's equation (Eq.
5.12) by setting ()~ :=:::; B1 ()3

(5.16)

SP E values for new data at time k are calculated using

SPE(k) = 2)xj(k) - (k))2 (5.17) Figure 5.2. T 2 and S P E charts based on PCA for monitoring a continuous
j=l polymerization reactor. A increase in reactor feed temperature is in-
troduced for 60 min at the time instant indicated by a vertical bar on the
2
These SPE(k) values computed using Eq. 5.17 follow the X (chi-squared) plot.
distribution [22]). This distribution can be well approximated at each time
interval using Box's equation in Eq. 5.12 (or its modified version in Eq.
5.16). A 5% increase in reactor feed temperature was introduced and main-
tained for 60 rnin before returning the feed stream to normal operating
Example The performance of univariate and multivariate process mon-
conditions. The multivariate charts (Figure 5.2) are the first to detect the
itoring charts are illustrated in Figures 5.2 and 5.3 for the polymerization
disturbance to the reactor operation. The T 2 statistic exceeds the 99% con-
of vinyl acetate in a CSTR. The simulation uses a model developed by Tey-
fidence interval 25 min after the disturbance was introduced. and the SPE
mour [291], consisting of four ordinary differential equations for the reactor
statistic 20 min after the disturbance, a few minutes earlier than the T 2
temperature, solvent volume fraction, monomer volume fraction and the
chart. The initiator concentration in the reactor exceeds the statistical lim-
initiator concentration in the reactor, and three differential equations for
its of the Shewhart chart (Figure 5.3) after 35 rnin. Reactor temperature
the molecular weight moments of the reactor. The moments are functions
and conversion readings exceed the statistical limits after approximately 40
of polymer chain reaction kinetics and probabilities of polymer chain prop-
mm and the polydispersity measurement exceeds the univariate limit after
agation. They are used for calculating various polymer molecular weights,
44 min.
polydispersity and conversion. The 'measured' variables are polydispersi-
ty, reactor temperature, conversion and the reactor initiator concentration.
The five input variables are the reactor cooling jacket temperature, the ini-
tiator concentration in the feed stream, the feed stream temperature, the 5.2 SPM Methods Based on PLS
feed solvent volume fraction and the residence time. The four monitored
output variables are assumed to be available via analytical methods at one Large amounts of process data, such as temperatures and flow rates. are
minute intervals for the physical system. The assumption is valid for the collected at high frequency by process data collection svstems. Information
reactor temperature, conversion and initiator concentration, though the on product quality variables is collected less frequently since these mea-
polydispersity measurement in a physical system may take up to 30 m'in surements are expensive. Although it is possible to measure some quality
or more to obtain via analytical monitoring techniques. The manipulated variables on-line by means of sophisticated devices, measurements arc gen-
variables are modified by adding random fluctuations to each of the inputs. erally made off~line in the quality control laboratory and often involve time
Disturbances may be added by changing the values of input variables. lags between data collection and receiving analysis results. Process data
106 Chapter 5. Monitoring of Multivariate Processes
5.2. SPM Methods Based on PLS 107

include the range of process variables that yield desired product quality. If
the PLS model is developed for monitoring certain process conditions, the
reference data set should include data collected under these conditions.
Since PLS technique is sensitive to outliers and scaling, outliers should
be removed and data should be scaled prior to modeling. After data pre-
treatment, the number of latent variables (PLS dimensions) to be retained
in the model is determined. Cumulative prediction sum of squares (CUM-
PRESS) versus the number of latent variables or prediction sum of squares
(PRESS) versus the number of latent variables plots are used for this pur-
pose. It is usually enough to consider the first few PLS dimensions for
monitoring activities, while more PLS dimensions are needed for prediction
in order to improve the accuracy of predictions.
The squared prediction error (SP E) can be calculated for the X and
the Y block models
m

SPEX,k = L (5.18)
j=l
q

SPEY,k = L(Ykj Ykj)2 (5.19)


j=l
Figure 5.3. Shewhart charts for monitoring a continuous polymerization where .f: and yare predicted observations in X and Y using the PLS model,
reactor. A increase in reactor feed temperature is introduced for 60 respectively, k and j are the indexes for observations and variables in X or
min. Y, respectively.
and Ykj in Eqs. 5.18 and 5.19 are calculated for new observations
as follows:
contain valuable information about both the quality of the product and the m
performance of the process operation. PLS models provide the quantitative
relations for estimating product quality from process data. They can also
=L
j=l
(5.20)
be used used to quickly detect process upsets and unexpected behavior.
Cross correlations and colinearity among process variables severely limit Process
the use of traditional linear regression techniques. PLS, as a projection Product Quality
Variables
method. offers a suitable solution for modeling such data. Variables
The first step in the development of a PLS model is to group the process 12 . m 12 k
variables as X and the product quality variables as Y (Figure 5.4). This 1
1
selection is dependent on the measurements available and the objectives of 2
(j)
c (j)
2
monitoring. The reference set used to develop the multivariate monitoring o c
chart will determine the variations considered to be part of normal oper- ~c X .Q
ro y
ation and ideally includes all variations leading to desired process perfor- Q)
(j)
c
Q)
mance. If routine variations in the reference set are too small, the resulting ..0 (j)

model used for process monitoring will cause frequent alarms, and if it in-
o ..0
o
n '---- ....J
cludes a data set that contains large variations, the sensitivity for detecting n '----__....J
abnormal operation will be poor. The reference data set selected should
Figure 5.4. Arrangement of data in PLS for SPM as suggested in [41].
108 Chapter 5. Monitoring of Multivariate Processes 5.3. SPM Using Dynamic Process Models 109

a
can represent the dynamics of the process. The subspace models can be
(5.21 )
developed by using the methodology described in Section 4.5. Commercial
i=l
and open-source software are available for developing subspace state space
(5.22) models using canonical variate (CV) realization and N4SID approaches.
where Wi,j denotes the weights, Pi,) the loadings for the X block (process MSPM techniques use the state-variables x( k) of the subspace state-space
variables) of the PLS model, the scores of new observations, and b models to generate the T 2 values and the residuals e (k) = y (k) - Y(k) be-
tween the measured and estimated values of the model outputs to generate
the regression coefficient for the inner relations.
2
Multivariate monitoring charts based on Hotelling's statistic (T ) and the SPE values at time k. The control limits of the charts are identical to
squared prediction errors (SPEx and SP Ey) are constructed using the those given in Section 5.l.
PLS models. Hotelling's T 2 statistic for a new independent t vector is [298] The residuals are used in monitoring with the normalized S P E chart
(SPE N ) in this example. At time k, SPEN(k) is
a(n 2 1)
Fa,n-a (5.23) (5.26)
n (n-a )

where S is the estimated covariance matrix of PLS model scores, a the where :E e and e are the covariance matrix and the mean vector of residuals
number of latent variables retained in the model and Fa,n-a the F distri- respectively, which are determined for in-control data. S P E given here is
bution value. The control limits on S P E charts can be calculated by an called as normalized since the S P E (k) values are scaled with their in-control
approximation of the X2 distribution given as S P Eo; = 9X~o; [22]. This mean and variance. SPEN is distributed as
equation is well approximated as [68, 122, 218] m(n 2 -1)
SPEN rv ( ) Fm,n-m (5.27)
nn-m
(5.24)
The in-control residual mean vector e is almost zero and in-control residual
covariance matrix :E e is diagonal.
2
where 9 is a weighting factor and h degrees of freedom for the X distribu- Example The performance of MSPM charts based on CV state-space
2
tion. These can be approximated as 9 = v / (2m) and h = 2m /v, where v models is illustrated by monitoring a high-temperature short-time (HTST)
is the variance and m the mean of the S P E values from the PLS model. pasteurization system [143, 211]. Pasteurization is a heat treatment process
Biplots of scores (t i vs tH1, for i = 1,'" ,a) can also be developed. of foods to secure destruction of pathogenic bacteria without markedly
The control limits at significance level 0</2 for a new independent t score affecting the physical and chemical properties of the end product. In HTST
under the assumption of Normality at any sampling time are pasteurization of milk, the standard time-temperature combination is 72°G
(161 °F) with a residence (holding) time of 15 sec before the pasteurized
(5.25)
milk is cooled. The process (Figure 5.5) consists of a plate heat exchanger,
a centrifugal pump, a flow diversion valve, a boiler and a homogenizer.
where n, Sest are the number of observations and the estimated standard
There are two regulatory valves, the steam injection valve to the boiler and
deviation of the score sample at the chosen time interval and is
the hot water flow valve in preheater section.
the critical value of the Student's t test with n 1 degrees of freedom at
The incoming raw product passing through the regenerator section goes
significance level 0</2 [100, 218]. The use of PLS models will be illustrated
first to the preheater section where it exchanges heat with hot water for
in Section 8.1 for sensor failure detection. controlling raw product temperature entering the homogenizer. After the
homogenizer, raw milk flows to the main heat exchanger and follows the
5.3 SPM Using Dynamic Process Models same procedure as in the generic pasteurization plant.
The primary source of heat is hot water. The hot water is heated by
MSPM techniques rely on the model of the process. If the process has sig- direct steam injection in the hot water heater. Three PID controllers are
nificant dynamic variations, state-space and subspace state-space models used to control product temperature. The first control loop regulates the
110 Chapter 5. Monitoring of Multivariate Processes 5.3. SPM Using Dynamic Process Models 111

outputs (mA). Hot water temperature, holding tube inlet temperature of


pasteurized product, holding tube outlet temperature of pasteurized prod-
uct and preheater outlet temperature of raw product are the output vari-
ables of the process. PID controller of the steam valve regulates the holding
tube inlet temperature of product, and PID controller of the preheater hot
water valve regulates the preheater outlet temperature of raw product.
FDV

Data for model formation are collected under open-loop conditions by


Hot Water Return
exciting the process with pseudo-random binary sequence (PRBS) signals
that are sent to the control valves. PRBS allows the control valves (steam
Preheatcr
Valve
valve and preheater hot water valve) to switch between two different signal
levels depending on a switching probability, P. Two different PRBS series
are used for two actuators. First a series of random numbers r with a
uniform distribution are generated. The signal that is sent to process at
time k changes depending on the value of T(k) and P. At time k, the signal
S(k) stays at the same level as the previous signal S(k-l) if the value of the
random number T(k) is less than the switching probability P. Otherwise
S(k) will switch to the other level. To collect open loop data of the process,
uniformly distributed, different PRBS (5000 x 1) were generated with P of
Hot water temperature
0.94 for each actuator. For steam control valve and preheater control valve,
Balance Tank
Homogenizer! Pump
Holding temperature
tube-out temperature
the actuator command generated by the above procedure changed between
product temperature
Steam valve 6 11 mA and 10 15 mA, respectively. The number of state variables in
Pn:hcalcr valve
_ _ pasteurized product - - hot W<Ller
the state-space model used for process monitoring was chosen as 12. The
design parameters, which are backward and forward time windows to build
Figure 5.5. Diagram of the NCFST pilot HTST pasteurization plant. the time-lagged data matrix, were chosen as 15 in model determination.
Reprinted from [143]. Copyright © 2001 with permission from Elsevier.

Three types of faults were implemented: sensor faults, actuator faults


raw product temperature leaving the preheater. The second loop controls and combination faults (single sensor-single actuator faults and multiple
product temperature entering the holding tube. The last loop controls the sensors-single actuator faults) [143]. Experiments were conducted with d-
temperature of the pasteurized product leaving the cooler. The raw product ifferent fault magnitudes and duration. Actuator faults to the steam valve
temperature at the exit of the preheater is controlled by manipulating the are used for illustration. The faults are caused by keeping the controllers ac-
flow of hot water through the preheat heater exchanger. The product tem- tive and sending a constant signal to the actuators instead of the controller
perature at the holding tube inlet is controlled by manipulating the stearn signal for a specific time period. The controller output is put in a data file.
flow rate into the hot water heat exchanger. The cooler product tempera- Therefore, the system will not know about the actuator abnormality until
ture is controlled by manipulating the flow rate of cold water through the controller notices deviations in the controlled variables.
cooler heat exchanger. The flow diversion valve is controlled by pasteur-
ized milk temperature at the holding tube exit. The measured variables Table 5.1 shows the time and duration of the faults and the detection
are hot water, holding tube inlet, holding tube outlet, and preheater exit times of T 2 and SPE charts. The T 2 chart signals the abnormal situation
temperatures and the steam valve and preheater valve signals. for last two failures that have larger magnitudes. SP EN chart shows all
The variables used in process modeling and fault diagnosis implemen- the failures (Figure 5.6). The arrows and numbers in the figures indicate
tation are four temperature measurements (DC) and two PID controller the faults (first column of Table 5.1) and their time of occurrence.
112 Chapter 5. Monitoring of Multivariate Processes 5.4. Other MSPM Techniques 113

state equations derived by using subspace model identification [211] is used


Table 5.1. Steam valve fault: Times and magnitudes of faults, performance
successfully in abnormal situation management [132, 133] and larger-scale
of SPM charts in terms of sampling times elapsed before detection (NA:
simulation problems such as the Tennessee Eastman process [168]. Multi-
No Alarm generated).
scale PCA by using wavelet decomposition (Section 6.1) has been proposed
for monitoring processes with multi-scale data [8, 344].
Fault Time (sec) Valve Signal (rnA) Duration (sec) T2 SPE N Independent component analysis (ICA) is proposed as an alternative
1 301 7.0 20 7 7 to PCA for MSPM. Various studies indicate that ICA-based MSPM tools
2 521 7.5 20 NA 25 are more successful for non-Gaussian data [162]. Several papers have been
3 741 11.0 20 40 11 published recently to illustrate the strengths and limitations of ICA for
4 961 12.0 20 19 1 MSPM [137, 138, 163, 164].
MSPM methods have been extended to processes that operate in multi-
ple modes. Multi-group and multi-block PLS have been proposed to moni-
tor with a single model a number of similar products manufactured across
different unit processes [189]. Dynamic PCA (DPCA) is used for a two-
step clustering method for process states in agile chemical plants [280].
Process states are first classified into modes corresponding to transitions
and steady-states. DPCA is then used to compare different modes and
transitions and to cluster them using similarity measures. Support vector
machines (SVM) have been used for simultaneous fault detection and op-
eration mode identification in multi-mode operations [40]. SVM is used
for classification together with an entropy-based variable selection method
to discriminate between data clusters corresponding to multiple operational
modes and abnormal data corresponding process faults. Angle-based classi-
fication and fault diagnosis techniques introduced by Raich and Cinar [243]
have been extended to monitor processes with multiple operating modes
[348].
When a process consists of several units that should be monitored in-
dividually along with the whole process, multi-block techniques [322] such
Figure 5.6. Steam valve fault: T 2 of state Variables and SPEN chart with as consensus PCA (CPCA) [335], hierarchical PCA (HPCA) [337], multi-
99% (-) and 95% (- -) confidence limits. Reprinted from [143]. Copyright block PLS (MBPLS) [319, 335] or hierarchical PLS (HPLS) [335] can be
© 2001 with permission from Elsevier. used. In multi-block algorithms, the descriptor variables (for PCA) and the
response variables (for PLS) are divided into several blocks so as to obtain
local information (block scores) as well as global information from process
5.4 Other MSPM Techniques data. The CPCA and MBPLS algorithms normalize the block loadings and
super loadings, while the HPCA and HPLS algorithms normalize the block
MSPM techniques based on PCA and PLS are gaining popularity and re- scores and super scores [237, 322].
placing traditional process performance assessment activities that rely on Moving-window PCA (M\VPCA) has been proposed to monitor time-
univariate tools such as Shewhart charts. PCA techniques have been used varying processes where both the PCA model and the statistical confidence
to monitor an LDPE reactor operation [145], high speed polyester film intervals of the monitoring charts are adapted [316]. MWPCA provides
production [320], Tennessee Eastman simulated process [168, 242, 342] and recursive adaptation within the moving window to adapt the mean and
sheet forming processes [249]. Several new concepts introduced in the 1990s variance of process variables, the correlation matrix, and the PCA model
are being extended. MSPM of strongly autocorrelated processes based on by recomputing the decomposition. MWPCA is compared to recursive
114 Chapter 5. Monitoring of Multivariate Processes

PCA and its performance is illustrated using the fluid catalytic cracking
unit (FCCD) challenge problem [193].
6
5.5 Summary
Multivariate SPM (MSPM) methods based on PCS, PLS, and state-space
models are presented in this chapter. Multivariate monitoring charts pro- Characterization of
vide more accurate information about the process and give warnings earlier
than the signals of univariate charts. MSPM relies on the statistical dis-
tance concept T 2 that can be computed by using all variables if the process
Process Signals
has a few variables. MSPM techniques based on PCA (Section 5.1) or PLS
(Section 5.2) are preferred if the process has a large number of variables and
there is significant colinearity among some of them. PCA is used if only
process variables are used in the development of MSPM charts. \iVhen both Interpretation of a process signal solely based on its temporal evolution
process and quality variables are used, and the two blocks of data need to is often risky. Subtle changes in signal characteristics and key transitions
be related as well, the MSPM charts are based on the latent variables of may be missed leading to incorrect assessment of process status. In some
PLS. In both cases, the status of the process is summarized by using two cases, one can attempt to extract more information from a process signal by
statistics, the Hotelling's T 2 and the squared prediction error (SPE). The transforming it into a domain that might help to accentuate key features of
charts plot T 2 or SPE values computed by using the information collected the signal. One such approach is the use of the Fourier transform (FT) to
at each sampling time on the time axis. The T 2 chart indicates the dis- determine the frequency content of a signal. Yet, it would also be interesting
tance of the current operation from the desired operation. The SP E chart to understand if the frequency characteristics of the signal may be changing
captures the magnitude of the error caused by deviations resulting from in time. In the next section (Section 6.1), wavelet transform (vVT) will be
2
events that are not described by the PCA or PLS-based model. The T briefly introduced to show how both frequency and temporal features of a
and SPE charts are used together, and if either chart indicates a significant signal can be localized. This will be followed in Section 6.2 by a discussion
deviation from the expected operation, the presence of an abnormality in on signal denoising based on wavelet transforms and a hybrid strategy that
process operation must be declared. can also deal with outliers that are often present in real-world signals. The
To include the information about process dynamics in the models, the subsequent sections will introduce methods that help model process signals
data matrix can be augmented with lagged values of data vectors, or model for later use in monitoring applications. First, in Section 6.3, triangular
identification techniques such as subspace state-space modeling can be used episodes will be discussed as a means of obtaining a symbolic representation
(Section 5.3). Other approaches proposed for MSPM are summarized in from an otherwise numerical time series data. A more elaborate strategy
Section 5.4). based on a doubly stochastic model, namely the hidden Markov models
If process monitoring detects an abnormality in process operation, the (HMMs), will be introduced in Section 6.4.2 and the chapter will conclude in
next steps are to find the source cause of the deviation (fault diagnosis) and Section 6.5 with the modeling of wavelet coefficients using the HMM paving
then to remedy the situation. Fault diagnosis is achieved by associating the way for a trend analysis methodology to be introduced in Chapter 7.
process behavior patterns to specific faults (Chapter 7) or by relating the
process variables that have significant deviations from their expected values
to various equipment that can cause such deviations. If the latter approach 6.1 ~avelets
is used. the variables that inflate T 2 or S P E statistics must be determined
since multivariate monitoring charts summarize the information from many In recent years, wavelet transform (WT) has been developed as a novel,
process variables. This is usually done by using contribution plots (Sections to use the term somewhat loosely, 'extension' of the traditional Fourier
3.4 and 7.4). transform (FT) as a means of capturing transitions in the frequency content

115
116 Chapter 6. Characterization of Process Signals 6.1. Wavelets 117

of a signal in time. In signal processing, wavelets are used as a major tool For this finite sequence, the discrete Fourier transform (DFT) is defined as
to analyze non-stationary signals [44, 182] and also well studied for signal n-l
denoising and compression purposes [56, 90, 229]. In process applications,
following the first studies summarized in the book by Motard and Joseph
Zk= L k=O,I, ... ,n-l (6.4)
,7=0
[205], there emerged also fine examples of wavelet applications in process
monitoring [8, 9], denoising [59] and compression [10, 200]. To follow the It is noted that Zk is associated with the frequency .fk == kin. There
historical development of \VT, it is best to start with a brief review of FT are efficient algorithms for computing Zk using the fast Fourier transform
and its early extensions. Then, continuous and discrete WT are introduced (FFT) [220].
separately and illustrated by examples. \Vhen evaluated using real-valued inputs (data), FFT gives output-
s (spectrum) whose positive and negative frequencies are redundant. It
turns out that they are complex conjugates of each other, meaning that
6.1.1 Fourier Transform
their real parts are equal and their imaginary parts are negatives of each
An important feature of a process signal is its periodicity, or, in other words, other. Note that the original signal sequence Zj can be reconstructed from
its frequency content. Fourier theory indicates that it is possible to sepa- its DFT by,
n-l
rate individual frequency components from stationary signals (i.e., stochas-
tic signals whose statistical characteristics do not change over time) and Zj = 1 '\"' Zke21rijkjn j = 0, 1, ... , n - 1 (6.5)
nL....J
make a transformation from the amplitude-time domain to the amplitude- K:=O

frequency domain. This transformation is known as the FouTieT TmnsfoTm Example Two signals are constructed using pure sinusoids with three
(FT). The Fourier transform of a continuous stationary signal z(t) for a different frequencies. The first signal is defined as,
given frequency w in radians is defined as,
h(t) = sin(2.05· 21ft) + sin(O.I· 21ft) + sin(1.5· 21ft) (6.6)
+=
Z(w) = j_= z(t)e-iwtdt (6.1) This is a stationary signal because all of the frequencies are present through-
out the duration of the signal. The second signal is a non-stationary signal
Often, one uses the frequency representation of FT as follows: that contains the same frequencies but at different time intervals, leading
to discontinuities at the points of transition:
+=
Zen = ,_= z(t)e-i2rrftdt (6.2) sin(2.05· 21ft) 0:::; t :::; 20
/ h(t) = sin(O,1 . 21ft) 20:::; t :::; 35 (6.7)
{ sin(1.5 . 21ft) 35:::; t :::; 50
In effect, FT expresses a periodic signal in terms of sinusoidal basis
functions. The spectrum obtained by the transformation shows the overall Figure 6.1 displays these signals along with their FT. One can easily observe
strength with which any frequency w is contained in z(t). When applied to that while the two signals are vastly different, their FT is quite similar,
aperiodic signals, it is required that the signal has finite energy, i,e" underscoring the inappropriateness of using FT for non-stationary signals.

(6.3) The idea of preserving temporal information while obtaining the fre-
quency spectra of any function led to the extension of the standard FT.
For periodic signals, the basis functions (exponential building blocks) can This extension of FT is known as the Gabor transform or the short-time
be related harmonically, while for aperiodic signals, one can only say that Fourier transform (STFT) in signal processing [235], The purpose is to
they are infinitesimally close in frequency. transform non-stationary signals, so that time and frequency information
In practical applications, a process signal is sampled to yield a sequence is preserved. Since a non-stationary signal can be viewed as composed of
of discrete values, i.e., z(t) -+ z(k) where k = 0,1, ... , n 1. Thus, the segments of stationary signals of certain length, the idea here is to decom-
discrete signal can be expressed as a sequence, {z(k) = ZO,Zl, ... ,Zn-l}. pose the non-stationary signal into small segments and perform FT of each
118 Chapter 6. Characterization of Process Signals 6.1. Wavelets 119

poor frequency resolution, and with a wide window, good frequency resolu-
tion but poor time resolution is achieved. Naturally, the main drawback of
STFT is that once the window size is selected, it is fixed for all frequencies.
Example Figure 6.2 depicts the STFT of the two signals introduced in
the previous example. A 21-point Hanning window function is used for
this example, and the window functions overlapped half of the previous
window when translated. One can now observe the difference in the signal
characteristics as the non-stationary signal yields a STFT that sharply
delineates the transition period.

Figure 6.1. Two signals and their corresponding FT.

segment. For this purpose, a window function needs to be chosen. Ideally,


the width of this window must be equal to the portion of the signal where
it does not violate stationarity conditions. The STFT can be defined as
follows:
Z(T, f) = I: z(t)w(t T)e-i27fftdt (6.8)

where w(t - T) is a window function centered at T (see [182] for various


window functions and their comparative merits). It can be seen that STFT Figure 6.2. Two signals and their corresponding STFT.
is a convolution of the signal with the window function. In STFT, the
narrower the window, the better the time resolution, but the poorer the
frequency resolution, and vice versa. This problem stems from the Heisen- 6.1.2 Continuous Wavelet Transform
berg's uncertainty principle which states that one cannot know the exact
time-frequency representation of a signal as no signal has finite time du- Historically, the first wavelet function is attributed to Haar [95] when he
ration and finite frequency bandwidth simultaneously. One can only know replaced the sinusoidal basis functions of FT with an orthonormal function.
the time intervals in which a certain band of frequencies exist, which turns I/J(t), given as, .
out to be a resolution problem. The FT does not have any resolution issues
in the frequency domain by the fact that window used in its kernel is the
1 a ::; t < 0.5
e-27fift function which lasts for all time. In STFT, the window is of fi-
-1 0.5 ::; t <1 (6.9)
nite length, thus the frequency resolution becomes poorer. In other words,
a t tt [0, 1]
with a narrow window selection, STFT provides good time resolution but The most important difference between the Haar basis and the sinusoids
120 Chapter 6. Characterization of Process Signals 6.1. Wavelets 121

is that e- jwt has infinite support, which means that it stretches out to These properties are desirable when representing signals through a wavelet
infinity, while the Haar basis has compact support since it only has nonzero series. In addition [44],
values between 0 and l.
The Wavelet Transform (WT) was formally introduced in the late 1970s • The function should decrease quickly towards 0 as its argument ap-
by Morlet [45], a geophysicist with Elf-Acquitane in France. Morlet's com- proaches infinity, and
pany was searching for oil by sending impulses into the ground and ana-
• The function is null outside a segment of the Real line, R.
lyzing their echoes. As sound waves travel through different materials at
diflerent speeds, geologists can infer what kind of material lies under the The equation for the continuous wavelet transform (CWT) can be ex-
surface. As Morlet analyzed these signals with FT and STFT, he was not pressed as,
satisfied with the constant window sizes in STFT in providing him with
CWT(s, u) = If:T
1 ;00
Z(t)1/J(S, u)dt (6.12)
the much needed frequency resolution. Morlet proposed a new transform
function by taking a cosine wave windowed by a Gaussian (Figure 6.3):
V lsi -00
where CWT(s, u) are the wavelet coefficients and '1/;(05, "11) is the family of
7jJ(t) = Cexp ( _ t;) cos(5t) (6.10) wavelets where sand u represent the dilation (scaling) and translation
(shifting) parameters, respectively. The family of wavelets represent the
By compressing this function in time, Morlet was able to obtain a higher translations and the dilations of the mother wavelet 7jJ(t) and can be ex-
frequency resolution and spread it out to obtain a lower frequency reso- pressed in the form:
lution. To localize time, he shifted these waves in time. He called his
transform the 'wavelets of constant shape' and today, after a substantial 05, U E R, s i= 0 (6.13)
number of studies in its properties, the transform is simply referred to as
the "Wavelet transform. The Morlet wavelet is defined by two parameters:
the amount of compression, called the scale, and the location in time. The best known wavelets are the Daubechies wavelets (dbc) and the Coif-
man wavelets (coif c). In both cases, c is the number of vanishing moments
of the functions. Daubechies also suggested the 'symlets' as the nearly sym-
metric wavelet family as a modification of the db family. The family 'Haar'
is the well-known Haar basis [95]" Figure 6.4 shows a number of wavelet
functions. As can be seen, the Haar functions are discontinuous and may
not provide good approximation for smooth functions.
In general, one would be interested in not only analyzing the signal but
also in reconstructing (synthesizing) the original signal using the wavelet
coefficients. While the mother wavelet can be any function for the former
exercise, it has to satisfy more conditions to provide the latter. The wavelet
functions are designed to have large number of moments (zero-crossings),
Figure 6.3. The Morlet wavelet. thus, the expansion of functions on such wavelet bases needs much fewer
terms than the Taylor expansion. This property leads to very sparse de-
There are several families of wavelets, proposed by different authors. compositions of functions, which facilitates the applications such as filtering
Those developed by Daubechies [46] are extensively used in engineering and data compression. For ideal signal reconstruction, the wavelets should
applications. "Wavelets from these families are orthogonal and compactly satisfy the orthogonality condition if the same wavelet is to be used for
supported, they possess different degrees of smoothness and have the max- both analysis and synthesis. FOr more details on the properties of 'WT, the
imum number of vanishing moments for a given smoothness. In particular, reader can consult a number of excellent books [228, 235].
a function f (t) has c vanishing moments if Figure 6.5 depicts the frequency coverage of FT, STFT and WT. While
.I tnf(t)dt = 0, n = 0, 1, ... , c 1 (6.11)
FT provides information on the power of frequencies present in the signal,
STFT can follow the signal in fixed windows and show the presence and
122 Chapter 6. Characterization of Process Signals 6.1. Wavelets 123

21
Haar

1[
Haar
:[
~
'"o
0, 0' .~t--t--t-t-t-t-+---J
) 0' "2'" 1-Yf-l-+-'+-4-4-'--f-L+Y
-1
-1 0 2 -1 0 2
.t I-I-t-t-t-t-t-J '"
:[
0'
_;~ d_b4_ _
Frequency Time Time

)
0 2 024 6
Figure 6.5. The frequency coverage of FT, STFT and WT.

:F\i~m, I
-2'-----~-~~-~-~--'
o 2 3 4 5
_:~ o 2 3 4 5
the appearance and disappearance of the band of wavelet coefficients at
higher scales. One can also see clearly the difference in the two frequencies
at the lower scales.
2,---------------;
coif3

A
'I
-2
0 5 10 15

Figure 6.4. The scaling (left) and wavelet (right) functions for four wavelets,
Haar, Daubechies 4, Symlet 3 and Coifiet 3.

the power of frequencies present in each window. Furthermore, while the CWT of Signal 1 CWT of Signal 2

tiling of STFT is linear, the tiling of WT is logarithmic [235], indicating


that the building blocks in two decompositions are different, and frequency
localization in WT is proportional to the frequency level. Thus, for vVT,
time localization gets finer in the highest frequencies. The multiresolution
decomposition concept that will be discussed next allows for an arbitrary
tiling of the time-frequency plane.

Example The CWT of the signals defined in Eq. 6.6 and Eq. 6.7 are 100 200 300 400 500

shown in Figure 6.6. Here, the Daubechies 4 (db4) wavelet is used. The
CWT yields a three-dimensional representation similar to that of STFT
Figure 6.6. Two signals and their corresponding CWT.
and easily discriminates between the two signals, correctly pinpointing the
times at which the signal frequency changes. For h (t), since all frequency
components are present for the entire duration, two prominent bands are
observed spanning the time axis at different scales. Note that since there are 6.1.3 Discrete Wavelet Transform
two higher-frequency components of the signal, the band at the lovver scales
appears fuzzy as one frequency masks the other. For h(t), the appearance The CWT results in wavelet coefficients at every possible scale. Thus,
and disappearance of the low-frequency behavior are delineated clearly by there is a significant amount of redundancy in the computation. But there
124 Chapter 6. Characterization of Process Signals 6.1. Wavelets 125

is an easy way to obtain \VT, which is called the discrete wavelet transform
(DWT). DWT is a special case of the WT and is based on dyadic scaling
and translating. FOr most practical applications, the wavelet dilation and
translation parameters are discretized dyadically (s = 2j , u = 2.1 k;).
A process signal z(t) can be represented through DWT as follows [44]:

jo
z(t) = I: ,k(t) + I: I: dj,k Pj,k(t)
1 (6.14)
k j=-= k
DECOMPOSITION
with

Ck == J z(t)cPjo,k(t)dt

dj,k == J (t)dt

Here, the wavelet function, ?jJ(t) , and the scaling function, cP(t) (see Figure
6.4), are defined as

(t) 2- j / 2 '1jJ(T j t - k;) (6.15)


cP.1o,k(t) Tj/2 cP (T jOt k), j,k E Z (6.16) RECONSTRUCTION

In this representation, integer j indexes the scale or resolution of analy-


Figure 6.7. The signal decomposition and reconstruction using FIR filters.
sis, i.e., smaller j corresponds to a higher resolution, and jo indicates the
Note that ,9 and 11 are the dual filters and S is the reconstructed signal.
coarsest scale or the lowest resolution. k indicates the time location of
the analysis. For a wavelet cP(t) centered at time zero and frequency wo,
the wavelet coefficient dj,k measures the signal content around time 2j k scaling functions can be expressed as:
and frequency 2-jwo. The scaling coe,fficient Ck measures the local mean
around time 2jo k. The DWT represents a function by a countable set of ?jJ(t) = hI: h(k)?jJ(2t - k) (6.17)
wavelet coefficients, which correspond to points on a 2-D grid of discrete k
points in the scale-time domain.
cP(t) = hI:g(k)'ljJ(2t - k) (6.18)
Mallat [182] proposed an algorithm, referred to as the multiresolution k
signal decomposition (MSD), to efficiently perform D WT. Its basic idea is to
use a low-pass filter (see Section 6.2.1) and a high-pass filter to decompose This approach greatly facilitates the calculation of wavelet and scaling co-
a dyadic-length discrete signal (time series) into low frequency and high efficients as typically implemented in the Matlab® Wavelet Toolbox [199].
frequency components, respectively. As shown in Figure 6.7, for a signal One can associate the scaling coefficients with the signal approximation,
S consisting of 128 points, one also performs a down-sampling operation and the wavelet coefficients as the signal detail.
to reduce the number of points in each scale by half. It is noted that, for Due to the down-sampling procedure during decomposition, the number
discrete signals, the upper limit for the scales is bounded by the maximum of resulting wavelet coefficients (i.e., approximations and details) at each
number of available details in the signal. level is exactly the same as the number of input points for this level. It is
One can show that the relationship between high-pass and low-pass sufficient to keep all detail coefficients and the final approximation coeffi-
finite impulse response (FIR) filters and the corresponding wavelet and cient (at the coarsest level) to be able to reconstruct the original data. The
126 Chapter 6. Characterization of Process Signals 6.2. Filtering and Outlier Detection 127

signal reconstruction involves the reverse procedure and up-sampling which


inserts zeros in between signal values from the previous level (see Figure
6.7).
Example The DWT of the two signals defined in Eq. 6.6 and Eq. 6.7 are

'f2SZ\Z0\J
shown in Figures 6.8 and 6.9, respectively. Some random noise (zero mean,
unit variance) is also added to each signal to distort the original features
slightly. Figures show the approximate and the detail coefficients of the
MSD for a three level decomposition. Following observations are made: ·2 5 10 15 2;J 25 30 35 40 45 50
,:~
-i.. 5 10 15 20 25 30 35 4-() 45 50

• The detail coefficients show the strength of the signal component


removed at each scale level. Especially in Figure 6.9, one can clearly
see how the first level removes the noise components followed by signal
components with distinct frequency behavior. 'M?VSN
-2 5 10 15 20 25 30 35 40 45 50

• Each approximate signal level depicts a coarser approximation of the


signal, with the last level (level 3) showing the key underlying signal
feature (mean).
,:~
'" 5 10 15 20 25 30 35 40 '\5 50

• As one can see in the detail signals, each level represents a band-pass
filtered signal, thus comprising a range of characteristic frequencies. Figure 6.8. The DvVT of noisy signal 1 using Daubechies 4 wavelet.

• While high-frequency noise is expected to be removed at the first


decomposition level, one can see that the effect of noise persists in all • An important property of wavelet bases is their lack of translation-
scales. al invariance. In other words, when a pattern is translated, its de-
scriptors are not only translated but also modified. This is a direct
There are a few issues that any user should be aware of in applying the consequence of the down-sampling procedure and leads to distorted
WT to the signals of interest. Below, some of these issues are highlighted reconstruction of the underlying signal features. A possible solution is
(the reader is referred to the references mentioned earlier for a more detailed to omit down-sampling, resulting in a redundant family of coefficients.
discussion) :

• It is not a straightforward task to come up with a procedure that 6.2 Filtering and Outlier Detection
would lead to the best mother wavelet for a given class of signals. N-
evertheless, exploiting several characteristics of the wavelet function, The measurements of process signals inherently contain noise that consist-
one can determine which family of wavelets would be more appropri- s of random signal disturbances interfering with the actual signal. The
ate for a specific application. signal noise can be due to variations in voltage, current or the measure-
ment technique itself. If the signal-to-noise ratio (SNR) is small, one may
• As a general rule, all orthogonal wavelets lack symmetry. This be- encounter misleading or biased results in subsequent data analysis steps.
comes an issue in applications such as image processing where sym- Thus, denoising (noise filtering) is a crucial step in signal analysis that is
metric wavelets are preferable. The symmetric wavelets also facilitate aimed at removing random signal behavior and producing a clean signal
the handling of image boundaries. that contains relevant process characteristics. Numerous techniques have
• Dealing with boundaries becomes an issue in wavelet analysis of finite been proposed for filtering process signals, going back to the seminal paper
array of data. These edge effects or singularities can be avoided by by Kalman [135] and others, [221, 309].
using adaptive filters near the signal boundaries. In addition to noise, process data may also contain outliers (gross er-
128 Chapter 6. Characterization of Process Signals 6.2. Filtering and Outlier Detection 129

Signal 2 signal and fj(k) is the noisy signal that one observes. The filter equation
represents a first-order difference equation given by,
y(k) = (3fj(k) + (1 (3)y(k 1) (6.19)
Here, fJ(k) represents the estimate of the true signal. Further, (3 is the filter
constant, or, in other words, the filtering bandwidth. By a judicious choice
of (3, one can remove high-frequency noise components from the signal and
retain the relevant signal characteristics. Figure 6.10 shows the frequency
response of a first-order filter and how the bandwidth changes as a function
of

<0 ,:~ 71
·2 5 10 i5 20 25 30 15 40 45 50

Figure 6.9. The DvVT of noisy signal 2 using Daubechies 4 wavelet.

rors) that may comprise up to 10 % of the data points in low-quality data


sets [101]. In practical applications, one might consider the observations ex-
ceeding five standard deviations as outliers, and in univariate data sets, the
outliers can be easily identified by visual inspection. However, for higher di- Figure 6.10. The frequency response of a low-pass filter with different filter
mensional data sets, this task becomes cumbersome and often intractable. constants. ,3 = 0.1 (solid), (3 = 0.5 (dash), /3 = 0.9 (dashdot)
Due to their influence on the actual signal characteristics, outliers need
to be removed in a systematic manner, often through methodologies that One drawback of the first-order filter is associated with the slope of the
perform this task in an unsupervised manner [26, 57]. frequency response that indicates how sharp the cut-off is for high frequency
components. Since the frequency response attenuates rather slowly at high
frequencies, this would create a somewhat ineffective denoising performance
6.2.1 Simple Filters if the SNR is relatively low.
In the signal processing literature, there are a number of filtering techniques Example Figure 6.11 shows a noisy signal and the effect of ,3 on the
that can be adopted for a variety of purposes. Here, only two of them will denoising performance. As one can observe, the closer the value of ,3 is to
be considered, chiefly in the context of denoising, namely the low-pass filter 1, the less effective the denoising is, since the bandwidth becomes larger,
and the median filter. The goal is to introduce the reader to the concept of almost reproducing the original signal. Yet, smaller values of ,3 may also
filtering and also prepare the groundwork for wavelet filtering techniques be ineffective as the filter tends to remove relevant signal characteristics as
to be discussed next. the bandwidth gets smaller, resulting in the removal of signal components
The so-called low-pass filter removes the high frequency components of at moderate to low frequencies.
a siO"nal and was referred to earlier in Section 6.1.3 (see also EvVMA charts Since the test signal is known in this case, the performance of the fil-
b
in Section 2.2.4). Suppose that y(k), with k = 1, ... n, represents the true tering methods can be evaluated by measuring the fidelity of the denoised
Chapter 6. Characterization of Process Signals 6.2. Filtering and Outlier Detection 131
130

adjustable parameter for MM, and defines the quality of denoising as will

w~ Actual Signal be illustrated in the example below.


Example Figure 6.12 shows the performance of the MM filter for three
: 10 different choices of the window length. It can be seen that a smaller window
is almost as good as the low-pass filter with fJ = 0.5 and longer window

':~
00 ~ 100 1~ ~ 2~ 300 3~ ~O ~O WO
lengths actually produce a signal estimate much closer to the actual signal
as the lower MSEs indicate.

l;;;:=;~:;~;::~ f;~/\r;v~~
: Do 50 100 150 200 250 300 3.50 400 450 500

':~
.~
, beta=O.5,mse=6S

. OJ 50 100 1:;.1) 200 250 30-0 350 400 450 500

E~~
< beta=O.9. mse=281

:.,0 un 50 100 150 200 250 300 350 400 4:50 500

Figure 6.11. The actual test signal, the test signal with noise and its de-
noised estimates using three different filter constants.
:~
/0
10
50 100 -:5D 200 250 300 . 350 400 450 500

signal to the original signal. The mean square error (1\1 SE) can be calcu-
.~
°0 50 100 "ISO 200 250 300 350 4(10 ,c1-50 500

lated as,
MSE= (6.20) Figure 6.12. The actual test signal, the test signal with noise and its
n denoised estimates using the moving median filter with different window
As shown in Figure 6.11, the lowest lVISE is associated with (3 = 0.5 and lengths.
one can judge the quality of denoising visually as well. . .
Another simple filter is the moving median (MM) filter first developed
by Tukey [299]. In this filtering technique, the median of a ':rindow ~on­
taininO" an odd number of observations is calculated as the wmdow shdes 6.2.2 Wavelet Filters
over the entire signal. As a result, the original signal is freed from noise as Wavelet-based denoising methods involve taking the discrete wavelet trans-
well as from outliers. Davies [47] showed that the MM filter could handle form (DvVT) of a signal, passing the resulting wavelet coefficients through
signals that have moderate or high SNR, or conta~inatedw~th ~lOise, which a thresholding step and then taking the inverse DWT (Figure 6.13). If a
comes from asymmetric distributions. The MM filter equatlOn IS expressed signal has its energy concentrated in a small number of wavelet dimensions,
as, the magnitude of its coefficients will be relatively large compared to noise
y(k) = med (f)[l - w/2]' ... , y[l + w/2]) (6.21) components that have their energy spread over a large number of coeffi-
with I = w/2 + 1,w/2 + 2, ... n w/2 and w + 1 is t~e
window le:lgth. cients. This implies that, during thresholding (or shrinkage), the wavelet
Similar to the filter constant of the low-pass filter, the wmdow length IS the transform will remove the low amplitude noise or the undesired signal com-
132 Chapter 6. Characterization of Process Signals 6.2. Filtering and Outlier Detection 133

ponents in the wavelet domain, and an inverse wavelet transform will then coefficients for the first two levels of the wavelet decomposition are selec-
reconstruct the desired signal with little loss of relevant features. tively denoised. As the number of decomposition levels (scales) increases,
the reconstructed signals tend to become smoother, losing some of the rel-
.evant features. Thus, the MSE significantly improves compared with the
Thresholding & Inverse 1NaveJet
performance of the simple filters. One can also see that the SureShrink
..... Transform
Discrete Wavelet
(OWT) f-+ Shrinkage f-+ Transform (IWT) 1+ with db8 performs the best for this application.

Figure 6.13. Wavelet-based denoising strategy.


F;~~JV~~
°0 50 100 150 200250 300 350 400 450 500

"~
The following thresholding methods can be defined:
1. The har-d-thresholding filter, ·PH , selects wavelet coefficients that 5

exceed a certain threshold and sets the others to zero:


°0 50 1\10 150 200 250 300 350 400 450 500

:~
Idl;::: T (6.22)
otherwise
°0 50 100 150 200 250 300 350 400 450 500
? The soft-thr-eshold'ing filter, F L , is similar to the hard-thresholding
filter, but it also shrinks the wavelet coefficients above the threshold, o~
5

d- T Idl;::: T °0 50 100 150 200 250 300 350 400 ..50 SOQ

Idl < T
"~
FL(d) = 0 (6.23)
{ d+T
Idl:'S: T 5

°0 50 100 150 200 250 300 350 400 450 500


The soft-thresholding is often preferred as the hard-thresholding has dis-
continuities that introduce artifacts to the denoised signal. The next step
is to determine the threshold value, T. Figure 6.14. The actual test signal, the test signal with noise and its de-
Donoho and Johnstone [56] suggest T = J2(J"~log(n) for thresholding noised estimates using VisuShrink and SureShrink.
(also called the universal threshold). Here, (J"~ is the estimate of the noise
variance and n is the length of the time series. If soft-thresholding is used
in conjunction with this threshold, then the estimates with high probability
are as smooth as the original ones and with small values for their risks in 6.2.3 Robust Filter
both the bias and the variance. This approach is referred to as VisuShr-ink The performance of denoising methods tends to deteriorate in the presence
by Donoho and Johnston [56], in reference to the good visual quality of the of outliers. Doymaz et al. [59] proposed a robust filtering strategy that
reconstruction obtained by choosing the appropriate threshold and simple uses a median filter (MM) (see Section 6.2.1) in tandem with the coefficient
'shrinkage' of wavelet coefficients. However, it may overly smooth the signal denoising method [7]. Here, this strategy is briefly reviewed and its key
for large n. Another shrinkage approach is referred to as Sur-eShr-ink: that benefits for denoising are pointed out.
uses a hybrid of the universal threshold and the SURE threshold along The robust filtering strategy is depicted in Figure 6.15 in which the
with soft-thresholding, and is derived from minimizing Stein's unbiased primary goal of MM is to remove outliers so that the wavelet denoising
risk estimator [282]. step can be more effective. It should be recognized that while MM removes
Example Figure 6.14 displays the denoising performance of VisuShrink outliers, it also removes some noise elements thus complementing the sub-
and SureShrink strategies using two different wavelets. Here, the wavelet sequent denoising step. This is an important issue, even in the absence
Chapter 6. Characterization of Process Signals 6.3. Signal Representation by Fuzzy Triangular Episodes 135
134

of outliers, since the decision regarding the length of the moving window Table 6.1. 1\;18 E values for various filtering strategies.
becomes less critical and choosing it as 3 often suffices.
Wiener Thresholding Robust Filtering
4.422 0.4768
Dlscrete Wavelet
..... Transform (DWT) f---+
DWT on Wavelet
Coefficients
Gaussian white noise, N(O, 1), and 1% outliers were added using Poisson
~ distribution. The results [59] show that the robust filtering approach per-
forms quite well. Figure 6.16 depicts the visual performance of the robust
Thresholding &
Shrinkage filtering strategy while Table 6.1 demonstrates how 1'v18E is minimized
with this tandem approach. Using just a simple MM filter, the signal is
~ freed of outliers and the noise quite satisfactorily, while the wavelet 'Wiener
IVVT onWave!et Inverse Wavelet
thresholding suffers from the presence of outliers significantly. The tandem
Coefficients f-+ Transform (IWT) f.+ approach appears to achieve the minimum 1\18E.

60 r---.,.--~-.,.---~~--~-~
40
Figure 6.15. The schematic of the robust filtering strategy.
40

The denoising step in Figure 6.15 uses a novel filtering scheme that uses
two wavelet shrinkage stages. In addition to the traditional thresholding
and shrinkage, the coefficient denoising method uses a second thresholding
and shrinkage step for the wavelet coefficients. It has been proven that this
strategy has better denoising performance [7] when the coefficient denoising -20 L-~_~_~_~_-.<J

02 04 0.6 0.8 0.55


step uses the Wiener filter which requires the knowledge of the statistics
of the signal and the noise. An approximation of the optimal Wiener filter
can be obtained using the diagonal elements as, Figure 6.16. The robust filtering technique performed on the Bumps bench-
mark signal (grey solid) with noise and outliers. The dashed signal is the
d2
Fw(d) = d2. 2 (6.24) signal estimate. Reprinted from [59]. Copyright © 2001 with permission
+ (j
from Elsevier.
The noise variance (j2 can be estimated from the wavelet coefficients of the
noisy signal and then used in Eq. 6.24 [90]. It is known that the larger the
variations in the input data power (relative to noise variance), the greater
the loss in performance due to simple thresholding compared to optimal
6.3 Signal Representation by Fuzzy Triangu-
\;Viener weighting. In general, if the signal is smooth, it will have a larger lar Episodes
energy spread over the scaling coefficients, resulting in a substantial per-
formance loss. Thus, the coefficient denoising approach, by taking double The analysis of process signals may be facilitated if the time series data can
transformation and using \Viener thresholding in the space of coefficients, be cast into a symbolic form. The relevant trends and generic data features
spreads the energy less over the detail coefficients, resulting in better perfor- can then be extracted and monitored using this qualitative representation.
mance than simply using the 'Wiener shrinkage. Naturally, this procedure Such a transformation is often carried out by defining a set of primitives (al-
cannot be continued further, because the signal would then start to lose its phabet) that define a visual characteristic of the signal [78, 142,247]. Here,
fundamental features. the methodology proposed by Stephanopoulos and coworkers is discussed
Example The Bumps benchmark signal is contaminated with additive [9, 34, 35]. They treated the problem of trend representation graphically
136 Chapter 6. Characterization of Process Signals 6.3. Signal Representation by Fuzzy Triangular Episodes 137

by utilizing a declarative language based on the notion that at the extrema


Concave Downward Concave Downward Concave Upward
or inflection points, the first or second derivatives, respectively, are zero. Concave Upward
Monotonic Increase ililonotonrc Decrease MonotOnic Decrease Iv!onotonic Increase
Thus , an episode , ErLa, b"j! is described as any part of a signal or process trend
given by
E[a,b] = {(t a, Ya), (tb' Yb,)} (6.25)
+
+
with a constant sign of the first and the second derivatives, + +

. (88t
s7gn - Y)
[a,b]
= constant; sign ( 8--~ )
8t
2y
[a,b]
= constant (6.26) Linear Increase linear Decrease Constant

Here, the time series segment is defined by the time duration of the episode, G
[t a , tb], and the signal magnitude, [Ya, Yb] (Figure 6.17). For each episode, a [dy]= + [dy]=
triangle is created, where one side of the triangle is constructed by drawing [ddy]= 0 [ddy]= 0 [dy]=
a line between the two end points of the episode. The other sides are drawn
by connecting the tangents of these endpoints, up to the point where the Figure 6.18. Definition of primitive shapes. Reprinted from [340]. Copy-
slopes intersect. It is noted that this is a semi-qualitative representation right © 1998 with permission from Elsevier.
because the positions of the endpoints (duration, magnitude) as well as the
slopes of the tangents to the curve at the endpoints are also retained.
wavelet analysis and scale-space filtering in conjunction with the triangular
representation. vVong et al. [340] suggest simple wavelet denoising as a
~
y(t)
I I I I i prelude to episode construction.
[dxj=+ To obtain a fully symbolic representation of the time series, Wong et
Vi
[ddxj=+

I I ,/
W i
I oJ [340] propose a fuzzification procedure. In fuzzy logic [346], the basic
premise is to characterize the mapping of a set of inputs to a set of outputs
Ys I It --~"- - -[-- by using a set of if-then rules. An attractive feature of this approach is
its ability to convert numeric data into linguistic variables. A membership
time function is used to define how well a variable belongs to the output based
on the degree of membership between 0 and 1. When a set of inputs is to
Figure 6.17. Definition of an episode. Reprinted from [340]. Copyright be mapped to a set of outputs, a combination of if-then rules, membership
© 1998 with permission from Elsevier. functions and logical operators are used in order to create a fuzzy model.
Based on the if-then rules and the logical operators (AND, OR, etc.), these
Cheung and Stephanopoulos [34] were able to reduce the time series inputs can be combined to generate an output for each rule.
into a semi-qualitative form using seven primitive shapes that consist of In Wong et al. [340], the quantitative values of magnitude and duration
four triangles and three straight lines (Figure 6.18). of a triangular episode are fuzzified using symbolic variables (small, medi-
A drawback of this method is its sensitivity to high-frequency noise um, and large) expressed by two membership functions (Figure 6.19). vVith
in the time series, thus a filtering step becomes necessary. Cheung and this technique, each triangle or line in Figure 6.18 can be transformed into
Stephanopoulos [35] overcome this problem by using a filtering process nine different qualitative triangles or lines. Figure 6.20 shows the triangle,
described as qualitative scaling. In this geometric approach, the origi- A, now expressed as nine new triangles. For example, smA represents a
nal sequence of letters is sequentially reduced by approximating a sub- symbol with the characteristic shape described by the letter A that is s-
sequence of letters within the original sequence by a trapezoid. Bakshi mall, s, in magnitude and medium, m, in duration. All letters are similarly
and Stephanopoulos [9] point out that this is a heuristic formulation and fuzzified with the exception of G which represents a straight line with no
lacks computational speed. They offer an alternative strategy by utilizing change in magnitude. Hence, one only has three new fuzzified lines, sG,
138 Chapter 6. Characterization of Process Signals 6.4. Development of Markovian Models 139

mG. and IG. Now. a new alphabet emerges with 57 symbolic characters. on this subject. The most notable applications of HMMs are found in the
The new alphabet· is more versatile because it allows the comparison of fields of automatic speech recognition (ASR) [117, 240, 239] and bioinfor-
the sequences based on the size of the characters as well as their shap~. matics [65, 144]. In ASR, the goal is to differentiate among a vocabulary of
This symbolic representation will be the basis for a process trend analysIs spoken words while recognizing the same words spoken by different people.
strategy that will be introduced in Section 7.1. In biological sequence matching, one attempts to match unknown sequences
of amino acids to a known family of proteins. Given the ability of HMMs
'\ to model time series data, a number of studies on fault detection and trend
0.9
{J,,8
\
small \ /
/ \ ;1/
\ large
analysis have been reported [277, 286, 340].
Before introducing the HMMs, it is imperative to understand the con-

A~;"J\
0.7
0.6 cept of Markov chains and the probability models. Next subsection provides
0.3 a brief account of these topics.
0.4
0.3
0.2
\. \ "-. 6.4.1 Markov Chains
A (first-order) Markov process is defined as a finite-state probability model
in which only the current state and the probability of each possible state
Figure 6.19. Membership functions for the duration and .magnitu~e.of prim- change is known. Thus, the probability of making a transition to each state
itive shapes. Reprinted from [340]. Copyright © 1998 WIth penmsslOn from of the process, and thus the trajectory of states in the future, depends
only upon the current state. A Markov process can be used to model
Elsevier.
random but dependent events. Given the observations from a sequence of
events (Markov chain), one can determine the probability of one element of
the chain (state) being followed by another, thus constructing a stochastic
V ~ ~ model of the system being observed. For instance, a first-order Markov
ssA smA s/A chain can be defined as
[7
msA mlA (6.27)

Here, we used the shorthand notation qt to denote the value of q at time t


IsA ImA IIA as q(t) in an attempt to simplify the subsequent expressions. Here, t is used
to denote the traditional use of the time instant in the Markovian modeling
literature. The notation P(::rly) indicates the conditional probability of
Figure 6.20. Extending the alphabet to include th.e :uzzified trian~le A.
observing ::r, premised on the presence of y. The change of the states is
Reprinted from [340]. Copyright © 1998 with permIssIon from ElseVIer.
captured by transition probabilities, ai), given by

= Si), lSi,jSM (6.28)


6.4 Development of Markovian Models where jl;f represents the number of states. The transition probabilities
satisfy the following relationships:
The Hidden Markov Model (HMM) is a powerful statistical tool for mod-
elino· a sequence of data elements called the observation vectors. As such,
ai) > 0 (6.29)
extrbaction of patterns in time series data can be facilitated by a judicious
IVI
selection and training of HMMs. In this section, a brief overview will be pre-
LOi) 1
sented and the interested reader can find more details in numerous tutorials
)=1
140 Chapter 6. Characterization of Process Signals 6.4. Developll1ent of Markovian Models 141

Figure 6.21 shows a cyclic three-state Markov chain, J.L {51, 52, 53}' Exall1ple The classic example of a Markov chain is the weather pattern
Given an initial state and the matrix of transition probabilities, one can modeling [240]. Again, consider a three-state Markov model in which the
not only estimate the state of the chain at any future instant but can also states, characterizing the weather on any given day t, are given as follows:
determine the probability of observing a certain sequence, using the state State 1: rainy; State 2: cloudy; State 3: sunny. The state transition
transition matrix. The examples below demonstrate these cases. probability matrix is defined as,

0.40 0.30 0.30 ]


A = 0.20 0.60 0.20 (6.31)
[ 0.10 0.10 0.80

The goal is to determine the probability of observing the sequence 'cloudy-


rainy-sunny' for the next three days, if the weather today is sunny. In
other words, one is interested in the probability of the observation sequence,
0= {53, 52, 51, 53}.

P( OliVI odel) P(53' 52, 5 1,5311\11 odel)


P(53)· P(52153)' P(5Ij52)' P(5315I)

1· (0.2) . (0.3) . (0.1)


0.006(0.6%) (6.32)

Here, the notation 71i = P(ql = 5i) is used for the initial state probability.

Figure 6.21. A three-state Markov model.


6.4.2 Hidden Markov Models
Exall1ple An analysis is presented where the consequences of brand Hidden Markov models (HMMs) are doubly stochastic in nature. In other
switching between three different brands of laundry detergent, X, Y and words, the sequence of states, 5 = 5 1,52,53, ... , 5M, of a Markov chain are
Z are explored. A market survey is conducted to estimate the following unobservable yet still are defined by the state transition matrix. In addition.
transition matrix for the probability of moving between brands each month:

0.80 0.15 0.05]


each state of the Markov chain is associated with a discrete output symbol
probability that generates an observable output sequence (outcome), °
01, 02, ... , 0T with length T. HMMs are finite because the number of states.
=

A = 0.05 0.90 0.05 (6.30)


[ 0.25 1\;1, as well as the number of observable symbols V = VI, V2, ... , V L of an
0.70 0.05 output alphabet, i.e., L, remain fixed for a particular model. Since it is only
the outcome, not the state visible to an external observer and the states are
For the first (current) month, market shares are given as 40%,25% and 35%
'hidden' to an outside observer, such a system is referred to as the Hidden
for brands X, Y and Z respectively. This establishes the initial condition,
Markov Model.
51 = [0.40,0.25,0.35]. The expected market shares after three months have
elapsed will be estimated. Hence, after one month has elapsed, the state of Exall1ple The concept of HMMs can be best explained by the urn-and-ball
the system is given as 52 = 5 1A = [0.38,0.36,0.26] and after three months example discussed by Rabiner [240]. Consider a collection of urns where
2 each urn contains a different proportion of colored balls which defines the
have elapsed the state of the system is given as 54 = 53A = 52 A
[0.318,0.5084,0.1736]. Note that the elements of 54 add to one as required. probability of drawing a specific colored ball from that urn. The data are
Hence, the market shares after three months have elapsed are given as generated by drawing a colored ball from an urn, and then based on that
31 50.84% and 17.36% for brands X, Y and Z, respectively. selection, a new urn is chosen and a another ball is drawn. The process
142 Chapter 6. Characterization of Process Signals 6.4. Development of Markovian Models 143

is continued until a sequence of balls is generated. In this process, the where


sequence of the chosen urns is not announced (thus, hidden) and only the
sequence of balls is known (observed). {P(Sil t = In
A {aij} {P(qt+1 = Sjlqt = Sin (6.34)
B {bj(ln = {P(vlattlqt = Sjn

An initial state distribution, 7T, defines the probabilities of beginning the


observation in each state. The matrices A and B are the probability density
distributions of the state transitions and the observation symbols, respec-
tively.
The number of states is usually unknown, but some physical intuition
about the system can provide a basis for defining 1\;[. Naturally, a small
number of states usually results in poor estimation of the data, while a large
number of states improves the estimation but leads to extended training
times. The quality of the HMM can be gauged by considering the residu-
als of the model or the correlation coefficients of observed and estimated
values of the variables. The residuals are expected to have a Normal distri-
bution (N(O, (]"2)) if there is no systematic information left in them. Hence,
the normality of the residuals can provide useful information about model
performance in representing the data.
The number of observation symbols is more definitive as it corresponds
directly to the possible outcomes of the system being observed (e.g., the
number of different colors that the balls would have in the urns). There are
Figure 6.22. A three-state HMM showing the state transitions and the out- three key problems that need to be solved: training (learning), evaluation
put probabilities. Reprinted from [340]. Copyright © 1998 with permission and state estimation.
from Elsevier. For the evaluation problem, the probability of an observation sequence
0= 01,02, ... , 0T is determined, given the model A, P(OIA). This probabil-
To illustrate the construction and properties, the three-state Markov ity can be found using the forward part of the inductive forward-backward
model depicted in Figure 6.21 is extended to express a HMM as given algorithm (Baum-Welch algorithm [13]), which is initialized by
in Figure 6.22. The key difference is that each state now has a set of
observation symbols, along with the probability of observing that symbol, (6.35)
01, from a given alphabet of symbols, V, in that state i, P(oIISi).

Example For the ball-and-urn example, the observations clearly are the where at (i) is the forward variable,
balls that are drawn and each state is represented by an urn. Thus, the
(6.36)
observation probability corresponds uniquely to the specific urn from which
a ball is drawn. The three-state HMM given in Figure 6.22 can be used to Then, using the forward inductive equation, the induction step is per-
model the observed symbols (balls) by estimating a set of HMM parameters. formed:
A HMM, denoted as A, can be uniquely described (parameterized) by
lVI. the number of states, L, the number of observation symbols, and three [~at(i)aij] bj(ot+d (6.37)
probability measures, 7T, A, and B.
where at(j) is the probability of being in state j and observing the partial
A (7T,A,B) (6.33)
symbol sequence 0 = 01,02, ... , 0t up to time t, given the HMM, A. Then,
144 Chapter 6. Characterization of Process Signals 6.5. Wavelet-Domain Hidden Markov Models 145

we have In the state estimation problem, the aim is to find a state sequence that
M best explains the real observations. For this, a new variable I is defined in
P(OI.\) = L aT(i) (6.38) terms of the forward (a) and backward (.8) variables of the Baum-Welch
i=l algorithm:
which yields the final result. This algorithm establishes the basis for the (6.46)
classification of faults as will be illustrated in Sections 7.1 and 7.2.
In the training pmblem, the model parameters are estimated that best
where
describe the observation sequence. In other words, the observation sequence IVI
is used to train the HMM by adjusting the model parameters. This training L~(t(i) = 1 (6.47)
is again accomplished through the Baum-Welch algorithm [13] that uses the i=]
maximum likelihood estimation approach to adjust the parameters, K, A, The new variable, ~(f( i) represents the probability of being in state i at time
and B in order to maximize P(OI.\). The backward part of the forward- t. The size of the matrix I is T x 1\1£. One can then find the most likelv
backward algorithm is used in the training step and initialized with state at time t using It (i): "
(6.39)
qt = arg maxbt (i)], (6.48)
l::;i::;T
where ,6k (i) is the backward variable,
·While this provides the most likely states for each t, there may be a problem
(6.40) with the state sequence obtained from this algorithm, as the algorithm
ignores the probability of occurrence of sequences of states. This can be
The inductive backward equation is given by remedied by using the Viterbi algorithm. Further details of these algorithms
can be found in [240]. Many algorithms are also developed as Matlab®
M
(6.41 ) toolboxes, a notable one being the toolbox by Thorvaldsen [295] that focuses
(i) = Laijbj(ot+d.6t+l(j)
on the solution of problems in bioinformatics.
i=]

The combination of the two inductive parts are essential in the re-estimation
of the parameters of the Hl\IM. By maximizing the auxiliary function 6.5 Wavelet-Domain Hidden Markov Models
In DvVT, the scaling coefficients are decomposed iteratively at each scale
Q(.\I'\) = L P(QIO, .\)log[P(O, .\1'\)] (6.42)
(Figure 6.7), clearly showing the dependency between adjacent scales. For
Q
orthogonal wavelet decomposition, it is expected that the wavelet coeffi-
the re-estimation formulas cients are uncorrelated bet\veen scales. However, for most practical ap-
plications, there is a residual dependency after the signal decomposition,
a] ( i),6 1 (i) (6.43) even though the dependency of the wavelet coefficients may be local. This
L~--l] at (i)aijb j (ot+l),6t+l(j) means that, for scaling and wavelet coefficients, there exists a dependency
aij (6.44) within and acmss the scales. This is consistent with the clustering and
L~=-ll at (i),6 t;(i) persistence properties of the wavelet coefficients [43] that state that for a
L~--]~5.t.Ot=OI at (i)8t (j) large (small) wavelet coefficient, the adjacent coefficients are also likely to
(6.45) be large (small) and that such values propagate across scales.
Given the dependency of the wavelet coefficients, one still has to find the
can be derived. The use of three re-estimation formulas guarantees that the appropriate framework for modeling their probability density functions. A
new probability P(OI,\), using the estimated parameters, is greater than or Gaussian model is not appropriate since the wavelet decomposition tends
equal to the prior probability P(OI).,)· to produce a large number of small coefficients and a small number of
6.6. Summary 147
146 Chapter 6. Characterization of Process Signals

exists from the root to the leaves through the branches. The HMT model is
lar"e coefficients the very property that one takes advantage of in data specified via the parameters (for the node i), fL7', oT and the initial,
co;pression and'denoising. Alternatively, the marginal probability. of.each P(Si = m), and the transition probabilities, ar;'·n = P(Si = mISp(i) = n).
coefficient can be represented by a mixture density. Instead of assIg~m~ a Here m and n denote the two states. The subscript p( i) refers to the parent
statistical model to wavelet coefficients, Crouse et al. [43] suggest assIgmng node, hence Sp(i) is the parent state. Consequently, the HMT model is
a set of states to each coefficient and then associating a probability density defined via the following parameters,
function with each state, f (w IS). Here, one can choose a two-state model
in which the coefficients can belong either to a high-variance state, f(wlS = • 7Tl = P(SI = m) is the probability mass function for the first node.
1). or to a low-variance state, f( wlS = 2). This yields a two-state zero-~ean • P(Si = mISp(i) = n) is the conditional probability that Si is
Gaussian mixture model. It should be noted that to enhance the fidelIty of in state m given its parent Sp(i) is in state n.
the fit [279], more complex mixture models (even with nonzer? means) can
also be used naturally at the expense of increased computatlOnal burden. • fL7' and CTr' are the conditional mean and the standard deviation,
This fram8\~ork also allows the use of non-Gaussian densities [245]. respectively, of the wavelet coefficient Wi at the ith node, given Si is
Next, the dependencies among the wavelet coefficients need to be ~e­ in state m, with f(WilSi = m).
fined. Given the persistence and clustering properties alluded to e~rlIer, The training problem determines the set of model parameters given
it appears logical to assume J\larkovian dependencies between the ~dJac~nt above for an observed set of wavelet coefficients. In other words, one first
state variables (not the wavelet coefficients). Such. a str~c~ure gIves nse obtains the wavelet coefficients for the time series data that we are interest-
to the hidden Markov trees (HMTs). Note that thIS statIstIcal model, .as ed in and then, the model parameters that best explain the observed data
suggested by Crouse et al. [43], can also be used to describe depe~denCles are found by using the maximum likelihood principle. The expectation
among the scaling coefficients (albeit with nonzero ~eans). F?r thl~ la~ter maximization (EM) approach that jointly estimates the model parameters
case. Gaussian mixture models can reasonably explam the salIent dIstr~bu­ and the hidden state probabilities is used. This is essentially an upward and
tion~ of the scaling coefficients. Here, the modeling of wavelet coeffiClent downward EM method, which is extended from the Baum-Welch method
will be considered for simplicity. developed for the chain structure HMM [43, 286].
For a limited amount of training data, to avoid over-fitting, a robust
training result can be achieved by assuming an identical distribution for a
1 ------------------~ t certain number of nodes, referred to as tying. Tying can be applied with-
=============~::i~~
:3.~
r

1
~-~~~-----------~
()-.-.

.M
4 -.r'I I
in and across the scales and increases the number of training data for a
certain distribution in the model by simplifying the model structure. Sim-
-------7~~,-------,-------~
..'~------i ply put, tying indicates that a certain number of nodes share the same
/I'\, 1 /1'\ I
/" I \,.... I ,/ I \. I
statistical distribution, the same number of states and the same distribu-
5>d : 6h{ : 7g : 8~ :
tion parameters. In signal denoising, one is interested in the shrinkage of
___ J,,~---~---4~~---l---J~~---~----f~~--~
/1\ t \ 1j\
I 11\ I I I noisy components, and the noise components are assumed to be identically
1: .
f I \\ I
1/
.l \:.
\lll\~
I l I \. I l
/1\1
I \. 1
distributed, therefore tying can significantly help capture such statistical
9
l\~
i' :lOh../-\:
ill : I
116
ffi : 13.~:
<1\ 12r)
l~
f
:14· 15 16.~. :
,,·, : 9 :
Q
/w,: / 'Iii \:,1 \~
Ili\II.1
l \:
features. In trend analysis, however, the signal trend plays a more impor-
tant role in characterizing the process failure, and thus, tying may distort
I
!
I
I
I
I
I I
l
I I the trend characteristics and will not be employed in the studies presented
in this book.
f
Figure 6.23. Tree structure HMM in wavelet domain as suggested in [43].
6.6 Summary
In this chapter, several signal characterization and modelling methods have
As shown in Figure 6.23, Crouse et al. [43] proposed a model, where the been introduced and discussed. It was shown that the wavelet transforma-
underlying backbone becomes a tree structure, and the Markov property
148 Chapter 6. Characterization of Process Signals

tion provides a time-frequency localization of a signal, allowing for the de-


tection of varying signal characteristics manifested by changing frequency
behavior over time. The use of wavelet transformation in signal denois-
ing has been demonstrated, especially in the context of outlier removal
7
and robust filtering. While this chapter focused on one-dimensional signal-
s. wavelet transformation can also be extended to two-dimensional signals
(i.e., images) where one can perform similar denoising and feature extrac-
tion tasks. In Chapter 10, the denoising implementation will be illustrated Process Fault Diagnosis
in an example concerning the full sheet profile in a paper machine. Wavelet
transformation and subsequent feature extraction of image data have been
studied for many years [285] and a recent direction is the study of nanoscale
features in atomic force microscopy (AFM) images [27, 80, 180]. The use The widespread availability of Distributed Control Systems (DCS) not only
of hidden Markov models allows the representation of process signals via provides the framework for advanced control applications but also greatly
probabilistic models and, when combined with triangular episodes and the facilitates the continuous monitoring of chemical processes to maintain safe
discrete wavelet transformation, facilitates the expression of specific signal and profitable plant operations. In most facilities, the plant operators are
characteristics. This forms the basis of trend detection and fault diagnosis asked to manage the operation in such a way as to ensure optimal produc-
strategies to be discussed in Chapter 7, next. tion levels, while attending to occasional alarm situations that may result
from equipment malfunctions. It is critical to identify such abnormal situ-
ations in a timely manner as there may be a potential for a safety hazard
that may affect not only the plant and its personnel but also the surround-
ing communities. Most operators traditionally relied on personal expertise
for such a task, and in some cases, the events exceeded the capabilities of
any human operator, thus leaving the plant vulnerable to costly shutdowns
and, in the worst case scenario, to possibly fatal accidents [216]. Today,
human expertise is complemented by computerized support systems that
comprise various data analysis and interpretation strategies that can pro-
vide guidance to the plant personnel for handling abnormal situations. The
key component of such a system is faiult detection and diagnosis (FDD) that
monitors the occurrence of process failures and identifies their root causes.

7.1 Fault Diagnosis Using Triangular Episodes


and HMMs
This section builds on the techniques described in Sections 6.2 and 6.3 to
offer a strategy for process trend analysis (Figure 7.1). The problem can
be stated simply as follows: given a set of known models of the process
operating conditions, determine the likelihood of a new set of observations.
The first step of the analysis is the training where Hidden Markov Mod-
els (HMMs) representing various operating behaviors are trained using la-
beled historical data from the process. In this section, three broad operat-

149
150 Chapter 7. Process Fault Diagnosis 7.1. Fault Diagnosis Using Triangular Episodes and HMMs 151

relation among the adjacent windows and creates the final assignment.
FS(k)=fuzzified sequence @ time k
r----+g,...!Sequence Matching
HMM

Time Correlated

Representation Classification

Figure 7.1. The trend analysis strategy using HMMs. The process i~for­
mation (measurement) at a time instant k is first expressed as a fuz~lfied
sequence (FS) and then processed through a classification step. Repnnted
Figure 7.2. A left-to-right HMM. Reprinted from [340]. Copyright © 1998
from [340]. Copyright © 1998 with permission from Elsevier.
with permission from Elsevier.

inC"b classes. namely normal abnormal and intermediate (transition between


I ., ,
The first HMM-based classifier (sequence matching) uses a subclass of
normal and abnormal), are used for labeling. The training step starts by HMMs called the left-to-right HMMs [240] (Figure 7.2). These models only
segmenting the labeled time series in windows, and an overlapping movi~g allow transitions to themselves or to the states to their right and the model
window (slice) is defined for the time series signal that will be analyzed for must begin in the first state. The output for each state will be based on the
process trends. The moving window enables the expression of the process 53-character alphabet of the fuzzy triangular representation. Starting from
trend for a discrete set of windows. The choice of the window length and the first state, a sequence of varying lengths can be modeled using different
the overlap period is problem dependent and will be discussed in the case combinations of state transitions, yielding a set of HMMs to represent the
studies. Next, the time series in the selected window is subjected to de- model classes (e.g., normal, abnormal and intermediate).
noising to eliminate any random behavior and to facilitate the subseque~t The structure of the second HMM-based classifier (time-correlated) is
construction of triangular episodes. Any filtering technique can be used m the three-state HMM given in Figure 7.3. Instead of using the fuzzy trian-
this step. The smoothed signal is then converted into semi-qualitative form gular episode alphabet as the output from the state, this classifier directly
by using the triangular episodes. Finally, this semi-qualitative sequenc~ is uses the probability of the sequence that was calculated using the sequence
transformed into a purely qualitative form by fuzzification of the quantlta- matching HMM and determines the probability of the class based on this
tive descriptors of the triangular episodes. With the data now being purely current window and the window of data just prior to it. The information
symbolic in the form of a sequence of letters, two tandem H~IM-based from the current window is utilized as well as the temporal information
classification methods are trained to determine whether the wmdow can from the past windows to calculate a corrected probability based upon the
be assigned to normal, abnormal or intermediate classes. The first HJ\IM- knowledge of how the entire sequence has propagated up to the current
based step classifies each time series segment (window) as being explained window [277].
by normal, abnormal or intermediate HMMs. In this step, each window is Once the relevant HMMs are trained, the trend analysis is carried out
treated as if it were independent of windows that come before it or that on the newly observed time series in real-time. The time series is windowed
follow it. The second HMM-based classifier accounts for the temporal cor- and smoothed before the signal in the window can be represented in the
152 Chapter 7. Process Fault Diagnosis 7.1. Fault Diagnosis Using Triangular Episodes and HMMs 153

fluctuation in the temperature of the inlet feed stream and maintain the
outlet concentration at a predetermined quality boundary. An abnormal
situation occurs when an unmeasured disturbance in the inlet feed con-
centration develops in addition to the inlet feed temperature disturbance.
The data collected consists of 50 simulations corresponding to 25 different
steady-state operating points of normal and abnormal simulations. The
simulations were about 1 hr long and the data were sampled at 0.1 min
intervals.
In half of the simulations, a step increase of 5% in the inlet feed temper-
ature was introduced; this change is deemed to be normal and can be easily
handled by the feedback control system. In the remaining 25 simulations,
this step change was used in addition to a 5% increase in the feed concen-
tration, resulting in an abnormal process trend that cannot be handled by
Figure 7.3. The three-state HMM, where the subscripts i, n and a denote in- the control system and leading to off-specification product.
termediate, normal and abnormal conditions, respectively. Reprinted from
[340]. Copyright © 1998 with permission from Elsevier.

symbolic form. Then, the sequence of letters in the window is classified


using the EM algorithm through two classifiers. Consequently, when an
unknown sequence is evaluated by each HMM classifier, the class of the
sequence will correspond to the model with the greatest probability (Figure liUllllc:ll . ~~-~~
-._.~

7.1).
The method will be illustrated by two case studies next.

.J,f' !lC'
-"
7.1.1 CSTR Simulation
The method has been demonstrated on a continuous stirred tank reactor
(CSTR) simulation to identify an abnormal inlet concentration disturbance
[340]. The jacketed CSTR, in which an exothermic reaction takes place,
is under level and temperature control. An important process variable is
the coolant flow rate through the jacket, that is related to the amount of
heat produced in the CSTR, and it indirectly characterizes the state of the
process. This variable will be monitored in this classification scheme. Figure 7.4. A training set for the CSTR example. The first window at
The trend analysis strategy will be shown to be able to differentiate be- the top shows the entire time series while the others indicate the moving
tween normal and abnormal responses of the coolant flow rate and is similar window. Thicker line indicates the abnormal trend. Reprinted from [340].
to the example used in the paper by Whiteley and Davis [325]. Here, three Copyright © 1998 with permission from Elsevier.
categories of classification are considered: normal, intermediate and abnor-
mal. An intermediate class represents a window of data that can move An example of the training classification is displayed in Figure 7.4 where
into the normal or abnormal classes in the next window and no definitive normal and abnormal situation simulations are also superimposed. The sec-
decision between normal and abnormal can be made during that specific ond window shows data progressing at a normal steady-state mode. \\Then
time period. For normal operation, the system is able to handle a a change in the system occurs, the coolant flow rate reacts to compensate
7.1. Fault Diagnosis Using Triangular Episodes and HMMs 155
154 Chapter 7. Process Fault Diagnosis

is expected at this step since the windowed sequences are difficult to dis-
for this deviation from the original steady-state. Initially, because there is tinguish between abnormal and normal situations due to the similarity of
no indication of whether this response is normal or abnormal based on the the local responses.
current information (third window), this window is considered to be in an
intermediate mode. Not until later can one differentiate between normal

t
and abnormal trends.

:I
Normal
Class
J
20 40 60 80 100
Normal
Class
.5

0 I'
20 100
Intermediate
Class
:1 t \~II
'~
.

:
20 40 60 80 100
Intermediate
Class
:1 N\
1
0 20 80 100
Abnormal
Class :1
0 20
if
if
II
40 60 80
1
100
I
Abnormal I
.5
Class I time (windows)
I
0 80 100
0 20 40 60
time (windows) Figure 7.6. The normalized probabilities after the time-correlated HMM
step. Reprinted from [340]. Copyright © 1998 with permission from Else-
vier.
Figure 7.5. The normalized probabilities after the sequence matching H-
MM step. Reprinted from [340]. Copyright © 1998 with permission from The time-correlated HMM is utilized to associate the individual se-
Elsevier. quences and eliminate the ambiguous assignments. 'When the time-correlated
~M~1 is. applied to the sequence, there is a dramatic increase in the pre-
As part of the analysis, each signal was converted into individual win- dIctIOn tIme-correlated probabilities of the true classes (Figure 7.6).
dows with a length of 6.4 min and the window moves in 36 sec intervals.
This results in 81 windows to yield a total of 4050 windows for the overall
simulation. All 4050 windows were converted into symbolic sequences using 7.1.2 Vacuum Column
fuzzified triangular representation as discussed before; 3159 of the sequences The vacuum column studied here is associated with the lubrication unit in
were used to train the HMMs, and the rest for evaluation. Three sequence ~he r..~izus~ima Refinery of the Japan Energy Corporation [341]. The goal
matching HMMs were created and trained using sequences belonging to IS to Identlf~ the weeping condition where the liquid is drained through
the particular event class. The evaluation set of sequences consists of five the perfor~tIOns due to low gas flow rates and hence causes instability in
normal and six abnormal simulations. Figure 7.5 displays the normalized the operatIOn of the column. The analysis uses temperature measurements
probabilities generated from the three sequence matching HMMs, from one from the tray 12 from the bottom of the column, T127, which has been
of the abnormal event simulations. The dark solid lines represent the true det~rmine? by the operators to capture the weeping dynamics. Figure 7.7
probability while the dashed lines represent the probability calculated by depIcts thIS temperature measurement corresponding to the normal condi-
the HMM. The y-axis for each plot is the probability (0 to 1) of belonging tion and the weeping conditions.
to that class and the x-axis represents the time in terms of the number of The window length for the temperature time series is taken as 64 min
windows. The simulation begins in the normal state and the disturbance and the wi~dow moves in 4 min intervals. For the test case, initially,
is introduced at window 17. The abnormal coolant flow rate should be de- the process IS assumed to be operating normally. At the 54th window, a
tected at window 26. The erratic behavior of the probability assignments
156 Chapter 7. Process Fault Diagnosis 7.2. Fault Diagnosis Using Wavelet-Domain HMMs 157

260 Two sets of HMMs were trained associated with normal and weeping
8r- 258
"'/~l.;JirH't/~.,,,,">V M 'V-~'l''''' "-I"'
,~

'~~~ "'-At \'J/~i/,'\


event classes using historical plant data and the test signal is evaluated
N
......
as before. Figure 7.8 shows the probabilities generated from the sequence
I- 256 matching HMM for the normal and the weeping conditions. Classification
254 after the sequence matching HMMs indicates a 8.5% misclassification of
o 200 400 600
the true class of the process versus the predicted class from the sequence
matching HMM. This percentage is calculated by associating a class to
260
Q: each window by choosing the sequence matching HMM with the higher
r-
N
258 probability and then comparing the results with the known classification.
f= 256

254 Start of Start of 2nd


0
I Normal co ndltlons weeping condition

time (min) /
/
1""'"."....-;--------h--.:'~- ---'::..,.._ __,
08
Figure 7.7. The temperature measurement at tray 12 for the vacuum col- Normal 0.6
umn. The top figure represents the normal conditions and the bottom figure Condition 0.4
represents the weeping conditions. 0.2
o0.l--~50(l-----,-1.",00'.-.,.."",----..",!::;;-~:h_--"-'_='""""""...",~--.~-_._I

weeping condition is detected that lasts until window 218 when the normal Weeping
operation is recovered. A second weeping condition begins at window 273 Condition
and lasts for about 164 window lengths.

Start of 1st Start of Start of 2nd


weeping condition Normai conditions weeping condition

Figure 7.9. The final probability of classification after the temporal corre-
lation HMM step.

When the time correlated HMM is introduced and the probabilities are
re-calculated, the results show a significant improvement (Figure 7.9). The
misclassification rate is reduced to 3.9%.

~::~~; jjllli lJ n D
o0.f-'l-"""-5""O'-"10*'0c---c1ti5"'0--"200 250 300
f'
350 400 450
7.2 Fault Diagnosis Using Wavelet-Domain H-
MMs
time (windows) A trend analysis strategy is proposed that takes advantage of the wavelet-
domain hidden Markov trees (HMTs) for constructing statistical models
of wavelets Section 6.5). Figure 7.10 depicts the strategy that can
Figure 7.8. The probability of classification after the sequence matching
be used to detect and classify faulty (abnormal) situations. As before, in
HMM step. the training phase, time series data collected under various conditions are
158 Chapter 7. Process Fault Diagnosis 7.2. Fault Diagnosis Using Wavelet-Domain HMMs 159

used to develop models. The monitoring phase, then, considers the o~-l~ne
signal(s) and determines the model that best explains it, thus classIfymg
the event associated with the model. 1st variable

2nd variabte

variable

Figure 7.11. A three-variable HMT showing both scaling and wavelet co-
efficient trees. From [286], reproduced with permission. Copyright © 2003
AIChE.

ate trend analysis. To illustrate the concept, a three variable multi-tree


structure is depicted in Figure 7.11. For each measured variable, there are
two tree structures joined together, one constructed by the scaling coeffi-
cients (light nodes) and the other by the wavelet coefficients (dark nodes).
The root nodes of each tree from a single variable are connected together.
Figure 7.10. The trend analysis strategy using wavelet-domain HMMs. The joint structure can be used for any single variable modeling, which
From [286], reproduced with permission. Copyright © 2003 AIChE. includes all frequency (scale) components in this specific variable. To limit
the computational complexity, only the tree of scaling coefficients is used.
In process monitoring, magnitude of a variable contains more information
The time series data are represented in the wavelet domain in the form
of the process, which is mainly described by its corresponding scaling co-
of scaling and wavelet coefficients. Ideally, modeling all these coe.fficients
efficients. Also scaling coefficients at each scale represent a smooth version
can O'lean all the information regarding the observed process operatmg con-
ditiobn. but will result in a large model tree structure, and increas~ ~he
of the signal with a different resolution, therefore it is not difficult to argue
that the scaling coefficients are sufficient for most of the process monitoring
computational effort in the training phase. It must be not~d t~at for dIffer- applications.
ent operating conditions, the scaling coefficients (approximatIon~ ~nd the
wavelet coefficients (detail) play different roles. Thus, for a specific tr~nd The root nodes of each tree structure are connected, corresponding to
analysis application, a different set of coefficients may be ~hosen, leadmg each variable under consideration. In each single tree, the deterministic
to a trade-off between classification accuracy and computatIonal cost. Un- trend information and the random factors are all accounted for. The ratio-
doubtedly, such a decision can be made a prior-i based on the nature of the nale behind using the multivariate tree structure is to be able to capture
fault. the correlations among variables. Here, the connection among variables is
The HMT model can also be extended from the single-tree structure arbitrary, and the apparent parent-child connection does not really imply
to the multi-tree structure (MHMT), which is then used in the multivari- the parent-child dependence, but it is just a way to model the relation be-
160 Chapter 7. Process Fault Diagnosis 7.2. Fault Diagnosis Using Wavelet-Domain HMMs 161

tween two nodes. In principle, the multi-tree structure can be expanded • Furthermore, in the multivariable problem, while three to five vari-
indefinitely, but the computational complexity may restrict the number of ables can be handled relatively easily, one reaches a computational
variables. bottleneck for larger problems. This can be possibly resolved by con-
The EM algorithm for the single-variable case also applies in a straight- sidering some of the new developments in HMM training algorithms
forward manner to the multi-tree structure, since the binary structure re- [254, 71].
mains unchanged in this structure expansion. For on-line process monitor-
Two case studies are presented next to illustrate this strategy.
ing, moving window is again used in the multivariable case. The complexity
of the computation increases by the number of trees compared to the single
tree at the same window size, but the multivariable system contains more 7.2.1 pH Neutralization Simulation
structural information, which makes it possible to reduce the window size, The simulation of a pH neutralization process has been previously studied
and therefore, keep the computational complexity the same as or less than by Galan et al. [81]. An acid stream (HCl solution) and an alkaline stream
the single variable case. In principle, one can use different window sizes (NaOH and N aHC0 3 solution) are fed to a 2.5 L constant volume, well-
for variables, but the same number of data points needs to be used to each mixed tank, where the pH is measured through a sensor located directly in
variable to keep the same w-eight of contribution to the MHMT from each the tank. The pH value is maintained at 7.0 by a PI controller.
variable. Using different window sizes results in having a monitoring delay
corresponding to the time of the longest window size used in the model, 06 r--~r------.---~--r-~-~~--~, ---r---
but it may reduce the computational complexity if a longer window size is ,,
,
needed for some of the variables. Longer window size is usually suggested 3 4 5 6
0.4
,:, 7
for the slow dynamics, so fewer data points within a certain time period ,,
would be enough to characterize the variable trend. Similarly, a smaller ,,,
window size is preferred for the fast dynamics, so more data points within pH ,
a certain time period would be needed. The monitoring strategy is carried ~~
,,
out in the same manner as in the single-variable case. ,,
While on-line implementation of the trend detection strategy is indeed ,,,
,
feasible, the training of HMT models may be rather time consuming and ,i
inefficient. We note the following on the implementation issues: "",,,,..t'..,
,,
• The presence of local minima encountered in the EM algorithm limits ,,,
,,
the complexity of the problems that can be tackled.
3000 3500 4000
• The amount of data required for training is problem dependent, and
especially when process events have somewhat similar features, more time
data would be required for training to ensure the development of
models that can capture the subtleties associated with each event. Figure 7.12. The test pH response showing regions of operation. From
[286], reproduced with permission. Copyright © 2003 AIChE.
• A smaller window size reduces the computational burden in training
and can improve detection time, yet the classification performance When an unexpected event occurs, the controller may not maintain
may deteriorate as trends may not have developed distinct features the pH value within the allowed range of operation and its performance
in a short time. degrades, thus resulting in an abnormal (faulty) operating condition. Here,
• Moreover, combining detail and approximation coefficients to build four distinct situations, other than the normal operating condition, will be
HMT models would be a natural next step in process trend detec- considered as follows:
tion, but this extension is hampered by computational difficulties as • Abnormal Condition I (AI): The pH value shows a sustained deviation
mentioned before. of more than =fO.5 (region 2 in Figure 7.12), which could result from
162 Chapter 7. Process Fault Diagnosis 7.2. Fault Diagnosis Using Wavelet-Domain HMMs 163

a large and sudden change in either the flow or the pH value of the 70
acid stream as a result of changes in the upstream process. r- r---
60

• Intermediate (I): The pH value indicates deviations (region 3 in Fig-


~. 50
ure 7.12), which could result from the same source as Abnormal I, but 'iii
the deviation remains within =F0.5. This region may act as a warning ~ 40
-a
(buffer zone) for an imminent change in normal operating conditions. C
';:: 30

• Abnormal II (All): The pH value exhibits high amplitude, high fre- 20


quency oscillations (region 4 in Figure 7.12), which could be the out-
0
come of a sensor failure or other equipment malfunction, such as pump
cavitation. 0
o 500 1000 1500 2000 -
2c.OO 3000 3500

• Abnormal III (AlII): The pH value increases slowly and reaches a lime

maximum point, then comes back to 7.0 slowly (region 6 in Figure


7.12), which could be the result of a temporary sensor drift. Figure 7.13. The variation of window sizes as a result of the adaptive
algorithm. From [286], reproduced with permission. Copyright © 2003
All other operation around pH 7 are considered as normal (N). AIChE.
A moving window is used to analyze the data as before, and especially
since only a limited number of data points can be considered at one time
To keep the analysis simple, two window lengths are considered, a 32-
for the wavelet decomposition. If a short window size is chosen, one may
point window and a 64-point window. After training, five sets of models
capture process changes quickly, but the window may not contain enough
for different operating conditions under 32- and 64-point windows are ob-
information to sufficientlv reflect the current process state, thus generating
tained. Spectral analysis based on Thomson's multitaper method [292] was
ambiguous classification;. Longer window sizes can consider more informa-
used to differentiate the short- and long-term signal behavior, then the ap-
tion, which is helpful to recognize the process trend, but may lead to large
propriate window size is chosen to analyze the test signal, since the high
time delays for the detection and classification of trends. Here, an a~ap­
frequency components are more important in short-term behavior and the
tive window size is implemented that uses a short window size for rapIdly
low-frequency components dominate in long-term behavior. ·Window size
changing data and a long window size for longer lasting phenomena, ba~ed
selection for the test signal is depicted in Figure 7.13.
on the spectral analysis of the signal. The window is moved every samplmg
Figure 7.14 depicts the classification result from likelihood determina-
time. Sixteen separate simulation runs are used, fifteen for training purpos-
tion, comparing the true and calculated probabilities. It can be observed
es and the last one (depicted in Figure 7.12) for testing the methodology.
that the HMT model yields the correct classification for most part of the
There are 1200 data points in each simulation set for training and 3755
test signal. Following observations can be made:
data points in the simulation for testing. The data were sampled every 45
sec.
• The abnormal condition AI (disturbance) and the abnormal condition
In this example, the Haar wavelet is used with a single scaling tree and
AlII (sensor drift) can be recognized clearly (Figures 7.14c, 7.14d).
the scaling coefficients are modeled using a two-component (iVI 2) HMT
model with nonzero mixture means. The models were trained using multi- • The instances of misclassification between sensor noise (All) and in-
ple observations (without tying). The wavelet coefficients were not modeled termediate operating condition (I) (Figures 7.14b, 7.14e) are notable.
since the approximate signal contains more distinguishing features among As the level of noisy signal momentarily matches the level of the
the studied abnormalities, as the primary goal in this study is to detect if signal in the intermediate region, the method results in misclassifica-
the pH deviation is beyond the tolerable limit ±0.5. In other words, the de-
tion. Yet, since these misclassification instances are rather isolated,
cision depends more on the information provided by the scaling coefficients the overall trend can still be inferred. Nevertheless, the model may
than the wavelet coefficient. need further training to eliminate such instances.
164 Chapter 7. Process Fault Diagnosis 7.2. Fault Diagnosis Using Wavelet-Domain HMMs 165

probabilities sensor drift


Any deviations from the assumed normal operating condition are con-

probabillMS of normal op""tng condlllOn '1_~_~~ D= I (ol


sidered as abnormal operating conditions, which would need the operator's
attention. Here, four abnormal operating conditions are defined and two

o
.. rr:1,
O ~lli- U _
05 ,
0
500 1000 1500 2000
. . -
2500 3000 3500 t
manipulated variables, qj and qe, are monitored. Tested cases are summa-
rized below.

o 500 1000 EOO 2000 2500 3000 3500

prcbabilities of intermediate operatingconditiol1 .:0 probabilities of acid flow rate disturbance

(cllj
• In response to a sudden increase in inlet concentration CAj (AI), qj
decreases and qc increases.

05
[IllLL
o 500
i 'I: \
1000 1500 2QOO 2500 30GO
'
3500 t
o 500 1000 1500 2000 2500

prohabilities of pH sensor noise


3000 3500 .
• In response to a sudden decrease in inlet concentration CAj (All), qj
increases and qe decreases.

• A sudden decrease in the pre-exponential factor (AlII) due to an

·:11
o 500
OIl
1000
I
1500 2000 2500
I (.'l,
3.000 3500
unmeasured component variation in the inlet flow, both qj and qe
decrease.

• The flow sensor for qj drifts high (AIV) without affecting the process,
Figure 7.14. The classification results for different classes of operating con- so other variables, including qe, remain unchanged.
ditions. Solid lines represent the true probabilities while the light dashed
Three simulations of each operating condition are carried out for model
lines are the calculated probabilities. From [286], reproduced with permis-
training. One simulation under each operating condition is used to test
sion. Copyright © 2003 AIChE. the monitoring result. Each test data set includes the transient process
from normal operating condition to the abnormal operating condition. The
• The brief misclassification between the normal condition (N) and the results use an 8-point window and the Haar wavelet. The test results are
sensor noise (All) (Figure 7.14a, 7.14e) is due to the switching of the shown in Figures 7.15, 7.16, 7.17, and 7.18. To simplify the model structure
window size from 32 to 64, and as the number of data points increases and therefore to reduce the computational effort, only scaling coefficients
suddenly, this causes the method to consider the new signal section are considered in this case study. Following remarks are in order:
as noisy. As the window moves forward, this misclassification error is
• The method classifies AI correctly (Figure 7.15) with a small delay.
corrected immediately. During the delay, AI is temporally misclassified as AIV due to the
similar response of these two abnormalities in the beginning.
7.2.2 CSTR Simulation
• The method classifies All correctly when the process nears steady-
A constant volume, continuous stirred tank reactor (CSTR) is simulated to state (Figure 7.16). There is a brief period of misclassification during
demonstrate the multivariate trend analysis strategy. In the CSTR, a sin- the transient between All and AlII due to the similar responses of
gle irreversible, exothermic reaction, A -+ B, is assumed to occur and the the two monitored variables. With more process information, the mis-
model equations are given in [232]. The disturbances are the feed concen- classification can be possibly avoided. As in (a), there is a temporary
tration and temperature, CAj and 11, respectively, and the control outputs misclassification between All and AIV.
are the tank concentration and temperature, CA and T, respectively. The
two outputs are controlled by two PI controllers via feed and coolant flow • The method classifies AlII correctly when process nears steady-state
rates, qj and qe, respectively. White Gaussian noise is added to the outputs (Figure 7.17). There is a short period of misclassification between N
to simulate the real process signal. The sampling interval is assumed to be and AlII during the transient, which is considered as N initially.
O.lmin. The normal operating condition (N) is taken as the steady-state,
CAs = CA(O) = 8.235 x 1O-2 mo l/ L;Ts = T(O) = 441.81K. • The method classifies AIV correctly (Figure 7.18).
166 Chapter 7. Process Fault Diagnosis
7.3. Fault Diagnosis Using HMMs 167

50
, 60
, 70 80 90
, 100 min

IAIV)

10 20 30 40
,
1
50 60 70 80 90 100 min
Figure 7.15. The classification of the AI abnormality. Each figure corre- •

sponds to the probability assignment made with respect to a mo.del .class.


The solid line is the expected probability and the light dashed lIlle IS the Figure 7.16. The classification of the All abnormality. From [286], repro-
calculated probability. From [286], reproduced with permission. Copyright duced with permission. Copyright © 2003 AIChE.
© 2003 AIChE.
whether the mean values of HMM state variables for each measured pro-
From the results above, one can conclude that the MHMT method for cess variable have changed significantly from their in-control values. Then,
trend analysis correctly classifies the different process operating conditions. by monitoring the changing trends in the HMM states, one can identify the
There are a few ambiguities in the transient region, which result from the faults that caused the variation in behavior. Here, the whole data set that
similar responses of manipulated variables to different process events. In contains all the faults that need to be diagnosed is used for developing the
other words, the information contained in two variables is not enough to HMM. The following section demonstrates this strategy.
immediately discern the transient part of the process. To eliminate such
ambio-uities additional variables may be needed in the HMT model.. Sun et
b ,
al. [286] have shown the extension of the method to the multivanate case 7.3.1 Case Study of HTST Pasteurization Process
for the CSTR simulation example. The process of HTST pasteurization (Figure 5.5) is described in detail in
Section 5.3. The variables used here are four temperature measurements
(OC) and two PID controller outputs (rnA). The hot water temperature,
7.3 Fault Diagnosis Using HMMs preheater outlet temperature of raw product, holding tube inlet temper-
Hidden Markov models (HMMs) provide a powerful framework for recogniz- ature of pasteurized product and holding tube outlet temperature of pas-
ing patterns in data and diagnosing process faults as shown in the previous teurized product are the output variables of the process (variables 1-4,
sections. Here, another procedure is introduced that is based on the state respectively). The input variables of the process are the PID controller
estimation problem (see Section 6.4.2). The procedure determines first output to the steam valve (variable 5) that regulates the holding tube inlet
temperature of product and the PID controller ontput to preheater hot wa-
168 Chapt er 7. Proces s Fault Diagn osis 7.3. Fault Diagn osis Using HMM s
169

(Nl
1
,;1·· . .
o
,:1 '1 '1
30
i
40 50 50
i
70 80 90
i

(AI)
100 min

1
o0~_..:.;10::. . ._~20;:----,,3;:-0 _...;.4;:-0_...;.5:,:.0_ _5:.,.0_ _7..,.0_ _3..,.0_--t90_~100 min 35 min

~"'I 1

,:r ,''''I
J- --- --- --- --- --- 1
o
':f .
10

I:...
20 30 40 50
.--... .<.-- ... - - .
&0 70 3D 90
m . - - • • •, : : ;
100 min
J
30 35

10 20 30 40 25

o~ f'
50 50 70 80 90 100 min

," ~AI~ 1
00 10 20 30 40 50 60 70
o 5 10 15 20 25 min
80 90100 min

Figure 7.17. The classification of the AlII abnorm ality. From [286], Figure 7.18. The classification of the AIV abnorm ality. From [286],
repro- repro-
duced with permiss ion. Copyri ght © 2003 AIChE .
duced with permission. Copyri ght © 2003 AIChE .

The aim of the fault diagnosis strateg y is to determ ine whethe r the
ter valve (variable 6) that regulat es the prehea ter outlet temper ature mean
of raw values of HMM state variabl es for each process variabl e change signific
produc t. A series of sensor and actuato r faults were tested on the antly
process, compar ed to in-cont rol HMM states. The increas e in probab ilities
and the experim ents were conduc ted with different fault magnit of these
udes and states is investi gated when corresp onding faults occur during
duratio ns. operati on.
After constru ction of a HMM for the given data sequence, the mean
Sensor failures are deliber ately introdu ced using the process control values
of M state variables (1-£ matrix ) are checked for an abnorm al increas
compu ter software. To accomp lish this, a real numbe r is introdu e or
ced to decreas e in the state variables. Then, the probab ility values which
the actual sensor reading , which is transm itted to the compu ter these
from the particu lar states take during a specific time period are examin ed
process. Instead of the actual sensor reading , the altered 'readin g' using the
is sent to I matrix (Eq. 6.46) to determ ine the time and duratio n of
the PID controllers. As the control lers compu te the control action faults.
based on First, the perform ance of HMMs in represe nting HTST data is assesse
the false sensor reading , the process receives a false correct ive action d
and using the model residua ls and the correla tion coefficients of observe
the fault origina ted by the sensors propag ates throug h the system d and
. The estima ted values of process variables. This is done by checking
magnit ude of sensor faults varies betwee n -0.83° 0 and 0.83°0 , the nor-
and their mality of residua ls of some import ant process variabl es (e.g., holding
duratio n change s betwee n 2 sec and 30 sec. tube
inlet temper ature and steam valve setting , variabl es 3 and 5, respect
To implem ent actuato r faults, the control lers are turned off for a specific ive-
ly). It appear s that the residua ls are autoco rrelated most of the
time period which results in a constan t signal being sent to the time. The
actuato rs. normal ity proper ty is affected by the extrem e values of faulty signals
During the implem entatio n of this class of faults, when the control since
led vari- the model may not perform well to estimat e the measur ements at
ables deviate from the set-poi nts, the control ler and the actuato times of
r cannot fault implem entatio n. If the observa tion sequence contain s many
respon d until the implem entatio n is over. outliers ,
the residuals will likely not belong to Norma l distribu tion.
170 Chapter 7. Process Fault Diagnosis 7.3. Fault Diagnosis Using HMMs 171

K= 50, T= 130 K= 50, T= 130


2rr-~~~~~~---,----.~----'
Table 7.1. Performance of HMMs with different number of states.

!~~~~W
M SPE r (holding tube inlet) r (steam valve) '"
30 112.87 0.9667 0.9374 ~" o~
Cil
>
50 60.95 0.9890 0.9825 -1

70 33.69 0.9971 0.9969


10 20 30 40 50 10 20 30 40 50
90 28.57 0.9972 0.9939 K= 50, T= 130 K= 50, T= 130

As noted before, the number of HMM states (M) is an indicator of model


performance. Low M values are not considered for HTST pasteurization
data because they possess poor predictive capabilities. On the other hand,
10 20 30 40 50
large NI values lead to increased computational effort and can cause over- K=50, T=130
10 20
K=50,
30
T= 130
40 50

fitting of the data. Table 7.1 shows the squared prediction errors (S P E) of
a data set by HMM with different AI values. Table 7.1 also depicts the r
values (correlation coefficient) for the holding tube inlet temperature and
the steam valve opening. When a large M is used, some states become
too specific for certain process behavior. For example, each temperature
increase in holding tube inlet sensor can be represented by different states 10 20 30 40 50
even though there are other faults with the same magnitude causing similar
reactions in the system. Real process data may show strong autocorrelation, Figure 7.19. Mean values of HMM states for the normal operating condi-
cross correlation and also include noise. In those cases, HMMs may require tion.
high numbers of states to truly capture the process behavior. For the HTST
data, 50 states were selected for modeling.
Sensor faults are introduced in the holding tube inlet temperature sen- data sequences with steam valve actuator faults are displayed in Figure
sor. This is the controlled variable of the process, thus, any deviation in 7.20 (Case I). Mean values of states for steam valve signals (variable 5)
its measurements causes the controllers to respond to that change and in- vary between -5 and + 7 approximately. Therefore, HMM state variables
fluence process operation. The actuator faults are introduced in the steam can be monitored to determine their behavior with respect to their mean
control valve. This is the manipulated variable, thus, any fault would cause values for particular process variables under various operating conditions.
all process variables to behave abnormally depending on the magnitude and Table 7.2 depicts the magnitude, duration and time of occurrence of six
duration of the fault. When there is a fault in the holding tube inlet sensor, different faults implemented to the steam valve for the first case study.
the steam valve that supplies the heating medium into the system responds The length of data sequence T used for HMM development is 114. Figure
right away. The hot product is then introduced to the holding tube. For 7.20 shows the mean values of HMlVI states for the data sequence collected
consumer health purposes, the temperature at the exit of the holding tube under faults implemented to the steam valve as given in Table 7.2. State 36
is critical and it is very strongly correlated with the holding tube inlet tem- of steam valve signals (variable 5) in Figure 7.20 has the highest mean value
perature. Any kind of fault related to both holding tube inlet sensor and among the HlVIM states and state 44 has the lowest mean value. Figure
steam valve may compromise consumer health and thus needs to be closely 7.21 shows the probability values associated with these particular states.
monitored. It is clearly seen that state 44 indicates faults in which the steam valve
Figure 7.19 displays the mean values of HMM states of process variables signals are low, i.e., 6 mA. On the other hand, state 36 indicates the faults
for data collected under normal operating conditions. For example, mean in which steam valve signals are higher, 12 mA. HMM state 6 has large
values of states for steam valve signals (variable 5) change between -2 and values among the states of both hot water temperature sensor (variable 1)
+2 approximately. Mean values of HMM states of process variables for and holding tube inlet temperature sensor (variable 3) (Figure 7.20). This
172 Chapter 7. Process Fault Diagnosis 7.3. Fault Diagnosis Using HMMs 173

K=50, T=114 K=50, T=114

I~~
(a)
2
I
0:[ j
10 20 30 10 20 30 40 50
0
0 200 400
I600 800
II
1000

I
1200

K= 50, T= 114 K= 50, T=114 ,


~~,~
(b)
2 2
(')

'e<J" ,
I
II
\ IWJ! ;°:1
:0

>
-1 I I
0 200 400 600 800 1000 1200
-2 30 40 50
40 50 10 20
K= 50, T= 114 ,

"''"
:0
.@

10 20 30 40
5

50
<0

:g'"
'>::
~ -1
0

-2
fN~vM
10 20 30 40 50
1°:1 0 200 400 600 800 1000
[j 1200

Figure 7.21. Probabilities of states 44, 36 and 6 for steam valve actuator
Figure 7.20. Mean values of HMM states for steam valve actuator fault. fault.

state gets a low mean value for steam valve signals (variable 5 in Figure 43 during process operation. State 43 indicates the faults with magnitudes
7.20). As seen in Figure 7.21, state 6 gets high probability value after fault of 0.83°0, which cause stronger responses than the faults numbered 1, 3,
6 (magnitude of 12 rnA with duration of 8 sec in Table 7.2), w~ich causes and 5 with fault magnitudes of 0.39°0 in Table 7.3. State 35 of steam
an increase in both temperature sensors. This is a typical behavIOr when a valve (variable 5), which has the largest value, indicates all faults except
fault occurs in the steam valve of the HTST pasteurization system. Since the first one (Figure 7.23b). The probabilities become 1 during the time
the steam valve opens up to 12 rnA and introduces large amounts of steam intervals after the faults occur. This state contributes to situations where
into the heater, the hot water temperature and consequently the product the steam valve opens and injects additional steam just after the closing
temperature in holding tube inlet increase. action because of increasing temperature in the holding tube inlet sensor.
In the second case study, the faults in holding tube inlet temperature The second largest mean value belongs to HMM state 13 among the hold-
sensor (variable 3) are investigated (Case II, Table 7.3). For ~he HMM ing tube inlet temperature states. The final plot in Figure 7.23c shows the
development, the length of data sequence, T, is taken as 131. FIgure 7.22 probability values of state 13 during the operation. This state indicates
shows the mean values of HMM states for six process variables. State 43 faults 2, 3, 4, 5 and 6. The first fault in the holding tube inlet is hard to
of steam valve signals (variable 5) has the lowest value among all HMM detect since it does not cause significant deviation in any of the process
states and the highest value for the holding tube inlet temperature sen- variables. In the case of holding tube inlet sensor faults, the same HMM
sor (variable 3). Whenever a fault occurs in the holding tube inlet sensor, states contribute to changes in the holding tube inlet temperature measure-
the steam valve responds promptly. Consequently, the same HMM state ments and steam valve actuator behavior. This situation is not observed
variable represents the changes in both variables. Since the faults in th.e in HMM of data sequences with the faults in steam valve (Case I).
sensor show positive magnitudes, they cause a reduction in signal magm- The HMM strategy can be modified by considering a moving window
tudes to the steam valve. Figure 7.23a shows the probability values of state to detect changes with low magnitudes and duration. It has been shown
7.4. Fault Diagnosis Using Contribution Plots 175
174 Chapter 7. Process Fault Diagnosis

K= 50, T= 114 K= 50, T= 114

Iv \ I 1
Table 7.2. Steam valve faults.
,:1 I
Fault
1
2
3
4
Fault Time (sec)
266
418
570
724
Steam Valve Signal (mA)
6.0
12.0
6.0
12.0
Duration (sec)
2
2
4
4
:5
<1l

~
0
-1

-2

10 20 30
K= 50, T= 114
40 50
It ~ V~ NirJ t
10 20 30
K= 50, T= 114
40 50

~:I
IV \jWv~0A \ i>r:I ~l'1n~JvV\~~J\(\
5 878 6.0 8 j
6 1036 12.0 8
A, J 1

Table 7.3. Holding tube inlet temperature sensor faults.


~ 0 I
I -2
,v \I
I
I
\
I
10 20 30 40 50 10 20 30 40 50
Fault Fault Time (sec) Temperature Signal (OC) Duration (sec) K= 50, T= 114 K= 50, T= 114

1
2
63
215
+0.39
+0.83
2
2 U?
i o~
1[ l \/1~ 6
1n

~::I v~w~ I~~


'"
:0 II 1\ 1\ /1
3 367 +0.39 4 <1l
.~

4 521 +0.83 4 >

5 675 +0.39 8
10 20 30 40 50 10 20 30 40 50
6 833 +0.83 8
Figure 7.22. Mean values of HMM states for holding tube inlet temperature
by Tokatli and Cinar [296] that such a strategy performs better than fault sensor fault.
diagnosis methods that are based on parity-space as well as state-space
identification [143]. The contribution of output (process) variable Yj on state variable Xi at
time k is

conti j = -'-TRANi jY~ (7.1)
7.4 Fault Diagnosis Using Contribution Plots 'Esti,i ' 'JK

!RAN is used to calculate state variable vector x by utilizing Yj{ which


When T 2 or SPE charts exceed their control limits to signal abnormal pro- IS composed of K past values of the process variables. Based on Eq. 4.67,
cess operation, variable contributions can be analyzed to determine which matrix TRAN is given as
variable(s) caused the inflation of the monitoring statistic and initiated
the alarm. The variables identified provide valuable information to plant (7.2)
personnel who are responsible for associating these process variables with
process equipment or external disturbances that will influence these vari- The total contribution of variable Yj:
ables, and diagnosing the source causes for the abnormal plant behavior. n
2
The procedure and equations for developing the contribution plots was p- CONTjT = L......t
" " ' conti,j (7.3)
resented in Section 3.4. i=1
The decomposition technique given in [146] can be extended to the T 2
where j = 1, ... ,p number of process variables and n is the number of state
and SPEN values of state variables. The state variables are calculated by
variables. Unlike the computation of score variables in PCA, the state
Eq. 4.67 in which the past data vector is used. When the T 2 or SPE chart
variables at each time are calculated by using not only the present value
of the state variables gives an out-of-control signal, contribution plots can
of the process variables, but also the past values of the process variables
be inspected to find the responsible variable for that signal.
176 Chapter 7. Process Fault Diagnosis 7.4. Fault Diagnosis Using Contribution Plots 177

HMM with K=50, T=114 The contribution of each process variable is determined and plotted on a
bar graph to decide which variable(s) caused the inflation of the SPEN
value at that particular time.
I Investigating the dynamic pattern of contribution plots is more effec-
0:1 I I 400 500
I 600 700 800
I 900
lI
1000
tive in fault diagnosis rather than the single snapshot of contributions of
variables at a particular time. The contributions can be plotted over time,
0 100 200 300
Time (Seconds) following an alarm signal in MSPM charts. The variation of the contribu-
11 I
tions over time can also be summarized by plotting the sum of the contribu-
I tions over a time period [301]. Rapid detection of the variables responsible
I
i 1
"'
05 j for inflating the monitoring statistics is necessary because the contribu-
tions smear over time as the effects of the abnormality spreads over other
01
0 100
I
200 300
I
400 500
I 600
I700 800
I 900 1000
variables. On the other hand, inspection of contributions over a period is
desirable to filter out instantaneous spurs caused by measurement noise or
Time (Seconds)
errors. In the test case summarized below a sequence of contribution plots

~ 0:[
~
I

I
I

II
. following the detection of an abnormal situation is given to illustrate the
smearing over time.
II
I Example Consider the HTST pasteurization example discussed in Sec-
0
0 100
I
200 300
L
400 500
I 600
I700 800
1\
900
I
1000
tion 5.3. In all contribution plots, the six process variables are hot water,
preheater, holding tube inlet and holding tube outlet temperatures, and
control signals to steam valve and preheater valve shown as the first. sec-
Figure 7,23, Probabilities of state 43, 35, and 13 for holding tube temper- ond, third, fourth, fifth and sixth process variable in the horizontal' axis.
ature sensor fault. respectively. The vertical axis is the total contribution of each process vari~
able. The same fault is repeated at different times for the same duration
but with increasing magnitude. For the steam valve fault, T 2 chart did
over the past data window K, The total contribution of each variable on
not alarm the faults or alarmed the faults later than the SPE charts or
state variable Xj is calculated by summing up all the contribution of the
the alarm signal persisted for shorter periods of time (Table 5.1). The T 2
variable coming from its past values. This procedure is repeated for all
chart alarmed the third and fourth faults later than the SPEN chart. The
state variables Xi (i 1, ... , n).
contribution plots of T 2 (Figures 7.24 and 7.25) showed that the holding
The computed values of the contributions of each process variable and
tube inlet temperature sensor and the hot water temperature sensor (vari-
its past values on all the state variables are plotted on a bar chart. The pro-
able 1) caused the alarms. Since the out-of-control situation in T 2 chart is
cedure is repeated for all process variables Y j (j = 1, ... ,p). Their contri-
for 2 and 3 sampling times, the information gathered from the contribution
butions are plotted on the same bar plot to decide which variable(s) caused
plots did not help to diagnose the fault in the steam valve. They did not
the out-of-control alarm in the multivariate T 2 chart of state variables. Use
provide information about the other process variables either. The variable
of state variables in SPM and their contribution plots are introduced and
contributions on SPEN for the first and second faults in the steam valve
illustrated in [211] and [219], respectively.
(Figure 7.26 and 7.27) showed that the contribution of hot water temper-
The contribution of process variable Yj on the SPEN statistics at time
ature sensor (variable 1) leads for 2 or 3 sampling times, then the holding
k is determined by dividing the squared error associated with jth variable
tube inlet temperature (variable 3) follows it. However, the contribution
by the S P EN value at time k:
plots did not show the steam valve (variable 5) as a contributor even in the
later sampling times.
CONT sPE = [( eJ -e)2/I:
J ej,j
]
(7.4) In the third fault at time 741 in steam valve fault, the contribution
J SPEN
plots of SPEN showed the holding tube inlet temperature sensor as the
where ej and I:ej,j are the mean and variance of jth error term, respectively. cause of the alarms (Figure 7.28). In the fourth fault at time 961 in steam
178 Chapter 7. Process Fault Diagnosis
7.5. Fault Diagnosis with Statistical Methods 179

(a)
(a)

o'"
.c t

o'"
.c
"F-
C
o "F-
C
"'
C
.12
o
S "'
C
.Q
:§ S
E .0
'C
o
o E
o
o
~ <i'i

(b) ~

(e)

3
Process Variables Process Variables

Figure 7.24. Contribution plots of T 2 for the steam valve fault 3 (Table ~igure 7.25. Contribution plots of T 2 for the steam valve fault 4. Sampling
5.1). Sampling time of the snapshot after the fault is introduced (a) 40, tIme .of the snapshot after the fault is introduced (a) 19, (b) 20, (c) 21.
(b) 41. Reprinted from [143]. Copyright © 2001 with permission from Repnnted from [143]. Copyright © 2001 with permission from Elsevier.
Elsevier.

d.iagnosis b~ ~sing real-t.ime knowledge-based systems (KBS). The integra-


valve fault I, just after the fault was introduced, SP EN chart gave an tIOn of s~atJstlcal detectIOn tools, contribution plots and fault diagnosis by
out-of-control alarm which was caused by holding tube inlet and holding a supervIsory KBS has been illustrated for both continuous [219] and batch
tube outlet temperature sensors according to the contribution plots (Figure processes [302, 303].
7.29). Obviously, this can not be caused by the fault in the steam valve.
After 6 sampling times, contribution plots showed that the reason for the
alarms in SP EN chart after time 961 is the hot water temperature sensor 7.5 Fault Diagnosis with Statistical Methods
and then the holding tube inlet temperature, which is the expected result
of a fault in the stearn valve. Contribut.ion pl~ts presented in Section 7.4 provide an indirect approach
Fault diagnosis of the HTST pasteurization system has also been con- to fault dIagnosIs by first determining process variables that have inflated
ducted by using parity relations [143], providing a comparative illustration the d~tection statistics. These variables are then related to equipment
of the use of HMMs (Section 7.3.1), contribution plots and parity space. and dIsturbances. A direct approach would associate the trends in process
The parity-space-based diagnosis issued alarms for the faults at the same data to faults explicitly. HMMs discussed in the first three sections of
time or after the multivariate charts in this case study and indicated the this chapter is one way of implementing this approach. Use of statistical
reasons behind the out-of-control alarms [143]. A fault diagnosis system discriminant analysis and classification techniques discussed in this section
that uses several of these techniques simultaneously and integrates their and in Section 7.6 provides alternative methods for implementino' direct
fault diagnosis. b
findings by using a decision maker seems more powerful than any single
technique used. \\Then a process can be represented by a few PCs, the biplots of PCs
Analysis of contribution plots can be automated and linked with fault and S P E provide a visual aid to identify data clusters that indicate normal
operation or operation under a specific fault (Figure 5.1). An integrated
180 Chapter 7. Process Fault Diagnosis 7.5. Fault Diagnosis with Statistical Methods 181

1 1,
- I

JJ - (b) (a) (b)


(a)

"10
0.5 ul'

":W
0.. I 0..
(f) 1 (f)

§"'
Q)
J5
0 I
L...L---l-=I:l.-~ -l
0
1 . r---]

3
c
0

"'
Q)
J5
OJ
1
0=
3 4 1
=
4
OJ
.~ .~ 1 1
> (d)
>
(e)
"'g; "''"
1

JnD=
1 (d)

,:1 I
Q)

0
I
"0..2 "2 05

0=
a. 1
'0 '0
"'c
.2
'5
0
ii c"'
.2
'5
.0
1
I ":1 0
.0
E ."E 1
c 0
o ()
() (f) 1[ (e) 1 (f)
I

Do
0.5 0.5

':1 =
0]

==0
1 2 3 1

Process Variables Process Variables

Figure 7.26. Contribution plots of SP EN for steam valve fault 1 at (Table Figure 7.27. Contribution plots of SP EN for steam valve fault 2. Sampling
5.1). Sampling time of the snapshot after the fault is introduced (a) 7, (b) time of the snapshot after the fault is introduced (a) 25, (b) 26, (c) 27, (d)
8, (c) 9, (d) 10, (e) 11, (f) 15. Reprinted from [143]. Copyright © 2001 28, (e) 29. Reprinted from [143]. Copyright © 2001 with permission from
with permission from Elsevier. Elsevier.

statistical method was developed for processes that need to be described The implementation of the FDD system at each sampling time starts with
by a higher number of PCs or for automation of diagnosis activities by monitoring. The model describing NO is used with new data to decide
utilizing PCA and discriminant analysis techniques [242]. PCA is used to if the current operation is in-control. If there is no significant evidence
develop a model describing normal operation (NO). This PC model is used that the process is out-of-control, further analysis is not necessary and the
to detect outliers from NO, as excessive variation from normal target or procedure is concluded for that measurement time. If score or residual
unusual patterns of variation. Operation under various known upsets is also tests exceed their statistical limits, there is significant evidence that the
modeled using PCA provided that sufficient historical data are available. process is out-of-control. Then, the PC models for all faults are used to
These fault models are then used to isolate source causes of faulty operation carry out the score, residuals, and/or angle tests, and discriminant analysis
based on the proximity of current process operation to one of the data is performed by using PC models for various faults to diagnose the source
clusters indicating a specific fault. Using PCs for several sets of data under cause of abnormal behavior.
different operating conditions (NO and with various upsets), statistics can The method was developed for monitoring continuous processes devi-
be computed to describe distances of the current operating point to regions ating from their steady-state operation and determining the most likely
representing other conditions of operation. Both scores distances and model source causes from a closed set of candidate causes. Stationarity, ergod-
residuals are used to measure such distance-based statistics. In addition, icity and lack of significant autocorrelation should be established before
angle-based criteria can also be used. The FDD system design includes the utilizing this method. The method does not rely on visual inspection of
development of PC models for NO and abnormal operation with specific plots; consequently, it is suitable for processes described by large sets of
faults, and the computation of threshold limits using historical data sets variables. The method was illustrated by monitoring the Tennessee East-
collected during normal plant operation and operation under specific faults. man industrial challenge problem [58].
182 Chapter 7. Process Fault Diagnosis 7.5. Fault Diagnosis with Statistical Methods 183

0.8
(a)

Il
0.6

0.4

u.f
0-
W
0.2f
:
I i
c 0
a 2 3 4 5 6 456
<1)
Q)
:0 0.8
'"
il
'§ (e)
0.6 (d)
>
<1)
<1)
Q) 0.4
e'-'
0- 0.2 I i
0
<1) '=
c 2 3 4 5 6 23456
.9
'5
."C
.0

a
U

2 3 4 5 6 2 3 4 5 6

Process Variables

Figure 7.28. Contribution plots of SP EN for steam valve fault 3. Sampling Figure 7.29. Contribution plots of SP EN for steam valve fault 4. Sampling
time of the snapshot after the fault is introduced (a) 11, (b) 14, (c) 15, (d) time of the snapshot after the fault is introduced (a) 1, (b) 6, (c) 7, (d)
16, (e) 17, (f) 18. Reprinted from [143]. Copyright © 2001 with permission 8, (e) 9, (f) 10. Reprinted from [143]. Copyright © 2001 with permission
from Elsevier. from Elsevier.

PC models for specific faults can be developed using historical data sets Residual Discriminant For situations where the data collected are not de-
collected when the process was experiencing that fault. When current mea- scribed well by PC models of other faults but will be within the residual
surements indicate out-of-control behavior, a likely cause for this behavior threshold of their own class, it is most likely that x is from the fault model
is assigned by pattern matching by using scores, residuals, angles or their i with minimum
combination.
where ri = t[ (1 - PpT)t i (7.6)
Score Discriminant Assuming that PC models retain sufficient variation
to discriminate between possible causes in scores that have independent 'ri is the residual computed using the PCA model for fault i and ri.CI is the
Normal distributions, the maximum likelihood that data x collected at a residual threshold at level 1000: based on the PCA model for fault i.
specific sampling time are from fault model i is indicated by the minimum
distance. This minimum can be determined for example by the maximum Combined Distance Discriminant Combining the information available in
scores and residuals usually improves the diagnosis accuracy [206]. Compar-
of di expressed by quadratic discrimination (Eq. 3.41)
ing the combined information to the confidence limits of each fault model.
di(t) lnpi - ~ In l:Eil - ~(t - Ii)T:E i -l(t - Ii) (7.5)
x is most likely to be from the fault model i with minimum '
2 2
where t = xP i is the location of original observation x in PC space for Ci Ti) + (1 - )( ti )
( r"cx
-.- Ci -.-' (7.7)
t"CI
fault model i, t i and :E i are the mean and the covariance along PCs for
fault model i, and Pi is the adjustment for overall occurrence likelihood of where t; and ri are the score distance and residual for fault i based on the
fault i [126]. PC model, respectively, ti,cx and ri.cx are the score distance and residual
184 Chapter 7. Process Fault Diagnosis 7.5. Fault Diagnosis with Statistical Methods 185

thresholds using the PC model, respectively, for fault i, and Ci is a weight


between 0 and 1. Ci is set equal to the fraction of total variance explained
by scores in order to weigh scores and residuals according to the amount of
~ 10
n

t - - -·- :- - -: : :
A
- - : _=
I
variation in data explained by each. The combined discriminant value thus ~ 10" [
calculated gives an indication of the degree of certainty for the diagnosis.
A value less than 1 indicates a good fit to the chosen model. If no model ~ ~-
10-27 ,- --- - ----------------- •• _._--,.- ••• _----_._-- •• -----,---------- .----- •.-,_.•••• -

I
Q 50 100 150 200 250
results in a statistic less than 1, none of the models provide an adequate
Sample Number
match to the observation. When a group of observations fail to fit within

~
any of the known groups, they could be considered as a new group and
added to the discrimination scheme. j :[ _.._ : m....................
The statistical distance discrimination schemes described are simple to

~--
implement. Relating increasing distance with lower likelihoods, they have
an intuitive appeal. They can use a large number of correlated variables
to choose between many possible source populations. Disjointedness and
overlap of sets can be accommodated. Unlike other diagnosis methods,
J Q 50 100
Sample Number
150
-. 200 250

additional source populations can easily be incorporated into the discrimi-


·t-..-.._--.._-.._-.._-.._-...-..-.._.. ._-_..-.._-_..-.._-..~.._-...-.._-.._-=-~_-.._-.._-..-.._-_..-.._-.._-_..-.._-..-__-.._-.._-.._-_--~
5
nation scheme without retraining the whole diagnosis system. 10

Example The test statistics using different types of discriminants can be


plotted versus. sample number in semilog plots. Figure 7.30 shows (a) resid-
ual, (b) score (plotted as the negative of the discriminant) and (c) combined
score-residual discriminants at each sampling time during a run with distur-
bance A (random variations in feed temperature) of the Tennessee Eastman
industrial challenge problem. The minimum and maximum (dashed line),
and average (solid line) statistics comparing a sample to all possible groups
Figure 7.30. Test statistics based on distance discriminants when the pro-
are shown along with the discriminant for the actual disturbance (stars).
c~ss is subjected to disturbance A: (a) Residuals, (b) Scores, and (c) Com-
Correct diagnosis is made when the statistic for the true group coincides
with the minimum value. The correct diagnosis is never made for this case bmed statistics. Minimum and maximum values (dashed lines), average
values. (solid line), statistics of disturbance A (stars). Reprinted from [243].
with residuals (Figure 7.30a). The combined discriminant (Figure 7.30c)
Copynght © 1997 with permission from Elsevier.
diagnoses the disturbance erratically and score discriminant (plotted as log
of its absolute value) (Figure 7.30c) diagnoses the disturbance correctly
most of the time. regions corresponding to operation with different faults can be used for fault
Figure 7.31 illustrates the fault isolation process when disturbance 3 diagnosis, to complement distance-based methods [243]. The method uses
(step change in feed temperature) is introduced. Score discriminants are angles between different coordinate systems and a similarity index defined
calculated using PC models for the various known faults (Figure 7.31c); this by using the angle information [154].
semilog plot shows the negative of the discriminant. The most likely fault
is chosen over time by selecting the fault corresponding to the maximum Euclidean and Mahalanobis A ngles The Euclidean angle eE between two
discriminant (curve with the lowest magnitude). Figure 7.31d reports the points a and b (with coordinates a and b and the vertex at the origin) is
fault selected at each sampling time. Fault 3, which is the correct fault, defined using vector products,
has been reported consistently after the first 10 sampling times.
where jlall = VaTa (7.8)
Angle-Based Discriminants for Diagnosis
The angles between principal coordinate directions of current data and Adjusting the angle definition for a weighted distance, the Mahalanobis
186 Chapter 7. Process Fault Diagnosis 7.5. Fault Diagnosis with Statistical Methods 187

Angular Discriminant For distance-based discriminants, the diagnosis can


be posed as a minimization of distance penalties. For angular information,
a suitable discriminant can be stated as;

(7.10)

where e.; is the angle between the test point and the mean of the ith group,
with the vertex positioned at the mean of NO. Looking at the absolute value
has the effect of ignoring on which side of the target mean a point may lie,
relative to the line joining the mean and the origin. Decision boundaries
~ 1(12 U1~::::::':-:-~---------------------------------------------------------------- . -----.----------------------.
"c
for angular discriminants describe open-ended conical regions in space.
~L.
Choice of Angular Mahalanobis Weighing A major task, as in distance-
based discriminants, is finding a suitable dispersion matrix and choice of
coordinates or dimensions to retain. In general, distance-based discrimi-
nants use a covariance matrix. Estimation of the covariance is done by the
method derived for a multivariate Normal distribution which provides the
most likely estimate_ However, there are some difficulties in using the usual
estimate for highly correlated variables. Mahalanobis-style weighing uses
the inverse of the covariance matrix, which can be mathematically unsta-
ble or physically unsuitable as it amplifies the importance of measurements
that have the smallest change. Use of PCA can work around the inver-
sion problem, but are generally also derived from the multivariate Normal
distribution.
Residual Mahalanobis Angle The residual Mahalanobis angle l' is defined
by replacing S-l with 1- ppl as the weighing matrix:

Figure 7.31. Detection and diagnosis of process upsets, (a) Detection of dr(a, b) = v(a - b)T(I - pPT)(a - b) (7_11)
outliers based on residuals, (b) Detection based on T 2 test of scores, (c)
Diagnosis statistics considering each possible disturbance, (d) Index of ~ho­ cos(1') = (aT(I - PpT)b)/(dr(a, O)de(b, 0)) (7.12)
sen disturbance for each observation. Reprinted from [243]. Copynght
© 1997 with permission from Elsevier. Example Use of angle-based diagnosis is illustrated by introducing a-
gain disturbance A (random noise in feed temperature) of the Tennessee
Eastman industrial challenge problem. Figure 7.32a shows the minimum
angle (eM) between points a and b with the vertex at the origin is and maximum angles (dashed lines), and the average angle (solid line) for
all 21 possible disturbances along with the angle to the correct disturbance
(7.9) (indicated by x). The diagnosis at each sampling time is made by selecting
the disturbance with the minimum angle to the observation, as plotted in
where S is the covariance matrix and d(a, b) = J(a - b)TS-1(a - b) is Figure 7.32b. Most samples are correctly diagnosed as coming from distur-
the Mahalanobis distance for points a and b. A constant Mahalanobis angle bance A (class 10), with a few misclassifications at the beginning and end
around the line joining point a with the origin is a hyperconical surface, of the run. A geometric explanation for this behavior could be that the
with distortion given by the matrix S. trajectory of data over time is curved, so the samples near the middle of
7.5. Fault Diagnosis with Statistical Methods 189
188 Chapter 7. Process Fault Diagnosis

Table 7.4. Percentage success in diagnosis of various disturbances with


new data. Reprinted from [243]. Copyright © 1997 with permission from
Elsevier.
Disturbance Type Score Residual Combination Angle
1 Step 94 0 69 100
2 Step 41 0 0 93
Sample Number 3 Step 0 100 92 29
B 4 Step 0 0 0 74

:f! ~~
5 Step 37 0 48 98
6 Step 70 0 73 93
~
D 7 Step 93 0 51 62
8 Random 1 0 0 99
50 100 150 200 9 Random 14 0 0 12
0
Sample Number A Random 57 0 25 88
B Random 0 0 0 31
C Random 7 0 9 0
Figure 7.32. (a) Test statistics based on angle discriminants: mmlmum C and F Random 33 0 41 0.9
and maximum values (dashed lines), average values (solid line), statistics D Ramp 0.7 0 0 87
of disturbance A (x), (b) Disturbance/fault diagnosed when the process is E Random 0 100 62 0
subjected to disturbance A. Reprinted from [243]. Copyright © 1997 with F Random 0 14 0 44
permission from Elsevier. G Random 33 0 13 47
H Ramp 76 0 66 97
I Ramp 0 0 0 98
the run are within angular bounds while those at the beginning and end of J Random 0 14 0 44
the run are at larger angles. K Random 73 0 0.9 32
Table 7.4 lists the average percentage of observations correctly diag-
nosed using angles, score, residual and combination of scores and residuals
discriminants with new data not used in model development. In gener- tion. The similarity index has a range from 0 to 1, increasing as models
al, half of the observations were correctly diagnosed, with step and ramp become more similar. It provides a quantitative measure of difference in
disturbances 10 to 40% better classified than random disturbances. As ex- covariance directions between models and a description of overall geometric
pected, diagnosis with new data was slightly less successful than diagnosis similarity in spread.
using observations from training.
Discrimination and Diagnosis of Multiple Disturbances
A related topic is the comparison of PC models and statistical tests
Detection and diagnosis of multiple simultaneous faults is an important
for overlap between disturbance regions. Krzanowski [154] describes the
concern. Most FDD techniques rely on the assumption of a single fault. In
derivation of angles between coordinate axes from different models, and
~ real process, combinations of faults may occur. An intervention policy to
proposes the minimum angle between models as a benchmark for simple
Improve process operation may need to take into account each of the con-
analysis. Use of angles to evaluate overlap between regions is discussed in
tributing faults. Diagnosis should be able to identi ajor contributors
[243J. The similarity index can be used to evaluate discrimination models
and co~rectly.indicatewhich, if any, secondary fault ccurring [241]. In
by selecting a threshold value to indicate where mistakes in classification
fault dlagnosls, where process behavior due to diffe faults is described
of data from the two models involved may occur. It can also be used to
by different models, it is useful to have a quantitative measure of similarity
compare models built from different operating runs of the same process for
or overlap between models, and to predict the likelihood of successful diag-
monitoring systematic changes in process variation during normal opera-
190 Chapter 7. Process Fault Diagnosis 7.7. Fault Diagnosis with Robust Techniques 191

nosis. Similarity measures serve as indicators of the success in diagnosing w~ll not be perfectly diagnosed. Idealizing the two fault regions as concen-
combinations of faults. They can identify combinations of faults that may tnc spheres, the inner model region is enveloped by the outer model. As
be masked or falsely diagnosed, and provide information about the success a result, only the outer fault will be diagnosed and the inner fault will
rates of different diagnosis schemes incorporating single and combinations be masked. Overlap of regions is likely to exist for most processes under
of faults. Using these guidelines, multiple faults occurring in a process can closed-loop control, the multiple fault scenario is further complicated for
be analyzed a priori with respect to their components, and accommodated such processes.
within the diagnosis framework.
Faults. causing random variation about a mean value (such as excessive
In comparing multivariate models, much work has been reported for
sensor n~ls~) ronove a process less drastically off-target than step or ramp
testing significant differences between means when covariance is constant.
faults. Slmllanty measures should indicate that the random variation faults
Testing for differences in covariance is more difficult yet crucial; diagnosis
have more overlap with other models, particularly with each other. Ramp
can be successfully done, whether or not means are different, as long as
or step faults tend to be the outer models and mask secondary random
there is a difference in covariance [79]. Testing for eigenvalue models of
variation faults.
covariance adds new complications, since the statistical characteristics are
not well known, even for common distributions. Simplifying assumptions
for special cases can be made, with significant loss of generality [194].
Overlap of Means An important statistical test in comparing multivariate
models is for differences in means. This corresponds to comparison of ori-
gin of coordinates rather than the coordinate directions. Many statistical
7.6 Fault Diagnosis Using SVM
tests have been developed for testing means, but most of them can become
numerically unstable when significant correlation exists between variables. S~ppor~ vector machines (SVM) have been used for many classification and
In order to work around the instability, overlap between eigenvalue-based ~lagnosls pro?lems in applications such as medical diagnosis, image recogni-
models can be evaluated. Target factor analysis can assign a likelihood on t~on hand-wnt~en ch~racter recognition, bioinformatics and text categoriza-
whether a candidate vector is a contributor to the model of a multivariate ~lOn [42]. TheIr use m chemical process fault diagnosis has been reported
data set. A statistic is defined to test if a specific vector is significantly m recent years. In one application, the performances of Fisher discrimi-
inside the confidence region containing the modeled data [181]. For overlap nant anal~sis, SVM, and proximal SVM for fault diagnosis are investigated
of means, the test can determine whether the mean from one model, f..Ll, [37]. Proxlmal SVM determines 'proximal' planes that separate the differ-
significantly overlaps the region of data from another (second) model [242]. ent classes to reduce the computational burden. The fault classification
Mean overlap analysis can be used to test if an existing PC model fits a performance was evaluated by using the Tennessee Eastman process sim-
new set of observations or if two PC models are analogous. ulator. The authors report the data sets had irrelevant information when
If there is no overlap between regions spanned by two different faults, all varia?les were used and the classification with SVM and PSVM were
two alternative schemes might handle multiple faults modeled by PCA. In poor. W~en ~elevant variables were selected by using genetic algorithms
one method, the combination fault is idealized as being located between and contnbutlOn plots, and used for fault classification the percentage of
the regions of the underlying component faults; allocations of membership misclassifications drop,Ped a~d SyM and PSVM outPe~formed FDA [37].
to the different independent faults contributing to the combination may The authors report mlsclasslficatlon for the testing data set to drop from
provide diagnosis of underlying faults. The second method is based on a 38% to 1.8% f?r FDA, ~nd from 44-45% to 6% for SVM and PSVM. By in-
more general extension of the discrimination scheme by introducing new corporatmg tIme lags mto SVM and PSVM for auto-correlated data. thev
models for each multiple-fault combination of interest. The measures of reduced the overall misclassification with SVM and PSVM to 3%. . u

similarity in model center and direction of spread can be useful to determine


the independence of the models used in diagnosis. A. study that integrates SVM with genetic-quasi-Newton optimization
algonthms reported. the application of the methodology to rayon yarn data
Masking of Multiple Faults 'When the region spanned by the model for one (two classes) and wme data (three classes) with very low misclassification
(outer) fault contains the model for another (inner) fault, their combination rates (0.1%) [156].
Chapter 7. Process Fault Diagnosis 7.7. Fault Diagnosis with Robust Techniques 193
192

7.7 Fault Diagnosis with Robust Techniques


A number of practical issues arise when a process monitoring strategy is
implemented in a real-time environment. Specifically, the data collected
from a Distributed Control System (DCS) are high dimensional, noisy, have
strongly correlated variables and, in most cases, the correlation structure SPEPlot
(boundaries found by KDE)
may be nonlinear. Furthermore, such process data often contain outliers
(gross errors) as a result of process characteristics, faulty sensors, equip-
ment failures, transient effects, or during the transference of values acquired
901;1/0- vel
by analog/digital converters. \Vhen developing an operator support system violated in
the
(aSS), such issues need to be tackled directly to ensure intelligent moni-
toring capabilities.
The ass has to provide the means for suppressing noise and outliers, lsolate sensors that cause

detecting in-control and out-of-control operations, and render sensor recon-


struction when some sensors become unavailable. Thus, the elements of a
robust monitoring strategy would be (i) robust filtering, (ii) dimensionality
reduction, (iii) fault detection and isolation and (iv) sensor reconstruction. Reconstruct fa'lled sensorls

This strategy is depicted in Figure 7.33. Each step in this strategy will Re-
plot SPE anoscores.
be reviewed in the next section, followed by an application to a pilot-scale
distillation column.
Figure 7.33. The schematic of robust monitoring strategy. Reprinted from
7.7.1 Robust Monitoring Strategy [60]. Copyright © 2001 with permission from Elsevier.
The elements of the robust monitoring strategy builds on the methods
discussed previously (e.g., PCA in 3.1 and signal filtering in 6.2.3). Here, structure of nonlinearity present in the data. This is indeed the NLPCA
only the key variations will be introduced. built on autoassociative NNs discussed in Section 3.6.1. This architecture
Robust Filtering The robust filtering step uses the tandem filtering ap- constructs lower dimensional features which are nonlinear combinations of
proach discussed in Section 6.2.3 where the moving median filter is used the original process variables, but it does not encourage explicitly the de-
along with wavelet coefficient denoising to remove outliers and noise arti- velopment of principal components which measure distinct dimensions in
facts from the measured signal. Then, the 'clean' process signal is presented the data, as. in the case for PCA. To provide a strategy in line with the
to the subsequent steps for monitoring. ~eatures of lmear PCA, an orthogonalization needs to be performed. This
IS accomplished using the Gram-Schmidt orthogonalization method that
Nonlinear peA To address the nonlinearity in the identity mapping of
multivariate data, a nonlinear counterpart of the PCA can be used (see can be found in many textbooks on linear algebra [92]. The Gram-Schmidt
Section 3.6.1). As the versions of NLPCA make use of the neural network procedure is performed as follows: Given a non-orthogonal set of vectors,
(NN) concept to address the nonlinearity, they suffer from the known over- {U1 , U2 ," . ,Up},
parameterization problem in the case of noise corrupted data. Data with 1. Let T 1 = U1
small SNR will also give rise to extensive computations during the training
of the network. Shao et al. [266] used wavelet filtering to pre-process the 2. Compute vectors T 1 , T 2 ,'" ,Tp successively using the formula,
data followed by IT-net to detect the non-conforming trends in an industrial
spray drier.
The approach presented here is based on Kramer's work [150] where his
method uncovers both linear and nonlinear correlations independent of the
7.7. Fault Diagnosis with Robust Techniques 195
194 Chapter 7. Process Fault Diagnosis

where '.' denotes the dot product.

The set of vectors {TI , T 2 ,' .. ,Tp } constitutes an orthogonal set. The pro-
cedure requires designation of a vector from which all other vectors are
constructed so as to be orthogonal to the initially chosen vector. In this
case there is no restriction on which vector has to be chosen first.
I~ summary, the orthogonal nonlinear principal component analysis (0-
NLPCA) algorithm develops orthogonal components directly from an auto
associative neural network using the Gram-Schmidt process. The mecha-
nism by which it incorporates the orthogonalization procedure resembles
Layer
'cascade control' where a faster inner loop rejects a disturbance before it 2 3 4 5

affects the outer loop. Figure 7.34 depicts the schematic of the O-NLPCA
proposed by Chessari [33]. The procedure is implemented in such a way Figure 7.34. The O-NLPCA structure. Reprinted from [60]. Copyright
that the mapping layer of network is designated as the inner loop, and the © 2001 with permission from Elsevier.
whole network is regarded as the outer loop. In order for the network to
generate orthogonal outputs at the bottleneck layer, one of the outputs is
chosen to be an 'anchor' vector so that the remaining outputs are orthog- impossible to incorporate it within the nonlinear PCA framework since ev-
onalized with respect to this anchored vector. Choice of the anchor vector ery time the SFEl'irnit is exceeded, a new neural network would need to
is random so as to eliminate biasing towards one vector. Once the orthog- be trained to check the ratio of S P E / SFElirnit. On the other hand, the
onalized outputs are obtained, the inner loop can map the inputs onto this RSVS approach is a three step procedure: (i) identification of the redun-
set of bottleneck outputs. The training should not be carried out until dancies among process sensors and determination of minimum redundancv
the error is minimized below a certain threshold because these vectors may bandwidth, (ii) ordering of sensors with redundant sensors in close pro;-
not satisfy the overall identity mapping objective. Training with a small imity and (iii) application of probabilistic and/or empirical rules using the
number of iterations encourages the construction of orthogonal nonlinear disturbance pattern identified for a new process measurement. Prior to
principal components (secondary objective) without intervening with the applying the RSVS, disturbances need to be identified (via BESI, for in-
identity mapping (the main objective). The drawback of having such a stance) correctly and then RSVS can distinguish whether the disturbance
structure is that the overall training of the network takes longer than the is a sensor malfunction or a process upset. Stork and Kowalski point out
original form of the NLPCA. Also if too many passes are allowed in the that [283] false alarms might lead the RSVS to misdiagnose the source of
inner loop, not only does the overall convergence slows down, but also the the disturbance. In what follows is a new fault detection and identification
secondary objective will not be met. technique that reduces computational load, suits both linear and nonlin-
ear PCA, provides reconstructed values for sensors identified as faulty, and
Fault Detection and Isolation Stork et al. [284], and Stork and Kowalski potentially eliminates false negative situations.
[283] proposed two algorithms to identify multiple sensor disturbances us- The technique is referred to as Backward Substitution for Sensor Iden-
ing backward elimination sensor identification (BESI), and to distinguish tification and Reconstruction (BSSIR) and it is based on the principle that
between process upsets and sensor malfunctions via redundant sensor vot- process upsets and sensor failures can be identified in the presence of re-
ing system (RSVS), respectively. In the BESI approach, once the SF E dundancy among sensor arrays. Due to process characteristics, these mea-
is violated at a given time, every sensor is sequentially removed from the surements may have strong correlations among each other, particularly the
model matrix followed by calculation of the upper control limit. If the ra- ones in close proximity and measuring the same variable. Therefore. when
tio of the SF E / SF Elim.it is less than one, then algorithm terminates and a disturbance affects the process, it would be sensed by a group of ~ensors
points to the sensor/ s that are left out for the out-of-control signal. Oth- rather than just by one. However, if a sensor malfunctions (e.g., due to
erwise, the procedure is continued until SF E / SFElimit ratio drops below complete failure, bias, precision degradation, or a drift), then this will only
one. This approach is computationally expensive to carry out multiple P- affect the individual sensor performance, at least initially. If the malfunc-
CA calculations at each time the SFE is violated. Moreover, it is almost
Chapter 7. Process Fault Diagnosis 7.7. Fault Diagnosis with Robust Techniques 197
196

tioning sensor is associated with a manipulated (or a controlled) variable or a disturbance? The criterion is the correlation coefficient (GG) defined
of a feedback control system, the information conveyed to the controller as
will be inaccurate. As a result, such a sensor malfunction may eventually GG = GOV(Xi,Xj) (7.16)
(5 X'i (5:r:j
manifest itself in more than one sensor.
Once a calibration model for the process space is built using the lin- where Gov(:J;,Xj) denotes the covariance between :ri and Xj, and (}x is the
ear/nonlinear PCA, over the course of operation, the SPE can be used to standard deviation for each vector. For the jth sensor, the most correlated
monitor the process against any unanticipated disturbances and/or. sensor sensor pairs can be found by ordering GGs between the jth sensor and
failures. At times when the S P Elimit is violated, instead of evaluatmg the other sensors. To calculate the correlation coefficient at time the S P Elimit
variable contribution to the SP E, one can go one step back in each sensor is triggered in the test set, a Q x I-size moving window, which contains the
array and calculate the SP E again. Subsequently, the SP ~ values are or- current sample and Q - 1 past samples for each sensor array, is formed.
dered from minimum to maximum. In other words, followmg vectors are The GGs calculated during plant operation are then compared with the
defined first, ones obtained from the training data. Starting from the first test sample, if
A

Xl- = Xl-aa
T a threshold of 10% or more degradation is observed in the GGs between the
Xl- = [Xl (k - 1) X2(k)· .. xm(k)] -+
T jth sensor and the two most correlated ones, then sensor .j is most likely
X2- = [xl(k) x2(k - 1)··· xm(k)] -+ X2- = X2_ aa
A
(7.14)
malfunctioning, because a failure in one sensor should not interfere with
other sensor readings unless that sensor is conveying information to one
A T of the controllers in the system. Otherwise, the cause that triggered the
1)] -+ Xm- = xm-aa SPE to exceed its limit will be due to a disturbance in the process, since
where Xi- denotes a row vector containing measurements for all sensors at a disturbance would typically propagate through the process and affect
time k, but k - 1 sample for sensor j; Xj_ is the model estimate. The rj multiple correlated sensors.
represents the corresponding SP E: Sensor ReconstT'1Lction Following the fault detection procedure discussed
above, the maintenance of unavailable sensors is needed as soon as they
rl '2.)Xl- - )2 are detected. However, if the sensor that conveys information to one of the
r2 I::(X2- -X2-? controllers were to be faulty, it is essential that its value be reconstructed
from the remaining sensors on-line. Sensor reconstruction can be performed
using the calibration model based on the PCA/NLPCA.
Here, after detecting and identifying the failed sensor/s, the unavail-
able sensor values are reconstructed using the calibration model and a
Then, the sensor index of ordered sum of squared residuals can be expressed constrained optimization algorithm from the remaining sensors. Each u-
navailable sensor value can be estimated by solving the following problem:
as
rs,index = index{sort(rl' r2,'" ,rm)} (7.15)
minll:ri - i=I,2,"',rn (7.17)
The sensor with r s,index(1) is first reconstructed using the calibrat~on mod-
el and the constrained optimization algorithm described below m sensor such that
reconstruction. After the first iteration, if the S P E remains above its lim- LB ~ ~UB,
't then rs,zndex,
1 ,
' (1 2) are reconstructed together.
~
This procedure continuesd
until either the SPE falls below its limit or the number of reconstr~ete where x denotes the estimation obtained from the calibration model, .Tis
sensors equals the number of principal components retained for the cahbra- represents the failed sensor, and is the reconstructed value of i th failed
tion model. Meanwhile, the reconstructed values are saved for use in the sensor. LB and U B are the lower and upper bounds for the missing sensor,
subsequent instant the SP E goes beyond its limit. and r2 1 - a is the 100(1 - a)% upper control limit of the SPE. Equation
Now that the affected sensors are isolated, the root cause for the alarm 7.17 can be solved easily since only the forward evaluations of the trained
ca: be explored. In other words, is the alarm due to a sensor malfunction, O-NLPCA network are required. Hence, a one-dimensional search over the
198 Chapter 7. Process Fault Diagnosis 7.7. Fault Diagnosis with Robust Techniques 199

missing values that satisfy the constraints will provide a solution to the
problem effortlessly. Meanwhile, the values of the remaining sensors are
kept constant while the optimization is carried out. Multiple sensor failures
can also be accommodated in an analogous way provided that the dimension
of the bottleneck is equal to or less than the number of the available sensors.
In this case, however, the problem becomes a multidimensional search for
values of the missing sensors that satisfy Eq. 7.17.

7.7.2 Pilot-Scale Distillation Column


A pilot-scale distillation column located at the University of Sydney, Aus-
tralia is used as the case study [60]. The 12-tray distillation column sepa-
rates a 36% mixture of ethanol and water. The following process variables
are monitored: temperatures at trays 12, 10, 8, 6, 4, and the reflux stream,
bottom and top levels (condenser), and the flow rates of bottoms, feed,
steam, distillate and reflux streams. The column is operated at atmospher-
ic pressure using feedback control. Three variables are controlled during
the operation: top product temperature, condenser level, and bottom lev-
el. Temperature at tray 8 is considered as the inferential variable for top
product composition. To maintain a desired product composition, PI con- Figure 7.35. The screen shot of the on-line monitoring strategy, indicating
trollers cascaded on flow were used to manipulate the reflux, top product n~rmal operation of the column. Reprinted from [60]. Copyright © 2001
and bottom product streams. WIth permission from Elsevier.
The column was operated four times at various operating conditions.
The first three data sets corresponding to a total of 12.8 hr of operation
were used to train the O-NLPCA network, and the fourth one was used for second and third PCs were estimated. The NOR is defined to be the 95%
model validation. However, prior to building a calibration model, both the contour underneath the surface of the joint probability density. In addition,
training and the testing data were processed through the robust tandem the SPE plot with 90% warning limit, and two-sided 99% individual con-
filter to remove noise and suppress possible outliers. fidence limits for the filtered process variables were also constructed using
The O-NLPCA network has 8-6-10-12 neurons in each layer, yielding KDE to facilitate fault detection and isolation. Any violations of the 90%
a prototype model with 6 principal components (PCs). For comparison, limit in the S P E will initiate the BSSIR algorithm to isolate the sensors
the linear PCA was also applied to the same data. As a performance that cause an out-of-control signal, and to distinguish between malfunc-
criterion, the root mean square of error (RMSE) was evaluated to compare tioning sensors and process upsets. The next step is to reconstruct those
the prediction ability of the developed PCA and O-NLPCA models on the malfunctioning sensor/ s using the constrained optimization algorithm and
training and validation data. While the linear PCA gave 0.3021 and 0.3227 the trained O-NLPCA network. While searching over the values of the
RMSE on training and validation data sets, respectively, the O-NLPCA failed sensor/s, two criteria are to be met: the value of the SPE should
provided 0.2526 and 0.2244 RMSE. This suggests that to capture the same stay below its 90% confidence limit, and the sensor/s values should remain
amount of information, the linear PCA entails utilization of more principal within their previously defined intervals. Following the reconstruction. the
components than its nonlinear counterpart. As a result, the information scores and the SPEare recalculated and plotted. .
embedded in the nonlinear principal components addresses the underlying To filter incoming process data, a window length of 500 samples was
events more efficiently than the linear ones. used. The window size is maintained constant by forgetting the first entry
To define the NO region (NOR) of the plant, kernel density estimation of the data vector and appending the new measurement vector to the end.
(KDE) is used. The joint probability density of the first and second, and To test the strategy for tracking sensor failures, a complete failure case
200 Chapter 7. Process Fault Diagnosis 7.7. Fault Diagnosis with Robust Techniques 201

is simulated. Sensor 11 (temperature at tray 4) was forced to completely


fail between 621 and 877 sampling intervals. The sensor value, which was
~ 80 0 e, was first decreased to 30 (in time segment 621-700) and then
to zero (in time segment 701-877). To test the disturbance monitoring,
flooding condition was generated by reducing the steam supply from ~ 1.15
to ~ 0.74 kg/min for 1.5 min (between 1103-1120 sampling instants), and
then increased to ~ 1. 7 kg/min for the rest of the operation.

Figure 7.37. The screen shot of the on-line monitoring strategy, indicat-
ing the flooding condition. Reprinted from [60]. Copyright © 2001 with
permission from Elsevier.

instant. Sensors that are suspected as failed or influenced by a process


Figure 7.36. The screen shot of the on-line monitoring strategy, indicating upset are plotted using a bar chart that shows their normalized squared
sensor failure. Reprinted from [60]. Copyright © 2001 with permission residuals at the instant of failure. The figure also shows the SPE plot
from Elsevier. when the faulty sensor was reconstructed using Eq. 7.17.

In Figure 7.35, the normal operating condition is captured. The subplots Next, flooding is introduced. As the flooding progresses through the
in the figure are the scores plot (upper left) between 1st and 2nd PCs, the column stages (Figure 7.37), sensors 5 and 12 were highlighted as faulty,
scores plot (lower left) between 2nd and 3rd PCs, the S P E plot (upper whereas sensors 2, 6, 10, and 11 were isolated as signifying a process up-
right), and the sensor diagnostic plot (lower right) that shows faulty sensors. set. Since there was no actual sensor failure, this shows that the BSSIR
The trends show that the process is operating normally, hence, no violations algorithm could not distinguish correctly between sensor failure and pro-
are indicated in the S P E and the scores plot. cess upset leading to false positive identification. The reasons for this could
Figure 7.36 shows how the monitoring strategy responds when sensor be that the correlation among sensors is not sufficiently strong, resulting
11 fails. This event is well captured in the scores plot between 2nd and 3rd in large deviation in the ee between the sensor labelled as faulty and the
PCs, and the SPE plot. vVhen the magnitude of SPE is greater than 85, most correlated one; or that the ee criterion measures the degree of linear-
its value is plotted at 85 so that the sum of squared residuals corresponds to ity among variables, hence if some variables are nonlinearly correlated, the
normal operation, and the 90% and 95% upper control limits are visible. In ee will again be small. Nevertheless, this ambiguity in the fault detection
addition, the lower right subplot depicts the sensor number and its failing strategy is noted.
202 Chapter 7. Process Fault Diagnosis

7.8 Summary
Several fault diagnosis methodologies have been demonstrated to stress the
variety and availability of techniques that can be deployed in practical appli-
cations. First, hidden Markov models (HMMs) have been developed either
8
to solve the state estimation problem for detecting faults or, in conjunc-
tion with wavelets and triangular episodes, to solve the maximum likelihood
problem for assigning fault classes. Next, multivariate statistical techniques
have been used to develop fault diagnosis strategies that are based on PCA
Sensor Failure Detection
and contribution plots that are also extended to robust strategies to deal
with measurement noise/outliers and nonlinear correlations among process
variables.
and Diagnosis

Sensor auditing is an important component of statistical process monitoring


(SPM). The sensors generate a wealth of information. This information is
used for monitoring and controlling the process. Misleading information can
be generated if there is a bias change, drift or high levels of noise in some
of the sensors. Erroneous information often causes control actions that are
unnecessary, resulting in the deterioration of product quality, safety and
profitability [224]. Identifying failures such as a broken thermocouple is
relatively easy since the signal received from the sensor has a fixed and
unique value. Incipient sensor failures that cause drift, bias change or
additional noise are more difficult to identify and may remain unnoticed
for extended periods of time. Consequently, early detection and diagnosis
of such faults followed by timely reporting of the analysis can assist plant
operators in improving product quality, process safety and profitability.
The fundamental idea is to utilize additional relevant process informa-
tion for assessing the correctness of information generated by a sensor. This
approach is known as the functional redundancy and it is more attractive
than physical redundancy by duplicating sensors and using a voting log-
ic to select the correct information. Several techniques based on statistics
and system theory have been developed for validation of sensor information
by functional redundancy. In most of these techniques, it is assumed that
detailed process information is available a priori. Often, this knowledge
is in the form of an accurate state-space model [39, 230]. In many cases,
this type of accurate representation of a chemical process based on first
principles is not available.
This chapter introduces two sensor audit strategies that can detect and
diagnose sensor faults. The first strategy (Section 8.1) focuses on sensor
auditing by using calibration and test data sets that are processed either
by developing PLS models (for data with low autocorrelation) or canoni-

203
228 Chapter 8. Sensor Failure Detection and Diagnosis 8.2. Real-Time Sensor FDD Using PCA-Based Techniques 229

400
Table 8.4. Source of identification for the sensors affected by the process
upset. Results are obtained after reconstruction of faulty sensors, and for 300
those points that belong to region II in Figure 8.15b. Reprinted from [62].
200
Copyright © 2001 with permission from Elsevier.
Sample No. Latent Space Residual Space 100

33 9, 18, 19 3,4, 5, 7, 8, 9, 11, 12, 13, t2- 0


14, 15, 16, 18
64 18 17, 18, 19 -100

92 9, 12 6, 7, 9, 10, 11, 17, 18, 19 -200


93 9, 12 6, 7, 9, 10, 11, 17, 18, 19
151 9, 11, 12 9, 15, 16, 17, 18, 19 -300

152 1, 2, 3, 5, 6, 7, 8, 20 17,18,19
155 ~

2, 3, 17, 18, 19 o 100 200 300 400


95~(' t1
156 8, 9, 11, 12, 13 7, 9, 10, 12, 13, 14, 15, 16, 8000 ,------=-~""-_~ __ ~-_~-_~~_
17, 18, 19
7000

6000
upset in these sensors, the scores plot after reconstructing the failed sen-
sors and the SP E /T 2 plot (Figure 8.15) consistently indicate the presence SPE
5000

4000
:
.....
..
of such a process upset. Moreover, similar, but less pronounced, behav-
ior was also observed at t = 10,28,120 - 121, 151 - 153, 157, 173, 207, 228- 3000

230,232 235,238 240,242,247,250,301- 303 and 320 - 321. While these 2000
instances also point to the presence of possible process upsets, it should al- 1000
so be recognized that the ones revealed by one or two uncorrelated sensors 95%
may be due to small changes in signal characteristics, such as noise. 20 40 60 80 100 120
An interesting realization is the fact that points that fall in region II of F
the SPE/T 2 (Figure 8.15) were not always caused by the same group of
sensors that were affected by the disturbance. Table 8.4 gives few instances Figure 8.15. Process status for the validation data set. Reprinted from [62].
when the system was undergoing the disturbances. As one can see, both Copyright © 2001 with permission from Elsevier.
latent and residual spaces are characterized rarely by the same group of
sensors. Furthermore, both the SP E (in region I) and the T 2 (in region III)
are capturing different type of disturbances. These findings suggest that
fault identification and isolation methods, which utilize the information
from the residual or latent space only, will not be able to reveal all the
disturbances.
The importance of reconstructing the faulty measurements plays a cru- lying reasons that led to a drift from the NOR were not due to the faulty
cial role in identifying the process upsets inherent in the system. ·Without sensors. This realization is a strong indication to present process upsets
reconstruction, these events might go undetected that eventually lead to masked by the failed sensors. It is worth mentioning that due to the mask-
false negative situations. Thereby, to remedy the masking effect of the ing problem, these disturbances were not correctly identified before in the
faulty measurements that inflate the T 2 and the S P E, reconstruction is literature. Therefore, it can be pointed out that the reported list of process
vital. As a particular aspect of this example, it was found that the under- upsets may be incomplete.
230 Chapter 8. Sensor Failure Detection and Diagnosis

8.3 Summary
Process sensors are a key element of monitoring strategies as they provide
a wealth of information about process status. However, they are also sub- 9
ject to various modes of failure, which can complicate the detection and
diagnosis of faults and catastrophic events. In this chapter, two sensor
auditing strategies were presented that can aid in the isolation of failed
sensors. Based on the concepts of PLS, CVSS and PCA, these sensor audit Controller Performance
strategies playa substantial role in discriminating between actual process
disturbances and sensor malfunctions, thus helping operators locate the
true root cause of process faults. The second method has also shown that
Monitoring
the malfunctioning sensors can be reconstructed using measurement infor-
mation from other sensors.

The objective of controller performance monitoring (CPM) is to develop


and implement technology that provides information to plant personnel for
determining if appropriate performance targets and response characteris-
tics are being met by the controlled process variables. Typical operating
targets include limits on deviation from the set-point, limits on manipu-
lated variable moves, variances of controlled and manipulated variables,
frequency of soft constraint violations and frequency of reaching hard con-
straints. These targets can be used as criteria for assessing controller per-
formance. Additional criteria are developed by considering the dynamic
response characteristics such as decay ratio, overshoot, response time and
response characteristics of the output error and the manipulated variable.
Several additional criteria are defined for multivariable systems including
the extent of dynamic interactions and loop shaping. Many of these crite-
ria may not be automated easily and various techniques that can compute
indexes indicating controller performance have been proposed.
The initial design of control systems includes many uncertainties caused
by inaccuracies in process models, estimations of disturbance dynamics and
magnitudes, and assumptions concerning the operating conditions [253].
The control algorithm and the tuning parameter values are chosen by using
this uncertain information, leading to process performance that can differ
significantly from the design specifications. Even if controllers perform well
initially, many factors can cause their abrupt or gradual performance deteri-
oration. Sensor or actuator failure, equipment fouling, feedstock variations,
product changes and seasonal variations may.affect controller performance.
It is reported that as many as 60% of all industrial controllers have some
kind of performance problem [105]. It is often difficult to effectively mon-
itor the performance and diagnose problems from trends in raw process
data [148]. These data show complicated response patterns caused by dis-

231
Chapter 9. Controller Performance Monitoring 9.1. Single-Loop Controller Performance Monitoring 233
232

turbances, noise, time-varying systems and nonlinearities. In addition, the [281]. A recent review [293] summarizes various advances in plantwide CPM
scarcity of engineers with control expertise to evaluate routinely the large for single-loop controllers and integrates CPM with detection of periodic
number of control loops in chemical processes makes the analysis of raw and nonperiodic oscillations in plant operation, valve stiction and root cause
data virtually unmanageable. These facts stress the necessity of efficien- of plant disturbances. Diagnostic tools for performance degradation in
t on-line techniques in controller performance monitoring and diagnosis. multivariable model-based control systems have been proposed [141]. Very
Development of on-line tools that can be automated and provide easy to few uses of KESs for CPM and diagnosis have been reported [125, 139,
interpret results by plant personnel are desirable. 264]. Review papers summarize various approaches for CPM of single-loop,
CPM ensures proper performance of the control systems to enable the multi-input-multi-output (MIMO), and MPC controllers [236, 238], and
process to operate as expected and manufacture products that meet their detection of valve stiction problems [123, 255]. CPM of MIMO processes by
specifications. CPM and control system diagnosis activities are a subset of using projections to subspaces [195, 196], and valve stiction by qualitative
the plantwide process monitoring and diagnosis activities. CPM and diag- shape analysis illustrate the diversity of techniques proposed for CPM and
nosis rely on the interpretation of data collected from the process. When diagnosis.
an abnormality is detected in process data, it is necessary to determine if An overview of single-loop CPM is presented in Section 9.1. Section
it is caused by a control system related cause as opposed to process equip- 9.2 surveys CPM tools for multivariable controllers. Monitoring of MPC
ment failure. The sequence of events and interactions can be more complex performance and a case study based on MPC of an evaporator model and
if for example an equipment failure triggers process variations that are a supervisory knowledge-based system (KES) is presented in Section 9.3
further amplified by the feedback of the control system. This chapter fo- to illustrate the methodology. The extension of CPM to web and sheet
cuses on CPM and diagnosis will be limited to determining if source causes processes is discussed in Section 10.3.
are associated with the controller. Controlled variables should meet their
operating targets such as specifications on output variability, effectiveness
in constraint enforcement, or closeness to optimal control. A comprehen-
9.1 Single-Loop Controller Performance Mon-
sive approach for assessing the effectiveness of control systems includes: (i) itoring
Determination of the capability of the control system; (ii) Development of
statistics for monitoring controller performance; (iii) Development of meth- An elegant CPM method based on minimum variance control (MVC) and
ods for diagnosing the underlying causes of changes in the performance of the variance of the controlled variable computed from routine process da-
the control system [105]. ta proposed by Harris [102] has initiated the recent interest in CPM. The
Performance criteria must be defined to determine the capability of a variance of a controlled variable is an important performance measure, s-
control system. A benchmark is established for assessment by using data ince many process and quality criteria are based on it. The theoretically
collected during some period of process operation with acceptable perfor- achievable absolute lower bound on the variability of the output can be an
mance. Once these are achieved, controller performance can be monitored appropriate benchmark to measure the performance of a regulatory control
over time to detect significant changes. Since control system inputs are ran- system. This benchmark is achieved by a system under MVC. Using MVC
dom variables, the outputs of the performance measure will be stochastic as performance benchmark, one can assess the performance of a control loop
as well. Therefore. statistical analysis tools should be used to detect sta- and make statements on the potential of improvements resulting from re-
tistically significant changes in controller performance. When performance tuning of controller parameters or implementing more sophisticated linear
degradation is detected, the underlying root causes have to be identified. feedback controllers [53]. A good performance relative to MVC indicates
Methods for isolating problems associated with the controller from those that further tuning or re-design of the control algorithm is neither necessary
arising from the process would be very useful. This chapter focuses on CPM nor helpful. In this case, further reduction of process variability can only
of single loop, multivariable and model predictive control (MPC) systems. be obtained by implementation of feedforward control or re-engineering of
Diagnosis is illustrated for MPC and is limited to distinguishing between the process. A poor performance might result from constraints such as un-
root cause problems associated with the controller and problems that are stable or poorly damped zeros of the process transfer functions or control
not caused by the controller [264]. action limits and indicates the necessity of further analysis such as process
Integration of CPM with diagnosis was reported for single-loop cases identification and controller re-design [115].
234 Chapter 9. Controller Performance Monitoring 9.1. Single-Loop Controller Performance Monitoring 235

Various performance indices have been suggested [54, 53, 149, 20, 148] The dynamic response of e(k) can be expressed as an autoregressive moving
and several approaches have been proposed for estimating the performance average (ARMA) model or a moving average (MA) time series model:
index for 8180 systems, including the normalized performance index ap-
proach [53], the three estimator approach [175], and the filtering and cor-
relation analysis (FCOR) approach [115]. A model free approach for linear
quadratic CPM from closed-loop experiments that uses spectrum analysis
of the input and output data has been suggested [136]. Implementation where. a( k) is a random noise sequence with variance 1J2 and 1/Ji are the
of 8180 loop based CPM tools for refinery-wide control loop performance coeffiCIents of the MA model or the impulse weights. Harris and his co-
assessment has been reported [294]. ",:orkers [53, 102] have noted that the variance of the closed-loop output is
The most popular tool for monitoring single-loop feedback and feedfor- gIven by
ward/feedback controllers is based on relative performance with respect to 2 _
lJ e -
[1 + 01,2 ,2 2 2
'1-'1 + 1fJ2 + '" + 1/Jf + .. .]lJ a (9.6)
minimum variance control (MVC) [53, 102]. The idea is not to implement
MVC but to use the variance of the controlled output variable that would The output error variance for MVC becomes
be obtained if MVC were used as the reference point. The variation of the
1J2
mv -
-(1+ 1fJl
12 + 12
1fJ2 + '" + 0;,2)2
'l-'f lJ a (9.7)
inflation of the controlled output variance indicates if the process is operat-
ing as expected or not. Furthermore, if the variance with a MVC is larger
where f de~ot:s the number of time intervals equivalent to the process time
than what could be tolerated, this indicates the need for modification of delay. Harns [a3] defines a performance index
operating conditions or process.
Following the MVC framework [102, 148], consider a process described
by a linear discrete- time transfer function model: (9.8)

(9.1)
Th~ index fI(f) gives the ratio of the variance in excess of that could be
achIeved under MVC to the actual variance. If fI(f) is close to 0 the con-
where y(k) is the output, u(k) is the input, di(k) is the ith measured dis- troll~r p.erforms closely to the performance of MVC, and fI(f) values closer
turbance, and v(k) represents the additive effect of noise and unmeasured to 1 mdlcate poor controller performance.
disturbances at the output. The argument (k) represents discrete time in- Kozub and Garcia [149] point out that in many practical cases ratin<Yof
stants. P(q-l) and Di(q-l) are stable polynomials corresponding to the output error characteristics relative to MVC is not practical or a~hievable.
transfer functions between the output and the manipulated input or mea- They propose autocorrelation patterns for first-order exponential output
sured disturbance i, respectively. The manipulated input is computed by error decay trend:
the controller
e(k) = 1 a(k) T
1 - Aq-l with A = exp (--) (9.9)
(9.2) T

where T is the sampling interval and T is the first-order response time


where C(q-l) and Cf,i(q-l) are the feedback and feedforward controller constant. The autocorrelation pattern is given by
transfer functions. The output deviation (error) from the set-point r (k) is
(9.10)
e(k) = r(k) y(k) (9.3)
which can be compared to the autocorrelation pattern of the error e(k).
By using Eqs. 9.1 and 9.2, the error e(k) can be written as They define a closed-loop potential (C LP) factor defined as

e(k) = r(k) - Li(Di(q-l) + P(q-l)Cf,i)di(k) - v(k)


(9.4) CLP
1 + P(q-l )C(q-l) (9.11)
Chapter 9. Controller Performance Monitoring 9.2. Multivariable Controller Performance Monitoring 237
236

For the closed-loop performance bound given in Eq. 9.9, the variance of 9.2 Multivariable Controller Performance Mon-
the output error is itoring
2 1 2 (9.12)
(Je = 1- ,\2(Ja CPM of multivariable control systems has attracted significant attention
because of its industrial importance. Several methods have been proposed
for performance assessment of multivariable control systems. One approach
which yields a bound limit for the CLP by noting that (J~v = (J~ if f = 0: is based on the extension of minimum variance control performance bounds
to multivariable control systems by computing the interactor matrix to
CLP=1-,\2 (9.13) estimate the time delay [103, 116]. The interactor matrix [103, 116] can be
obtained theoretically from the transfer function via the Markov parameters
or estimated from process data [114]. Once the interactor matrix is known,
These indexes can be extended to consider the variance ratios of the k-step- the multivariate extension of the performance bounds can be established.
ahead forecast error to the variance of e(k). A performance index similar For example, Harris and co-workers [103] propose
to CLP, CLPk is defined as [148]:
E[Y~VWYMV]
1]=1 (9.15)
(9.14) E[yrWYt]

where W is a positive-definite weighting matrix, Y is the vector of out-


puts and E[.] denotes expectation. As an extension of this approach, a
Other enhancements for indexes that originate from the same concepts have filtered optimal H 2 control law with desired closed-loop dynamics has been
been proposed [20, 110, 248] and applications to refinery control loops have proposed [114]. Alternatively, multivariate MVC performance might be es-
been reported [294]. Lynch and Dumont [175] have presented a methodolo- timated via multivariate time series analysis [105]. A pass/fail likelihood
gy based on Laguerre networks to model the c.losed-loop s~stem for compu~­ ratio test was proposed to determine if performance specifications like set-
in er the minimum achievable variance, an on-lIne delay estImator, and stat~c tling time, decay ratio, minimum variance, or frequency-domain bounds
in~ut-output estimator for assessing process nonlinearity. Likelihood ratlO are met [300]. Huang and Shah [115] proposed as benchmark user-specified
tests have been proposed to determine if the output error response char- closed-loop dynamics, like settling time or overshoot. Covariance-based
acteristics are acceptable based on specified dynamic performance bounds performance indexes and a user-defined benchmark have been presented by
[300]. But Kozub [148] warns that this approach is conceptually and :om- Qin and co-workers [195, 196, 238].
putationally too demanding compared to other metho.ds ~nd that ~elIance Another group of approaches focuses on model-based control systems.
on only settling-time specification to construct the lIkelIhood ratlO tests The ratio of the desired and achieved controller objective functions, set-
[300] may be misleading. tling time, and constraint violations based criteria have been proposed for
Time series models of the output error such as Eq. 9.5 can be used to a Dynamic Matrix Control (DMC) type model predictive controller [223].
identify the dynamic response characteristics of e(k) [148]. D~namic re- Diagnosis tools for source causes of poor controller performance have also
sponse characteristics such as overshoot, settling time and cyclmg can be been suggested. A different group of tools for detecting and diagnosing
extracted from the pulse response of the fitted time series model. The pulse controller performance problems have been suggested by using multivari-
response of the estimated e(k) can be compared to the pulse response of ate statistical tests on the prediction error for detection and casting the
the desired response specification to determine if the output error charac- diagnosis problem as a state estimation problem [141].
teristics are acceptable [148]. The third class of techniques include a frequency-domain method based
Cross correlation analysis is proposed for assessing the dynamic sig- on the identification of the sensitivity j1Lnction (S (s)) and the complemen-
nificance of measured disturbances and set-point changes with respect .to tary sensitivity function (T(s)) from plant data or CPM of multivariable
closed-loop error response, and testing the existence of plant-model mIS- systems [140]. Robust control system design methods seek to maximize
match for models used in controller design [281]. closed-loop performance subject to specifications for bandwidth and peak
238 Chapter 9. Controller Performance Monitoring 9.3. CPM for MPC 239

magnitude of 8(s) and T(s). Estimates of these transfer functions can be outputs, and change in manipulated variables at time k, respectively. Q
obtained by exciting the reference input with a zero-mean, pseudo-random and R are weighting matrices representing the relative importance of each
binary sequence, observing the process output and error response, and de- controlled and manipulated variable. Control moves at each sampling time
veloping a closed-loop model. Performance assessment is based on the are obtained by calculating a control sequence that minimizes <'I>(k). There-
comparison between the observed frequency response characteristics and fore, it is reasonable to measure MPC performance by calculating values
the design specifications. Selection of appropriate model structures, exper- of <'I>(k) using plant data. A performance measure based on <'I>(k) can be
imental design and model validation which will ensure reasonable estimates defined as
of 8(s) and T(s) are discussed in [140]. The method has been automat- Jactual(k) = eT(k)Qe(k) + 6u T (k)R6u(k) (9.17)
ed and embedded in a real-time knowledge-based system for supervisory
where e(k) = y(k) r(k) is the vector of controlled variable errors and
multivariable control [139]. Since the technique is intrusive, it should be
6u(k) is the vector of control moves at time k. <'I>(k) is a random variable
used after one of the nonintrusive techniques discussed earlier indicates a
because of measurement noise and disturbances. Consequently, the expect-
controller performance problem. Because the procedure checks controller
ed value of the cost function is more suitable for measuring the controller
performance against design criteria, controller design and tuning via loop performance achieved:
shaping techniques provide an automated controller modification opportu-
nity for maximizing performance. (9.18)
Here E[.] is the expectation operator and e(k) and 6u(k) are comput-
9.3 CPM for MPC ed from the data set under examination, The LQG benchmark [115], the
hzsto'rlcal performance benchmark [222], and the model-based performance
CPM for model predictive control (MPC) systems has been studied in re- benchmark [222, 347] are some of the methods that have been proposed in
cent years. The availability of a model for MPC offers new alternatives for the literature for CPM of MPC.
CPM of MPCs in contrast to multivariable control CPM that is usually
data-driven, relying only on routinely collected process data. This section LQG-Benchmark The achievable performance of a linear system charac-
starts with a summary of some CPM techniques proposed in the litera- terized by quadratic costs and Gaussian noise can be estimated by solvinO'
ture. These techniques are extended and integrated to a comprehensive the linear quadratic Gaussian (LQG) problem. The solution can be plotted
MPC performance assessment and monitoring methodology and diagnosis as a trade-off curve that displays the minimal achievable variance of the
of types of causes for poor process performance [264]. Use of real-time KB- co~trol!ed variable versus the variance of the manipulated variable [115]
Ss for integrating CPM and diagnosis is also presented. Integration of CPM whIch IS used as a CPM benchmark. Operation close to optimal perfor-
and diagnosis is illustrated by using an evaporator control case study. MPC mance is indicated by an operating point near this trade-off curve. For
calculations in this work are performed using a slightly modified version of mUlti~ariable control systems, H 2 norms are plotted. The LQG objective
the Matlab® MPC Toolbox [204] to allow for nonlinear plant models and functIon and the corresponding H 2 norms are [115]
a stepwise calculations necessary for on-line monitoring.
Model predictive control is based on real-time optimization of a cost <PLQC(A) = E[e(kfQe(k)] + AE[6u(k)TR6u(k)] (9.19)
function. Consequently, CPM methods that focus on the values of this cost
IIGyll~ = E[e(kfQe(k)] IIGullh = E[6u(kfR6u(k)] (9.20)
function can be developed. The MPC cost function <'I>(k) is
The trade-off curve is obtained by calculating the H 2 norms for different
p
values of A and plotting II Gy II ~ versus II G u I h. Once the trade-off curve is
<'I>(k) L [y(k + j) r(k + j)]TQ[y(k + j) - r(k + j)] calculated, the H 2 norms under the existing control system are computed
j=N ,
and compared to the optimal control represented by the trade-off curve,
IV!
The LQG benchmark is limited to a special group of MPCs character-
+ L[6u(k + ,j - 1)]TR[6u(k + j - 1)] (9.16) ized by the equality of control (M) and prediction (P) horizons and lack of
,j=]
feedforward components and constraints. It may be considered as a limit of
where r(k), y(k), and 6u(k) are vectors of reference trajectories, predicted achievable performance in terms of input and output variance to evaluate
240 Chapter 9. Controller Performance Monitoring 9.3. CPM for MPC 241

various types of controllers. Since)\1 and P are two independent and impor- The actual performance is defined as
tant tuning parameters and incorporation of constraints and feedforward
Pc
control are important advantages of MPC over conventional controllers, al-
ternatives to the LQG benchmark have been developed for monitoring the Jact(k) = :LeT(k + j - Pc)Qe(k +j - Pc) (9.24)
j=l
performance of these more interesting MPC implementations.
Historical Benchmark A priori knowledge that the performance was The expected performance uses Eq. 9.24 as well, after replacing e with e*.
good during a certain time period is necessary to use this approach [222]. The ratios ides and 1MPc are very similar. In general, they are smaller
For the block of input and output data of this period, the historical bench- than 1 due to imperfect models, sensor noise, or other uncertainties.
mark Aist is given by an equation of the same form as Eq. 9.18 where e(k) hvI pc is a stochastic variable and statistically significant changes in the
and L:.u(k) are taken from the historical data set. The objective function controller performance can be detected by statistical analysis. 1M pc is
for the performance achieved (Jach) is calculated by using again Eq. 9.18 assumed to be generated by an ARMA model
where e(k) and L:.u(k) are taken from data collected during the period of
interest. The performance measure is defined as the ratio (9.25 )

Jhist where C(q-1) and A(q-1) are monic polynomials and z(k) is a zero-mean,
ihist = - J (9.21)
ach uncorrelated, Gaussian noise signal [347]. Polynomials A and C and the
variance of z can be estimated from a sequence of h;J pc values computed
Model-based Performance Measure Two alternatives that rely on a by using data collected in a time interval in which the controller performs as
process model, the design case and the expected performance, have been expected. 1M PC is highly serially correlated and the AR part is first-order
proposed: [347]:
(9.26)
Design Case Approach. Patwardhan et al. [222] have suggested the com-
parison of the achieved performance with the performance in the design Defining
case that is characterized by inputs and outputs given by the model. The A(q-1 )
design cost function ,hes has the same form as Eq. 9.18 where e*(k) and L:.1MPc(k) == A IMPc(k) (9.27)
C(q-1 )
L:.u(k)* are substituted for e(k) and L:.u(k) to indicate the predicted devia-
tions of model outputs from the set-points (an estimate of the disturbance where C(q-1) and A(q-1) are estimated polynomials, the estimated noise
is included) and the optimal control moves, respectively. Jach is the same variance is used to compute 95% confidence intervals on L:.1MPC(k) [347].
as that in historical benchmark Eq. 9.18 and is calculated using plant da- Violation of these control limits indicates a statistically significant change
ta. Performance variation between the real plant (Jach) and model (Jdes) in controller performance. According to Eqs. 9.26 and 9.27, L:.{rvl pc(k) is
is expressed by a prediction residual and should have a Normal distribution. Prediction
Jdes residuals are used to monitor variations in autocorrelated random variables
ides = - - (9.22)
Jach using well-established SPM charts.
Expected Performance Approach. Zhang and Henson [347] have proposed an A Comprehensive Technique for MPC Performance Monitoring
on-line comparison between expected and actual process performance. The The essential step in the LQG benchmark is the calculation of various
expected performance is obtained by implementing controller actions on the control laws for different values of A and prediction (P) and control (1\1)
process model. The expected performance incorporates estimates of state horizons (P M). This is a case study for a special type of MPC (un-
noise, but no output disturbances. The actual and expected performance constrained, no feedforward) and a special parameter set (M = P) to find
are compared on-line over a moving horizon Pc of past data using the ratio the optimal value of the cost function and an optimal controller parameter
[347]: set. Using the same information (plant and disturbance model, covariance
1 .(k) = Jexp(k) Inatrices of noise and disturbances), studies can be conducted for any type
(9.23) of MPC and the influence of any parameter can be examined. These studies
MPC Jaet(k)
242 Chapter 9. Controller Performance Monitoring 9.3. CPM for MPC 243

Table 9.1. Categorization of techniques to be used (ff - feedforward). significant reduction in the computational burden.
For on-line monitoring, ihist is computed at each sampling time. In
Controller Specification Assessment Monitoring Diagnosis analogy to the calculation of J act [347], the achieved cost function (Jach) is
unconstrained, no ff LQG ~/hist (k) ides (k) calculated over a moving horizon Pc of past data
unconstrained, ff comparative study ihist(k) ides (k)
constrained, no ff comparative study ihist(k) ides (k) 1 [PC
constrained, ff comparative study ihist (k) ides (k) R 2:)e T (k +j - Pc)Qe(k +j - Pc)) (9.28)
C j=l

can be automated and the corresponding value of the cost function can be +b.uT(k + j - Pc)Rb.u(k +j - Pc)]
reported as function of the underlying parameter set [264].
A value of the cost function suitable to be the historical benchmark where e(k) is the vector of control errors at time k. The performance
and a design case that performs acceptably is selected. Two performance measure ihist (k) at sampling time k is
measures for on-line monitoring are defined after a benchmark is obtained.
ihist (k) is extended for computation at each sampling time to determine k) Jhi8t
controller performance. ides (k) is extended for computation at each sam- ihist ( " = Jach(k) (9.29)
pling time to assist in diagnosis of types of causes for poor performance.
CPM is implemented by using the LQG benchmark or a benchmark ob- Since ~/hist is a random variable, SPM tools can be used to detect statistical-
tained from case studies and ihist (k). 'When the controller performance is ly significant changes. ihist (k) is highly autocorrelated. Use of traditional
declared poor, ides(k) is used to make diagnostic decisions. SPM charts for autocorrelated variables may yield erroneous results. An
Tools for controller performance assessment (CPA), CPM, and diag- alternative SPM method for autocorrelated data is based on the develop-
nosis are available for four types of MPCs by obtaining benchmarks for ment of a time series model, generation of the residuals between the values
constrained cases and controllers including feedforward components, and predicted by the model and the measured values, and monitoring of the
establishing statistical analysis to the historical and model-based perfor- residuals [1]. The residuals should be approximately normally and indepen-
mance measures ihist(k) and ides(k) (Table 9.1). dently distributed with zero-mean and constant-variance if the time series
The tuning parameters of MPC include P, 1\11, and a that determines model ~rov~des an accurate description of process behavior. Therefore, pop-
the desired speed of approach to the set-point by using a relationship be- ular ul1l~anate SPM char.ts (such as x-chart, CUSUM, and EWMA charts)
tween the set-points and the reference trajectory r(k + I) = aSp(k + I - are apphcable to the resIduals. Residuals-based SPM is used to monitor
1) + (1 a)sp(k + I). In addition, weight matrices and input constraints ihist(k). An AR model is used for representing ihist(k):
can be used to adjust the aggressiveness of the controller. The minimum
achievable value of the cost function J can be found by varying M, P, and (9.30)
a if the weight matrices and constraints are fixed to specific values. For
where A(q-l) is monic polynomial with ai,i. = 1, . . . , na and E( k:) is a
P = M (LQG benchmark), the largest value of P(= M) minimizes the
zero-mean, uncorrelated, Gaussian noise signal. Equation 9.30 is used to
cost function. However, 1\11 = 2 and P = 20 seems to be the optimum com-
estimate the value of k ihist:
(t) at. tl'Irl~e', (k) . The residuals are
bination for the parameter ranges under examination for the evaporator
control case study. The minimal value of J can be used as a benchmark.
A quantitative measure of the performance is given by ihist. Systemat-
(k) (9.31)
ic comparative studies may be computationally too intensive, especially if The AR model and the variance of ei(k) can be estimated from an 'in-
limits on control moves and weight matrices are considered. Therefore, one control' ~ata set using software such as Matlab® System Identification
might want to select 1\1 and P first and then continue to seek the bench- Toolbox l191]. A standard x-chart is designed using control limits at ±3
mark value by varying other parameters. The absolute optimum may be standard deviations (30' limits) to monitor the residuals ei (k) and conse-
. missed because of the interdependencies of parameters, but the trade-off is quently ihist (t).
244 Chapter 9. Controller Performance Monitoring 9.3. CPM for MPC 245

the action taken is known and the root cause of the effect does not need
Table 9.2. Groups of root cause problems.
to be identified by diagnosis tools (Subgroup Ia). Changes in measured
Group I Group II disturbances and input saturation make up subgroup lb. Additional in-
(a) change in controller specifications change in process dynamics formation is needed to distinguish between them. Input saturation can be
(b) change in measured disturbances change in unmeasured disturbance determined by looking at manipulated variable trajectories. A saturation
(b) input saturation change in noise covariance effect in a manipulated variable indicates input saturation as underlying
root cause and rules out the increase in measured disturbances.
Discrimination between performance degradation due to increases in
The model-based performance measure Ides is used in the proposed unmeasured disturbances and changes in process parameters is a question
method as model-based performance measure after modifying the cost func- of model validation. Consider an idealized case where disturbances can be
tions for on-line monitoring. Jdes(k) and Jach(k) are computed using Eq. regarded as white noise. If the model is perfect, the innovation sequence is
9.28 with e* and e, respectively. white noise as well [2]. Imperfect models change the color of the innovation
sequence that can be detected using various methods.
(9.32)

Statistical monitoring similar to that for Ihist (k) is developed to detect


significant changes over time.
Diagnosis
Ides is monitored for diagnosing the causes of performance degradation.
Some root causes affect the design case controller while others do not. For
instance, increases in unmeasured disturbances, actuator faults, or increase yes
in the model mismatch do not influence the design case performance. Ac-
cordingly, Jdes remains constant while Jach increases, reducing the model-
based performance measure. Root cause problems such as input saturation
or increase in measured disturbance, on the other hand, affect the design
case performance as well. This leads to an approximately constant value of -unmeasured
disturbance
the model-based performance measure, if the effect is quantitatively equal -measuremen
(which happens for a good process model). The three techniques intro- noise
duced can be classified according to the type of controller and the indexes -model
used for CPAjCPM and diagnosis activities (Table 9.1). mismatch
When degradation in performance is indicated, diagnosis can be per-
formed by inspecting Ides(k). Assuming that only one source cause occurs, Figure 9.1. Diagnosis logistics.
if Ides(k) has not changed significantly, the reason for the overall degrada-
tion does affect both the design and achieved performance cost function to If it is assumed that changes in controller specifications are done man-
the same extent. Thus, the cause belongs to Group I (Table 9.2). If the ually and do not need to be identified by the diagnosis tools, the sequence
model-based performance measure shows a degradation as well, the cause of detection and diagnosis follows the path in Figure 9.1. Performance is
belongs to group II. If multiple causes can occur simultaneously, then the monitored over time using the performance measure based on Ihist. Once
diagnosis logic becomes more complex. a degradation is detected, Ides is used to distinguish between root cause
Subgroups are defined to further distinguish between the root cause problems of Group I and Group II. Information about the trend of ma-
problems in Group 1. All changes in the controller (e.g., tuning parame- nipulated variables is used to distinguish between problems resulting from
ters, estimator, constraints) are assumed to be performed manually, since constraints and increases in measured disturbances.
246 Chapter 9. Controller Performance Monitoring 9.3. CPM for MPC 247

Example A case study illustrates the application of CPM and diagnosis 1.Sr-----,---,---.,-----,-,---r---,-,--_r---_----,

rJ\,rJ~~J!'v~l\
to MPC of a forced circulation evaporator using a detailed model [264].
First, a historical benchmark is found. Then, performance monitoring and
diagnosis are performed simultaneously for two different cases differing by
the use of linear and nonlinear plant models. The fundamental assump-
\\
O.S
tion of a known plant and disturbance model while assessing the initial \-----
performance is perfectly valid for the first case study and questionable for
the second. The impact of linearity assumption and other effects resulting 2S::c
o0';---S;;';;0-"----7.'10:;;-0------:;1S:::0--2=00:-0----=-
0
- : - -,.c-00'-----'ss-0- - - . J
3 400
discrete time !min}
from nonlinearity are shown and discussed [264]. A forced circulation evap-
orator model is used. It is a linear state-space model in deviation variables O.slcll=::;;;R=esI,idu=al;::;r---,--,------,----r--,---r----,
obtained from linearization around normal operating conditions [215]. The 0.2 - - 2.crlimits

system has three controlled variables (separator level (L 2 ), product com-


position (X 2 ), and operating pressure (P2 )); three manipulated variables
(product flow rate (F2 ), steam pressure (PlOO ) , and cooling water flow rate
(F200 )); and five disturbances (circulation flow rate (F3 ), feed flow rate (Fd,
feed composition (Xl), feed temperature (T1 ), and cooling water inlet tem- -02 0';---s;;;0--7.'10::-0---;1=SO--2=00:-0----=-2SL O--3,.c-00-----'3S--'0---.J
400
perature (T200 )). Two cases are summarized to display the performance of discrete time [min]

the integrated CPM and diagnosis method presented. Details are provided
elsewhere [264].
Figure 9.2. Effect of input saturation on Ihist.

Decrease of the Saturation Limit. The saturation limit of P lOO is set to


zero at k 300 min. Ihist indicates a performance degradation (Figure
9.2). A linear plant simulation model and a linear MPC model are used.
Because Ides does not decrease, the source cause of the degradation belongs
to Group 1. To distinguish between an increase in measured disturbances,
an increase in the measurement noise and an input saturation as the source
t'V"'A~~~MJ\J\IL'~I~
o0';--S;;;0--::10::-0---:-1S=0-r---:2=00:::------:::25=0----:3:-'-00,-------JL
35'-::-0--4-'--00--45---"---0- - soo
I
cause, the trend of the manipulated variables is observed (Figure 9.3). The 200 r-_,--_-.-_--r_---,_di.&::_r_el_e',im_":...-lmLin}:,-_--,_--,_---,_---,
effect of input saturation can be seen clearly between k = 300 min and
k = 350 min. After k 350 min the MPC being aware of this limit tries
to stay at the operation point by rearranging the use of the manipulated
variables. However, the input saturation is correctly identified to be the
root cause problem.

Real-time Diagnosis with GffI'Y - Increase in Measured Disturbance. G2® is


a commercial knowledge-based system (KBS) development tool for build-
ing real-time KBS [88]. It can be used for developing supervisory KBS for
building process models, monitoring and control systems, fault diagnosis
algorithms, on-line operator interaction, and integration of these functions.
Expert knowledge and reasoning can be represented by rules that can make
inferences based on process data. Procedures containing a certain sequence
of actions can be programmed, for instance, for automating checks on d- Figure 9.3. Effect of input saturation on the manipulated variables.
ifferent variables. Communication between G2® and external systems is
handled by the G2® Standard Interface (GSI) that provides the necessary
248 Chapter 9. Controller Performance Monitoring 9.4. Summary 249

ufacture of products within specifications. CPM and control system di-


agnosis activities are a subset of the plantwide monitoring and diagnosis
activities and they rely on the interpretation of process data. A compre-
hensive approach for assessing the effectiveness of control systems includes
the determination of control system capability, development of statistics
for monitoring its performance, and development of methods for diagnosing
the underlying causes of changes in performance. Many new techniques and
tools have been developed in recent years to enhance CPM. This chapter
focused on CPM of single-loop, multivariable and model predictive con-
trol (MPC) systems. Diagnosis was limited to distinguishing between root
cause problems associated with an MPC system and problems that are not
caused by the controller. Monitoring of MPC performance and a case study
based on MPC of an evaporator model and a supervisory knowledge-based
system (KES) is presented to illustrate the methodology. The extension of
CPM to web and sheet processes is discussed in Section 10.3.

Figure 9.4. Snapshot of G2® screen Increase in the measured disturbance.

network protocol information for communication between G2® and exter-


nal functions written in C. In this work, software modules developed in
Matlab® are converted to C with the Matlab® C compiler and linked to
G2® [289]. The increase in measured disturbance is implemented on F 3 at
k = 300 min. The disturbance data sequence of this variable is increased
by a factor 4. The increase in the measured disturbance causes performance
degradation as indicated by ihist. Since ides does not decrease, the actual
cause of degradation belongs to Group 1. Trends in manipulated variables
are observed to distinguish between the possible subgroups. A performance
degradation due to constraints is ruled out since the manipulated variables
are not saturated. The diagnosis logistics (Figure 9.1) are implemented as
a rule base in G2® to support the operator. Figure 9.4 shows the G2®
screens with the message box, the manipulated variable trajectories,and
the CPM measures. The result of inferencing by G2® and the diagnostic
results are displayed in the message box.

9.4 Summary
Controller performance monitoring (CPM) ensures proper performance of
the control systems for safe and profitable operation of a process and man-
10

Web and Sheet Processes

Sheet forming processes measure performance data mostly through scan-


ning sensors that traverse in the cross-direction (CD) as the sheet is formed
in the machine direction (MD), thus creating a zigzag pattern of discrete
data path. However, there are some rare applications that have the ca-
pability for full sheet measurement at each sampling time. Figure 10.1
shows the difference between the two and the resulting spatio-temporal
form of process data. Representing the scanner generated data as a two-
dimensional full matrix Y (n, k) is a practical approximation that greatly
simplifies the necessary calculations for process performance tracking and
evaluation. Nature of the process data Y(n, k) may be the thickness of
the sheet, its moisture, basis weight (mass/area), brightness or any other
pertinent measure of process performance or product value. The process
itself may involve manufacturing of metal sheets, glass, plastic film, fabrics
or pulp and paper sheets.

A unique characteristic of the process data for sheet forming processes is


the presence of two independent variables, space n and time k. In most cases
the target of the process is to maintain the uniformity of Y for all nand
k. For some applications a predefined constant CD profile y~~r\n) may
be the desired target. Therefore, the objective is to analyze the deviation
of Y from its target and extract meaningful information from the results
to be used for process control or performance evaluation and diagnostics.
While the space variable n is well defined between the front and back ends
of the cross-direction the time variable k is flexible in terms of its origin
and end. Most sheet forming processes are continuous and thus k may be
treated as an indefinite discrete variable. At the same time, for practical
reasons, all sheets are cut to finite lengths for packaging and transportation
or for post-processing. Therefore, the time index k may also be treated as
a finite length temporal variable.

251
228 Chapter 8. Sensor Failure Detection and Diagnosis 8.2. Real-Time Sensor FDD Using PCA-Based Techniques 229

400
Table 8.4. Source of identification for the sensors affected by the process
upset. Results are obtained after reconstruction of faulty sensors, and for 300
those points that belong to region II in Figure 8.15b. Reprinted from [62].
200
Copyright © 2001 with permission from Elsevier.
Sample No. Latent Space Residual Space 100

33 9, 18, 19 3,4, 5, 7, 8, 9, 11, 12, 13, t2- 0


14, 15, 16, 18
64 18 17, 18, 19 -100

92 9, 12 6, 7, 9, 10, 11, 17, 18, 19 -200


93 9, 12 6, 7, 9, 10, 11, 17, 18, 19
151 9, 11, 12 9, 15, 16, 17, 18, 19 -300

152 1, 2, 3, 5, 6, 7, 8, 20 17,18,19
155 ~

2, 3, 17, 18, 19 o 100 200 300 400


95~(' t1
156 8, 9, 11, 12, 13 7, 9, 10, 12, 13, 14, 15, 16, 8000 ,------=-~""-_~ __ ~-_~-_~~_
17, 18, 19
7000

6000
upset in these sensors, the scores plot after reconstructing the failed sen-
sors and the SP E /T 2 plot (Figure 8.15) consistently indicate the presence SPE
5000

4000
:
.....
..
of such a process upset. Moreover, similar, but less pronounced, behav-
ior was also observed at t = 10,28,120 - 121, 151 - 153, 157, 173, 207, 228- 3000

230,232 235,238 240,242,247,250,301- 303 and 320 - 321. While these 2000
instances also point to the presence of possible process upsets, it should al- 1000
so be recognized that the ones revealed by one or two uncorrelated sensors 95%
may be due to small changes in signal characteristics, such as noise. 20 40 60 80 100 120
An interesting realization is the fact that points that fall in region II of F
the SPE/T 2 (Figure 8.15) were not always caused by the same group of
sensors that were affected by the disturbance. Table 8.4 gives few instances Figure 8.15. Process status for the validation data set. Reprinted from [62].
when the system was undergoing the disturbances. As one can see, both Copyright © 2001 with permission from Elsevier.
latent and residual spaces are characterized rarely by the same group of
sensors. Furthermore, both the SP E (in region I) and the T 2 (in region III)
are capturing different type of disturbances. These findings suggest that
fault identification and isolation methods, which utilize the information
from the residual or latent space only, will not be able to reveal all the
disturbances.
The importance of reconstructing the faulty measurements plays a cru- lying reasons that led to a drift from the NOR were not due to the faulty
cial role in identifying the process upsets inherent in the system. ·Without sensors. This realization is a strong indication to present process upsets
reconstruction, these events might go undetected that eventually lead to masked by the failed sensors. It is worth mentioning that due to the mask-
false negative situations. Thereby, to remedy the masking effect of the ing problem, these disturbances were not correctly identified before in the
faulty measurements that inflate the T 2 and the S P E, reconstruction is literature. Therefore, it can be pointed out that the reported list of process
vital. As a particular aspect of this example, it was found that the under- upsets may be incomplete.
230 Chapter 8. Sensor Failure Detection and Diagnosis

8.3 Summary
Process sensors are a key element of monitoring strategies as they provide
a wealth of information about process status. However, they are also sub- 9
ject to various modes of failure, which can complicate the detection and
diagnosis of faults and catastrophic events. In this chapter, two sensor
auditing strategies were presented that can aid in the isolation of failed
sensors. Based on the concepts of PLS, CVSS and PCA, these sensor audit Controller Performance
strategies playa substantial role in discriminating between actual process
disturbances and sensor malfunctions, thus helping operators locate the
true root cause of process faults. The second method has also shown that
Monitoring
the malfunctioning sensors can be reconstructed using measurement infor-
mation from other sensors.

The objective of controller performance monitoring (CPM) is to develop


and implement technology that provides information to plant personnel for
determining if appropriate performance targets and response characteris-
tics are being met by the controlled process variables. Typical operating
targets include limits on deviation from the set-point, limits on manipu-
lated variable moves, variances of controlled and manipulated variables,
frequency of soft constraint violations and frequency of reaching hard con-
straints. These targets can be used as criteria for assessing controller per-
formance. Additional criteria are developed by considering the dynamic
response characteristics such as decay ratio, overshoot, response time and
response characteristics of the output error and the manipulated variable.
Several additional criteria are defined for multivariable systems including
the extent of dynamic interactions and loop shaping. Many of these crite-
ria may not be automated easily and various techniques that can compute
indexes indicating controller performance have been proposed.
The initial design of control systems includes many uncertainties caused
by inaccuracies in process models, estimations of disturbance dynamics and
magnitudes, and assumptions concerning the operating conditions [253].
The control algorithm and the tuning parameter values are chosen by using
this uncertain information, leading to process performance that can differ
significantly from the design specifications. Even if controllers perform well
initially, many factors can cause their abrupt or gradual performance deteri-
oration. Sensor or actuator failure, equipment fouling, feedstock variations,
product changes and seasonal variations may.affect controller performance.
It is reported that as many as 60% of all industrial controllers have some
kind of performance problem [105]. It is often difficult to effectively mon-
itor the performance and diagnose problems from trends in raw process
data [148]. These data show complicated response patterns caused by dis-

231
Chapter 9. Controller Performance Monitoring 9.1. Single-Loop Controller Performance Monitoring 233
232

turbances, noise, time-varying systems and nonlinearities. In addition, the [281]. A recent review [293] summarizes various advances in plantwide CPM
scarcity of engineers with control expertise to evaluate routinely the large for single-loop controllers and integrates CPM with detection of periodic
number of control loops in chemical processes makes the analysis of raw and nonperiodic oscillations in plant operation, valve stiction and root cause
data virtually unmanageable. These facts stress the necessity of efficien- of plant disturbances. Diagnostic tools for performance degradation in
t on-line techniques in controller performance monitoring and diagnosis. multivariable model-based control systems have been proposed [141]. Very
Development of on-line tools that can be automated and provide easy to few uses of KESs for CPM and diagnosis have been reported [125, 139,
interpret results by plant personnel are desirable. 264]. Review papers summarize various approaches for CPM of single-loop,
CPM ensures proper performance of the control systems to enable the multi-input-multi-output (MIMO), and MPC controllers [236, 238], and
process to operate as expected and manufacture products that meet their detection of valve stiction problems [123, 255]. CPM of MIMO processes by
specifications. CPM and control system diagnosis activities are a subset of using projections to subspaces [195, 196], and valve stiction by qualitative
the plantwide process monitoring and diagnosis activities. CPM and diag- shape analysis illustrate the diversity of techniques proposed for CPM and
nosis rely on the interpretation of data collected from the process. When diagnosis.
an abnormality is detected in process data, it is necessary to determine if An overview of single-loop CPM is presented in Section 9.1. Section
it is caused by a control system related cause as opposed to process equip- 9.2 surveys CPM tools for multivariable controllers. Monitoring of MPC
ment failure. The sequence of events and interactions can be more complex performance and a case study based on MPC of an evaporator model and
if for example an equipment failure triggers process variations that are a supervisory knowledge-based system (KES) is presented in Section 9.3
further amplified by the feedback of the control system. This chapter fo- to illustrate the methodology. The extension of CPM to web and sheet
cuses on CPM and diagnosis will be limited to determining if source causes processes is discussed in Section 10.3.
are associated with the controller. Controlled variables should meet their
operating targets such as specifications on output variability, effectiveness
in constraint enforcement, or closeness to optimal control. A comprehen-
9.1 Single-Loop Controller Performance Mon-
sive approach for assessing the effectiveness of control systems includes: (i) itoring
Determination of the capability of the control system; (ii) Development of
statistics for monitoring controller performance; (iii) Development of meth- An elegant CPM method based on minimum variance control (MVC) and
ods for diagnosing the underlying causes of changes in the performance of the variance of the controlled variable computed from routine process da-
the control system [105]. ta proposed by Harris [102] has initiated the recent interest in CPM. The
Performance criteria must be defined to determine the capability of a variance of a controlled variable is an important performance measure, s-
control system. A benchmark is established for assessment by using data ince many process and quality criteria are based on it. The theoretically
collected during some period of process operation with acceptable perfor- achievable absolute lower bound on the variability of the output can be an
mance. Once these are achieved, controller performance can be monitored appropriate benchmark to measure the performance of a regulatory control
over time to detect significant changes. Since control system inputs are ran- system. This benchmark is achieved by a system under MVC. Using MVC
dom variables, the outputs of the performance measure will be stochastic as performance benchmark, one can assess the performance of a control loop
as well. Therefore. statistical analysis tools should be used to detect sta- and make statements on the potential of improvements resulting from re-
tistically significant changes in controller performance. When performance tuning of controller parameters or implementing more sophisticated linear
degradation is detected, the underlying root causes have to be identified. feedback controllers [53]. A good performance relative to MVC indicates
Methods for isolating problems associated with the controller from those that further tuning or re-design of the control algorithm is neither necessary
arising from the process would be very useful. This chapter focuses on CPM nor helpful. In this case, further reduction of process variability can only
of single loop, multivariable and model predictive control (MPC) systems. be obtained by implementation of feedforward control or re-engineering of
Diagnosis is illustrated for MPC and is limited to distinguishing between the process. A poor performance might result from constraints such as un-
root cause problems associated with the controller and problems that are stable or poorly damped zeros of the process transfer functions or control
not caused by the controller [264]. action limits and indicates the necessity of further analysis such as process
Integration of CPM with diagnosis was reported for single-loop cases identification and controller re-design [115].
234 Chapter 9. Controller Performance Monitoring 9.1. Single-Loop Controller Performance Monitoring 235

Various performance indices have been suggested [54, 53, 149, 20, 148] The dynamic response of e(k) can be expressed as an autoregressive moving
and several approaches have been proposed for estimating the performance average (ARMA) model or a moving average (MA) time series model:
index for 8180 systems, including the normalized performance index ap-
proach [53], the three estimator approach [175], and the filtering and cor-
relation analysis (FCOR) approach [115]. A model free approach for linear
quadratic CPM from closed-loop experiments that uses spectrum analysis
of the input and output data has been suggested [136]. Implementation where. a( k) is a random noise sequence with variance 1J2 and 1/Ji are the
of 8180 loop based CPM tools for refinery-wide control loop performance coeffiCIents of the MA model or the impulse weights. Harris and his co-
assessment has been reported [294]. ",:orkers [53, 102] have noted that the variance of the closed-loop output is
The most popular tool for monitoring single-loop feedback and feedfor- gIven by
ward/feedback controllers is based on relative performance with respect to 2 _
lJ e -
[1 + 01,2 ,2 2 2
'1-'1 + 1fJ2 + '" + 1/Jf + .. .]lJ a (9.6)
minimum variance control (MVC) [53, 102]. The idea is not to implement
MVC but to use the variance of the controlled output variable that would The output error variance for MVC becomes
be obtained if MVC were used as the reference point. The variation of the
1J2
mv -
-(1+ 1fJl
12 + 12
1fJ2 + '" + 0;,2)2
'l-'f lJ a (9.7)
inflation of the controlled output variance indicates if the process is operat-
ing as expected or not. Furthermore, if the variance with a MVC is larger
where f de~ot:s the number of time intervals equivalent to the process time
than what could be tolerated, this indicates the need for modification of delay. Harns [a3] defines a performance index
operating conditions or process.
Following the MVC framework [102, 148], consider a process described
by a linear discrete- time transfer function model: (9.8)

(9.1)
Th~ index fI(f) gives the ratio of the variance in excess of that could be
achIeved under MVC to the actual variance. If fI(f) is close to 0 the con-
where y(k) is the output, u(k) is the input, di(k) is the ith measured dis- troll~r p.erforms closely to the performance of MVC, and fI(f) values closer
turbance, and v(k) represents the additive effect of noise and unmeasured to 1 mdlcate poor controller performance.
disturbances at the output. The argument (k) represents discrete time in- Kozub and Garcia [149] point out that in many practical cases ratin<Yof
stants. P(q-l) and Di(q-l) are stable polynomials corresponding to the output error characteristics relative to MVC is not practical or a~hievable.
transfer functions between the output and the manipulated input or mea- They propose autocorrelation patterns for first-order exponential output
sured disturbance i, respectively. The manipulated input is computed by error decay trend:
the controller
e(k) = 1 a(k) T
1 - Aq-l with A = exp (--) (9.9)
(9.2) T

where T is the sampling interval and T is the first-order response time


where C(q-l) and Cf,i(q-l) are the feedback and feedforward controller constant. The autocorrelation pattern is given by
transfer functions. The output deviation (error) from the set-point r (k) is
(9.10)
e(k) = r(k) y(k) (9.3)
which can be compared to the autocorrelation pattern of the error e(k).
By using Eqs. 9.1 and 9.2, the error e(k) can be written as They define a closed-loop potential (C LP) factor defined as

e(k) = r(k) - Li(Di(q-l) + P(q-l)Cf,i)di(k) - v(k)


(9.4) CLP
1 + P(q-l )C(q-l) (9.11)
Chapter 9. Controller Performance Monitoring 9.2. Multivariable Controller Performance Monitoring 237
236

For the closed-loop performance bound given in Eq. 9.9, the variance of 9.2 Multivariable Controller Performance Mon-
the output error is itoring
2 1 2 (9.12)
(Je = 1- ,\2(Ja CPM of multivariable control systems has attracted significant attention
because of its industrial importance. Several methods have been proposed
for performance assessment of multivariable control systems. One approach
which yields a bound limit for the CLP by noting that (J~v = (J~ if f = 0: is based on the extension of minimum variance control performance bounds
to multivariable control systems by computing the interactor matrix to
CLP=1-,\2 (9.13) estimate the time delay [103, 116]. The interactor matrix [103, 116] can be
obtained theoretically from the transfer function via the Markov parameters
or estimated from process data [114]. Once the interactor matrix is known,
These indexes can be extended to consider the variance ratios of the k-step- the multivariate extension of the performance bounds can be established.
ahead forecast error to the variance of e(k). A performance index similar For example, Harris and co-workers [103] propose
to CLP, CLPk is defined as [148]:
E[Y~VWYMV]
1]=1 (9.15)
(9.14) E[yrWYt]

where W is a positive-definite weighting matrix, Y is the vector of out-


puts and E[.] denotes expectation. As an extension of this approach, a
Other enhancements for indexes that originate from the same concepts have filtered optimal H 2 control law with desired closed-loop dynamics has been
been proposed [20, 110, 248] and applications to refinery control loops have proposed [114]. Alternatively, multivariate MVC performance might be es-
been reported [294]. Lynch and Dumont [175] have presented a methodolo- timated via multivariate time series analysis [105]. A pass/fail likelihood
gy based on Laguerre networks to model the c.losed-loop s~stem for compu~­ ratio test was proposed to determine if performance specifications like set-
in er the minimum achievable variance, an on-lIne delay estImator, and stat~c tling time, decay ratio, minimum variance, or frequency-domain bounds
in~ut-output estimator for assessing process nonlinearity. Likelihood ratlO are met [300]. Huang and Shah [115] proposed as benchmark user-specified
tests have been proposed to determine if the output error response char- closed-loop dynamics, like settling time or overshoot. Covariance-based
acteristics are acceptable based on specified dynamic performance bounds performance indexes and a user-defined benchmark have been presented by
[300]. But Kozub [148] warns that this approach is conceptually and :om- Qin and co-workers [195, 196, 238].
putationally too demanding compared to other metho.ds ~nd that ~elIance Another group of approaches focuses on model-based control systems.
on only settling-time specification to construct the lIkelIhood ratlO tests The ratio of the desired and achieved controller objective functions, set-
[300] may be misleading. tling time, and constraint violations based criteria have been proposed for
Time series models of the output error such as Eq. 9.5 can be used to a Dynamic Matrix Control (DMC) type model predictive controller [223].
identify the dynamic response characteristics of e(k) [148]. D~namic re- Diagnosis tools for source causes of poor controller performance have also
sponse characteristics such as overshoot, settling time and cyclmg can be been suggested. A different group of tools for detecting and diagnosing
extracted from the pulse response of the fitted time series model. The pulse controller performance problems have been suggested by using multivari-
response of the estimated e(k) can be compared to the pulse response of ate statistical tests on the prediction error for detection and casting the
the desired response specification to determine if the output error charac- diagnosis problem as a state estimation problem [141].
teristics are acceptable [148]. The third class of techniques include a frequency-domain method based
Cross correlation analysis is proposed for assessing the dynamic sig- on the identification of the sensitivity j1Lnction (S (s)) and the complemen-
nificance of measured disturbances and set-point changes with respect .to tary sensitivity function (T(s)) from plant data or CPM of multivariable
closed-loop error response, and testing the existence of plant-model mIS- systems [140]. Robust control system design methods seek to maximize
match for models used in controller design [281]. closed-loop performance subject to specifications for bandwidth and peak
238 Chapter 9. Controller Performance Monitoring 9.3. CPM for MPC 239

magnitude of 8(s) and T(s). Estimates of these transfer functions can be outputs, and change in manipulated variables at time k, respectively. Q
obtained by exciting the reference input with a zero-mean, pseudo-random and R are weighting matrices representing the relative importance of each
binary sequence, observing the process output and error response, and de- controlled and manipulated variable. Control moves at each sampling time
veloping a closed-loop model. Performance assessment is based on the are obtained by calculating a control sequence that minimizes <'I>(k). There-
comparison between the observed frequency response characteristics and fore, it is reasonable to measure MPC performance by calculating values
the design specifications. Selection of appropriate model structures, exper- of <'I>(k) using plant data. A performance measure based on <'I>(k) can be
imental design and model validation which will ensure reasonable estimates defined as
of 8(s) and T(s) are discussed in [140]. The method has been automat- Jactual(k) = eT(k)Qe(k) + 6u T (k)R6u(k) (9.17)
ed and embedded in a real-time knowledge-based system for supervisory
where e(k) = y(k) r(k) is the vector of controlled variable errors and
multivariable control [139]. Since the technique is intrusive, it should be
6u(k) is the vector of control moves at time k. <'I>(k) is a random variable
used after one of the nonintrusive techniques discussed earlier indicates a
because of measurement noise and disturbances. Consequently, the expect-
controller performance problem. Because the procedure checks controller
ed value of the cost function is more suitable for measuring the controller
performance against design criteria, controller design and tuning via loop performance achieved:
shaping techniques provide an automated controller modification opportu-
nity for maximizing performance. (9.18)
Here E[.] is the expectation operator and e(k) and 6u(k) are comput-
9.3 CPM for MPC ed from the data set under examination, The LQG benchmark [115], the
hzsto'rlcal performance benchmark [222], and the model-based performance
CPM for model predictive control (MPC) systems has been studied in re- benchmark [222, 347] are some of the methods that have been proposed in
cent years. The availability of a model for MPC offers new alternatives for the literature for CPM of MPC.
CPM of MPCs in contrast to multivariable control CPM that is usually
data-driven, relying only on routinely collected process data. This section LQG-Benchmark The achievable performance of a linear system charac-
starts with a summary of some CPM techniques proposed in the litera- terized by quadratic costs and Gaussian noise can be estimated by solvinO'
ture. These techniques are extended and integrated to a comprehensive the linear quadratic Gaussian (LQG) problem. The solution can be plotted
MPC performance assessment and monitoring methodology and diagnosis as a trade-off curve that displays the minimal achievable variance of the
of types of causes for poor process performance [264]. Use of real-time KB- co~trol!ed variable versus the variance of the manipulated variable [115]
Ss for integrating CPM and diagnosis is also presented. Integration of CPM whIch IS used as a CPM benchmark. Operation close to optimal perfor-
and diagnosis is illustrated by using an evaporator control case study. MPC mance is indicated by an operating point near this trade-off curve. For
calculations in this work are performed using a slightly modified version of mUlti~ariable control systems, H 2 norms are plotted. The LQG objective
the Matlab® MPC Toolbox [204] to allow for nonlinear plant models and functIon and the corresponding H 2 norms are [115]
a stepwise calculations necessary for on-line monitoring.
Model predictive control is based on real-time optimization of a cost <PLQC(A) = E[e(kfQe(k)] + AE[6u(k)TR6u(k)] (9.19)
function. Consequently, CPM methods that focus on the values of this cost
IIGyll~ = E[e(kfQe(k)] IIGullh = E[6u(kfR6u(k)] (9.20)
function can be developed. The MPC cost function <'I>(k) is
The trade-off curve is obtained by calculating the H 2 norms for different
p
values of A and plotting II Gy II ~ versus II G u I h. Once the trade-off curve is
<'I>(k) L [y(k + j) r(k + j)]TQ[y(k + j) - r(k + j)] calculated, the H 2 norms under the existing control system are computed
j=N ,
and compared to the optimal control represented by the trade-off curve,
IV!
The LQG benchmark is limited to a special group of MPCs character-
+ L[6u(k + ,j - 1)]TR[6u(k + j - 1)] (9.16) ized by the equality of control (M) and prediction (P) horizons and lack of
,j=]
feedforward components and constraints. It may be considered as a limit of
where r(k), y(k), and 6u(k) are vectors of reference trajectories, predicted achievable performance in terms of input and output variance to evaluate
240 Chapter 9. Controller Performance Monitoring 9.3. CPM for MPC 241

various types of controllers. Since)\1 and P are two independent and impor- The actual performance is defined as
tant tuning parameters and incorporation of constraints and feedforward
Pc
control are important advantages of MPC over conventional controllers, al-
ternatives to the LQG benchmark have been developed for monitoring the Jact(k) = :LeT(k + j - Pc)Qe(k +j - Pc) (9.24)
j=l
performance of these more interesting MPC implementations.
Historical Benchmark A priori knowledge that the performance was The expected performance uses Eq. 9.24 as well, after replacing e with e*.
good during a certain time period is necessary to use this approach [222]. The ratios ides and 1MPc are very similar. In general, they are smaller
For the block of input and output data of this period, the historical bench- than 1 due to imperfect models, sensor noise, or other uncertainties.
mark Aist is given by an equation of the same form as Eq. 9.18 where e(k) hvI pc is a stochastic variable and statistically significant changes in the
and L:.u(k) are taken from the historical data set. The objective function controller performance can be detected by statistical analysis. 1M pc is
for the performance achieved (Jach) is calculated by using again Eq. 9.18 assumed to be generated by an ARMA model
where e(k) and L:.u(k) are taken from data collected during the period of
interest. The performance measure is defined as the ratio (9.25 )

Jhist where C(q-1) and A(q-1) are monic polynomials and z(k) is a zero-mean,
ihist = - J (9.21)
ach uncorrelated, Gaussian noise signal [347]. Polynomials A and C and the
variance of z can be estimated from a sequence of h;J pc values computed
Model-based Performance Measure Two alternatives that rely on a by using data collected in a time interval in which the controller performs as
process model, the design case and the expected performance, have been expected. 1M PC is highly serially correlated and the AR part is first-order
proposed: [347]:
(9.26)
Design Case Approach. Patwardhan et al. [222] have suggested the com-
parison of the achieved performance with the performance in the design Defining
case that is characterized by inputs and outputs given by the model. The A(q-1 )
design cost function ,hes has the same form as Eq. 9.18 where e*(k) and L:.1MPc(k) == A IMPc(k) (9.27)
C(q-1 )
L:.u(k)* are substituted for e(k) and L:.u(k) to indicate the predicted devia-
tions of model outputs from the set-points (an estimate of the disturbance where C(q-1) and A(q-1) are estimated polynomials, the estimated noise
is included) and the optimal control moves, respectively. Jach is the same variance is used to compute 95% confidence intervals on L:.1MPC(k) [347].
as that in historical benchmark Eq. 9.18 and is calculated using plant da- Violation of these control limits indicates a statistically significant change
ta. Performance variation between the real plant (Jach) and model (Jdes) in controller performance. According to Eqs. 9.26 and 9.27, L:.{rvl pc(k) is
is expressed by a prediction residual and should have a Normal distribution. Prediction
Jdes residuals are used to monitor variations in autocorrelated random variables
ides = - - (9.22)
Jach using well-established SPM charts.
Expected Performance Approach. Zhang and Henson [347] have proposed an A Comprehensive Technique for MPC Performance Monitoring
on-line comparison between expected and actual process performance. The The essential step in the LQG benchmark is the calculation of various
expected performance is obtained by implementing controller actions on the control laws for different values of A and prediction (P) and control (1\1)
process model. The expected performance incorporates estimates of state horizons (P M). This is a case study for a special type of MPC (un-
noise, but no output disturbances. The actual and expected performance constrained, no feedforward) and a special parameter set (M = P) to find
are compared on-line over a moving horizon Pc of past data using the ratio the optimal value of the cost function and an optimal controller parameter
[347]: set. Using the same information (plant and disturbance model, covariance
1 .(k) = Jexp(k) Inatrices of noise and disturbances), studies can be conducted for any type
(9.23) of MPC and the influence of any parameter can be examined. These studies
MPC Jaet(k)
242 Chapter 9. Controller Performance Monitoring 9.3. CPM for MPC 243

Table 9.1. Categorization of techniques to be used (ff - feedforward). significant reduction in the computational burden.
For on-line monitoring, ihist is computed at each sampling time. In
Controller Specification Assessment Monitoring Diagnosis analogy to the calculation of J act [347], the achieved cost function (Jach) is
unconstrained, no ff LQG ~/hist (k) ides (k) calculated over a moving horizon Pc of past data
unconstrained, ff comparative study ihist(k) ides (k)
constrained, no ff comparative study ihist(k) ides (k) 1 [PC
constrained, ff comparative study ihist (k) ides (k) R 2:)e T (k +j - Pc)Qe(k +j - Pc)) (9.28)
C j=l

can be automated and the corresponding value of the cost function can be +b.uT(k + j - Pc)Rb.u(k +j - Pc)]
reported as function of the underlying parameter set [264].
A value of the cost function suitable to be the historical benchmark where e(k) is the vector of control errors at time k. The performance
and a design case that performs acceptably is selected. Two performance measure ihist (k) at sampling time k is
measures for on-line monitoring are defined after a benchmark is obtained.
ihist (k) is extended for computation at each sampling time to determine k) Jhi8t
controller performance. ides (k) is extended for computation at each sam- ihist ( " = Jach(k) (9.29)
pling time to assist in diagnosis of types of causes for poor performance.
CPM is implemented by using the LQG benchmark or a benchmark ob- Since ~/hist is a random variable, SPM tools can be used to detect statistical-
tained from case studies and ihist (k). 'When the controller performance is ly significant changes. ihist (k) is highly autocorrelated. Use of traditional
declared poor, ides(k) is used to make diagnostic decisions. SPM charts for autocorrelated variables may yield erroneous results. An
Tools for controller performance assessment (CPA), CPM, and diag- alternative SPM method for autocorrelated data is based on the develop-
nosis are available for four types of MPCs by obtaining benchmarks for ment of a time series model, generation of the residuals between the values
constrained cases and controllers including feedforward components, and predicted by the model and the measured values, and monitoring of the
establishing statistical analysis to the historical and model-based perfor- residuals [1]. The residuals should be approximately normally and indepen-
mance measures ihist(k) and ides(k) (Table 9.1). dently distributed with zero-mean and constant-variance if the time series
The tuning parameters of MPC include P, 1\11, and a that determines model ~rov~des an accurate description of process behavior. Therefore, pop-
the desired speed of approach to the set-point by using a relationship be- ular ul1l~anate SPM char.ts (such as x-chart, CUSUM, and EWMA charts)
tween the set-points and the reference trajectory r(k + I) = aSp(k + I - are apphcable to the resIduals. Residuals-based SPM is used to monitor
1) + (1 a)sp(k + I). In addition, weight matrices and input constraints ihist(k). An AR model is used for representing ihist(k):
can be used to adjust the aggressiveness of the controller. The minimum
achievable value of the cost function J can be found by varying M, P, and (9.30)
a if the weight matrices and constraints are fixed to specific values. For
where A(q-l) is monic polynomial with ai,i. = 1, . . . , na and E( k:) is a
P = M (LQG benchmark), the largest value of P(= M) minimizes the
zero-mean, uncorrelated, Gaussian noise signal. Equation 9.30 is used to
cost function. However, 1\11 = 2 and P = 20 seems to be the optimum com-
estimate the value of k ihist:
(t) at. tl'Irl~e', (k) . The residuals are
bination for the parameter ranges under examination for the evaporator
control case study. The minimal value of J can be used as a benchmark.
A quantitative measure of the performance is given by ihist. Systemat-
(k) (9.31)
ic comparative studies may be computationally too intensive, especially if The AR model and the variance of ei(k) can be estimated from an 'in-
limits on control moves and weight matrices are considered. Therefore, one control' ~ata set using software such as Matlab® System Identification
might want to select 1\1 and P first and then continue to seek the bench- Toolbox l191]. A standard x-chart is designed using control limits at ±3
mark value by varying other parameters. The absolute optimum may be standard deviations (30' limits) to monitor the residuals ei (k) and conse-
. missed because of the interdependencies of parameters, but the trade-off is quently ihist (t).
244 Chapter 9. Controller Performance Monitoring 9.3. CPM for MPC 245

the action taken is known and the root cause of the effect does not need
Table 9.2. Groups of root cause problems.
to be identified by diagnosis tools (Subgroup Ia). Changes in measured
Group I Group II disturbances and input saturation make up subgroup lb. Additional in-
(a) change in controller specifications change in process dynamics formation is needed to distinguish between them. Input saturation can be
(b) change in measured disturbances change in unmeasured disturbance determined by looking at manipulated variable trajectories. A saturation
(b) input saturation change in noise covariance effect in a manipulated variable indicates input saturation as underlying
root cause and rules out the increase in measured disturbances.
Discrimination between performance degradation due to increases in
The model-based performance measure Ides is used in the proposed unmeasured disturbances and changes in process parameters is a question
method as model-based performance measure after modifying the cost func- of model validation. Consider an idealized case where disturbances can be
tions for on-line monitoring. Jdes(k) and Jach(k) are computed using Eq. regarded as white noise. If the model is perfect, the innovation sequence is
9.28 with e* and e, respectively. white noise as well [2]. Imperfect models change the color of the innovation
sequence that can be detected using various methods.
(9.32)

Statistical monitoring similar to that for Ihist (k) is developed to detect


significant changes over time.
Diagnosis
Ides is monitored for diagnosing the causes of performance degradation.
Some root causes affect the design case controller while others do not. For
instance, increases in unmeasured disturbances, actuator faults, or increase yes
in the model mismatch do not influence the design case performance. Ac-
cordingly, Jdes remains constant while Jach increases, reducing the model-
based performance measure. Root cause problems such as input saturation
or increase in measured disturbance, on the other hand, affect the design
case performance as well. This leads to an approximately constant value of -unmeasured
disturbance
the model-based performance measure, if the effect is quantitatively equal -measuremen
(which happens for a good process model). The three techniques intro- noise
duced can be classified according to the type of controller and the indexes -model
used for CPAjCPM and diagnosis activities (Table 9.1). mismatch
When degradation in performance is indicated, diagnosis can be per-
formed by inspecting Ides(k). Assuming that only one source cause occurs, Figure 9.1. Diagnosis logistics.
if Ides(k) has not changed significantly, the reason for the overall degrada-
tion does affect both the design and achieved performance cost function to If it is assumed that changes in controller specifications are done man-
the same extent. Thus, the cause belongs to Group I (Table 9.2). If the ually and do not need to be identified by the diagnosis tools, the sequence
model-based performance measure shows a degradation as well, the cause of detection and diagnosis follows the path in Figure 9.1. Performance is
belongs to group II. If multiple causes can occur simultaneously, then the monitored over time using the performance measure based on Ihist. Once
diagnosis logic becomes more complex. a degradation is detected, Ides is used to distinguish between root cause
Subgroups are defined to further distinguish between the root cause problems of Group I and Group II. Information about the trend of ma-
problems in Group 1. All changes in the controller (e.g., tuning parame- nipulated variables is used to distinguish between problems resulting from
ters, estimator, constraints) are assumed to be performed manually, since constraints and increases in measured disturbances.
246 Chapter 9. Controller Performance Monitoring 9.3. CPM for MPC 247

Example A case study illustrates the application of CPM and diagnosis 1.Sr-----,---,---.,-----,-,---r---,-,--_r---_----,

rJ\,rJ~~J!'v~l\
to MPC of a forced circulation evaporator using a detailed model [264].
First, a historical benchmark is found. Then, performance monitoring and
diagnosis are performed simultaneously for two different cases differing by
the use of linear and nonlinear plant models. The fundamental assump-
\\
O.S
tion of a known plant and disturbance model while assessing the initial \-----
performance is perfectly valid for the first case study and questionable for
the second. The impact of linearity assumption and other effects resulting 2S::c
o0';---S;;';;0-"----7.'10:;;-0------:;1S:::0--2=00:-0----=-
0
- : - -,.c-00'-----'ss-0- - - . J
3 400
discrete time !min}
from nonlinearity are shown and discussed [264]. A forced circulation evap-
orator model is used. It is a linear state-space model in deviation variables O.slcll=::;;;R=esI,idu=al;::;r---,--,------,----r--,---r----,
obtained from linearization around normal operating conditions [215]. The 0.2 - - 2.crlimits

system has three controlled variables (separator level (L 2 ), product com-


position (X 2 ), and operating pressure (P2 )); three manipulated variables
(product flow rate (F2 ), steam pressure (PlOO ) , and cooling water flow rate
(F200 )); and five disturbances (circulation flow rate (F3 ), feed flow rate (Fd,
feed composition (Xl), feed temperature (T1 ), and cooling water inlet tem- -02 0';---s;;;0--7.'10::-0---;1=SO--2=00:-0----=-2SL O--3,.c-00-----'3S--'0---.J
400
perature (T200 )). Two cases are summarized to display the performance of discrete time [min]

the integrated CPM and diagnosis method presented. Details are provided
elsewhere [264].
Figure 9.2. Effect of input saturation on Ihist.

Decrease of the Saturation Limit. The saturation limit of P lOO is set to


zero at k 300 min. Ihist indicates a performance degradation (Figure
9.2). A linear plant simulation model and a linear MPC model are used.
Because Ides does not decrease, the source cause of the degradation belongs
to Group 1. To distinguish between an increase in measured disturbances,
an increase in the measurement noise and an input saturation as the source
t'V"'A~~~MJ\J\IL'~I~
o0';--S;;;0--::10::-0---:-1S=0-r---:2=00:::------:::25=0----:3:-'-00,-------JL
35'-::-0--4-'--00--45---"---0- - soo
I
cause, the trend of the manipulated variables is observed (Figure 9.3). The 200 r-_,--_-.-_--r_---,_di.&::_r_el_e',im_":...-lmLin}:,-_--,_--,_---,_---,
effect of input saturation can be seen clearly between k = 300 min and
k = 350 min. After k 350 min the MPC being aware of this limit tries
to stay at the operation point by rearranging the use of the manipulated
variables. However, the input saturation is correctly identified to be the
root cause problem.

Real-time Diagnosis with GffI'Y - Increase in Measured Disturbance. G2® is


a commercial knowledge-based system (KBS) development tool for build-
ing real-time KBS [88]. It can be used for developing supervisory KBS for
building process models, monitoring and control systems, fault diagnosis
algorithms, on-line operator interaction, and integration of these functions.
Expert knowledge and reasoning can be represented by rules that can make
inferences based on process data. Procedures containing a certain sequence
of actions can be programmed, for instance, for automating checks on d- Figure 9.3. Effect of input saturation on the manipulated variables.
ifferent variables. Communication between G2® and external systems is
handled by the G2® Standard Interface (GSI) that provides the necessary
248 Chapter 9. Controller Performance Monitoring 9.4. Summary 249

ufacture of products within specifications. CPM and control system di-


agnosis activities are a subset of the plantwide monitoring and diagnosis
activities and they rely on the interpretation of process data. A compre-
hensive approach for assessing the effectiveness of control systems includes
the determination of control system capability, development of statistics
for monitoring its performance, and development of methods for diagnosing
the underlying causes of changes in performance. Many new techniques and
tools have been developed in recent years to enhance CPM. This chapter
focused on CPM of single-loop, multivariable and model predictive con-
trol (MPC) systems. Diagnosis was limited to distinguishing between root
cause problems associated with an MPC system and problems that are not
caused by the controller. Monitoring of MPC performance and a case study
based on MPC of an evaporator model and a supervisory knowledge-based
system (KES) is presented to illustrate the methodology. The extension of
CPM to web and sheet processes is discussed in Section 10.3.

Figure 9.4. Snapshot of G2® screen Increase in the measured disturbance.

network protocol information for communication between G2® and exter-


nal functions written in C. In this work, software modules developed in
Matlab® are converted to C with the Matlab® C compiler and linked to
G2® [289]. The increase in measured disturbance is implemented on F 3 at
k = 300 min. The disturbance data sequence of this variable is increased
by a factor 4. The increase in the measured disturbance causes performance
degradation as indicated by ihist. Since ides does not decrease, the actual
cause of degradation belongs to Group 1. Trends in manipulated variables
are observed to distinguish between the possible subgroups. A performance
degradation due to constraints is ruled out since the manipulated variables
are not saturated. The diagnosis logistics (Figure 9.1) are implemented as
a rule base in G2® to support the operator. Figure 9.4 shows the G2®
screens with the message box, the manipulated variable trajectories,and
the CPM measures. The result of inferencing by G2® and the diagnostic
results are displayed in the message box.

9.4 Summary
Controller performance monitoring (CPM) ensures proper performance of
the control systems for safe and profitable operation of a process and man-
10

Web and Sheet Processes

Sheet forming processes measure performance data mostly through scan-


ning sensors that traverse in the cross-direction (CD) as the sheet is formed
in the machine direction (MD), thus creating a zigzag pattern of discrete
data path. However, there are some rare applications that have the ca-
pability for full sheet measurement at each sampling time. Figure 10.1
shows the difference between the two and the resulting spatio-temporal
form of process data. Representing the scanner generated data as a two-
dimensional full matrix Y (n, k) is a practical approximation that greatly
simplifies the necessary calculations for process performance tracking and
evaluation. Nature of the process data Y(n, k) may be the thickness of
the sheet, its moisture, basis weight (mass/area), brightness or any other
pertinent measure of process performance or product value. The process
itself may involve manufacturing of metal sheets, glass, plastic film, fabrics
or pulp and paper sheets.

A unique characteristic of the process data for sheet forming processes is


the presence of two independent variables, space n and time k. In most cases
the target of the process is to maintain the uniformity of Y for all nand
k. For some applications a predefined constant CD profile y~~r\n) may
be the desired target. Therefore, the objective is to analyze the deviation
of Y from its target and extract meaningful information from the results
to be used for process control or performance evaluation and diagnostics.
While the space variable n is well defined between the front and back ends
of the cross-direction the time variable k is flexible in terms of its origin
and end. Most sheet forming processes are continuous and thus k may be
treated as an indefinite discrete variable. At the same time, for practical
reasons, all sheets are cut to finite lengths for packaging and transportation
or for post-processing. Therefore, the time index k may also be treated as
a finite length temporal variable.

251
252 Chapter 10. Web and Sheet Processes 10.1. Traditional Data Analysis 253

~ •

~=--------7
~ ~ .... I~
MD

. • . --•-.--
y • •
: "". ""'"
• ,: •
•••• "."'"
;.
,+: .," +
Scanning sensor Full sheet sensor
~~ -------'!!?-- -------L---L_;~__.. __~ _
n=1 _ ~~ .. : +;'.
............
• .......... •
+ •
.. ,.
~;." ~\.....,. (y-y ) =f(X-XJ
'..t_ -" +1 . m/ n
.." :,
, ,
n=N_1I
k=1 k=K
,,, x
Figure 10.1. Sheet process data measurement.

Figure 10.2. General approach to univariate function analysis.


10.1 Traditional Data Analysis
In terms of orientation to appreciate the traditional basics of sheet forming resentation of a typical data set for a roll of sheet showing measurements
process data it is useful to briefly review the conventional steps of data at 54 cross-direction (CD) locations for 110 machine-direction (MD) scans,
analysis for a univariate system. Consider the set of hypothetical data N = 54 and K = 110. Typically total CD distance may be about 5 m and
shown in Figure 10.2. There are 40 data points with X as the independent the MD length may correspond to approximately 30 min of production. CD
variable and Y as the dependent variable. The means are X m and Ym re- data length increments usually match the size of CD adjustment actuators
spectively. Clearly there is a trend in Y as a function of X. Let the function while each data is the averaged scanner information for the correspondinG"
(Y - Ym ) = f (X - X m) represent the trend in an optimal manner and be the distance that is also known as the data lane. For a typical applicatio~
correlation model for the segment of observed data. The residual of data additional sheet properties like thickness and moisture are measured simul-
when compared to the model is a measure of scatter or variability around taneously. For the basis weight measurements in this example there are a
the dominant trend. Close observation of the residual indicates that the total of 54x 110 = 5940 data points represented in matrix form Y(n, k). For
variability in the range X < X m is smaller than the variability in the range practical reasons, the original numbers of the data are turned into normal-
X> X m . This simple yet useful approach has three basic components: (1) ized deviation variables by subtracting the overall mean and then dividing
establishing and removing the data means, (2) establishing the correlated by the standard deviation. For a typical paper roll the overall mean is
trend in dependent variable, (3) analyzing the scatter around the correla- very close to the target basis weight, so there is not much information loss
tion model as a function of the independent variable. Complete evaluation by the use of deviation variables. However, the product value for the roll
of data and quantifying its pertinence for process improvement may require is easily degraded by the two-dimensional variability of Y( n, k) regardless
further effort which may be extensive depending on the complexity of the of how close the target is satisfied by the overall average. Figure 10.4 is
problem. However, in general these are the three basic steps for initial the histogram of the total data set showing typical similarity to a normal
analysis of process data that directly apply to sheet forming processes as distribution. It is important to recognize that the histogram captures only
well. the collective variability of the individual data points without any regard
to tr~nds and correlations that might exist with respect to specific (n, k)
locatIOns on the sheet.
10.1.1 MD/CD Decomposition
MD/CD decomposition separates the two-dimensional means one at a
As an example consider the basis weight measurement in paper manufac- time from the data matrix Y(n, k) that has N rows and K columns. First,
turing where the uniformity of sheet density, tracked as gm/m 2 , must be the averages of all spatial locations for each scan are computed to get the
closely maintained at a fixed target. Figure 10.3 is a three-dimensional rep- MD trend as YMD(k). Then, the CD profile is computed by subtract-
254 Chapter 10. Web and Sheet Processes 10.1. Traditional Data Analysis 255

in the N xK dimensions. Data residual is then defined as


Y R(n, k) = Y(n, k) - YMD(n, k) - Y cD(n, k) (10.1)
or simply Y R = Y - YMD - YCD·

mean;::. a std::; 0.198 var" 0.0392

110

n
k

Figure 10.3. Normalized basis weight data for a roll of paper sheet. Figure 10.5. Vector and matrix forms of MD trend.

mean::: 0
std = 1
var:;;:; 1

Figure 10.6. Vector and matrix forms of CD profile.

Both vector and matrix forms of MD trend and CD profiles are shown in
Figure 10.4. Histogram of normalized basis weight data with comparison
Figures 10.5 and 10.6. MDjCD decomposition removes the most dominant
to normal distribution.
trends in data through simple averaging along each dimension. MD trend-
ing is uniquely defined as it applies exclusively to each time increment. On
the other hand CD profile calculation is specifically dependent on the time
ing YMD(k) from each element of the corresponding column of Y(n, k) 'window' or the number of scans used for averaging, in this case K = 110.
followed with row-by-row averaging to get YCD(n). The MD trend is an Figure 10.7 shows that the remainder Y R is more random than the original
N-dimensional row vector while the CD profile is a K-dimensional column data Y as should be expected.
vector. It is useful to construct corresponding set of matrices Y1vID(n, k) MDjCD decomposition is a sequence of two practically independent av-
and Y cD(n, k) where the vectors YMD(k) and YCD(n) are repeated to fill eraging or zero-order filtering operations with resulting variances that are
256 Chapter 10. Web and Sheet Processes 10.2. Orthogonal Decomposition of Profile Data 257

110

Figure 10.7. Residual of data after removing MD trend and CD profile.

essentially additive, O"~ = O"~ + O"~CD + O"~R • 2 For process monitoring all n
~"2D .
three variance components O"yMD ,0"y;2CD and O"yR or eqUlvalently the cor-
responding standard deviations are tracked. Reduction in variability may Figure 10.8. Comparison of full roll CD profiles. Light line is a repeat of
come from improvements in: (a) process equipment, (b) operating practices Fig. 10.6 and heavy line is for another roll produced about 24 hours earlier.
including the elimination of disturbances, and (c) automatic control. Ac-
cording to the normalized variances as displayed in the Figures 10.5-10.7
or long time-span filter. Although YC D (n) effectively and very efficiently
major contributors to process variability for this example are Y CD and
captures the gross nature of CD variability it does not imply that the same
Y R implying the presence of effective basis weight automatic control for
profile vector is a quasi steady-state property of the sheet. Depending on
YM D (k) but no active feedback correction for CD profile.
process conditions and raw material variations CD profile usually changes
with time. For the example discussed in Section 10.1.1, a comparison of
10.1.2 Time Dependent Structure of Profile Data its CD profiles with a roll manufactured approximately 24 hours earlier is
shown in Figure 10.8. In contrast, an example of typical changes in the CD
Analysis of full sheet data is useful for process performance evaluations profile within the span of the same roll can be tracked through consecutive
and product value calculations. For feedback control or any other on-line four shorter-window averages sequentially displayed in Figure 10.9. The
application, it is necessary to continuously convert scanner data into a 110 scans, shown as a single roll average in Figure 10.6, are divided into
useful form. Consider the data vector Y(:, k) for scan number k. It is approximately four equal segments to capture and demonstrate the short-
separated into its MD and CD components as Y(:, k) = YMD(k)+ Y CD(:, k) term time dependence of the CD profile.
where YMD(k) is the mean of Y(:, k) as a scalar and Y CD(:, k) is the
instantaneous CD profile vector. MD and CD controllers correspondingly
use these calculated measurements as feedback data for discrete time k.
Univariate MD controllers are traditional in nature with only measurement
10.2 Orthogonal Decomposition of Profile
delay as a potential design concern. On the other hand, CD controllers are Data
multivariate in form and must address the challenges of controller design
for large dimensional correlated systems. In Section 10.1, it was stated that the basic steps of data evaluation are (a)
Control systems ignore short term variabilities through appropriately removing of mean, (b) correlating the dominant trend, and (c) analyzing
designed filters. Effective length of the filter window determines how quickly the residual scatter around the correlation. For the two-dimensional sheet
significant variations are actually detected. Defining a CD profile vector process data MD/CD decomposition is essentially the implementation of
YCD(n) for a complete roll is perhaps the simplest form of a large window step (a) in both the spatial and temporal modes. The resulting data com-
258 Chapter 10. Web and Sheet Processes 10.2. Orthogonal Decomposition of Profile Data 259

2"- - - - - - - - - - - - - , where YM = (f?c is the j\1-dimensionallow-order approximation of Y and


Var = 0.647
II Var =0.703
=
k 1 :27 k =28:55
YR is the residual. Due to the orthogonal nature of the decomposition YM

--
and YR are independent and therefore (J2Y = (J2YM + (J2YR'

--
c::

~
c
c::

>-
c
u
Separating measured data vectors or matrices into independent lower
order approximations and residual terms is useful both in process perfor-
mance evaluation, as variance contributions can be clearly separated, and
in feedback process control, as the number of decision variables can be
significantly reduced while the adverse effects of autocorrelation are elimi-
54 54 nated. In the following two sections orthogonal decomposition approaches
n n using Gram polynomials and principal components analysis (PCA) will be
2,--------------, 2[,-------------, introduced.
=
Var 0.635
k =56:82

--
c::
c
u
10.2.1 Gram Polynomials
Gram polynomials are orthogonal and defined uniquely for discrete data at
>- equidistant positions much like the spatial data collected in sheet forming
processes. For N data positions, discrete-point scalar components of the
mth-order polynomial vector Pm = [Pm,l".Pm,n ...Pm,N]T are defined as
54
n n _ ~ (-l)J(m+j?j (n)j
Pm,n - L.-. ( '1)2 TV (10.2)
Figure 10.9. Sequence of changes in short-term CD profile through the j=O J. 1

length of a roll.
PO,n = 1
n-1
ponents YM D, YC D and Y R maintain significant information about process P1,n = 1 - 2( N ~ 1)
performance and improvement potentials, which can be evaluated through
identification and analysis of dominant trends as suggested in step (b). _ (N 1)(2m - 1) [ 2(n - 1)]
Pm,n - m(N m) 1- N _ 1 P(m-1),n
Process data can be correlated or de-trended using a variety of functions.
A computationally reliable approach is the use of orthogonal functions. (m - l)(N - 1 + m)
Least squares fit of data with a simple reduced order function can pro- - m(N - m) P(m-2),n (10.3)
vide valuable information about process performance in terms of dominant
contributions to variability. Recursive formulations with respect to both polynomial order and data
Let Y = [Y1; Y2; ... ; YN] = [Y1 .. ·YN] T be an N-dimensional mean-centered position are given in Eq. 10.3. Zero-order is only used to account for
observation vector where y = O. Let Zl ... ZM be M orthogonal basis the data mean if needed. For a mean-centered data vector, the effective
functions where M < N, each basis vector Zm = [Zm,l ... Zm,N]T is N- polynomials are P1 through P N -1. First five of these are plotted in Figure
dimensional, and zT Zj = 0 for i =F j. Define the N x M dimensional 10.10 for N = 50. Note that the polynomials are explicitly defined through
the data length.
bases matrix (f? = [ZlZ2 ... ZM] through which Y can be approximated as
Y :=::; (f?c, where c = [C1 ... CM]T is the score vector measuring the projection Example Consider the CD profile examined in Figure 10.6. The mea-
magnitudes of Y onto the lower dimensional bases Zl ... ZM. Least squares surement vector YCD has N = 54 data positions with the correspond-
approximation of c is obtained by c = \Ity where \It = (f?T (f?) -1 (f?T is ing Gram polynomials P1 through P53 that form (f? = [P1P2 ... p531 and
called the transition matrix. Now Y can be expressed as Y = YM + YR \It = (q.T (f?) -1 (f?T as defined earlier. Computationally \It can be easily
260 Chapt er 10. Web and Sheet Proces ses 10.2. Ortho gonal Decom positio n of Profile Data 261

Gram polyno mial approx imation has essenti ally removed all low-fre
quency
variatio ns from the profile.

-Q.8r

_1\1L ---"-.-- -----7. c--


10 20 SO 40
Discrete equidistant data positions, n

Figure 10.10. Gram polyno mials as basis functio ns, first-or der throug h
fifth-order.
Figure 10.11. Fourth -order Gram polyno mial approx imation of
mean CD
profile and compar ison with the residua l measur ement signal,
YCD(R) =
constru cted row-by-row owing to the simplif ication arising from
orthogo - YCD - YCD(M )·
nality, where the mth row is l)!m = p~/(P~Pm), which is the
normal -
ized form of the corresp onding polyno mial. Project ion magnit udes
of YCD
on the Gram polyno mial basis vectors are c = l)!y CD' Althou
gh all 53
compon ents of c are needed to duplica te YC D only a few of the
low-order
coefficients may be sufficient to capture the domina nt trends in
the mean
profile. Consid er the full represe ntation as YCD = cPc and its partitio
ned
form YCD cP[CM; C(N-I- M)] leading to YCD cP MCM +cP RCR where the
subscri pt M is used to designa te the selection of the first M polyno
mial
orders and R to designa te the residual. Dimens ions of cP M and
CM are
N x M and M x 1 respectively.
Orthog onal decomp osition of the CD profile YCD = YCD(M ) + YCD(R)
using 1\11 = 4 and the corresp onding varianc e contrib utions (T~CD =
(T~CD(M) + 0' -
1 10io io - 40
Gram Polynomial order, m
(T2 reveal that approx imately 50% of the variabi lity can be attribu ted Frequency [CO data widthr 1
YCD(R)
to a low-frequency wave capture d by 4th-ord er Gram polynomials. •
FIg- Figure 10.12. Accoun ting for total CD profile variabi lity throug
ure 10.11 shows that the low-order approx imation is in fact very h Gram
effective polyno mial approx imation s and the cumula tive power spectra for
and that the residua l profile is more balance d compar ed to the origina YCD and
l CD YCD(R )'
profile. Cumul ative contrib utions of individ ual polyno mial orders
toward s
total varianc e are plotted in Figure 10.12. For this particu lar case
even the A reasona ble observa tion after these results may be that an effectiv
first-or der polyno mial, which is simply a straigh t line, accoun ts e CD
for more control ler should be able to reduce CD variabi lity by at least a factor
than 20% of the variance. As it is visible in Figure 10.11, the CD of 2.
profile Anothe r way of stating the same observa tion from a perform ance
has a distinc t slant increas ing from left (front of machine) to right monito r-
(back), ing point of view would be that as long as a CD control ler is functio
which is capture d by Pl. Also plotted in Figure 10.12 are the cumula ning
tive properl y both plots in Figure 10.12 should indicat e insignificant
power spectra of YCD and YCD(R) as functio ns of frequency based contrib u-
on in- tions from Gram polyno mials up to order 4 or 5, which would
cremen tal measur ement length. It is confirmed again that the fourth- also mean
order essentially similar power spectra for YCD and YCD(R) .
262 Chapter 10. Web and Sheet Processes 10.2. Orthogonal Decomposition of Profile Data 263

10.2.2 Principal Components Analysis 1r--'~~·~'~'-~r--- ~_" _~; ,

...............
Principal components analysis (PCA) (see Section 3.1) provides a technique
....
to define orthogonal basis functions that are directly constructed from pro-
cess data, unlike Gram polynomials which are dependent on the data length
•• ••
.•••• Cumulative variance

only. PCA is also uniquely suitable for extracting the dominant features •
+••-
of two-dimensional data like the residual profile obtained after MD ICD •••
decomposition, YR. ••••••
Let Y be N x K dimensional data (N < K) with mean-centered rows
and columns as it is for YR. Although the only requirement for PCA ap-
plication is mean-centering of columns, having the rows mean-centered as
.
0.21-..
I

0 1-'
••••

--lO-------w-----io
••••
Eigenvalues
......
. ...
40
.......
~••
,
I
J

i
well due to CD profile removal provides better scaling to remaining data. Eigenvalue sequence

Define a covariance or scatter matrix Z = yy T and let U = [U1 U2 ... UN]


with Ui = ... Ui,2V V be the orthonormal eigenvectors of Z such that Figure 10.13. Normalized eigenvalues of Y R and the cumulative contribu-
ZU = UA. As Z is symmetric and UTU = UU T = IN it follows that tions of modes towards total variance.
Z = U AU T . Both U and A are easily computed through singular value
decomposition (SVD). A is the diagonal eigenvalue matrix containing ele-
ments A1 ... AN that are sequenced in descending order A1 2: A2 2: '" 2: AN· For practical purposes, the eigenvalues are normalized with respect to A1.
The basis matrix U is optimal in the sense that the largest contribution Last eigenvalue is zero as the rank of the scatter matrix Z is N - 1 due to
to data variance is captured through the first eigenvector, and of the resid- mean centering ofY R rows with CD profile removal. Variance contributions
ual the largest contribution to variance is then captured by the second of each mode associated with the eigenvalues are also plotted. There are
eigenvector, and so on. Projection of original data Y onto the new basis various methods of choosing the number of modes to be used for PCA
vectors is calculated by A = UTy and equivalently the data are repre- approximation. One common method is to select the point of transition
sented through the eigenvector bases as Y = U A. Corresponding to their where eigenvalues start decreasing gradually after the initial faster drop.
directional roles N x N matrix U and N x K matrix A are referred to as For this case it happens at approximately 1\;1 = 4. The choice of lvI is not
spatial modes and temporal scores respectively. Both A and the eigenval- absolute as iVI = 6 would have provided similar results; however. smaller NI
ue matrix A provide a measure of data variability along each eigenvector, is always preferred for parsimony unless there is a practical reas~n to choose
AA T = UTyyTU = UTZU = A. For eigenvector Ui the corresponding a larger M. Figure 10.14 shows a few examples of the eigenvectors and the
variance contribution of data is <7; = (aiaf)/(K - 1) where (aiaf) Ai temporal scores for YR· First two eigenvectors are capturing dominant
and ai is the ith row-vector of A. Similar to Ai variance contributions are slant and parabolic wave behaviors while one of the latter eigenvectors
also in descending order <7r 2: <7~ 2: ... 2: <7Jv. (45th) is practically high frequency noise. Eigenvectors are constructed to
Dominant correlations of data are usually captured by a small number have normalized magnitude while their contributions for Y R reconstruction
of initial eigenvectors. A simple orthogonal decomposition is accomplished for each temporal position are captured through associated score vectors.
by partitioning U = [U 2\;! U R1 and A = [AM; A R ] where M designates the Correspondingly, magnitudes of first two scores are much higher than the
number of initial dominant modes to be used for approximation while R 45th. For process control and monitoring, the goal is to have all eigenvectors
stands for the remaining (N - 1\11) modes or the residual. Data matrix be- and scores resemble the high-frequency nature of the 45th mode whether
comes y = U iV! AM + U RA R = Y M + YR. For a successful approximation, the PCA analysis is done on Y or YR.
YM captures significant variability trends and Y R simply represents resid- PCA approximation of Y R with 1'vl 4 gives Y R = Y RM + Y RR
ual random noise. Transformation in the form Y :::::0 YN! uses 1\;1(N + K) where the last matrix is the residual profile recreated through N M
data entries and provides [l-NI(N + K)/(NK)]100 % data compression. modes. Combining Y RM with the averaging results of MD/CD decomposi-
For the paper machine data considered earlier, the PCA analysis of the tion generates an overall filtered approximation for the full sheet profile as
residual matrix Y R generates eigenvalues that are plotted in Figure 10.13. Y YAW + Y CD + Y RN! + Y RR = Y M + Y RR. Last two profiles are shown
264 Chapter 10. Web and Sheet Processes 10.2. Orthogonal Decomposition of Profile Data 265

"1-5 "[~"'~'j ", """octo,,,

'~--~1 '~-I .~
-0 5'
'1 M
-0.5' , -0.51
1 M
I
Temporal score 1

. ~I
0- ..- ---'-.. . - - . - . ... ""-1,
,

i I
-4'1
110
Temporal scores 2 and 45 (light line)
Figure 10.15. 4th-order PCA approximation of Y and the residual, Y =
YM +Y RR .

Figure 10.14. First two and the 45th eigenvectors and scores of YR.

in Figure 10.15. Y RR retains only high frequency random contributions in


Y with SJw = 0.24 while the filtered profile retains the rest S'fw = 0.76.
Process control and monitoring target would be to minimize overall process
variability with the majority of variance captured in Y RR.
An alternative orthogonal decomposition of two-dimensional profile data
is denoising through wavelet transforms as discussed in Section 6.2.2. To Figure 10.16. Filtered approximation of Y through wavelet denoising using
demonstrate, hard-thresholding with the wavelet function db8, as defined hard-thresholding with one level of decomposition (left) and two levels of
in Eq. 6.22, is used on the profile Y R to approximate the full profile as decomposition.
Y w ~ YMD + Y CD + Y RW. Figure 10.16 shows the results of denoising
carried out using two levels of decomposition. Corresponding variances are
S~V1 = 0.78 and Srv2 = 0.71 compared to the PCA result at S~1 = 0.76. deviation form if the CD target is not uniform, MD/CD decomposition
Y = Y 1V1D + Y CD + Y R clearly indicates the importance of maintaining
10.2.3 Flatness of Scanner Data constant average behavior, YlvID = a and YCD = 0, in keeping Yon target.
Y R contains scan-by-scan residual of data that measures point-wise devia-
Performance objective for sheet forming processes is to maintain uniformi- tions with respect to YMD trend and YCD profile. Each column vector of
.
ty or flatness ill ( tar-get)
Y - Y CD .
Th e target matnx
. represent s YCD
tar-get r
lor Y R has deviation data that originate from local variabilities that are either
those applications where the CD set-point is not uniform, like in metal and random or structured in MD and/or CD directions. Due to the traversing
plastic sheets that may have a 'frown' thickness target almost parabolic nature of the scanner it is not possible to differentiate the origin of locally
gradually increasing front and back ends towards the middle with relative- structured variations between MD and CD. Regardless of this limitation it
ly uniform mid section. Given the full matrix form of sheet data Y, in is still informative to explore and quantify the local data trends in Y R in
266 Chapter 10. Web and Sheet Processes 10.2. Orthogonal Decomposition of Profile Data 267

order to expose the potential margin of improvement. A similar transformation of all scan data residuals, i.e., columns of Y R,
is accomplished by Y R ;::::: <J>B and B = wY R where <J> = [PIP2] and
W = (<J>T<J»-I<J>T. B is the 2 x K scores matrix measuring the magnitudes
of slope and parabola bases for each residual scan contained in YR. A
Basis scores: {slope parabola]:;: [1 1 1
phase-plane scatter plot of B provides a concise view of flatness in terms
of deviations from the target [0 0]. Further characteristics of the scores are
measured through a PCA decomposition by establishing equivalent dom-
inant eigenvectors U. Through SVD of Z = BB T , the two basis vectors
U = [UIU2] are identified with a measure of transformed scores A = UTB.
Figure 10.18 shows the scatter plots for Y Rand Y RM (M = 4) with the
corresponding PCA alignments and the standard deviation contours as ref-
erence. Both Y Rand Y RM are very similar indicating the effectiveness
of fourth-order PCA approximation. Significant scatter of the scores show
Figure 10.17. [Slope Parabola] as bases and the approximation of YCD· that the residual scans contain structured variabilities measurable as first-
and second-order polynomials indicating non-flat behavior. Further evi-
First- and second-order Gram polynomials provide simple but powerful dence of Y R PCA approximation accuracy is displayed in Figure 10.19
orthogonal bases to test the 'flatness' of a profile measurement. Fo.r a profile where the phase-plane plots for Y RR = Y R - Y Rill! are shown for fourth-
to be flat it should at least have approximately 0 as the magmtudes for and sixth-order approximations. Increasing the number of PCA modes from
its second-order Gram polynomial approximation. Consider the modified 4 to 6 adds marginal improvement for [Slope Parabola] projections. As a
version of the basis functions as shown in Figure 10.17 where PI has a process performance objective, Figure 10.18 plots for Y Rand Y RM should
positive slope for convenience (opposite of the formal definition of Gram look similar to that of Y RR in Figure 10.19. During continuous process
polynomials, see Eq. 10.3) and the basic scores are [1 1] at full sc~le. improvement a logical follow up to the minimization of [PIP2] projection-
Approximation of sheet data YCD indicates scores of [0.639 .-0.236], w~lch s is to do the same with [P3P4] projections, and so on until all dominant
are significant. Clearly, requiring only these scores to be 0 IS not suffiCIent trends are eliminated and Y R columns contain only high-frequency random
for YCD uniformity, but it is a necessary first step. signals.

StD =0.2732 StD = 0.3082 StD


2
=0.2698 StD1 = 0.07432 StD =0.03622 StD
1
=0.06076 StD
2
=0.03428
2 1 2
1

•• ••• ••
'"
'0
J:l
::
'"
a.

••
• •
Slope Slope Slope
Slope

Figure 10.18. PCA coordinates [UI U2] of the [Slope Parabola] projections Figure 10.19. PCA coordinates of the [Slope Parabola] projections of
of Y R (left plot) and Y RM scans (columns) with M = 4. Y RR = Y R - Y RM with M = 4 (left plot) and M = 6.
268 Chapter 10. Web and Sheet Processes 10.3. Controller Performance 269

The quadrants of the phase-plane plots divide projection scores into d- feedback loop interactions. Performance evaluation of MD control basically
ifferent characteristic patterns. For example, quadrant 1 is uphill slope and follows the same principals as any other univariate control system. On the
smiling parabola, quadrant 3 is downhill slope and frowning parabola, etc. other hand CD control performance evaluation is more involved reflecting
Figure 10.20 shows the larger scores of each quadrant with a distinct shade the difficulties that arise in CD controller design due to high dimensionality
of gray while allowing a central elliptic region to be the lower limit for shape and strong correlations of decision variables. In the following sections, MD
classification. Slope and parabola limits of the plot are arbitrarily assigned and CD control performance evaluations will be discussed. The method-
for demonstration purposes. Creating five distinct classifications for scores ologies presented for CD controller design and performance evaluation are
that can be used as a simple filter is a form of masking for additional da- both model-based and reflect recent developments in the technology.
ta mining. Data points of scores representing K = no scans provide an
overall accounting of different shape trends while differentiating almost flat 10.3.1 MD Control Performance
from the significantly shaped scans. Another view of the same information
is also included in Figure 10.20 as a top view of the sheet with scan colors MD control performance is evaluated directly from the closed-loop time se-
reflecting the designated score shade. The lower magnitude scores within ries data Y MD. Let deviation variable Yt represent the measurement signal
the ellipsoid limit are neutral (white) in the sheet view, though they were at time t. One way of representing Yt in terms of previous measurements is
gray circles in the score plot for visibility. Sheet view of the scores con- through a moving average (MA) correlation model
tain temporal information indicating frequency of changes between shape 00
patterns. For example, first and second halves of the sheet do not have y(k) = (1 +L VJjq-1)a(k) (10.4)
similar patterns. The switch from lighter to darker shades imply a change j=l
in process disturbance behavior. Obviously, the performance objective is
to have all scores within the ellipsoid limit and a sheet view without any where a(k) is the noise signal, q-1 is the backward difference operator and
stripes. VJj is a model constant. For a process with an effective time delay of h
sampling intervals, Eq. 10.4 can be restated in two parts as

y(k) = (1 + VJ1q-1 + ... + VJh_1Q-h+l)a(k) + (10.5)


% in range = 29.1, % Iflquadrants= (16A 15.5 20 19.1)
+(VJh + VJh+1Q-1 + ... + VJh+1Q-l + ... )a(k - h)
.!!!
o
The second term is an h-step-ahead prediction of y(k) while the first term
~ 0,- -----·--····.4·-:c. k.:':;j-C·--·····----:
is the prediction error. The best a controller can do is to eliminate the devi-
"
0.
ation represented by the second term. Thus, theoretical minimum variance
is S;'in = (1 + L.~::; 1fJ;)S;, which notably requires the calculations of only
h - 1 constants ~)j and variance of noise, traditionally done through Yule-
Walker equations using the autocorrelation coefficients of y. For process
Slope
control purposes, total variance of measurement signal is 8 2 = S/
+ t?
Figure 10.20. Masking of Y RM [Slope Parabola] projections to highlight where the contribution of offset from target is also accounted. Normalized
significant deviations and characteristic patterns. performance index (NPI) is 7](h) = 1- S;'in/ S2, which measures the margin
of improvement opportunity for the controller.
NPI for sheet data y M D is calculated for a range of measurement de-
lays h = 1, ... ,20 and plotted in Figure 10.21. In most sheet processes
10.3 Controller Performance where data are collected with a traversing scanner the measurement delay
is 2 or 3. The plot shows that there is essentially no significant margin
Sheet forming processes have univariate MD and multivariate CD con- for additional MD control improvement. Confirmation of MD controller
trollers. Process dynamics for both are dominated by gain and time delay. performance can be seen in Figure 10.22 where the fifth-order auto regres-
Most of the appreciable dynamics arise from the design of signal filters and sive (AR) model of YMD is compared to the residual noisy signal. The
270 Chapter 10. Web and Sheet Processes 10.3. Controller Performance 271 I


. -. . .§

1 i

t02:!h,m:~:~~:~'E£II~!'~-~=;~-I:~=;~=:t~~-;~
11n
'1.0.25;•.. . _......· · !
Figure 10.21. Normalized performance index NPI of YMD to measure MD
control improvement potential. '5' ~ "{i) . fs 20 "25'" 30 ' - . "35
Measurement lag

Figure 10.23. Autocorrelation coefficients of YMD and its AR(5) model


approximation.

10.3.2 Model-Based CD Control Performance


0.6"
var'" 0.00872 var"'O,032 Realistic CD control performance calculation is not possible from direct
computations only using measured process output data. Although tradi-
tionally it has been suggested that minimum variance of CD profile should
be linked to the Nyquist frequency of measurement data length, such an ap-
proach results in a rather aggressive and unrealistic estimation of what can
..{).6i -0.6
actually be achieved through a well designed CD controller. The reason is
1
the high dimensionality and correlated interactions of the control elements
Figure 10.22. Fifth-order AR approximation of YMD and the residual ran- or the slice actuators that render 'perfect' CD control impossible. Instead,
for a given process, an 'optimal' CD control performance can be estimated
dom signal.
through simulation, which can be compared to actual process output data
to calculate improvement potential in terms of a normalized performance
index. The model-based simulation approach for control performance eval-
latter captures more than 75% of the total variance. Autocorrelation co- uation is a general method that can be used for other applications which
efficients of YM D as a function of measurement lag are plotted in Figure may require more accuracy beyond methods that use direct process output
10.23 with the corresponding 95% confidence limits. Within process time data only.
delay of lag 3, the magnitude of the coefficients is reduced below desired NPI for the model-based approach is defined as T) = 1 - S~(oPt/ S~
limit, again confirming the general satisfactory behavior of the controller. where S~(oPt) is calculated through simulation using process related data as
However, in the same plot, it is also clear that there is a cycling trend of the shown in Figure 10.24. There are two parts to the calculations, disturbance
autocorrelation coefficients with increasing measurement lag that suggests estimation and achievable performance estimation. Process disturbance is
possibility of a tightly tuned controller. Similar plot for the AR(5) mod- estimated from the difference between actual process output data Y and
el shows the cycling nature of YMD signal correlations more clearly. For model prediction from control decisions U. In turn, for the optimal per-
a well-tuned and efficiently performing controller the autocorrelation bars formance estimation, D(est) is used as the input of a closed-loop simulated
should be randomly varying within the confidence limits starting shortly control system. The process model (Model A) in the simulated control sys-
after the process measurement time lag. tem is the same model used in disturbance estimation, while the optimal
272 Chapter 10. Web and Sheet Processes 10.3. Controller Performance 273

PROCESS

mean "'.0,2&6
std =0,735
Disturbance estimation var=O,541

Achievable performance estimation

CONTROL
(opt)
Y(opt)

15

Figure 10.25. Improvement in profile data as a result of simulated CD


control implementation. Differences in absolute values of local data show
improvements as reductions in deviation magnitudes.
Figure 10.24. Block diagram of the general approach to calculate best
achievable (optimal) controller performance through simulation.
t location as a difference in deviation magnitudes (IY OptCont I - IY Data I),
which is a negative number at any position where original process deviation
controller reference model (Model B) can be different and simpler. It is im- is reduced through simulated CD control implementation. Histogram of the
portant to emphasize that disturbance estimation calculations use only the collective data has a negative mean (-0.286) confirming overall improve-
output of the actual implemented control system and not the algorithmic ment potential. Performance analysis is based on the computed results for
details. Thus, it is possible to carry out these calculation without requiring the last 96 scans as the first 14 were part of the controller filter initiation,
proprietary information from control vendors. For CD control application, which is a natural artifact of working with a batch data file. However. the
the design of the simulated optimal controller does not imply aggressive be- procedure is equally valid for on-line applications where T] and the c~rre­
havior to achieve minimum variance. The control tuning should be realistic sponding visualization diagrams can be tracked in real time in terms of
resulting in mild actions to avoid picketing while satisfying all necessary moving windows of preselected scan lengths.
constraints imposed by slice physical characteristics. Another informative visualization of the results is through a 2D plot
The paper machine example first introduced in Section 10.1.1 reflects showing CD controller effects at each location as one of three categories: (a)
open-loop in CD control data. Application of model-based CD control per- significant reduction in deviation magnitude from target (dark), (b) mild
formance calculations is simplified for this case as U = 0 while the results reduction or amplification (gray), or (c) significant amplification (light).
establish an upper limit of improvement expectations from a potential CD For demonstration purposes the standard deviation limits [±(S = 0.735)]
control implementation. Following the procedure summarized in the block of the (IYoptContl - IY Datal) data are used as color masking boundaries
diagram of Figure 10.24 simulation calculations provide Shopt) = 0.0644 and the results are shown in Figure 10.26. Corresponding color masking
or T] = 1 - 0.0644/0.625 ~ 0.9. This is of course a theoretical target and of the fourth-order PCA approximation clearly shows that the significant
the practical reality may be in the range 0.5 to 0.7, which would still be a improvement areas are correlated and match the strong deviation locations
significant improvement. of the original profile shown in Figures 10.3 and 10.15. This is a confirma-
A unique advantage of the model-based CD control performance calcu- tion that the CD controller improvement potential calculations are based
lations is the two-dimensional information detail for improvement potential. on appropriately targeted variability reduction. A simple implementation
Figure 10.25 shows the process variability reduction for each measuremen- of the masked PCA approximation as an on-line monitoring metric is to
274 Chapter 10. Web and Sheet Processes lOA. Summary 275

Accordingly, CD control performance evaluation requires a special method


of model-based approach to compute improvement potential based on a
realistically achievable target for that particular process.

Figure 10.26. CD control implementation improvements showing significant


differences of (!YOptCont! - !Y Datal) on a 2D plot by masking the results
into three categories: [< -0.735] = dark, [-0.735 to 0.375] = gray and
[> 0.735] = light. Similar masking of the fourth-order PCA approximation
(right) emphasizes correlated improvement locations.

expect the complete 2D plot to be in gray color within quality limits that
are reasonable for the specific product. Calculated dark colors, showing
significant improvement potential locations, that cover more than 2 - 5%
of the plot would flag reduction in CD control performance.

10.4 Summary
Process data for web and sheet forming processes are in two-dimensional
form describing spatio-temporal properties that are practically described
as cross direction (CD) and machine direction (MD) for space and time,
respectively. Mean data values at discrete CD increments describe average
property profile which needs to be kept on target for optimum product
value. Both process and controller performance analyses focus on degree
of data variability in MD and CD averages as well as the residual of data
after the removal of MD/ CD trends. Extraction and quantitative analysis of
structured correlations in the two-dimensional residual profile can be done
using orthogonal decomposition methods like Gram polynomials, principal
components analysis (PCA), and wavelet denoising. These methods can
identify any significant process variability information hidden within the
otherwise seemingly random nature of residual data. Rigorous CD control
application of sheet forming processes has unique complications arising from
the strong correlations of its large scale and constrained decision variables.
Bibliography

[1] LC Alwan and HV Roberts. Time series modeling for statistical pro-
cess control. J. Business and Economic Statistics, 6:87-95, 1988.

[2] BDO Anderson and JB Moore. Optimal Filtering. Prentice-Hall,


Englewood Cliffs, NJ, 1979.

[3] TW Anderson. Introduction to Multivariate Statistical Analysis. John


Wiley & Sons, New York, NY, 2nd edition, 1984.

[4] D Antis, JL Slutsky, and CM Creveling. Design for Six Sigma in Tech-
nology and Product Development. Prentice-Hall PTR, Upper Saddle
River, NJ, 2003.

[5] M Aoki. State Space Modeling of Time Series. Springer-Verlag, New


York, NY, 2nd edition, 1990.

[6] M Bagshaw and RA Johnson. The effect of serial correlation on the


performance of CUSUM tests. Technometrics, 17:73-80, 1975.

[7] A Bakhtazad, A Palazoglu, and JA Romagnoli. Process data denois-


ing using wavelet transform. Intelligent Data Analysis, 4:267-285,
1999.

[8] BR Bakshi. Multiscale PCA with application to multivariate statis-


tical monitoring. AIChE J., 44(7):1596-1610, 1998.

[9] BR Bakshi and G Stephanopoulos. Representation of process trends,


part iii. Multiscale extraction of trends from process data. Comput.
e Chem. Engg., 18:267-302, 1994.
[10] BR Bakshi and G Stephanopoulos. Compression of chemical process
data by functional approximation and feature extraction. AIChE J.,
42:477-492, 1996.

277
278 BIBLIOGRAPHY BIBLIOGRAPHY 279

[11] A Banerjee, Y Arkun, B Ogunnaike, and R Pearson. Estimation of [24] L Breiman and JH Friedman. Estimating optimal transformations for
non-linear systems using linear multiple models. AIChE J., 43:1204- multiple regression and correlation. J. Amer. Statist. Assoc., 80:580-
1226, 1997. 598, 1985.
[12] DB Bates and DG Watts. Nonlinear Regression Analysis and Its
[25] R Bro and AK Smilde. Centering and scaling in component analysis.
Applications. John Wiley & Sons, New York, NY, 1988. J. Chemometrics, 17:16, 2003.
[13] LE Baum. An inequality and associated maximization technique in
statistical estimation for probabilistic functions of a Markov process. [26] AG Bruce, DL Donoho, HY Gao, and RD Martin. Denoising and
Inequalities, 3:1-8, 1972. robust nonlinear wavelet analysis. In SPIE PTOceedings - Wavelet
Applications, page 2242, Orlando, FL, 1994.
[14] S Beaver and A Palazoglu. A cluster aggregation scheme for ozone
episode selection in the San Francisco, CA Bay Area. Atmospheric [27] M Carmichael, R Vidu, A Maksumov, A Palazoglu, and P Stroeve.
EnviTOnment, 40:713-725,2006. Using wavelets to analyze AFM images of thin films: Surface micelles
and supported lipid bilayers. Langmuir, 20:11557-11568, 2004.
[15] S Beaver and A Palazoglu. Cluster analysis for autocorrelated and
cyclic chemical process data. Ind. &J Engg. Chem. Research, 2006. [28] B Chen and A Westerberg. PTOceedings of PTOcess Systems Engineer-
Submitted. ing (PSE). Elsevier, New York, NY, 2003.
[16] S Beaver and A Palazoglu. Cluster analysis of hourly wind mea-
[29] R Chen and RS Tsay. Nonlinear additive ARX models. J. Amer.
surements to reveal synoptic regimes affecting air quality. J. Applied Statist. Assoc., 88(423):955-967, 1993.
MeteoTOlogy and Climatology, 2006. In press.

[17] S Becker. Unsupervised learning procedures for neural networks. Int. [30] S Chen and SA Billings. Modeling and analysis of nonlinear time
J. Neural Systems, 2:17-33, 1991. series. Int. J. ContTOl, 49:2151-2171, 1989.

[18] KR Beebe and BR Kowalski. An introduction to multivariate cali- [31] S Chen and SA Billings. Representations of nonlinear systems: The
bration and analysis. Anal. Chem., 59:1007A-I015A, 1987. NARMAX model. Int. 1. ContTOl, 49:1013-1032, 1989.

[19] DP Bertsekas. Dynamic PTOgramming and Optimal ContTOl. Athena [32] Y Cheng, W Karjala, and DM Himmelblau. Resolving problems in
Scientific, Belmont, MA, 2nd edition, 2000. closed loop nonlinear process identification using IRN. Comput. &J
Chem. Engg., 20(10):1159-1176,1996.
[20] S Bezergianni and C Georgakis. Controller performance assessment
based on minimum and open-loop output variance. Chem. Engg. [33] CJ Chessari. Studies in Modeling and Advanced ContTOl. PhD thesis,
Practice, 8:791-797,2000. The University of Sydney, Australia, 1995.
[21] AW Bowman. Altrenative method of cross-validation for the smooth-
ing of density estimate. Biometrika, 76:353-360, 1984. [34] JTY Cheung and G. Stephanopoulos. Representation of process
trends, Part 1. A formal representation framework. Camp. &J Chem.
[22] GEP Box. Some theorems on quadratic forms applied in the study of Engg., 14:495-510, 1990.
analysis of variance problems: Effect of inequality of variance in one-
way classification. The Annals of Mathematical Statistics, 25:290-302, [35] JTY Cheung and G Stephanopoulos. Representation of process trend-
1954. s, Part II. The problem of scale and qualitative scaling. Compo &J
Chem. Engg., 14:511-539, 1990.
[23] GEP Box, GM Jenkins, and GC Reinsel. Time Series Analysis -
Forecasting and ContTOl. Prentice-Hall, Inc., Englewood Cliffs, NJ, [36] LH Chiang and RD Braatz. Fault Detection and Diagnosis in Indus-
3rd edition, 1994. trial Systems. Springer-Verlag, London, UK, 2001.
280 BIBLIOGRAPHY
BIBLIOGRAPHY
281
[37] LH Chiang, ME Kotanchek, and AK Kordon. Fault diagnosis based
on Fisher's discriminant analysis and support vector machines. Com- [51] N Delfosse and P .Loubaton. Adaptive blind separation of independent
put. &J Chem. Engg., 28(8):1389-1401,2004. sources: A deflatIOn approach. Signal Processing, 45:59-83, 1995.

[38] LH Chiang, EL Russell, and RD Braatz. Fault Detection and Diag- [52] WE Deming. Out of the Crisis. MIT Press, Cambridge, MA, 1982.
nosis in Industrial Systems. Springer-Verlag, London, UK, 2001.
[53] L ~es~orough and T J Harris. Performance assessment measures for
[39] K Choe and H Baruh. Sensor failure detection in flexible structures umvanate feedback control. Canadian J. of Chem. Engg. 70'1186-
using modal observers. 1. Dynamic Systems Measurement and Con- 1197, 1992. ' .
trol, 115:411-418,1993. [54] WR D. e Vrie~ and SM Wu. Evaluation of process control effectiveness
[40] Y-H Chu, SJ Qin, and C Han. Fault detection and operation mode an~ dIagnosl.s of variation in paper basis weight via multivariate time
identification based on pattern classification with variable selection. senes analySIS. IEEE Trans. on Automatic Control, 23:702-708,1978.
Ind. &J Engg. Chem. Research, 43:1701-1710, 2004.
[55] D Dong and TJ McAvoy. Nonlinear principal components analysis
[41] A Cinar, S Parulekar, C Undey, and G Birol. Batch Fermentation: based on principal curves and neural networks. Comput. &J Chem.
Modeling, Monitoring, and Control. Marcel Dekker, New York, NY, Engg., 20(1):65-78, 1996.
2003.
[56] DL. Donoho and 1M Johnstone. Ideal spatial adaptation via wavelet
[42] N Cristianini and J Shawe-Taylor. Support Vector Machines. Cam- shrmkage. Bwmetrika, 81:425-455, 1994.
bridge University Press, Cambridge, UK, 2000. [57] DL D
. on? h 0 and TPY Yu. Nonlinear pyramid transforms based on
[43] MS Crouse, RD Nowak, and RG Baraniuk. Wavelet-based statistical medIan mterpolation. SIAM J. Math. Analysis, 31:1030-1061,2000.
signal pocessing using hidden Markov models. IEEE Trans. on Signal
[58] JJ Downs and EF Vogel. A plant-wide industrial control problem. In
Processing, 46:886-902, 1998.
AIChE Annual Meeting, Chicago, IL, 1990.
[44] I Daubechies. Ten Lectures on Wavelets. Society for Industrial and
[59] F Doymaz, A Bakhtazad, JA Romagnoli, and A Palazoglu. Wavelet-
Applied Mathematics, Pennsylvania, 1992.
based robust filtering of process data. Comput. &J Chem. En .
[45] I Daubechies. Where do wavelets come from? A personal point of 25:1549-1559,2001. gg,
view. Proc. of IEEE, 84(4):510-513, 1996.
[60] F Doymaz, J Chen, JA Romagnoli, and A Palazoglu. A robust strat-
[46] I Daubechies. Orthonormal bases of compactly supported wavelets. egy for real-time process monitoring. J. Process Control 11'343-359
2001. ,. ,
Comm. on Pure and Applied Math. X, 51:909-996, 1998.
[47] ER Davies. The relative effects of median and mean filters on noisy [61] F D~ymaz, A Palazoglu, and JA Romagnoli. Orthogonal nonlinear
signals. J. Modern Optics, 39:103-113, 1992. partIal least-squares. Ind. &J Eng. Chem. Research 42'5836-5849
2003. ,. ,
[48] BS Dayal and JF MacGregor. Improved PLS algorithms. J. Chemo-
metrics, 11:73-85, 1997. [62] F D~ymaz., JA Romagnoli, and A Palazoglu. A strategy for detection
and IsolatlOn of sensor faults and process upsets. Chemometrics &J
[49] M deBoor. A Practical Guide to Splines. Springer-Verlag, New York, Intell. Lab. Sys., 55:109-123,2001.
NY, 1978.
[63J R? Duda, PE Hart, and DG Stork. Pattern Classification. John
[50] J DeCicco and A Cinar. Empirical modeling of systems with output WIley & Sons, New York, NY, 2nd edition, 2001.
multiplicities by multivariate additive NARX models. Ind. &J Engg.
Chem. Research, 39(6):1747-1755,2000. [64] E~ Dudewicz and SN Mishra. Modern Mathematical Statistics. John
WIley & Sons, New York, NY, 1988.
BIBLIOGRAPHY BIBLIOGRAPHY 283
282
[79] K Fukunaga. Statistical Pattern Recognition. Academic Press, San
[65] R Durbin, S Eddy, A Krogh, and G Mitchinson. Biological Sequence Diego, CA, 1990.
Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cam-
bridge University Press, Cambridge, UK, 1998. [80] C Gackenheimer, L Cayon, and R Relfenberger. Analysis of scanning
probe microscope images using wavelets. Ultramicroscopy, 106:389-
[66] JR English, M Krishnamurthi, and T Sastri. Quality monitoring 397,2006.
of continuous flow processes. Comput. &J Chem. Engg., 20:251-260,
1991. [81] 0 Galan, A Palazoglu, and JA Romagnoli. Robust H oo control of
nonlinear plants based on multi-linear models - An application to a
[67] L Eriksson, E Johansson, N Kettaneh-Wold, and S Wold. Multi- and bench scale pH neutralization reactor. Chem. Engg. Sci., 55:4435-
Megavariate Data Analysis. Umetrics Academy, Umea, Sweden, 2001.
4450,2000.
[68] M Evans, N Hastings, and B Peacock. Statistical Distributions. John [82] 0 Galan, JA Romagnoli, and A Palazoglu. Real-time implementation
Wiley & Sons, New York, NY, 1993. of multi-linear model-based control strategies. An application to a
[69] BS Everitt. Cluster Analysis. Heinemann Education, London, UK, bench-scale pH neutralization reactor. J. Process Control. 14:571-
579,2004. '
3rd edition, 1993.
[83] 0 Galan, JA Romagnoli, A Palazoglu, and Y Arkun. The gap met-
[70] FW Faltin and WH Woodall. Some statistical process control method-
s for autocorrelated data - discussion. J. Quality Technology, 23:194- ric concept and implications for multi-linear model-based controller
design. Ind. &J Eng. Chem. Research, 42:2189-2197, 2003.
197, 1991.
[71] G Fan and XG Xia. Improved hidden Markov models in the wavelet- [84] F Gao. Proceedings of Advanced Control of Chemical Processes (AD-
domain. IEEE Trans. on Signal Processing, 49:115-120,2001. CHEM). Elsevier, New York, NY, 2004.
[85] P Geladi. Wold, Herman, the father of PLS. Chemometrics &J Intel/.
[72] W Favoreel, B De Moor, and P Van Overschee. Subspace identifi- Lab. Sys., 15:R7-R8, 1992.
cation of bilinear systems subject to white inputs. Technical Report
ESAT-SISTA/TR 1996-531, Dept. Elektrotechniek Katholieke Uni- [86] P Geladi and BR Kowalski. An example of 2-block predictive partial
versiteit Leuven, 1999. least-squares regression with simulated data. Analytica Chimica Acta,
[73] AP Featherstone and RD Braatz. Model-based cross-directional con- 185:19-32, 1986.
trol. Tappi J., 83(3):203-207, 1999. [87] P Geladi and BR Kowalski. Partial least-squares regression: A tuto-
[74] RA Fisher. The statistical utilization of multiple measurements. An- rial. Analytica Chimica Acta, 185:1-17, 1986.
nals of Eugenics, 8:376-386, 1938. [88] Gensym. Gf!W Reference Manual. Gensym Corporation, Cambridge,
[75] I Frank. A nonlinear PLS model. Chemometrics &J Intel/. Lab. Sys., MA,1997.
8:109-119,1990. [89] www.gensym.com. [Accessed 11 July 2006].
[76] JH Friedman. Multivariate adaptive regression splines. Ann. Statist., [90] S Ghael, AM Sayeed, and RG Baraniuk. Improved wavelet denoisinO'
19:1-144,1991. via empirical Wiener filtering. In AF Laine, MA Unser, and A Al~
droubi, editors, SPIE Technical Conference on Wavelet Applications
[77] JH Friedman and W Stuetzel. Projection pursuit regression. J. Amer.
in Signal Processing VI, volume 3458, San Diego, CA, 1997.
Statist. Assoc., 76:817-823, 1981.
[91] M Girolami. Self-Organizing Neural Networks: Independent Compo-
[78] T Fujiwara, M Koyama, and H Nishitani. Extraction of operating sig- nent Analysis and Blind Source Separation. Springer-Verlag, London,
natures by episodic representation. In Advanced Control of Chemical
UK, 1991.
Processes: IFAC Symposium, pages 333---338, Kyoto, Japan, 1994.
284 BIBLIOGRAPHY BIBLIOGRAPHY 285

[92] GH Golub and CF van Loan. Matrix Computations. Johns Hopkins [106] T Hastie and W Stuetzle. Principal curves. J Amer. Statist. Assoc.,
University Press, Baltimore, MD, 2nd edition, 1989. 84(406):502-516,1989.

[93] C Goutis. A fast method to compute orthogonal loadings partial least [107] S Haykin. Neural Networks. Prentice-Hall, Upper Saddle River, NJ,
squares. J. Chemometrics, 1l:33-38, 1997. 2nd edition, 1999.

[94] SP Gurden, JA Westerhuis, R Bro, and AK Smilde. A comparison [108] RR H~cking and RN Leslie. Selection of the best subset in regression
of multiway regression and scaling methods. Chemometrics fj Intel/. analysIs. Technometrics, 9(4):531-540,1967.
Lab. Sys., 59(1-2):121-136,2001.
[109] AE Hoerl and RW Kennard. Ridge regression: Biased estimation for
[95] A Haar. Zur theorie der orthogonalen funktionen-systeme. Math. nonorthogonal problems. Technometrics, 12(1):55-67, 1970.
Annals, 69:331-371, 1910.
[1l0] A Horch and AJ Isaksson. A modified index for control performance
[96] H Haario and V-M Taavitsainen. Nonlinear data analysis. II. Exam-
assessment. In Proc. of ACC98. IEEE, 1998.
ples on new link functions and optimization aspects. Chemometrics
fj Intell. Lab. Sys., 23:51-64, 1994.
[Ill] A Hoskuldsson. PLS regression methods. J. Chemometrics, 2:2ll-
228, 1988.
[97] R Haber and H Unbehauen. Structure identification of nonlinear dy-
namic systems - A survey on input/output approaches. Automatica,
[1l2] H Hotelling. The generaliztion of Student's ratio. Ann. Math. Statist.
26:651-677,1990. 2:360-378, 1931. '
[98] V Haggan and T Ozaki. Modeling nonlinear random vibrations using
an amplitude dependent autoregressive time series model. Biometri- [1l3] H Hotelling. Analysis of a complex of statistical variables into prin-
ka, 68:186-196, 1981. cipal components. J. Educ. Psychol., 24:417, 1933.

[99] AC Hahn. Forecasting, Structural Time Series Models and the [1l4] B Huang and SL Shah. Practical issues in multivariable feedback
Kalman Filter. Cambridge University Press, New York, NY, 1989. control performance assessment. J. Process Control 8(5-6):421-430
1998. "
[100] GJ Hahn and WQ Meeker. Statistical Intervals. A Guide to Practi-
tioners. John Wiley & Sons, New York, NY, 1991. [1l5] B Huang and SL Shah. Performance Assessment of Control Loops.
Springer- Verlag, London, UK, 1999.
[101] FR Hampel, EM Ronchetti, PJ Rousseeuw, and WA Stahel. Robust
Statistics: The Approach Based on Influence Functions. John Wiley [1l6] B Huang, SL Shah, and EK Kwok. Good, bad, or optimal? Perfor-
& Sons, New York, NY, 1986. mance assessment of multivariable processes. Automatica 33:ll75-
ll83, 1997. '
[102] TJ Harris. Assessment of control loop performance. Can. J. Chem.
Engg., 67:856-861, 1989. [1l7] XD Huang, Y A~iki, and MA Jack. Hidden Markov Models for Speech
Recogmtwn. Edmburgh University Press, Edinburgh, UK, 1990.
[103] TJ Harris, F Boudreau, and JF MacGregor. Performance assessment
of multivariate feed back controllers. A utomatica, 32: 1505-1518, 1996. [1l8] R Hudlet and R Johnson. Linear discrimination and some further
[104] T J Harris and WH Ross. Statistical process control procedures for results on best lower dimensional representations. In V Rzyin, editor,
correlated observations. Canadian J. Chern. Engg., 69:48-57, 1991. Classificatwn and Clustering, pages 371-394. Academic Press, Inc.,
New York, NY, 1977.
[105] T J Harris, CT Seppala, and LD Desborough. A review of performance
monitoring and assessment techniques for univariate and multivariate [1l9] A Hyvarinen, J Karhunen, and E Oja. Independent Component Anal-
control systems. J. Process Control, 9:1-17, 1999. ysis. John Wiley & Sons, New York, NY, 2001.
286 BIBLIOGRAPHY BIBLIOGRAPHY 287

[120] JE Jackson. Principal components and factor analysis : Part I -


[133] BC Juricek, DE Seborg, and WE Larimore. Fault detection using
principal components. J. Quality Technology, 12(4):201-213, 1980.
canonical variate analysis. Ind. fj Engg. Chem. Research, 43:458-
[121] JE Jackson. A Users Guide to Principal Components. John Wiley & 474, 2004.
Sons, New York, NY, 1991.
[134] C Jutten and J Herault. Blind separation of sources: 1. An adaptive
[122] JE Jackson and GS Mudholkar. Control procedures for residuals as- algorithm based on neuromimetric architecture. Signal Process., 24:1,
sociated with principal components analysis. Technometrics, 21:341- 1991.
349,1979.
[135] RE Kalman. A new approach to linear filtering and prediction prob-
[123] M Jelali. An overview of controller performance assessment technol- lems. Trans. ASME - J. Basic Engineering, 82:34-45, 1960.
ogy and industrial applications. Control Engg. Practice, 14:441-466,
2006. [136] LC Kammer, RR Bitmead, and PL Bartlett. Optimal controller prop-
erties from closed-loop experiments. Automatica, 34:83-91, 1998.
[124] XJ Jiao, MS Davies, and GA Dumont. Wavelet packet analysis of
paper machine data for control assessment and trim loss optimization. [137] M Kano, S Hasebe, I Hashimoto, and H Ohno. Evolution of multi-
Pulp fj Paper Canada, 105(9):T208-211, 2004. variate statistical process control: Independent component analysis
and external analysis. Comput. fj Chem. Engg., 28(6-7):1157-1166,
[125] P Jofriet, C Seppala, M Harvey, B Surgenor, and TJ Harris. An 2004.
expert system for control loop performance. Pulp fj Paper Canada,
97:207-211,1996. [138] M Kano, S Tanaka, S Hasebe, I Hashimoto, and H Ohno. Monitoring
independent components for fault detection. AICHE J., 49:969-976,
[126] RA Johnson and DW Wichern. Applied Multivariate Statistical Anal- 2003.
ysis. Prentice-Hall, Englewood Cliffs, NJ, 4th edition, 1998.
[139] SJ Kendra, MR Basila, and A Cinar. Intelligent process control with
[127] LPM Johnston and MA Kramer. Probability density estimation using supervisory knowledge-based systems. IEEE Control Systems, 14:37-
elliptical basis functions. AIChE J., 40:1639-1649, 1994. 47, 1994.

[128] IT Jolliffe. Principal Component Analysis. Springer-Verlag, New [140] SJ Kendra and A Cinar. Controller performance assessment by fre-
York, NY, 1986. quency domain techniques. J. Process Control, 7(3):181-194, 1997.

[129] IT Jolliffe. Principal Component Analysis. Springer-Verlag, New [141] P Kesavan and JH Lee. Diagnostic tools for multivariable model-
York, NY, 2nd edition, 2002. based control systems. Ind. fj Engg. Chem. Research, 36:2725-2738,
1997.
[130] EM Jordaan and GF Smits. Estimation of the regularization parame-
ter for support vector regression. In Proc. World Conf. Computational [142] KB Konstantinov and T Yoshida. Real-time qualitative analysis of
Intelligence, pages 2785-2791, Honolulu, Hawaii, 2002. the temporal shapes of (bio)process variables. AIChE J., 38(11):1703-
1715, 1992.
[131] M Jordan. Attractor dynamics and parallelism in a connectionist
sequential machine. In Proc. Eighth Annual Conf. of the Cognitive [143] F Kosebalaban and A Cinar. Integration of multivariate SPM and
Science Society, Amherst, MA, 1986. FDD by parity space technique for a food pasteurization process.
Comput. fj Chem. Engg., 25:473-391, 2001.
[132] BC Juricek, DE Seborg, and WE Larimore. Predictive monitoring
for abnormal situation management. J. Process Control, 11:111-128,
[144] T Koski. Hidden Markov Models for Bioinformatics. Prentice-Hall,
2001. Boston, MA, 1999.
288 BIBLIOGRAPHY BIBLIOGRAPHY 289

[145] T Kourti and JF MacGregor. Process analysis, monitoring and diag- [158] WE Larimore. System identification, reduced-order filtering and mod-
nosis using multivariate projection methods. Chemometrics fj Intell. eling via canonical variate analysis. In Proc. of Automatic Control
Lab. Sys., 28:3-21, 1995. Conf., page 445, 1983.
[146] T Kourti and JF MacGregor. Multivariate SPC methods for process [159] WE Larimore. Canonical variate analysis in identification, filtering,
and product monitoring. J. Quality Technology, 28(4):409-428,1996. and adaptive control. In Proc. of IEEE Conf. on Decision and Con-
trol, page 596, 1990.
[147] BR Kowalski. Chemical Process Control - V Conference Proceed-
ings, chapter Process Analytical Chemical Engineering, pages 97-101. [160] WE Larimore. Identification and filtering of nonlinear systems using
AIChE Symposium Series 316. CACHE-AIChE, 1997. canonical variate analysis. In Nonlinear Modeling and Forecasting:
Proc of the Workshop on Nonlinear Modeling and Forecasting, Santa
[148] DJ Kozub. Controller performance monitoring and diagnosis: Ex- Fe, NM, Vol 12. Addison-Wesley, 1990.
periences and challenges. In CPC V Proceedings, pages 83-96, Lake
Tahoe, NV, 1997. [161] M LeBlanc and R Tibshirani. Adaptive principal surfaces. J. Amer.
Statist. Assoc., 89(425):53-64, 1994.
[149] DJ Kozub and CE Garcia. Monitoring and diagnosis of automat-
ed controllers in the chemical process industry. In AICHE Annual [162] J-M Lee, SJ Qin, and I-B Lee. Fault detection and diagnosis of
Meeting, St. Louis, MO, 1993. multivariate processes based on modified independent components
analysis. AIChE J., 2006. Submitted.
[150] MA Kramer. Nonlinear principal component analysis using autoas-
sociative neural networks. AIChE J., 37:233-243, 1991. [163] J-M Lee, CK Yoo, and I-B Lee. New monitoring technique with
ica algorithm in wastewater treatment process. Water Science and
[151] MA Kramer. Autoassociative neural networks. Comput. fj Chem. Technology, 47:49-56,2003.
Engg., 16(4):313-328,1992.
[164] J-M Lee, CK Yoo, and I-B Lee. Statistical process monitoring with
[152] MA Kramer and JA Leonard. Diagnosis using backpropagation neural independent components analysis. J. Process Control, 14:467-485,
networks Analysis and criticism. Comput. fj Chern. Engg., 14:1323- 2004.
1338, 1990. [165] J Leonard and MA Kramer. Improvement of the backpropagation
[153] JV Kresta, JF MacGregor, and TE Marlin. Multivariate statistical algorithm for training neural networks. Comput. fj Chem. Engg.,
monitoring of process operating performance. Canadian J. Chem. 14(3):337-341, 1990.
Engg., 69:35-47, 1991. [166] J Leonard, MA Kramer, and LH Ungar. A neural network archi-
tecture that computes its own reliability. Comput. fj Chem. Engg.,
[154] WJ Krzanowski. Between-groups comparison of principal compo-
16(9):819-835,1992.
nents. J. Amer. Statist. Assoc., 74:703-707, 1979.
[167] IJ Leontaritis and SA Billings. Input-output parametric models for
[155] WJ Krzanowski. Cross-validation choice in principal component anal-
nonlinear systems. Int. J. Control., 41:303-344, 1985.
ysis. Biometrics, 43:575-584, 1987.
[168] D Lieftucht, U Kruger, L Xie, T Littler, Q Chen, and S-Q Wang.
[156] A Kulkarni, VK Jayaraman, and BD Kulkarni. Support vector classi- Statistical monitoring of dynamic multivariate processes Part 2.
fication with parameter tuning assisted by agent-based systems. Com- Identifying fault magnitude and signature. Ind. fj Engg. Chem. Re-
put. fj Chem. Engg., 28:311-318,2004. search, 45:1677-1688, 2006.
[157] S Lakshminarayanan, SL Shah, and K Nandakumar. Identification [169] F Lindgren, P Geladi, S Riinnar, and S Wold. Interactive variable
of Hammerstein models using multivariate statistical tools. Chem. selection (IVS) for PLS. Part 1. Theory and algorithms. J. Chemo-
Engg. Science, 50(22):3599-3613, 1995. metrics, 8:349-363, 1994.
290 BIBLIOGRAPHY BIBLIOGRAPHY 291

[170] L Ljung. System Identification: Theory for the user. Prentice-Hall, [183] EC Malthouse. Limitations of nonlinear PCA as performed with
Englewood Cliffs, NJ, 2nd edition, 1999. generic neural networks. IEEE Trans. on Neural Networks, 9(1):165-
173, 1998.
[171] L Ljung and T Glad. Modeling of Dynamic Systems. Prentice-Hall,
Englewood Cliffs, NJ, 1994. [184] EC Malthouse, AC Tamhane, and RSH Mah. Nonlinear partial least
squares. Comput. fj Chem. Engg., 21(8):875-890, 1997.
[172] A Lorber, L Wangen, and B Kowalski. A theoretical foundation for
the PLS algorithm. J. Chemometrics, 1:19-31, 1987. [185] B Maner, FJ Doyle III, B Ogunnaike, and R Pearson. Nonlinear
model predictive control of a multivariable polymerization reactor
[173] C-W Lu and MR Reynolds, Jr. Control charts based on residuals for using second-order Volterra series. Automatica, 32:1285-1302, 1996.
monitoring autocorrelated processes. Technical Report 94-8, Depart-
[186] HD Maragah and WH Woodall. The effect of autocorrelation on the
ment of Statistics, Virginia Polytechnic Institute and State Universi-
retrospective x-chart. J. Statist. Comput. Simul., 40:29-42, 1992.
ty, Blacksburg, Virginia, 1994.
[187] PZ Marmarelis and VZ Marmarelis. Analysis of Physiological Sys-
[174] H Liitkepohl. Introduction to Multiple Time Series Analysis.
tems. Plenum Press, New York, NY, 1978.
Springer-Verlag, Berlin, Germany, 1991.
[188] H Martens and T Nres. Multivariate Calibration. John Wiley & Sons,
[175] CB Lynch and GA Dumont. Control loop performance monitoring. New York, NY, 1989.
IEEE Trans. on Control System Technology, 4:185-192, 1996.
[189] EB Martin and AJ Morris. Monitoring performance in flexible pro-
[176] JF MacGregor. Some statistical process control methods for autocor- cess monitoring. In Preprints IFAC ADCHEM 7, pages 47-54, Hong
related data - discussion. J. Quality Technology, 23:198-199, 1991. Kong, 2004.
[177] JF MacGregor, C Jaeckle, C Kiparissides, and M Koutoudi. Process [190] RL Mason and JC Young. Multivariate Statistical Process Control
monitoring and diagnosis by multiblock PLS methods. AIChE J., with Industrial Applications. ASA-SIAM, Philadelphia, 2002.
40(5):826-838, 1994.
[191] MathWorks. Matlab® System Identification Toolbox. The MathWork-
[178] JB MacQueen. Some methods for classification and analysis of mul- s, Inc., Natick, MA, 2001.
tivariate observations. In Proc. 5th Berkeley Symp. on Mathematical
[192] WS McCulloch and W Pitts. A logical calculus of the ideas immanent
Statistics and Probability, volume 1, pages 281-297, Berkeley, CA,
in nervous activity. Bull. Mathematical Biophysics, 5:115-133, 1943.
1967. Univesity of California Press.
[193] RC McFarlane, RC Reineman, JF Bartee, and C Georgakis. Dynamic
[179] PC Mahalanobis. On tests and measures of group divergence. J. simulator for a model IV fluid catalytic cracking unit. Comput. fj
Proc. Asiatic Soc. Bengal, 26:541-588, 1930. Chem. Engg., 17(3):275-300,1993.
[180] A Maksumov, R Vidu, A Palazoglu, and P Stroeve. Enhanced fea- [194] GJ McLachlan. Discriminant Analysis and Statistical Pattern Recog-
ture analysis using wavelets for scanning probe microscopy images of nition. John Wiley & Sons, New York, NY, 1992.
surfaces. J. Colloid and Interface Science, 272:365-377, 2004.
[195] CA McNabb and SJ Qin. Projection based MIMO control perfor-
[181] ER Malinowski. Statistical F-tests for abstract factor analysis and mance monitoring - I. Covariance monitoring in state space. J. Pro-
target testing. J. Chemometrics, 3:49-60, 1988. cess Control, 13:739-759,2003.

[182] SG Mallat. A theory for multiresolution signal decomposition: The [196] CA McNabb and SJ Qin. Projection based mimo control performance
wavelet representation. IEEE Trans. on Pattern Analysis and Ma- monitoring - II. Measured disturbances. J. Process Control, 15:89-
chine Intelligence, 11:674-693, 1989. 102,2005.
292 BIBLIOGRAPHY BIBLIOGRAPHY 293 I
[197] P Miller and RE Swanson. Contribution plots: The missing link in [210] A Negiz and A Cinar. PLS, balanced and canonical variate realization
multivariate quality control. In 31th Annual Fall Technical Conf., techniques for identifying varma models in state space. Chemometrics
ASQC, Rochester, NY, 1993. &J Intell. Lab. Sys., 38:209-221, 1997.
[198] P Miller, RE Swanson, and CF Heckler. Contribution plots: The [211] A Negiz and A Cinar. Statistical monitoring of multivariable dynamic
missing link in multivariate quality control. Int. J. App. Math. &J processes with state-space models. AIChE J., 43(8):2002-2020, 1997.
Compo Science, 8(4):775-792, 1998.
[212] A Negiz and A Cinar. Monitoring of multivariable dynamic processes
[199] M Misiti, G Oppenheim, J-M Poggi, and Y Misiti. Wavelet Toolbox and sensor auditing. J. Process Control, 8(5-6):375-380, 1998.
User's Guide (For Use with Matlab®). Mathworks, Natick, MA, 1996.
[213] A Negiz, ES Lagergren, and A Cinar. Mathematical models for con-
[200] M Misra, S Kumar, SJ Qin, and D Seemann. Error based criterion for current spray drying. Ind. &J Engg. Chem. Research, 34:3289-3302,
on-line wavelet data compression. J. Process Control, 11(6):717-731, 1995.
2001.
[214] PRC Nelson, PA Taylor, and JF MacGregor. Missing data methods
[201] RR Mohler. Bilinear Control Processes. Academic Press, New York, in PCA and PLS: score calculations with incomplete observations.
NY, 1973. Chemometrics &J Intell. Lab. Sys., 35:45-65, 1996.
[202] DC Montgomery and CM Mastrangelo. Some statistical process con- [215] RB Newell and PL Lee. Applied Process Control: A Case Study.
trol methods for autocorrelated data. J. Quality Technology, 23:179- Prentice-Hall, Englewood Cliffs, NJ, 1988.
193, 1991.
[216] I Nimmo. Adequately addressing abnormal operations. Chem. Engg.
[203] DC Montgomery and GC Runger. Applied Statistics and Probability Progress, 91:36-45, 1995.
for Engineers. John Wiley & Sons, New York, NY, 1st edition, 1994.
[217] P Nomikos. Detection and diagnosis of abnormal batch opera-
[204] M Morari and L Ricker. Model Predictive Control Toolbox for use tions based on multiway principal components analysis. ISA Trans.,
with Matlab®. The Mathworks Inc., Natick, MA, 1998. 35:259-266, 1996.
[205] RL Motard and B Joseph. Wavelet Applications in Chemical Engi- [218] P Nomikos and JF MacGregor. Multivariate SPC charts for moni-
neering. Kluwer Academic Publishers, Boston, MA, 1994. toring batch processes. Technometrics, 37:41-59, 1995.
[206] T Naes and T Isaksson. Splitting of calibration data by cluster anal- [219] A Norvilas, A Negiz, J DeCicco, and A Cinar. Intelligent process
ysis. J. Chemometrics, 5:49-65, 1991. monitoring by interfacing knowledge-based systems and multivariate
[207] A Negiz. Statistical dynamic modeling and monitoring methods for statistical monitoring. J. Process Control, 10(4):341-350,2000.
multivariable continuous processes. PhD thesis, Illinois Institute of [220] AV Oppenheim and RW Schafer. Discrete-Time Signal Processing.
Technology, Department of Chemical and Environmental Engineer- Prentice-Hall, Englewood Cliffs, NJ, 1989.
ing, Chicago, IL, 1995.
[221] A Papoulis. Signal Analysis. McGraw-Hill, New York, NY, 1977.
[208] A Negiz and A Cinar. On the detection of multiple sensor abnormal-
ities in multivariable processes. In Proc. American Control Confer- [222] RS Patwardhan and SL Shah. Issues in performance diagnostics of
ence, pages 2364-2369, 1992. model-based controllers. J. Process Control, 12(3):413-427,2002.
[209] A Negiz and A Cinar. A parametric approach to statistical monitor- [223] RS Patwardhan, SL Shah, G Emoto, and H Fujii. Performance anal-
ing of processes with autocorrelated observations. In AIChE Annual ysis of model-based predictive controllers: An industrial case study.
Meeting, Miami, FL, 1995. In Proc. of AIChE Annual Meeting, Miami Beach, FL, 1998.
294 BIBLIOGRAPHY BIBLIOGRAPHY 295

[224] R Payne. Predictive sensor diagnostics reduce downtime and costs. [238] SJ Qin and J Yu. Multivariable controller performance monitoring. In
I€3CS, 14:59-63, 1993. Prep. IFAC ADCHEM 2006, pages 593-600, Gramado, Brazil, 2006.

[225] K Pearson. Mathematical contributions to the theory of evolution [239] L Rabiner and BH Juang. Fundamentals of Speech Recognition.
XIII. On the theory of contingency and its relation to association Prentice-Hall, Englewood Cliffs, NJ, 1993.
and normal correlation. Drapers Co. Res. Mem. Biometric series I,
[240] LR Rabiner. A tutorial on hidden Markov models and selected appli-
Cambridge University Press, London, UK, 1901.
cations in speech recognition. Proceedings of the IEEE, 77:257-286,
[226] K Pearson. On lines and planes of closest fit to systems of points in 1989.
space. Philos. Mag., 2:559, 1901.
[241] A Raich and A Cinar. Multivariate statistical methods for monitoring
[227] RK Pearson and BA Ogunnaike. Nonlinear Process Control, chapter continuous processes: Assessment of discrimination power of distur-
Nonlinear Process Identification. Prentice-Hall PTR, Upper Saddle bance models and diagnosis of multiple disturbances. Chemometrics
River, NJ, 1997. €3 Intell. Lab. Sys., 30:37-48, 1995.
[228] DB Percival and AT Walden. Wavelet Methods for Time Series Anal- [242] A Raich and A Cinar. Statistical process monitoring and disturbance
ysis. Cambridge University Press, Cambridge, UK, 2000. diagnosis in multivariable continuous processes. AIChE J., 42(4):995-
1009, 1996.
[229] AA Petrosian and FG Meyer. Wavelets in Signal and Image Analysis:
From Theory to Practice. Kluwer Academic, Boston, MA, 200l. [243] A Raich and A Cinar. Diagnosis of process disturbances by statistical
distance and angle measures. Comput. €3 Chem. Engg., 21(6):661-
[230] NP Piercy. Sensor failure estimators for detection filters. IEEE Trans.
673, 1997.
on Automatic Control, 37:1553-1558, 1992.
[244] JB Rawlings and I Chien. Gage control of sheet and film forming
[231] M Pottmann and R Pearson. Block-oriented NARMAX models with
processes. AIChE J., 42(3):753-766, 1996.
output multiplicities. AIChE J., 44(1):131-140, 1998.
[245] R Redner and H Walker. Mixture densities, maximum likelihood and
[232] M Pottmann and DE Seborg. Identification of nonlinear processes
the EM algorithm. SIAM Rev., 26:195-239, 1994.
using reciprocal multiquadric functions. J. Process Control, 2:189-
203, 1992. [246] GC Reinsel. Elements of Multivariate Time Series Analysis. Springer-
[233] MB Priestley. Nonlinear and Nonstationary Time Series Analysis. Verlag, New York, NY, 2nd edition, 1997.
Academic Press, London, UK, 1988. [247] R Rengaswamy and V Venkatasubramanian. A syntactic pattern-
[234] DC Psichogios and LH Ungar. SVD-NET:An algorithm that auto- recognition approach for process monitoring and fault diagnosis. Eng-
matically selects network structure. IEEE Trans. on Neural Networks, g. App. of Artificial Intelligence, 8:35-51, 1995.
5(3):513-515,1994. [248] RR Rhinehart. A watchdog for controller performance monitoring.
[235] S Qian. Introduction to Time-Frequency and Wavelet Transforms. In Proceedings of American Control Conference, Seattle, WA, 1995.
Prentice-Hall, Upper Saddle River, NJ, 2002.
[249] A Rigopoulos, Y Arkun, and F Kayihan. Full CD profile control of
[236] SJ Qin. Controller performance monitoring. A review and assessment. sheet forming processes using adaptive PCA and reduced order MPC
Comput. €3 Chem. Engg., 23:178-186,1998. design. In Proceedings of ADCHEM'97, page 396, 1997.

[237] SJ Qin, S Valle, and MJ Piovoso. On unifying multiblock analysis [250] A Rigopoulos, Y Arkun, and F Kayihan. Identification of full profile
with application to decentralized process monitoring. J. Chemomet- disturbance models for sheet forming processes. AIChE J., 43(3):727-
rics, 15:715-742, 2001. 739, 1997.
296 BIBLIOGRAPHY BIBLIOGRAPHY 297

[251] A Rigopoulos, Y Arkun, and F Kayihan. A novel approach to full [263] T Sastri. A recursive estimation algorithm for adaptive estimation
CD profile control of sheet forming processes using adaptive PCA and and parameter change detection of time series models. 1. Gp. Res.
reduced order IMC design. Comput. 8 Chem. Engg., 22(7-8):945- Soc., 37:987-999, 1986.
962,1998.
[264] J Schaefer and A Cinar. Multivariable MPC system performance
[252] BD Ripley. Pattern Recognition and Neural Networks. Cambridge assessment, monitoring, and diagnosis. J. Process Control, 14(2):113-
University Press, New York, NY, 1996. 129,2004.

[265] DW Scott. Multivariate Density Estimation: Theory, Practice and


[253] JA Romagnoli and A Palazoglu. Introduction to Proces Control. CRC
Visualization. John Wiley & Sons, New York, NY, 1992.
Press / Taylor & Francis, Boca Raton, FL, 2005.
[266] R Shao, F Jia, EB Martin, and AJ Morris. Wavelets and nonlinear
[254] JK Romberg, H Choi, and RG Baraniuk. Bayesian tree structured principal components analysis for process monitoring. Control Engg.
image modeling using wavelet-domain hidden Markov models. IEEE Practice, 7:865-879, 1999.
Trans. on Image Processing, 10:1056-1068,2001.
[267] WA Shewhart. Economic Control of Quality of Manufactured Prod-
[255] M Rossi and C Scali. A comparison of techniques for automatic uct. Van Nostrand, New York, NY, 1931.
detection of stiction: Simulation and application to industrial data.
J. Process Control, 15:505-514,2005. [268] BW Silverman. Density Estimation for Statistics and Data Analysis.
Chapman & Hall, London, UK, 1986.
[256] M Rudemo. Empirical choice of histograms and kernel density esti-
[269] SIMCA-P (Version 11.0), 2006. UMETRICS AB, Umea, Sweden,
mators. Scand. J. Statistics, 9:65-78, 1982.
(www.umetrics.com ).
[257] DE Rumelhart and JL McClelland, editors. Parallel Distributed Pro- [270] A Singhal and DE Seborg. Pattern matching in historical batch data
cessing: Explorations in the Microstructure of Cognition, volume 1. using PCA. IEEE Control Systems Magazine, 22:53-63, 2002.
MIT Press, Cambridge, MA, 1986.
[271] A Singhal and DE Seborg. Pattern matching in multivariate time
[258] GC Runger and FB Alt. Choosing principal components for mul- series databases using a moving window approach. Ind. 8 Engg.
tivariate statistical process control. Commun. Statist. Theory 8 Chem. Research, 41:3822-3838,2002.
Methods, 25(5):909-922, 1996.
[272] A Singhal and DE Seborg. Effect of data compression on pattern
[259] GC Runger, TR Willemain, and S Prabhu. Average run lengths matching in historicals data. Ind. 8 Engg. Chem. Research, 44:3203-
for CUSUM control charts applied to residuals. Commun. Statist. - 3212, 2005.
Theory 8 Methods, 24(1):273-282, 1995. [273] A Singhal and DE Seborg. Evaluation of pattern matching method
for the Tennessee Eastman challenge problem. J. Process Control,
[260] EL Russell, LH Chiang, and RD Braatz. Data-driven Methods for
16:601-613, 2006.
Fault Detection and Diagnosis in Chemical Processes. Springer-
Verlag, London, UK, 2000. [274] J Sjoberg, Q Zhang, L Ljung, A Benveniste, B Delyon, P Glorennec,
H Hjalmarsson, and A Juditsky. Nonlinear black-box modeling in
[261] TP Ryan. Some statistical process control methods for autocorrelated system identification: A unified overview. A utomatica, 31 (12): 1691-
data - discussion. J. Quality Technology, 23:200-202, 1991. 1721, 1995.
[262] AA Safavi, J Chen, and JA Romagnoli. Wavelets-based density es- [275] JC Skelton, PE Wellstead, and SR Duncan. Distortion of web profiles
timation and application to process monitoring. AIChE J., 43:1227- by scanned measurements. Pulp 8 Paper Canada, 104(12):T316-319,
1241, 1997. 2003.
298 BIBLIOGRAPHY BIBLIOGRAPHY 299

[276] AK Smilde, R Bro, and P Geladi. Multiway Analysis: Applications


[289] E Tatara. An Integrated Knowledge-Based System for Automated
in the Chemical Sciences. John Wiley & Sons, New York, NY, 2004.
System Identification, Monitoring, and Sensor Audit for Multivariate
[277] P Smyth. Hidden Markov models for fault detection in dynamic Processes. Master's thesis, Illinois Institute of Technology, Chicago,
IL, 1999.
systems. Pattern Recognition, 27:149-164, 1994.
[290] E Tatara and A Cinar. An intelligent system for multivariate sta-
[278] T Sodersrtom and P Stoica. System Identification. Prentice-Hall,
tistical process monitoring and diagnosis. ISA Trans., 41:255-270,
Englewood Cliffs, New Jersey, 1989. 2002.
[279] HW Sorenson and DL Alspach. Recursive Bayesian estimation using [291] F Teymour. The Dynamic Behavior of Free-Radical Solution Poly-
Gaussian sums. Automatica, 7:465-479, 1971. merization in Continuous Stirred Tank Reactors. PhD thesis, Uni-
versity of Wisconsin, Madison, 1989.
[280] R Srinivasan, C. Wang, WK Ho, and KW Lim. Dynamic principal
component analysis based methodology for clustering process states [292] DJ Thomson. Spectrum estimation and harmonic analysis. Proceed-
in agile chemical plants. Ind. fj Engg. Chem. Research, 43:2123-2139, ings of IEEE, 70:1055-1096,1982.
2004.
[293] NF Thornhill and A Horch. Advances and new directions in plant-
[281] N Stanfelj, TE Marlin, and JF MacGregor. Monitoring and diagnosis wide controller performance assessment. In Prep. IFAC ADCHEM
of process control performance: The single-loop case. Ind. fj Engg. 2006, pages 29-36, Gramado, Brazil, 2006.
Chem. Research, 32:301-314, 1993.
[294] NF Thornhill, M Oettinger, and P Fedenczuk. Refinery-wide control
[282] CM Stein. Estimation of the mean of a multivariate normal distribu- loop performance assessment. J. Process Control, 9:109-124, 1999.
tion. Ann. Statistics, 9:1135-1151, 1981.
[295] S Thorvaldsen. A tutorial on Markov models based on Mendel's clas-
[283] CL Stork and BR Kowalski. Distinguishing between process upsets sic experiments. J. Bioinformatics and Compo Biology, 3:1441-1460,
2005.
and sensor malfunctions using sensor redundancy. Chemometrics fj
Intel/. Lab. Sys., 46:117-131, 1999. [296] F Tokatli-Kosebalaban and A Cinar. Fault detection and diagnosis in
a food pasteurization process with hidden Markov models. Canadian
[284] CL Stork, DJ Veltcamp, and BR Kowalski. Identification of multiple J. Chem. Engg., 82:1-11,2004.
sensor disturbances during process monitoring. Analytical Chemistry,
69:5031-5036, 1997. [297] H Tong. Threshold Models in Nonlinear Time Series Analysis.
Springer-Verlag, New York, NY, 1983.
[285] G Strang and T Nguyen. Wavelets and Filter Banks. Wellesley-
Cambridge Press, Wellesley, MA, 1996. [298] ND Tracy, JC Young, and RL Mason. Multivariate control charts for
individual observations. J. Quality Control, 24(2):88-95, 1992.
[286] W Sun, A Palazoglu, and JA Romagnoli. Detecting abnormal process
trends by wavelet-domain hidden Markov models. AIChE I, 749:140- [299] JW Tukey. Exploratory Data Analysis. Addison-Wesley, Reading,
150,2003. MA,1970.

[300] ML Tyler and M Morari. Performance monitoring of control systems


[287] JA Suykens, TV Gestel, J de Brabanter, B De Moor, and J Van-
using likelihood methods. Automatica, 32:1145-1162, 1996.
derwalle. Least Squares Support Vector Machines. World Scientific
Publishing Co., Singapore, 2002. [301] C Undey, S Ertunc, and A Cinar. Online batch/fed-batch process
performance monitoring, quality prediction, and variable contribution
[288] V-M Taavitsainen and P Korhonen. Nonlinear data analysis with
analysis for diagnosis. Ind. fj Engg. Chem. Research, 42:4645-4658,
latent variables. Chemometrics fj Intell. Lab. Sys., 14:185-194,1992. 2003.
bWU UGHAJ -lHY BIBLI OGRA PHY
301
[302] C Undey, E Tatara , and A Cinar. Real-ti me batch process supervi sion [314] M Verhaegen and D Westwick. Identif ying MIMO Wiener
by integra ted knowledge-based system s and multiva riate statisti system s
cal using subspa ce model identifi cation method s. In Proc. of 34th Conf.
method s. Engg. App. Artifici al Intelligence, 16:555-566, 2003.
on Decision and Control, numbe r FPI4, 1995.
[303] C Undey, E Tatara , and A Cinar. Intelligent real-tim e perform ance [315] V Volterra. Theory of Functionals and Integro-Differential Equatio
monito ring and quality predict ion for batch/f ed-batc h cultiva tions. ns.
Dover, New York, NY, 1959.
1. Biotechnology, 108(1) :61-77, 2004.
[316] X Wang, U Kruger , and GW Irwin. Proces s monito ring
[304] C Undey, E Tatara , BA Williams, G Birol, and A Cinar. A hybrid approa ch
using fast moving window PCA.
supervi sory knowledge-based system for monito ring penicillin fermen
-
tation. In Proc. Americ an Control Conf., volume 6, pages 3944-3948, [317] Z Wang, C Di Massimo, MT Tham, and AJ Morris. Proced
ure for
Chicago, IL, 2000. determ ining the topolog y of multila yer feedforward neural networ
ks.
Neural Networks, 7(2):291-300, 1994.
[305] Evan der Burg and J de Leeuw. Nonlin ear canonic al correla
tion.
British J. Math. Statist. Psychol., 36:54'"-80, 1983. [318] Z Wang, MT Tham, and AJ Morris. Multila yer feedforward
neural
networks: A canonic al form approx imation of nonlinearity. Int.
[306] T Van Gestel, J Suykens, G Lanckr iet, A Lambre chts, B De J.
Moor, Control, 56(3):655-672, 1992.
and J Vandewalle. Bayesi an framework for least squares suppor
t
vector machin e classifiers, Gaussi an processes, and kernel Fisher dis- [319] LE Wange n and BR Kowalski. A multibl ock partial least
squares
crimina nt analysis. Neural Computation, 15:111 5-1148 ,2002. algorit hm for investi gating comple x chemical system s. J. Chemo met-
rics, 3(1):3- 20, 1989.
[307] P van Overschee and B De Moor. N4SID : Subspa ce algorith
ms
for the identification of combined determ inistic- stochas tic system [320] M Weighell, EB Martin , M Bachm ann, AJ Morris, and J Friend.
s. Mul-
Automa tica, 30:75-93, 1994. tivaria te statisti cal process control applied to an industr ial produc tion
facility. In Proc. of A DCHEM '97, pages 359-36 4, 1997.
[308] V Vapnik. The Nature of Statistical Learning Theory.
Springer- [321] PJ Werbos. Beyond Regression: New Tools for Prediction and
Verlag, New York, NY, 1995. Anal-
ysis in the Behavioral Sciences. PhD thesis, Harvar d University,
in
[309] SV Vaseghi. Advanced Signal Processing and Digital Noise Reduct Applied Mathem atics, 1984.
ion.
John Wiley & Sons, New York, NY, 1996.
[322] JA Westerhuis, T Kourti, and JF MacGr egor. Analysis of multibl
ock
[310] A Vasilache, B Dahhou , G Roux, and G Goma. Classification of and hierarc hical PCA and PLS models. J. Chemometrics, 12:301-321,
fermen tation process models using recurre nt neural networks. Int. 1998.
J.
System s Science, 32(9):1139-1154, 200l.
[323] Wester n Electri c Compa ny. Statisti cal Quality Control Handbo
ok.
[311] V Venkat asubram anian and K Chan. A neural networ k method AT&T Technologies, Indiana polis, 1984.
ology
for process fault diagnosis. AIChE 1.,35(1 2):199 3-2002 ,1989.
[324] J Weston and C Watkin s. Multi-class suppor t vector machin
es. Tech-
[312] V Venkat asubram anian, R Rengaswamy, SN Kavuri , and K Yin. A nical Report CSD-TR-98-04, Univer sity of London, London, UK,
review of process fault detecti on and diagnosis Part III: Proces s his- 1998.
tory based method s. Comput. f3 Chem. Engg., 27:327 -346,20 03.
[325] JR Whitel ey and JF Davis. Qualita tive interpr etation of sensor
pat-
[313] M Verhaegen and P Dewilde. Subspa ce model identification. Part I: terns. IEEE Expert, 8:54-63, 1993.
The output error state space model identifi cation class of algorith ms.
[326] A Willsky. A survey of design method s for failure detecti on
Int. 1. Control, 56:1187-1210, 1992. in dy-
namic systems. Automa tica, 12:601 -611,19 76.
302 BIBLIOGRAPHY BIBLIOGRAPHY 303

[327] BM Wise and NB Gallagher. The process chemometrics approach to [339] S Wold, M Sjostrom, and L Eriksson. PLS-regression: A basic tool of
process monitoring and fault detection. J. Process Control, 6(6):329- chemometrics. Chemometrics €3 Intell. Lab. Sys., 58:109-130, 200l.
348, 1996.
[340] JC Wong, KA McDonald, and A Palazoglu. Classification of process
[328] BM Wise, NB Gallagher, R Bro, JM Shaver, W Windig, and trends based on fuzzified symbolic representation and hidden Markov
RS Koch. PLS Toolbox 3.5 for use with Matlab®. Eigenvector Re- models. J. Process Control, 8:395-408, 1998.
search, Inc., Manson, WA, 2004.
[341] JC Wong, KA McDonald, A Palazoglu, and T Wada. Application of
[329] BM Wise, NL Ricker, and DJ Veltkamp. Upset and sensor fault de- a fuzzy triangular representation and hidden Markov models classi-
tection in multivariable processes. In AIChE Annual Meeting, Paper fication in the detection of abnormal situations in refining processes.
164b, San Francisco, CA, 1989. In Proceedings of CONTROL 97, pages 566-571, Sydney, Australia,
1997.
[330] BM Wise, DJ Veltkamp, NL Ricker, BR Kowalski, SM Barnes, and
V Arakali. Application of multivariate statistical process control (M- [342] L Xie, U Kruger, D Lieftucht, T Littler, Q Chen, and S-Q Wang.
SPC) to the West Valley slurry-fed ceramic melter process. In Pro- Statistical monitoring of dynamic multivariate processes - Part l.
ceedings of Waste Management '91, pages 169-176, Tucson, AZ, 1991. Modeling autocorrelation and cross-correlation. Ind. €3 Engg. Chem.
Research, 45:1659-1676,2006.
[331] H Wold. Multivariate Analysis, chapter Estimation of principal com-
ponents and related models by iterative least squares, pages 391-420. [343] E Yashchin. Performance of CUSUM control schemes for serially
Academic Press, New York, NY, 1966. correlated observations. Technometrics, 35:37-52, 1993.

[332] S Wold. Cross-validatory estimation of the number of components in [344] S Yoon and JF MacGregor. Principal-component analysis of mul-
factor and principal components analysis. Technometrics, 20(4) :397- tiscale data for process monitoring and fault diagnosis. AIChE J.,
405,1978. 50(11):2891-2903,2004.

[333] S Wold. Nonlinear partial least squares modelling: II. Spline inner [345] Y You and M Nikolaou. Dynamic process modeling with recurrent
relation. Chemometrics €3 Intell. Lab. Sys., 14:71-84, 1992. neural networks. AIChE J., 39(10):1654-1667, 1993.

[334] S Wold, P Geladi, K Esbensen, and J Ohman. Multi-way principal [346] L Zadeh. Fuzzy sets. Inf. Control, 8:338-353, 1965.
component and PLS analysis. J. Chemometrics, 1:41-56,1987.
[347] Y Zhang and MA Henson. A performance measure for constrained
[335] S Wold, S Hellberg, T Lundstedt, M Sjostrom, and H Wold. PLS model predictive controllers. In European Control Conference, Karls-
modeling with latent variables in two or more dimensions. In Proc. ruhe, Germany, 1999.
Symp. on PLS Model Building: Theory and Application, Frankfurt,
[348] SJ Zhao, J Zhang, and YM Xu. Monitoring of processes with multi-
Germany, Sept. 1987.
ple operating modes through multiple principal components analysis
[336] S Wold, N Kettaneh-Wold, and B Skagerberg. Nonlinear PLS mod- models. Ind. €3 Engg. Chem. Research, 43:7025-7035,2004.
eling. J. Chemometrics, 7:53-65, 1989.

[337] S Wold, N Kettaneh-Wold, and K Tjessem. Hierarchical multiblock


PLS and PC models, for easier model interpretation, and as an alter-
native to variable selection. J. Chemometrics, 10:463-482, 1996.

[338] S Wold, A Ruhe, H Wold, and W J Dunn. The collinearity problem in


linear regression. Partial least squares PLS approach to generalized
inverses. SIAM J. Sci. Stat. Comput., 3(5):735-743, 1984.
Index

0: error, 10 Biplots, 100, 179


(3 error, 10 Box's equation, 104

Akaike's information criterion, 88, Canonical variates, 43


95 multipass CVSS for sensor au-
Artificial neural networks, 58 diting, 212
activation function, 59 state-space (CVSS) models,
autoassociative networks, 63, 96
79, 193 Canonical variates analysis, 43, 89,
back-propagation, 58 100
connections, 59 Hankel matrix, 95
learning paradigms, 62 CD control performance, 271
error back-propagation, 62 Classification, 50
reinforcement, 62 with Fisher's discriminant anal-
supervised, 62 ysis, 56
unsupervised, 63 with HMMs, 144, 157
limitations, 59 Cluster analysis, 48
multi-layer feedforward net- Colinearity, 76
works, 61 Confidence limits, 100
neurons, 59 Contribution plots, 46, 100, 174
recurrent networks, 62 Control
sigmoid function, 61 linear quadratic Gaussian (LQG),
topologies, 61 239
Autocorrelated data, 22 model predictive, 238
parameter change detection, Control charts, see Monitoring chart-
27 s
residuals charts, 26 Control limit
Autocorrelation coefficient, 24 lower, 12
Autocovariance, 95 of R chart, 15
Average run length, 17, 19 of S chart, 16
of x chart, 15
Basis functions, 116, 119, 262 on SPE, 108
Beta distribution, 102 selection of, 13
306 INDEX INDEX 307

upper, 12 Denoising, 127, 150, 193, 264 angle-based discriminants, 184 Gram polynomials, 259
warning, 13 Discriminant combined distance discrimi-
Controller performance monitor- angular, 186 nant, 183 Hidden Markov model (HMM), 138,
ing, 231 combined, 183 knowledge-based systems, 178 141, 149, 166
closed-loop potential, 235 Euclidian angle, 185 parity relations, 178 state variables, 168
CPM using minimum variance Fisher's, 53 residual discriminant, 182 states, 169
control, 233 Mahalanobis angle, 185 robust, 191 training, 143
diagnosis of MPC performance, residual, 182 score discriminant, 182 Hidden Markov tree, 147, 157, 162
242, 244 Mahalanobis angle, 187 sensor auditing, 203 Hotelling's statistic, see Monitor-
for model predictive controller- score, 182 sensor faults, 204 ing charts
s,238 linear, 53 using contribution plots, 174 HTST pasteurization , 109 , 167 , 177 ,
comprehensive technique, 241 quadratic, 52 207
using discriminant analysis, 179
expected performance ap- Distance Hypothesis testing, 9, 12
using PLS, 204
proach, 240 Euclidian, 48 Type I error, 10, 13, 14
using statistical methods, 179
historical benchmark, 240 Mahalanobis, 49 Type II error, 10
using SVM, 191
LQG-Benchmark, 239 statistical, 49
Faults Independent component analysis,
model-based performance mea- Distribution
actuator, 111, 177 43
sure, 240 F,101
incipient, 203 mixing matrix, 44
frequency-domain method, 237 X2 , 103, 108
Beta, 102 masking of multiple faults, 190 process monitoring, 112
interactor matrix, 237
chi-squared, 103 multiple simultaneous faults separating matrix, 44
minimum variance control, 237 189 '
multivariable control system- Lambda, 214 sphering matrix, 44
Normal, 8, 15, 34, 96, 102 sensor, 111, 170, 195, 223 Inner product, 64
s, 237
Disturbances, 91, 219 Feature space, 66 Input-output models, 83
single-loop, 233
discrimination from sensor fault- Filter
valve stiction, 233
s, 195, 220 low-pass, 128 k-means clustering, 49
Correlation function, 23 Kernel, 64
multiple simultaneous, 189 median, 128, 130, 133, 193
Correlogram, 24, 79 Mercer's theorem, 64
overlap of means, 190 robust, 133
Cost Kernel density estimation, 64, 198
sensors, 195 Filtering, 127, 136
function for MPC, 238 Knowledge-based systems (KBS),
of misclassification, 50 Final prediction error, 88
Eigenvalues, 39, 262 Fisher's discriminant analysis, 53 178,204,214,238,246
Cross-direction (CD), 251 Eigenvectors, 39, 262 Kurtosis, 44
Cross-validation, 40 kernel-based, 65
Episode, 136 Flatness, 266
CSTR, 152, 164 Estimated Linearization of nonlinear systems,
Cumulative sum (CUSUM) charts, Forced circulation evaporator, 246
covariance matrix, 108 92
11, 18 Fourier transform
of residuals, 104 Jacobian matrices, 93
one-sided, 18 definition, 116
of scores, 101
two-sided, 19 discrete, 117 Machine direction (MD), 251
variance, 102
Exponentially weighted moving av- fast, 117 Markov process, 139
Decomposition erage (EWMA) charts, 11, short-time, 117 Masking, 273
orthogonal, 38, 262 22 Functional redundancy, 203 MD control performance, 269
singular value, 39, 262 Fuzzification, 137 MDjCD decomposition, 253
spectral, 39 Fault diagnosis, 100 Fuzzy logic, 137 Mean, 8
308 INDEX INDEX 309

Minimum variance control, 233 Moving average (MA) charts, 11, pH process, 161 Scatter
Mode, 263 19 Phase-plane, 267 between-class, 53
Model predictive control, 238 estimation of S, 19 Population, 8 matrix
control horizon, 239 process level monitoring, 20 Prediction error, 83 between-class, 55
prediction horizon, 239 spread monitoring, 21 Prediction error sum of squares total,55
tuning parameters, 242 Multivariate statistical process mon- (PRESS),40 within-class, 55
Model-based control performance, itoring (MSMP), 99 Principal components analysis, 37 within-class, 53
271 SPE,103 consensus, 113 Sensor
Models angle-based, 113 dynamic, 113 auditing, 203
ARMA,241 charts, 99 hierarchical, 113 reconstruction, 197, 200, 223
Box-Jenkins, 86 D,103 loadings, 39 Sheet, 251
first principles, 73 Q,103 moving-window, 113 Singular value decomposition, 39,
input-output, 73 SPE, 99, 103, 108, 109 multi-block, 113 262
linear, 73 T 2 , 99, 102 multiscale, 112 Singular values, 95
nonlinear, 74 PC scores biplots, 100 scores, 263 Slurry-fed ceramic melter (SFCM),
linear discrete-time transfer PC scores charts, 101 matrix, 39 224
function, 234 PLS scores biplots, 108 vector, 258, 263 Small process shifts, 18
nonlinear, 96 with state variables, 109 Projection to latent structures, see Spectral decomposition, 39
nonlinear ARMAX, 88 Partial least squares Splines, 82
NIPALS, 42 Pseudo-random binary sequence, Spray drier, 30
nonlinear ARX, 89
Normal operation (NO), 37, 100, 111
nonlinear PCA, 79 Squared prediction error (SP E),
180
nonlinear PLS, 82 103, 107
Normalized performance index, 269 Quadratic discrimination score, 56
output error, 87 Standard deviation, 8
regression, 75 O-NLPCA, 194, 198 Range, 8, 14 State vector, 90
state-space, 89 Orthogonal decomposition, 260 Reference set, 101 State-space models
subspace state-space, 93 Orthogonality, 102 Regression discrete-time, 90
time series, 83 Outliers, 128 coefficients, 76 disturbance, 91
Monitoring charts multivariable linear, 76 linear, 90
cumulative sum (CUSUM), 18 Parameter change detection, 27 nonlinear, 78 linearization of nonlinear mod-
exponentially weighted mov- Partial least squares, 42, 79 nonlinear PCA, 79 els, 92
ing average (EWMA), 22 convergence, 81 partial least squares, 79 nonlinear, 96
for CPM, 243 inner relations, 81 principal components, 78 state variables, 89
moving average (MA), 19 multi-block, 113 ridge, 78 subspace, 93
multivariate, 108 multipass PLS for sensor au- stepwise, 77 Statistical discrimination, 50
Q-statistic, 103 diting, 204 with lagged variables, 79 Statistical process control (SPC),
Hotelling's T 2 , 101 nonlinear iterative algorithm Residuals charts, 27 2, 7
score biplots, 100 (NIPALS), 80 CUSUM charts, 31 Subspace state-space models, 93,
Shewhart , 11 nonlinear PLS, 82 for CPM, 243 100
assumptions of, 13 outer relations, 80 Ridge parameter, 78 canonical variate realization,
mean (x), 14, 15, 17 residuals matrices, 80, 82 Run rules, 14 94, 108
range (R), 12, 14 weight vectors, 80 future data window J, 94
standard deviation, 12, 15 PCA, 262 Sample, 8 Hankel matrix, 94
310
INDEX
I
within samples, 12 OTHER RELATED TITLES OF INTEREST
N4SID, 94, 108
past data window K, 94 Vinyl acetate polymerization, 104,
Sum of squares 214
cumulative prediction (CUM- I!at~h Ferm~ntation: Modeling: Monitoring, and Control
Wavelet filter, 131
PRESS),107 All Cmar, Satlsh J. Parulekar, Cenk Undey, and Gulnur Birol
coefficient denoising, 133
prediction (PRESS), 107 ISBN: 0824740343
hard-thresholding, 132, 264
Support vector machines (SVM),
soft-thresholding, 132
66, 191 Engineering Economics and Economic Design for Process Engineers
Wavelet transform, 264
k-class pattern recognition, 68
decision function, 68
continuous, 121 Thane Brown
discrete, 124 ISBN: 0849382122
dual solution, 67
Multiresolution Signal Decom-
proximal, 191
position, 124 Instrument Engineers' Handbook, Fourth Edition, Volume One:
Wavelets, 145, 157
Temporal, 251, 257
Coiflet, 121
Process Measurement and Analysis
Tennessee Eastman industrial chal- Bela Liptak
Daubechies, 121
lenge problem, 181, 184, ISBN: 0849310830
Haar, 120, 162
187
Morlet, 120
Time series models, 83
Symlet, 121 Introduction to Process Control
autoregressive (AR), 83
Web,251 Jose Romagnoli and Ahmet Palazoglu
autoregressive integrated mov-
ing average (ARIMA), 83 ISBN: 0849334969
autoregressive moving aver-
age with exogenous in- Materials Processing Handbook
puts (ARMAX), 87 Joanna R. Groza, James F. Shackelford, Enrique J. Lavernia, and
autoregressive with exogenous Michael T. Powers
inputs (ARX) , 87 ISBN: 0849332168
exogenous variables, 83
moving average (MA), 83
NARMAX, 88
nonlinear ARX, 89
Transition matrix, 258
Triangular episodes, 135, 150
Type I error, 10
Type II error, 10

Variables
deviation, 92
predictor, 76
state, 89, 90
Variance inflation factor, 77
Variation
between samples, 12

You might also like