Gray Box 1994

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Continuous-time nonlinear signal processing:

A neural network based approach for gray box


identification*
R. Rico-Martinez, J. S. Anderson and I. G. Kevrekidis
Department of Chemical Engineering, Princeton University,
Princeton N J 08544

Abstract
Artificial neural networks (ANNs) are often used for short term dis-
crete time series predictions. Continuous-time models are, however, re-
quired for qualitatively correct approximations to long-term dynamics
(attractom) of nonlinear dynamical systems and their t r d t i o n s (bifur-
cations) M system parameters are varied. In previous work we developed
a black-box methodology for the characterisation of experimental time
series M continuous-time models (sets of ordinary Merentid equations)
baaed on a neural network platform. Thm methodology naturally lends
itself to the identification of partially known first principlu dynamic mod-
els, and here we present its extension to gray-box identification.

1 Introduction
Artificial Neural Networks (ANNs) have proven to be a valuable tool in nonlinear
signal processing applications. Exploiting i d e a common to nonlinear dynamics
(attractor reconstruction) and system identification (ARMA models), method-
ologies for the extraction of nonlinear models from experimental time series
have been developed (e.g. [l, 21) and applied to experimental data. In previous
work, we have discussed some inherent limitations of these techniques (based
on dbcrete-time schemes) in characterising the instabilities and bifurcations of
nonlinear systems depending on operating parameters.
An alternative approach, resulting in continuow-time models (sets of Ordi-
nary Differential Equations (ODES)), also based on a neural network platform,
was devised and implemented [3,4,5]. The approximations constructed in that
*Thiswork was partially iupported by AFlPA/ONR, the Exxon Education Foundstionand
an NSF PYI award. RRM acknowledgesthe support of CONACyT through a fellowship.

0-7803-2026-3194 $4.00 Q 1994 IEEE 596


work can be described an black-box; no insight from first principles modeling of
the system was incorporated in them.
In t h s work we extend the approach to cases where portions of the algebraic
forms of the set of ODEBdescribing the dynamical evolution of the system are
known. We attempt to capture the behavior of the overall system by "hard-
wiring" the known parts and approximating the unknown parts using a neural
network (gray-box identification).
In what follows we first bridly outline our black-box approach for the identi-
fication of continuous systems. This discussion naturally leads to the extension
to gray-box identification. Finally, we illustrate its use through an application
to the modeling of a reacting system with complicated nonlinear kiietica.

2 Black-box approach
Consider the autonomous ODE

x' = F(x';$
x' e R", p' e RP, F:R"XRPHR"
where x' is the vector of state variables, p'ia the vector of operating parameters
and 2 is the vector of derivatives of the state variables with respect to time.
In previous work we showed a way of constructing such a set of ODES from
discretetime experimental measurements of the state variables only ([3, 4, 51,
set also [SI). We embedded the training of a neural network that approximates
the function F(Z;g in a numerical integrator scheme. Both explicit and im-
plicit integrators can be (and have been) used. In addition, we illustrated how
the approach can be used when time series of only a single state variable are
available.
Consider the simple implicit integrator (traperoidal rule) formula for Eq. 1:

where h is the time step of the integration, z,,


is the value of the vector of states
at time t and Xn+l is the (approximate) result of integrating the set of ODES
+
to time (t h). Figure 1(a) schematically depicts a neural network constructed
using this numerical integrator as a template. The boxes labeled "neural net-
work'' represent the sume neural network evaluated with two different sets of
inputs for each training vector. Given the implicit nature of the integrator,
the "prediction" of the integration depends on itself. Training was therefore
done using standard recurrent network training idem [7, 81. Alternatively, a
nonlinear algebraic equation solver can be used, coupled with the training, to
solve exactly for the predicted value at every iteration and for every training

597
ImpUdt Inkgrotor Scheme

X" P P
Figure 1: (a) Schematic of the evaluation of a neural network embedded in the
implicit integrator (trapezoidal rule) of Eq. (2). The implicit dependence of the
prediction of the state on itself results in a backward (recurrent) connection.
(b) Schematic of the evaluation of a neural network for the gray-box approach.
The known part of the model (G(X;p)) is evaluated along with the unknown
part (F(X;p)), approximated by a neural network. In order to calculate errors
(for training) the contribution of the known and unknown parte are combined
using the integrator to give the state of the system at the next sampling time.

vector. Further details can be found in references [4, 51. The use of explicit
integrators is discussed in [3]. Thw identification procedure has been tested for
experimental systems exhibiting complicated dynamics (see e.g. [3, 41).

3 Gray-box approach
The approach discussed above can be combined with first principles modeling
for casea where the full state vector is known while the understanding of the
modeling of the system is only partial. Such an example is encountered in
modeling reacting systems, when the kinetics of the reaction are not known a
priori while inflow and outflow or heat transfer are well understood and easily
modeled.
As in the case of black-box approximations, we embed the training of the
neural network in a numerical integrator ncheme. For gray-boxes, the known
part of the right-hand-side ot the ODES is explicitly calculated (Uhardwired")
and the neural network is trained to approximate only the unknown parts.
Let us aulume for the purposes of the illustration presented here that the
first principles model of a given system takes the simple form:

598
where G(if;grepresents the known part of the model and F(X';p3is the un-
known part. Note that the methodology is not restricted to models of the
additive form of Eq. (3).
Figure l(b) schematically depicts the training procedure for an implicit in-
tegrator. A "global" network is used to predict the state at the next time step.
Some of the weights and nodes in this network are fixed because of the "known"
part of the model; some are fixed because they pertain to the integration scheme
and its constants. A neural "sub"-network is also contained in the scheme, which
will upon training approximate the unknown parts of the right-hand-side of the
system ODES. We again use the implicit integrator of Eq. (2) as the basis for
training this network, which - due to the implicit nature of the integrator - has
recurrent connections and therefore requires multiple evaluations.

4 An illustrative example
In order to illustrate the capabilities of the gray-box approach we will make use
of simulated data from a model reacting system [SI. It consists of a well-stirred
reactor in which a single irreversible reaction A + B occurs on a catalytic
surface. The maas balances for species A on the catalytic surface and the gas
phase take the general (dimensionless) form:

de
-
dr

where 6 is the fractional coverage of the catalytic surface, II is the partial pres-
sure of the reactant in the gas phase, 7 is the dimensionless temperature, T is
the dimensionless time and Ka, K R , Kd, a*,/3 and II* are constants. This has
been suggested as one of the simplest models that can give rise to oscillations in
isothermal catalytic reactions; its main characteristic is the coverage-dependent
desorption activation energy (the e-* term in Eq. (4)) caused by adsorbate-
adsorbate interactions.
To illustrate the dependence of the dynamics on an operating parameter,
we obtained time series from this system for several values of the dimensionless
temperature 7 (keeping the remaining parameters KO = 35, a* = 30, Kd = 350,
II* = 0.36, K R = 8.5 and fl = 0.2 constant). Depending on the value of 7,the
system may evolve towards a steady state, towards oscillatory behavior, or to
either of the two depending on the initial conditions. The variegation in long-
term dynamics makes this example a good test of the approximating capabilities
of the neural network.
Figure 2 shows the bifurcation diagram for this system: a branch of steady
states undergoes a subcritical Hopf bifurcation to oscillatory behavior for 7 =
Figure 2: Bifurcation diagram for the single species surface reaction system with
respect to the dimensionless temperature 7. Solid lines denote stable steady
states, dashed lines unstable steady states, open circles unstable limit cycles
and filled circler stable limit cycles. The maximum B of the periodic trajectory
at each value of 7 is marked.

2.941. There is a small range of values of 7 where a stable large amplitude oscil-
lation coexists with a stable steady state (starting at about 7 N 2.9076). As 7 is
increased the system exhibits, as itB wle long-term attractor, a large amplitude
limit cycle that disappears at 7 N 3.841 via another (now rupemitieol) Hopf
bifurcation.
Figure 3 shows phase portraits of the system for several values of 7 in the
range of the bifurcation diagram of Fig. 2.

5 Network construction and results


Using data representative of the periodic phenomena described above, we tested
the neural network ODEgay-box algorithm for identification. The training set
included several time series (0 and II vs T ) for values of 7 before the subcritical
Hopf (including the region of bietability), after the subcritical Hopf (limit cycle
behavior), as well as after the supercritical Hopf at high values of 7.
For our illustration we asaume that all terms in Eq. 4 are known except for
the term representing the rate of desorption of the reactant from the catalytic
surface. That is, we replace the term K&- w, with an unknown function
f(B,-y)to be approximated through a neural network. The gray model we seek
to construct is of the form:
y=3.0 y-3.0

E .I f q .
I-' IA

. .I

I.4r
.a .4 .* D b
-0
U

. .-I
7'2.9 y-2.926

.I .a ,

0 - 0 - 0
.a .4 D .a .4 .a .4
e e

.a .- .I
y-3.8
./ % 1'3.86

'"j
.4 .4 , .4

I I 0

.I .I. .I

D .I .4 .e
0 .a .4 .D I

Figure 3: Long term attractors for 7 values in the range of the bifurcation
diagram of Fig. 2. Top row: phase portrait of the stable limit cycle at 7 = 3.0
along with segments of the two corresponding time aeries. For the stable steady
stat- (7 = 2.9 and 7 = 3.85) phase portraits of transients approaching the
steady state are shown. In the regiom of bistability (7 in the range (2.925,2.94))
the unstable (and thus experimentally unobservable) limit cycle in the interior
of the large amplitude stable limit cycle is also drawn.

601
..i ...

Figure 4: Predicted desorption rate as a function of surface coverage (e) and


dimensionless temperature (left) and relative prediction error (right). The plot
on the right shows the difference of the predicted minus the actual desorption
rate normalised by the actual rate.

The (feedforward) neural subnetwork, embedded in the numerical integrator


of Fig. l(b), involves two inputs (6 and 7 ) ,one output (f(d,7)) and six neurons
with sigmoidal (tanh-type) activation function in each of the two hidden layers.
The derivatives of the error measure (energy function) with respect to network
parameters needed for the training algorithm are obtained using the chain rule
and (due to the recurrence) the implicit hnction theorem.
The training set consisted of a t o t 4 of 2950 points allocated in the following
manner: 250 points for 7 = 2.9, 450 for 7 = 2.91, 500 for 7 = 2.925, 500 for
7 = 2.94, 250 for 7 = 3.0, 250 for 7 = 3.0, 250 for 7 = 3.2, 250 for 7 = 3.4,
250 for 7 = 3.8 and 250 for 7 = 3.05. The time step of the integrator was
0.06 dimensionless units for all the time series used (roughly one twentieth of
the period of the oscillation observed at 7 = 3.0). More points are included
in the region of multistability in an effort to capture accurately the hysteresis
phenomena. Training was performed using a conjugate gradient algorithm with
frequent restarts (see [4, 51 for a discussion). Convergence was achieved after
approximately 300 network parameter updates.
The subnetwork succeeds in capturing the basic form of the behavior of the

602
....
%. 0
.a0 0
..... 0
..... 0

.lo. ''.I
0

Figure 5: Predicted bifurcation diagram for the single species surface reaction
system with the gray-box neural network approximation of the desorption rate.

rate of desorption with respect to 8 and 7 (surface coverage and temperature).


Figure 4 compares the actual desorption rate (as a function of (e,7)) with the
network predictions. More importantly, the dynamic behavior (including the
infinite-time attractors) of the system (Eq.5) also compares favorably with the
original system (Eq.4). Figure 5 shows the predicted bifurcation diagram using
the form of the desorption rate given by the network. The network correctly
predicts a subcritical Hopf bifurcation at low 7 , as well as a supercritical Hopf
bifurcation at higher values of 7 (at a slightly lower value of 7 than for the
original system, Fig. 2).
The neural network gray-box approximation can be used to extract impor-
tant mechanistic information pertaining to the fitted step - and thus possibly
discriminate among rival candidate first principles models. For example, Fig. 6
shows that the network predicts a linear dependence of the logarithm of the
desorption rate versus at constant 8, in agreement with desorption being an
activated process. Fig. 6 shows also that the predicted dopes of these plots (and
thus, the activation energies) vary linearly with 8, consistent with an assumption
of attractive adsorbate-adsorbate interactions (aa was indeed the case).

6 Summary
We have extended a previously developed black-box neural network methodol-
ogy for the characterigation of experimental systems as continuoustime models,
so as to allow the identification of unknown parts of first principles models. Such
modeling efforts incorporate the insight obtained from the first principles mod-
eling (algebraic forms of the ODES describing the dynamical evolution of the
system) in a neural network framework capable of approximating (after training)

603
-11

4 4

-1.

Figure 6: The linear dependence of the natural logarithm of the desorption


4
rate with respect to at constant 8 is correctly captured by the neural network
gray-box approximation (left); furthermore the predicted slope of the lines varies
linearly with 8 (right), in agreement with the assumption of adsorbate interac-
tions used to generate the training data.

unknown parts of the model.


The capabilities of this gray-box approach were illustrated using a single
species surface reaction system. In this illustration we assumed that the wtpres-
sion for the rate of desorption of the reactant is not known and approximated
it through a neural network. Both the short- and long-term dynamic behavior
of the system is well approximated by the hybrid model r d t i n g from training.
Furthermore, a study of the properties of the fitted desorption rate may yield
insight in the physical mechanisms underlying it, and thus possibly assist in
discriminating among rival first principles models.
Discretatime models (bawd on neural networks) are trained to predict the
tesuN of integrating the model equations over some time period. It is dficult to
"unravel" the contribution of known parts of the model to this result from the
contribution of the unknown terms. When, on the other hand, the equations
themselves are approximated (aa opposed to the result of integrating them), the
procedure naturally lends itself to incorporating procaws whose modeling is
established to the gray-box model.
The type of overall network presented here (with some parts of its architec-
ture available for training, and some other parts fixed by either the known parts
of the model or the integrator scheme) may prove to be a valuable tool towards
understanding the dynamicn of experimental systems. The particular choice of
recurrent nets tunplated on implicit integrators presented h e n in motivated by
the anticipated stiffnew of chemical kinetic equations. Feedforward implemen-
tations baaed on explicit integrators are also possible. We are currently working
on variants of training algorithms for recurrent nets and their implementation
on parallel computers.

References
[l] A. S. Lapedes and R. M. Farber. Nonlinear signal processing using neural
networks: Prediction and system modeling. Los Alamor Report LA- UR
87-3663(1987).
[2]A. S. Weigend and N. A. Gemhenfeld. Time series prediction: Forecasting
the future and understanding the past. Addiron- Werley (1993).
[3] R. Rim-Martiner, K. M e r , I. G. Kevrekidis, M. C. Kube and J. L.
Hudson. Dieuete- vs. continuowtime nonlinear signal processing of Cu
e l e c t r d i l u t i o n data. Chem. Eng. Comm., vol. 118,pp. 2548 (1992).
[4]R. Rico-Martfnes. Neural networks for the characterisation of nonlinear de-
terministic systems. Ph. D. The&, Department of Chemical Engineering,
Princeton University (1904).
[5]R. Rico-Martines and I. G. Kemlridi. Continuous-time modeling of non-
linear systems: A neural network approach. Proc. lfifi,?IEEE I d . Conf.
N e u d N e t w o h , IEEE Publicatiow, vol. 111, pp. 1522-1525 (1993).
[e] S. R. Chu and R. Shourahi. A neural network approach for identification
of continuowtime nonlinear dynamic systems. Proc. of the l f i g l ACC, vol.
1, pp. 1-5 (1991).
[7]F. J. Pineda. Generalisation of back-propagation to recurrent neural net-
works. Phgs. Rev. Letter#, vol. 59, pp. 2229-2232 (1987).
[8]L. B. Almeida. A learning rule for asynchronous perceptrons with feedback
in a combinatorial environment. Proc. IEEE 1st Ann. Int. Conf. Neural
Networh, San Diego, CA., pp. 609-618 (1987).
[9]I. Kevrekidis, L. D. Schmidt and R. Arii. Rate multiplicity and oecillations
in single species surface reactions. Surf. Sci., vol. 137,pp. 151-166(1984).

605

You might also like