Wavelet Networks For Nonlinear System Modeling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Neural Comput & Applic (2007) 16:433–441

DOI 10.1007/s00521-006-0069-3

O R I G I N A L A RT I C L E

Wavelet networks for nonlinear system modeling


Seda Postalcioglu Æ Yasar Becerikli

Received: 5 November 2005 / Accepted: 31 August 2006 / Published online: 27 September 2006
 Springer-Verlag London Limited 2006

Abstract This study presents a nonlinear systems and applied for parameters updating with momentum term.
function learning by using wavelet network. Wavelet Quadratic cost function is used for error minimization.
networks are as neural network for training and Three example problems have been examined in the
structural approach. But, training algorithms of wave- simulation. They are static nonlinear functions and
let networks is required a smaller number of iterations discrete dynamic nonlinear system.
when the compared with neural networks. Gaussian-
based mother wavelet function is used as an activation Keywords Wavelet networks  Training  Dynamic
function. Wavelet networks have three main parame- system modeling  Wavelet
ters; dilation, translation, and connection parameters
(weights). Initial values of these parameters are ran-
domly selected. They are optimized during training 1 Introduction
(learning) phase. Because of random selection of all
initial values, it may not be suitable for process mod- This study shows a nonlinear static function and dy-
eling. Because wavelet functions are rapidly vanishing namic system modeling using wavelet network. Re-
functions. For this reason heuristic procedure has been cently wavelet network has been used as an alternative
used. In this study serial-parallel identification model of the artificial neural networks because of interpre-
has been applied to system modeling. This structure tation of the model with neural networks is so hard
does not utilize feedback. Real system outputs have [20]. On the other hand training algorithms for wavelet
been exercised for prediction of the future system networks require less number of iterations than neural
outputs. So that stability and approximation of the networks [3]. The wavelet network is an approach for
network is guaranteed. Gradient methods have been system identification in which nonlinear functions are
approximated as the superposition of dilated and
translated versions of a single function [3, 19, 20].
S. Postalcioglu There is another approximation method except the
Department of Electronic and Computer Education, neural network. This is wavelet decomposition. In the
Technical Education Faculty, Kocaeli University,
Kocaeli, Turkey wavelet decomposition, only the weights are identified,
e-mail: [email protected] while the dilation and translations will follow the reg-
ular grid structure. In contrast, in the wavelet network,
Y. Becerikli (&) weights dilations and translations are jointly fitted from
Department of Computer Engineering,
Engineering Faculty, Kocaeli University, Kocaeli, Turkey data [20]. Wavelet network use a wavelet like an acti-
e-mail: [email protected]; [email protected] vation function. The structure of a wavelet network is
shown in Fig. 1 [18].
Y. Becerikli The architecture of a wavelet network is exactly
Department of Computer Engineering,
and Electronics and Telecommunication Engineering, specified by the number of wavelets required for a
Halic University, Istanbul, Turkey given classification or regression application. The

123
434 Neural Comput & Applic (2007) 16:433–441

Fig. 1 The structure of a


wavelet network
input x1 hidden layer output
x2
. k

. (a i x b i ) f ( x)
i 1
w i (a i b i )

.
xn

optimal wavelet network structure is achieved the best X


1

approximation and prediction capability [17]. Wavelets f ðtÞ ¼ ðf ðtÞ; wm;n ðtÞÞwm;n ðtÞ ð2Þ
m;n¼1
show local characteristics which is the main property in
both space and spatial frequency [14]. Therefore,
For computational efficiency, a0 = 2 and b0 = 1 are
among the all input scope of the network, the hidden
commonly used so that results lead to a binary dilation
nodes with the wavelet function influence the networks
of 2–m and a dyadic translation of n2m. Therefore, a
output only in some local range. This can prevent the
practical sampling lattice is a = 2m and b = n2m so that
interaction between the nodes and assist the training
(3) is obtained.
process and generalization performance. The radial
basis function is also local, but it does not have the wm;n ðtÞ ¼ 2m=2 wð2m t  nÞ ð3Þ
spatial–spectral (time–frequency) zooming property of
the wavelet function, and therefore cannot represent Another scheme of decamping f(t)2L2(R) is through
the local spatial–spectral characteristic of the function. father wavelet or the scaling function is given in (4).
So, for approximation and forecasting the wavelet X
network should have a better performance than the f ðtÞ ¼ ðf ðtÞ; UM;n ðtÞÞUM;n ðtÞ
traditional neural network [5, 15]. Families of wavelet n
X
functions especially, wavelet frames are universal ap- þ ðf ðtÞ; Um;n ðtÞÞUm;n ðtÞ ð4Þ
proximators in identification of nonlinear systems. m>M;n
Wavelet networks have been used both for static [12,
where F (t) is the scaling function that has the
20] and dynamic modeling [10, 15]. System modeling is
following relation with mother wavelet. Equations (5)
realized in three steps. First, input variables are
and (6) show this relation.
determined, second, network structure and initial
weights values are decided and finally training proce- pffiffiffiX
dure is done. The parameters of wavelet networks are UðtÞ ¼ 2 hk wð2t  kÞ ð5Þ
dilation (d), translation (m), bias a0, and weights (a, c). k

Parameters are optimized during learning phase. Gra- pffiffiffiX


wðtÞ ¼ 2 gk Uð2t  kÞ ð6Þ
dient method has been used for the optimization of
k
parameters. The continuous dynamic version of the
wavelet networks called ‘‘dynamic wavelet net- The second relation is of particular importance be-
works—DWN’’ is the other approach to model and cause if demonstrates that f(t) can be arbitrarily
control of the systems using wavelets [1, 2]. approximated closely by selecting a sufficiently large
M, the dilation parameter, such that for any  > 0. This
2 Structure of wavelet network relation is given by the following equation.
 
 X 
Wavelet network based on discrete wavelet transform  
f ðtÞ  ðf ðtÞ; UM;k ÞUM;k  \ e ð7Þ
is first proposed by Zhang. A discrete wavelet is  k

formed if we fix the two positive constants (a0 and b0)
for M enough large. The approximation by the
and define as shown in (1) [7].
truncated wavelet decomposition can be expressed by
m=2   (8) [7].
wm;n ðtÞ ¼ a0 w am
0 t  nb0 ð1Þ
X X
where both m and n2Z. Then, for f2L2(R), (2) can be f ðtÞ  ðf ; UM;k ÞUM;k ¼ cn UM;k ð8Þ
reached [7]. k k

123
Neural Comput & Applic (2007) 16:433–441 435

The result can be interpreted as the fine components 2.1 Mother wavelet function
that belong to the wavelet space Wm for the function
are neglected and coarse components that belong to The wavelets are the family of the signals that is pro-
the scaling space Vm are preserved to approximate the duced by the translations and the dilations of a mother
original function in M scales. Figure 2 shows the wavelet satisfying the admissibility condition. This
wavelet network structure. In the structure of typical condition is given in (12) [11].
multilayer feed-forward networks, if F is used as
nonlinear approximation function of the hidden units Z1  2 dx
^ 
and the connection weights (cn) then the approxima- cw ¼ wðxÞ ð12Þ
jxj
tion in (8) can be implemented. 0
Wavelet functions can be classified by two catego-
ries. These are orthogonal wavelet and wavelet frames. The term mother wavelet gets its name as two
Wavelet frames are used for application of function important properties of the wavelet analysis. The term
approximation and process modeling due to the wavelet means a small wave. The term mother implies
orthogonal wavelets cannot be expressed in closed that the functions with different region of support
form [9]. Wavelet frames are constructed by mother which are used in the transformation process. They are
wavelet. A wavelet F j (x) is derived from its / z (x) derived from the mother wavelet. In other words, the
mother wavelet: mother wavelet is a prototype to generate the other
windowing functions [13]. The main idea of wavelet
Y
Ni
theory consists of representing an arbitrary signal f(x)
Uj ðxÞ ¼ /ðzjk Þ ð9Þ
by means of a family of functions that are scaled and
k¼1
translated versions of a single main function known as
xk  mjk the mother wavelet [4]. The relationship between these
Zjk ¼ ð10Þ
djk functions is represented by (13).

1 t  m
Ni, is the number of inputs. Nw is a layer of wavelets.
wm;d ðxÞ ¼ pffiffiffiffiffiffi w m; d 2 R: ð13Þ
The network output y is computed as: jdj d

X
Nw X
Ni
Mother wavelet function gives an efficient, and useful
y ¼ WðxÞ ¼ cj Uj ðxÞ þ a0 þ ak xk ð11Þ
j¼1 k¼1 description of the signal of interest [16]. This function
has some general properties [10]. It is more convenient
a0, a, c are adjustable parameters of wavelet networks. in practice to use a redundant wavelet family than an
orthonormal wavelet basis for constructing the wavelet
y network, because admitting redundancy allows us to
construct wavelet functions with a simple analytical
Output layer Σ form and good spatial–spectral localization properties
c1 cj
[5]. In this paper, a nonorthogonal wavelet has been
Φ1 Φj used as a mother wavelet which is the first derivative of a
Π Π Gaussian function as shown in (14).
2
/ðxÞ ¼ xeð1=2Þx : ð14Þ
ψ 11 ψ 1k ψ j1 ψ jk

Wavelet layer
3 Wavelet network structures for nonlinear system
z11 z1k z j1 z jk
modeling
d 11 d 1k d j1 d jk
m 11 m 1k m j1 m jk
Input layer a0 System identification problem can be described in
1
three groups [8]. First of them is parallel identification
x1
a1 model. This model does not guarantee converge of the
parameters, because of the dynamic model delay out-
ak put feedback. This structure is shown in Fig. 3. This
xk
network is optimized using corresponding optimization
Fig. 2 Wavelet network structure algorithm.

123
436 Neural Comput & Applic (2007) 16:433–441

u (k)
Dynamic System
y p (k) The learning is based on Stochastic Gradient algo-
rithm. This algorithm implies by (15).

@J XN
@yn
¼ en : ð15Þ
@h n¼1
@h
z -1

Off-line gradient or Newton methods may be used,


z -1
but the computation of the gradient and the Hessian
y p (k) matrix calculation is very stray to calculate [20]. So we
y (k)
- +
Dynamic Optimization preferred to recursively minimize the criterion using
Wavelet e (k) Algorithm
Network input–output. This algorithm updates the parameter
vector h after each measurement.
z -1

z -1 4 Static system modeling using wavelet network

The main purpose is to update the parameters during


the training phase. h is the set of adjustable parameters.

Fig. 3 Parallel identification model h ¼ fmjk ; djk ; cj ; ak g j ¼ 1; . . . ; Nw ; k ¼ 0; . . . ; Ni :

h is to be estimated by training so that (11)


approximates the unknown function f() on the domain
Second is inverse system identification model as defined by the training set. The static behavior of the
shown in Fig. 4. This structure can be used for direct process can be defined as y(p) = f(x). f() is an un-
inverse control. known nonlinear function. The training procedure is
And the last one is serial-parallel identification based on the minimization of a quadratic cost function
model as shown in Fig. 5. This structure does not use is used as shown in (16).
dynamic feedback. This model uses process outputs to
estimate of the following outputs. So this structure 1X N

guarantees stability and converge according to the JðhÞ ¼ ðypn  yn Þ: ð16Þ


2 n¼1
other structures.
Figure 6 shows the static wavelet network architecture
for Nw = 1. y is the network output. The minimization
is performed by iterative gradient-based methods. The
u (k) Dynamic
yp (k) partial derivative of the cost function w.r.t. h is, w.r.t.
system
parameter a0:
Dynamic Optimization e(k)
Algorithm + @y
-
y p (k) ¼1
@a0
z -1
w.r.t. direct connection parameters:
@y
z
-1
¼ xk k ¼ 1 to Ni
@ak
Wavelet w.r.t. weights:
Network

z -1 @y
¼ Uj ðxÞ j ¼ 1 to Nw
@cj
z -1
w.r.t. translations:

@y @y @Uj @wjk @zjk


¼
Fig. 4 Inverse system identification model
@mjk @Uj @wjk @zjk @mjk

123
Neural Comput & Applic (2007) 16:433–441 437

@zjk 1
u (k) Dynamic y p (k) ¼
System
@mjk djk

w.r.t. dilations:

@y @y @Uj @wjk @zjk


¼
z −1 @djk @Uj @wjk @zjk @djk

@zjk ðx  mjk Þ


¼
z −1 + @djk d2jk
wavelet e (k)
y(k) Optimization
network − Alg.
Equations (17) and (18) have been used for
parameters updating.
z−1

@j
htþ1 ¼ ht þ l  ð17Þ
@h
−1
z
ht shows current parameters. ht+1 shows updated
Actual Output parameters. l is the learning rate. In this study
momentum term which increases the learning speed
Fig. 5 Serial-parallel identification model
has been used,

@j
htþ1 ¼ ht þ l  þ a½ht  ht1  ð18Þ
@h

where a is the momentum coefficient. a is selected


between 0 and 1.

4.1 Initialization of the network parameters

Selecting initial values of the wavelet network


parameters are important because initial value affects
the speed of the training and approximation to the
global or local minimum. Weights are updated
according to the derivative of the activation function
at selected initial value. Functions and derivative of
the functions cannot be zero. Initial value is signifi-
cant for wavelet network because wavelets have
localization properties [9] and wavelets may exit from
Fig. 6 Static wavelet network architecture for Nw = 1 the related domain because of initial value. On the
other hand, initial values may make some wavelets
local and the components of the gradient of the cost
@y function very small in areas of interest. So selecting
¼ cj initial values of dilation (djk) and translation (mjk)
@Uj
randomly may not be suitable for process modeling.
@Uj In this study, the vector m of wavelet j at the center
¼ wðzj1 Þwðzj2 Þ    w0 ðzjk Þ    wðzjNi Þ of parallelepiped defined by the Ni intervals {[ak,bk]}
@wjk
and the dilation parameters are initialized to the
k ¼ 1; . . . Ni ; j ¼ 1; . . . ; Nw value as shown in (19) [10].

@wjk 2 2 2 1 1
¼ ezjk =2 þ z2jk ezjk =2 ¼ zjk ezjk =2  zjk djk ¼ 0; 2ðbk  ak Þ and mjk ¼ ðak þ bk Þ: ð19Þ
@zjk zjk 2

123
438 Neural Comput & Applic (2007) 16:433–441

8
Equation (19) guarantees that the wavelets extend < 2:186x  12:864 x 2 ½10; 2;
initially over the all input domain. The choice of the f ðxÞ ¼ 4:246x x 2 ½2; 0; :
:
weights (a, c) is less critical for wavelet network. They 10eð0:005x0:5Þ sinðxð0:03xþ0:7ÞÞ x 2 ½0; 10
are initialized to interval 0 and 1. ð20Þ
4.2 Stopping conditions for training
Systems shows different characteristic for different
input domains. The domain of the x data is trans-
Parameters of the wavelet networks are trained during
formed into [ – 1, 1]. The learning procedure is applied
learning phase for approximation the target function.
on this domain. The number of training sequence is 200
Gradient methods have been applied for adjustable
samples which is uniformly distributed in the interval
parameters. When variation of gradient and parame-
of interest. Figure 8 shows static process and model
ters reaches a lower bound or the number of iterations
output.
reaches a fixed maximum, then system training is
These results were obtained using a wavelet network
stopped.
with ten neurons. Learning iteration is 3,000.
Momentum coefficient was selected 0.9. Training mean
5 Dynamic system modeling using wavelet network
square error (TMSE) is 5.6699 · 10–4. TMSE is com-
puted using (21).
Dynamic system identification problems consist of
three groups; parallel, serial-parallel and inverse sys-
tem identification models. In this study, serial-parallel 1X N
TMSE ¼ ðyp ðnÞ  yn Þ2 ð21Þ
identification model has been used as shown in Fig. 7. N n¼1
This structure does not use feedback. Real system
outputs have been used to predict of the future system where N is the number of inputs.
outputs. So that stability and approximation of the
network are guaranteed.
6.2 Example 2
6 Simulations
This example shows approximation of two variable
6.1 Example 1 functions [6]. Equation (22) shows the function.

For the static system modeling (20) has been used [20]. f ðx1 ;x2 Þ ¼ ðx21  x22 Þsinð0:5x1 Þ  106 x1 ;x2 6 10: ð22Þ

Fig. 7 Dynamic system


modeling with wavelet
network

123
Neural Comput & Applic (2007) 16:433–441 439

Fig. 8 Nonlinear static model and process outputs

Fig. 10 Nonlinear model and process output

yðkÞ ¼ f ðyp ðk  1Þ; yp ðk  2Þ; . . . ; yp ðk  Ns Þ;


uðk  1Þ; . . . ; uðk  Ne ÞÞ: ð23Þ

Inputs are past outputs of the process (yp) and the


external inputs u for (23). f() is a unknown nonlinear
function, which is to be approximated by a wavelet
network (Y). For the dynamic system modeling we use
(24) [10].

yp ðkÞ ¼ f ðyp ðk  1Þ;yp ðk  2Þ;uðk  1ÞÞ


24 þ yp ðk  1Þ
¼ yp ðk  1Þ
30
Fig. 9 Nonlinear process output uðk  1Þ2
 0:8 yp ðk  2Þ þ 0:5uðk  1Þ ð24Þ
1 þ uðk  1Þ2

The function output is on the domain [ – 1, 1] by yðkÞ ¼ Wðyðk  1Þ; yðk  2Þ; uðk  1Þ; hÞ:
normalization. The input sequence is constituted with
random amplitude in the range [ – 10, 10] · [ – 10, 10] The input and output sequence for training consists of
for training. Figure 9 shows the process for training. pulses with random amplitude in the range [ – 5, 5] and
Process was learned with 200 training samples.
Wavelet network has seven wavelets for Fig. 12 as test
phase. Learning iteration is 1,000. Momentum is se-
lected 0.07, TMSE is 0.0011. Model and process output
is shown in Fig. 10.

6.3 Example 3

For dynamic system modeling second-order nonlinear


function is used. The input sequence is {u(n)} and the
measured process output is {yp(n)}. As in the static
case, the aim is to fit f() by a wavelet network.

yp ðkÞ ¼ f ðyp ðk  1Þ; yp ðk  2Þ; . . . ; yp ðk  Ns Þ;


uðk  1Þ; . . . ; uðk  Ne ÞÞ Fig. 11 Input sequence for system used in Sect. 6.3

123
440 Neural Comput & Applic (2007) 16:433–441

train this kind of networks with appropriate initializa-


tion of the translation and dilation parameters. In this
study serial-parallel identification model has been used
to learn nonlinear function. This structure does not use
feedback. Real system outputs have been used for
prediction of the future system outputs. So that sta-
bility and approximation of the network is guaranteed.
Quadratic cost function has been used for error mini-
mization. Gaussian-based mother wavelet function has
been utilized. Because generally this function has been
preferred for these kinds of studies and it has universal
approximation properties. As an example, nonlinear
static and dynamic functions have been examined using
Fig. 12 Output sequence for system used in Sect. 6.3
wavelet network for function learning.

References

1. Becerikli Y (2004) On three intelligent systems: dynamic


neural, fuzzy and wavelet networks for training trajectory.
Neural Comput Appl 13(4):339–351
2. Becerikli Y, Oysal Y, Konar AF (2003) On a dynamic
wavelet network and its modeling application. Lect Notes
Comput Sci (LNCS) 2714:710–718
3. Galvao KH, Becerra VM (2002) Linear-wavelet models for
system identification. IFAC 15th Triennial World Congress,
Barcelona, Spain
4. Gutés A, Céspedes F, Cartas R, Alegret S, del Valle M,
Gutierrez JM, Muñoz R (2006) Multivariate calibration
Fig. 13 Nonlinear dynamic model and process output model from overlapping voltammetric signals employing
wavelet neural networks. Chemometrics Intell Lab Syst
83(2):169–179
5. He Y, Chu F, Zhong B (2002) A hierarchical evolutionary
with random duration between 1 and 20 sampling algorithm for constructing and training wavelet networks.
periods (Figs. 11, 12, respectively). Neural Comput Appl 10:357–366
Simulation result is shown in Fig. 13. Wavelet net- 6. Ho DWC, Zhang P-A, Xu J (2001) Fuzzy wavelet networks
work has five wavelets in Fig. 13 as test phase. Learn- for function learning. IEEE Transact Fuzzy Syst 9(1):200–
211
ing iteration is 4,000. Momentum was selected 0.3. 7. Lin Y, Wang F-Y (2005) Modular structure of fuzzy system
TMSE is 3.02 · 10–2. modeling using wavelet networks. In: IEEE, networking,
sensing and control proceedings, Tucson, Arizona, USA,
19–22 March 2005 pp 671–676
8. Narendra KS, Parthasaraty K (1990) Identification and
7 Conclusions control of dynamical system using neural networks. IEEE
Transact Neural Netw 1(1):4–27
Wavelet networks are an alternative to neural net- 9. Oussar Y, Dreyfus G (2000) Initialization by selection for
works for nonlinear function learning. Wavelets show wavelet network training. Neurocomputing 34:131–143
10. Oussar Y, Rivals I, Personnaz L, Dreyfus G (1998) Training
local characteristic so the initial values of translations wavelet networks for nonlinear dynamic input ouput mod-
and dilations requires more care than the initial value eling. Neurocomputing 20:173–188
of the weights. If selecting initial values of the trans- 11. Özkurt N, Savacı FA (2006) The implementation of nonlin-
lation and the dilation properly, training time is shorter ear dynamical systems with wavelet network. Int J Electron
Commun (AEÜ) 60:338–344
than neural networks. The architecture of a wavelet 12. Pati YC, Krishnaprasad PS (1993) Analysis and synthesis of
network is exactly specified by the number of wavelets feedforward neural networks using discrete affine wavelet
required for a given classification or regression appli- transformations. IEEE Trans Neural Netw 4(1):73–85
cation. The optimal wavelet network structure is 13. Polikar R (January 12, 2001) The wavelet tutorial. http://
engineering.rowan.edu/~polikar/WAVELETS
achieved the best approximation and prediction capa- 14. Polycarpou M, Mears M, Weaver S (1997) Adaptive wavelet
bility. In this study, it has been presented that wavelet control of nonlinear systems. In: Proceedings of the 1997
networks can be used for system modeling and how to IEEE conference on decision and control, pp 3890–3895

123
Neural Comput & Applic (2007) 16:433–441 441

15. Postalcıoğlu S, Erkan K, Bolat DE (2005) Comparison of 18. Thuillard M (2000) Review of wavelet networks, wavenets,
wavenet and neuralnet for system modeling. Lect Notes fuzzy wavenets and their applications. ESIT 2000, Aachen,
Artif Intell 3682:100–107 Germany, 14–15 September 2000
16. Reza AM (October 19, 1999) Wavelet characteristics. White 19. Zhang Q (1997) Using wavelet network in nonparametric
Paper, Spire Lab., UWM estimation. IEEE Trans Neural Netw 8(2):227–236
17. Shi D, Chen F, Ng GS, Gao J (2006) The construction of 20. Zhang Q, Benveniste A (1992) Wavelet networks. IEEE
wavelet network for speech signal processing. Neural Trans Neural Netw 3(6):889–898
Comput Appl 11(34):217–222

123

You might also like