27 PDF

Neural Networks 18 (2005) 985997
www.elsevier.com/locate/neunet
Wavelet neural network classification of EEG signals by using

AR model with MLE preprocessing
Abdulhamit Subasia,*, Ahmet Alkana, Etem Koklukayab, M. Kemal Kiymika
a
Department of Electrical and Electronics Engineering, Kahramanmaras Sutcu Imam University, Karacasu Kampusu, 46601 Kahramanmaras, Turkey
b
Department of Electrical and Electronics Engineering, Sakarya University, 54187 Sakarya, Turkey
Received 28 January 2003; accepted 10 January 2005
Abstract
Since EEG is one of the most important sources of information in therapy of epilepsy, several researchers tried to address the issue of
decision support for such a data. In this paper, we introduce two fundamentally different approaches for designing classification models
(classifiers); the traditional statistical method based on logistic regression and the emerging computationally powerful techniques based on
artificial neural networks (ANNs). Logistic regression as well as feedforward error backpropagation artificial neural networks (FEBANN)
and wavelet neural networks (WNN) based classifiers were developed and compared in relation to their accuracy in classification of EEG
signals. In these methods we used FFT and autoregressive (AR) model by using maximum likelihood estimation (MLE) of EEG signals as an
input to classification system with two discrete outputs: epileptic seizure or nonepileptic seizure. By identifying features in the signal we want
to provide an automatic system that will support a physician in the diagnosing process. By applying AR with MLE in connection with WNN,
we obtained novel and reliable classifier architecture. The network is constructed by the error backpropagation neural network using Morlet
mother wavelet basic function as node activation function. The comparisons between the developed classifiers were primarily based on
analysis of the receiver operating characteristic (ROC) curves as well as a number of scalar performance measures pertaining to the
classification. The WNN-based classifier outperformed the FEBANN and logistic regression based counterpart. Within the same group, the
WNN-based classifier was more accurate than the FEBANN-based classifier, and the logistic regression-based classifier.
q 2005 Elsevier Ltd. All rights reserved.
Keywords: EEG; Epileptic seizure; Fast fourier transform (FFT); Autoregressive method (AR); Maximum likelihood estimation (MLE); Logistic regression
(LR); Feedforward error backpropagation artificial neural network (FEBANN); Wavelet neural network (WNN)
1. Introduction
The human brain is obviously a complex system, and
exhibits rich spatiotemporal dynamics. Among the noninvasive techniques for probing human brain dynamics,
electroencephalography (EEG) provides a direct measure of
cortical activity with millisecond temporal resolution. Early
on, EEG analysis was restricted to visual inspection of EEG
records. Since there is no definite criterion evaluated by the
experts, visual analysis of EEG signals is insufficient. For
example, in the case of dominant alpha activity delta and
theta activities are not noticed. Routine clinical diagnosis
* Corresponding author. Tel.: C90 535 739 51 91; fax: C90 344 219 10
52.
E-mail address: [email protected] (A. Subasi).
0893-6080/$ - see front matter q 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.neunet.2005.01.006
needs to analysis of EEG signals. Therefore, some

automation and computer techniques have been used for
this aim (Guler, Kiymik, Akin, & Alkan, 2001). Since the
early days of automatic EEG processing, representations
based on a Fourier transform have been most commonly
applied. This approach is based on earlier observations that
the EEG spectrum contains some characteristic waveforms
that fall primarily within four frequency bandsdelta (1
4 Hz), theta (48 Hz), alpha (813 Hz), and beta (1330 Hz).
Such methods have proved beneficial for various EEG
characterizations, but fast Fourier transform (FFT), suffer
from large noise sensitivity. Parametric power spectrum
estimation methods such as AR, reduces the spectral loss
problems and gives better frequency resolution. Also AR
method has an advantage over FFT that, it needs shorter
duration data records than FFT (Zoubir & Boashash, 1998).
Neural network detection systems have been proposed by
a number of researchers including Pradhan, Sadasivan, and
Arunodaya (1996) and Weng and Khorasani (1996).
986
A. Subasi et al. / Neural Networks 18 (2005) 985997
Pradhan presented preliminary results for a classification of

seizure activities by applying an artificial neural network
(ANN) based on a learning vector quantization. Pradhan
uses the raw EEG as input to a neural network while Weng
uses the features proposed by Gotman (1982) with an
adaptive structure neural network, but his results show a
poor false detection rate.
Osario, Frei, and Wilkinson (1998) have applied a wavelet
transform to ECoG recordings. The wavelet scale used
corresponds approximately to a 540 Hz band-pass filter.
The output is squared, median filtered and finally compared
with a background measure. Qu and Gotman (1997) propose
the use of a nearest neighbour classifier on EEG features
extracted in both the time and frequency domains to detect
the onset of epileptic seizures. Five features are used to
characterize each 2.56 s epoch of EEG. In their paper Gabor,
Leach, and Dowla (1996) state their aim to detect 85% of
seizures with a false positive rate of 1 per hour or less using a
generic system. The Webbers paper (Webber, Lesser,
Richardson, & Wilson, 1996) is the only one of the three
reviewed to use a standard multi-layer perceptron (MLP)
structure neural network to classify the input EEG feature
vector. Their network is a 31-30-8 structure with the 31 input
features being various statistical measures of each 2 s EEG
epoch. The input feature vectors were classified into eight
groups including small seizure, large seizure, and normal.
Tseng, Chen, Chong, and Kuo (1995) evaluated different
parametric models on a fairly large database of EEG
segments. Using inverse filtering, white noise tests, and 1-s
EEG segments, they found that autoregressive (AR) models
of orders between 2 and 32 yielded the best EEG estimation.
For a method which avoids the use of signal segmentation
and provides an on-line AR parameter estimation that fits
nonstationary signals, like EEG (Goto, Nakamura, & Uosaki,
1995). In a problem of classifying EEGs of normal subjects
from those with psychiatric disorders, Tsoi, So, and Sergejew
(1994) used AR representations in a preprocessing stage and
artificial neural networks in the classification stage. Finding a
suitable representation of EEG signals is the key to learning a
reliable discrimination and to understanding the extracted
relationships (Anderson, Devulapalli, & Stolz, 1995).
Kalayci and Ozdamar (1995) showed that an ANN
performs better, if the input and output data can be processed
to capture the characteristic features of the signal. They used
a wavelet representation for automated detection of the EEG
spikes. More recently, ANN that applies Bayesian methods
are shown to be more robust compared with other techniques
because they incorporate measures of confidence in their
output for the LevenbergMarquardt (LM) procedure
(Vuckovic, Radivojevic, Chen, & Popovic, 2002). In
addition, standard MLP was improved by using finite
impulse response filters (FIR) instead of static weights for a
temporal processing of data (Haselsteiner & Pfurtscheller,
2000). Petrosian, Prokhorov, Homan, Dashei, and Wunsch
(2000) showed that the ability of specifically designed and
trained recurrent neural networks (RNN) combined with
wavelet pre-processing, to predict the onset of epileptic

seizures both on scalp and intracranial recordings.
The theory of wavelets can be exploited in understanding
the universal approximation properties of WNNs, and in
providing initialization heuristics for fast training. WNNs
offer a good compromise between robust implementations
resulting from the redundancy characteristic of nonorthogonal wavelets and neural systems, and efficient functional
representations that build on the time-frequency localization
property of wavelets.
This paper aims to compare the traditional method of
logistic regression to the more advanced and relatively recent
neural network techniques, as mathematical tools for developing classifiers for the detection of epileptic seizure. In the
neural network techniques, both the feedforward error backpropagation ANN (FEBANN), and the wavelet neural
network (WNN) will be used. The choice of these two
networks was based on the fact that the former is the most
popular type of ANNs and the latter is one of the most
powerful networks commonly used in solving classification/
discrimination problems. The accuracy of the various
classifiers will be assessed and cross-compared, and advantages and limitations of each technique will be discussed.
2. Materials and method

2.1. EEG data acquisition and representation
Data for evaluating the seizure classification models
(classifiers) is acquired by analyzing the EEG records belong
to several healthy and unhealthy (epileptic patients) persons.
The signals are collected by a data acquisition system which
contains data acquisition card (PCI MIO-16-ECtype), signal
processors and a personnel computer. Data can be taken in to
computer memory quickly by using this card which is
connected to PCI data bus of the computer. For this system
MATLAB software package was used. The system provides
real time data processing. The power spectral density of the
signal P(f) found by applying conventional and modern
spectral analysis methods such as FFT and AR.
Scalp EEG signals are synchronous discharges from
cerebral neurons detected by electrodes attached to the
scalp. Epileptic seizure is an abnormality in EEG recordings
and characterized by brief and episodic neuronal synchronous
discharges with dramatically increased amplitude. This
anomalous synchrony may occur in the brain locally (partial
seizures) which is seen only in a few channels of the EEG
signal, or involving the whole brain (generalized seizures)
which is seen in every channel of the EEG signal. Four
channels of EEG (F7-C3, F8-C4, T5-O1 and T6-O2) recorded
from a patient with absence seizure epileptic discharge is
shown in Fig. 1, and normal EEG signal is shown in Fig. 2.
EEG signals for both healthy and unhealthy cases were
recorded from subjects under relaxation, with their eyes
closed. The recording conditions followed Guideline seven
987
Amplitude
500
500
F8-C4
500
1000
1500
2000
2500
3000
Amplitude
500
500
F7-C3
500
1000
1500
2000
2500
3000
Amplitude
500
500
T6-O2
500
1000
1500
2000
2500
3000
Amplitude
500
500
T5-O1
500
1000
1500
Number of Samples
2000
2500
3000
Fig. 1. Epileptic EEG signal.
of the American EEG Society and electrodes were placed

according to the International 1020 system. The signals
were digitized and transferred to the PC using 12-bit AD
converter, storage-sampling rate at 200 Hz and band pass
filtered between 070 Hz.
2.2. Autoregressive parameter estimation and MLE
In the AR model, to find out model parameters LevinsonDurbin algorithm which makes use of the solution of the
YuleWalker equations is used. Autocorrelation estimation
is used for the solution of these equations. After those
autocorrelation, AR model parameters are estimated. To do
that biased form of the autocorrelation estimation is used
which is given as
rxx Z
X
1 NKmK1
x nxn C m;
N nZ0
mR 0
(1)
The aim now is to estimate the AR model parameters by

using MLE in the solution of the YuleWalker equations
from a record of EEG data. If the maximum likelihood
estimate of a parameter exists under regular conditions, it is
consistent, asymptotically unbiased, efficient, and normally
distributed. Unfortunately, the maximum likelihood (ML)
estimator is often too cumbersome to obtain. As this is the
case for the EEG model, it is proposed to estimate the model
parameters by maximizing an approximation of the loglikelihood function, known as Whittles approximation, the
derived estimator is expected to retain the properties
associated with the ML estimator in an asymptotic sense,
but with much less complexity. In fact, Whittles estimate
asymptotically retains the properties of the ML estimate for
Gaussian random processes, but this is not generally true for
the nonGaussian case (Zoubir & Boashash, 1998).
In many cases it is difficult to evaluate the MLE of the
parameter whose power spectrum density function (PSDF)
is Gaussian due to the need to invert a large dimension
covariance matrix. For example, if xwN(0, c(q)), the MLE
of q is obtained by maximizing
Px; q Z
1
expK1=2X t cK1 qX
2pn=2 detcq
(2)
If the covariance matrix cannot be inverted in closed

form, then a search technique will require inversion of the
N!N matrix for each value of q to be searched. An
alternative approximate method can be applied when x is
data from a zero mean random process, so that covariance
matrix is Toeplitz. In such a case, the asymptotic
log-likelihood function is given by

N
N 1=2
If
ln Px; q Z ln 2p K
ln Pxx f C
df
2
2 K1=2
Pxx f
(3)
988
Amplitude
200
100
F8C4
0
100
500
1000
1500
2000
2500
3000
Amplitude
500
F7-C3
500
500
1000
1500
2000
2500
3000
Amplitude
200
0
T6-O2
200
400
500
1000
1500
2000
2500
3000
Amplitude
100
0
T5-O1
100
200
0
500
1000
1500
2000
2500
3000
Number of Samples
Fig. 2. Normal EEG signal.
where
K1
2

1 NX
xneKj2pfn
If Z
N
nZ0
is the periodogram of the data and Pxx(f) is the

power spectral density (PSD). The dependence of the
log-likelihood function on q is through the PSD.
Differentiation of (3) produces the necessary conditions
for MLE

v ln Px; q
N 1=2
1
If vPxx f
K
ZK
df
vqi
2 K1=2 Pxx f P2xx f
vqi
(4)
or
1=2
K1=2

1
If vPxx f
K
df Z 0
Pxx f P2xx f
vqi
(5)
The second derivative allows the NewtonRaphson or

scoring method may be implemented using the asymptotic
likelihood function. This leads to simpler iterative
procedures and is commonly used in practice.
In this study, to find MLE asymptotic form of the loglikelihood given by (3) is used. Since the PSD is
Pxx f Z
d2u
jAf j2
(6)
After some calculations and derivations, the estimated

auto correlation function is,
8
NK1Kjkj
>

<1 X
xnx n C jkj
jkj% N K 1
(7)
R^ xx k Z N
nZ0
>
:
0
jkjR N
and the set of equations to be solved for the approximate
MLE of the AR filter parameters becomes
p
X
^ R^ xx k K l Z KR^ xx k k Z 1; 2; .p
al
(8)
lZ1
or in matrix form
2 ^
R^ xx 1
Rxx 0
6
6 R^ xx 1
/
6
6
/
/
4
3
3
/ R^ xx p K 1 2 a1
^
76
7
/
R^ xx 1 7
^ 7
a2
76
6
74 . 7
5
/
/
5
^
ap
R^ xx p K 1 R^ xx p K 2 /
R^ xx 0
3
2^
Rxx 1
7
6
6 R^ xx 2 7
7
6
Z K6
7
4 / 5
R^ xx p
(9)
These are so-called estimated YuleWalker equations

and this is the autocorrelation method of linear prediction.
Note that the special form of the matrix and the right-hand
vector, which thereby allow a recursive solution known as
the Levinson recursion (Guler et al., 2001). To complete the
discussion explicit form for the MLE of d2u must be
determined. From (6)
2
d^u Z
p
X
^
akR
xx Kk Z
kZ0
p
X
^
akR
xx k
(10)
kZ0
These parameters are used to compute AR spectral power

spectral density as
PYW
xx f Z
1 C Pp
2
d^wp
kZ1

Kj2pfk 2
^
ake
(11)
The order of the model, namely, the filter, is depend on

the number of AR coefficients. In the AR method, the model
order is identified according to different criteria. In this
study, Akaike information criteria (AIC) are taken as the
base. The modeling degree pZ8 was taken because the
determined modeling degree was lower. In the AR model,
MLE is used for the solution of the YuleWalker equations
to get AR model parameters (Guler et al., 2001).
989
using FFT and AR method which has the MLE

optimization.
In Fig. 3(a) FFT of epileptic EEG signal is given. If
frequency spectrum of FFT is examined, it is seen that there
are peaks at 1 and 4 Hz. AR spectrum of the same signal is
presented in Fig. 3(b). There are peaks at 1 and 3 Hz with
higher amplitude. When we compare these two spectrums it
is seen AR spectrum has got sharper peaks and less
misleading peaks than FFT. Due to this better frequency
solution, explanation and determination of the activities in
the signal is easier by using AR method. Since the signal is
taken from an epileptic patient, the results fit with the
typical characteristics of epilepsy that is delta activity (low
frequency range) (Guler et al., 2001).
Fig. 4(a) shows FFT spectrum of an EEG signal of a
healthy person. Spectrum of the AR method is given in
Fig. 4(b). If these two spectrums are examined although FFT
spectrum has got wide and misleading peaks AR spectrum
has got sharp and clear peaks. If these spectrums are
examined delta activity, alpha activity, and beta activity can
be seen easily. These results are true because it is a normal
EEG signal. Higher variations and misleading peaks in FFT
spectrum avoid the dominant alpha and delta activities.
2.3. Spectral analysis of EEG signals

2.4. Logistic regression
In this study, analysis of the EEG signals is presented.
Programs were written in MATLAB to estimate an FFT and
AR model of given data. These signals are analyzed by
Logistic regression (Hajmeer & Basheer, 2003; Hosmer

& Lemeshow, 1989; Vach, Robner, & Schumacher, 1996) is
(a) 120
Magnitude (dB)
100
80
60
40
20
0
0
10
12
14
16
18
20
12
14
16
18
20
Frequency (Hz)
(b) 120
Magnitude (dB)
110
100
90
80
70
60
50
40
0
10
Frequency (Hz)
Fig. 3. Power spectral density of epileptic EEG signal (a) FFT, (b) AR with MLE.
990
Magnitude (dB)
(a) 100
50
50
0
10
12
14
16
18
20
12
14
16
18
20
Frequency (Hz)
(b) 100
Magnitude (dB)
80
60
40
20
0
20
0
10
Frequency (Hz)
Fig. 4. Power Spectrum of normal EEG signal (a) FFT, (b) AR with MLE.
a widely used statistical modeling technique in which the

probability, P1, of dichotomous outcome event is related to
a set of explanatory variables in the form

P1
logitP1 Z ln
1 K P1
Z b0 C
n
X
bi xi
Z b0 C b1 x1 C/C bn xn
(12)
iZ1
In Eq. (12), b0 is the intercept and b1,b2,.,bn are the

coefficients associated with the explanatory variable
x1,x2,.,xn. A dichotomous variable is restricted to two
values such as yes/no, on/off, survive/die, or 1/0, usually
representing the occurrence or nonoccurrence of some event
(for example, epileptic seizure/not). The explanatory
(independent) variables may be continuous, dichotomous,
discrete, or combination. The use of ordinary linear
regression (OLR) based on least squares method with
dichotomous outcome would lead to meaningless results. As
in Eq. (12), the response (dependent) variable is the natural
logarithm of the odds ratio representing the ratio between
the probability that an event will occur to the probability
that it will not occur (e.g. probability of being epileptic or
not). In general, logistic regression imposes less stringent
requirements than OLR in that it does not assume linearity
of the relationship between the explanatory variables and
the response variable and does not require Gaussian
distributed independent variables. Logistic regression

calculates the changes in the logarithm of odds of the
response variable, rather than the changes in the response
variable itself, as OLR does. Because the logarithm of odds
is linearly related to the explanatory variables, the regressed
relationship between the response and explanatory variables
is not linear. The probability of occurrence of an event as
function of the explanatory variables is nonlinear as derived
from Eq. (12) as
P1 x Z
1
1 C eKlogitP1 x
1

P
1 C exp K b0 C niZ1 bi xi
(13)
Unlike OLR, logistic regression will force the probability

values (P1) to lie between 0 and 1 (P1/0 as the right-hand
side of Eq. (13) approaches KN, and P1/1 as it
approaches CN). Commonly, the maximum likelihood
estimation (MLE) method is used to estimate the coefficients b0,b1,.,bn in the logistic regression equation
(Hajmeer & Basheer, 2003; Hosmer & Lemeshow, 1989;
Schumacher, Robner, & Vach, 1996). This method is
different from that based on ordinary least squares (OLS) for
estimating the coefficients in linear regression. The OLS
method seeks to minimize the sum of squared distances of
all the data points from the regression line. On the other
hand, the MLE method seeks to maximize the log likelihood
which reflects how likely it is (the odds) that the observed
values of the dependent variable may be predicted from the

observed values of the independent variables. Unlike OLS
method, the MLE method is an iterative algorithm which
starts with an initial arbitrary estimate of the regression
equation coefficients, and proceeds to determine the
direction and magnitude of change in the coefficients that
will increase the likelihood function. After this initial
function is determined, residuals are tested and a new
estimate is computed with an improved function. This
process is repeated until some convergence criterion (e.g.
Wald test, log likelihood-ratio test, classification tables,
etc.) is reached. In the current study, the coefficients were
obtained by minimizing (using Newtons method) the log
likelihood function defined as the sum of the logarithms of
the predicted probabilities of occurrence for those cases
where the event occurred and the logarithms of the predicted
probabilities of nonoccurrence for those cases where the
event did not occur (Dreiseitl & Ohno-Machado, 2002;
Hajmeer & Basheer, 2003).
2.5. Artificial neural networks (ANNs)
ANNs are computing systems made up of large number of
simple, highly interconnected processing elements (called
nodes or artificial neurons) that abstractly emulate the
structure and operation of the biological nervous system.
Learning in ANNs is accomplished through special training
algorithms developed based on learning rules presumed to
mimic the learning mechanisms of biological systems. There
are many different types and architectures of neural networks
varying fundamentally in the way they learn; the details of
which are well documented in the literature (Dreiseitl and
Ohno-Machado). In this paper, two neural networks relevant
to the application being considered (i.e. classification of
epileptic/normal EEG data) will be employed for designing
classifiers; namely the FEBANN and WNN.
The architecture of FEBANN may contain two or more
layers. A simple two-layer ANN consists only of an input
layer containing the input variables to the problem, and
output layer containing the solution of the problem. This
type of networks is a satisfactory approximator for linear
problems. However, for approximating nonlinear systems,
additional intermediate (hidden) processing layers are
employed to handle the problems nonlinearity and complexity. Although it depends on complexity of the function
or the process being modeled, one hidden layer may be
sufficient to map an arbitrary function to any degree of
accuracy. Hence, three-layer architecture ANNs were
adopted for the present study. Fig. 5 shows the typical
structure of a fully connected three-layer network with n
input nodes (neurons), m hidden nodes, and r output nodes.
As seen in Fig. 5, the neurons of any one layer are connected
to all the neurons in the succeeding and preceding
layers, but no links are allowed within the same layer.
The multilayer network is designed such that when it
receives an input signal vector XZ[x1,.,xi,.,xn]T, it is
991
Output
Output Layer
Layer of wavelets
Input Layer
f1
f2
f3
f4
f5
fNi
Frequency Inputs
Fig. 5. A wavelet neural network.
^ as close as
required to produce an estimate output vector Y
possible to the (unknown) systems actual output YZ
[y1,y2,.,yk,.,yr]T. Since no interaction is allowed between
the nodes of the same layer, each output (e.g. kth output) is
analyzed independently. Assuming that the network has
already been trained (using some supervised training
procedure), it is activated by (i) feeding the input vector
X forward through the links connecting the input to hidden
layers and represented by WZ[w1,.,w2,.,wz]T, where zZ
m(nC1), (ii) processing of the input vector at the hidden
nodes, (iii) passing it further through the links connecting
the hidden nodes to the kth output node, represented by the
vector VZ[v1,.,v2,.,vz]T, where zZmC1, and finally (iv)
processing it at the kth output node to produce a solution.
The network approximation y^k for the kth true output (yk)
is computed from
"
!#
m
n
X
X
y^k Z l
vkj l
wji xi
(14)
jZ0
iZ0
In Eq. (14), l is an activation function such as the simple

sigmoidal logistic function l(z)Z1/(1CeKz) bounded
between 0.0 and 1.0, which will be used in this study,
wj0Zqj and vk0Zqk are the thresholds (biases) for the input
and hidden nodes respectively which determine the firing
limits of the neurons, and x0Z1.0. Note that yk Z y^k C 3
where 3 is an error term, usually assumed to be normally
distributed with a mean of zero and a standard deviation of
one. It is important to mention that the input parameters
(x1,.,xi,.,xn) are all scaled to the same range as that of the
activation function. Because an activation function is used
^ will be referred to as
in the output layer, the ANN output y
the ANN activation. For an ANN with one output node, the
subscript index on y^k is dropped and the ANN solution will
be referred to as y^ (e.g. epileptic, 1, or normal, 0).
The strength of the ANN lies in selecting an optimal set
of the connection weights (vkj and wji) of Eq. (14). Initially,
these weights are assigned randomly; however, during
992
training they change dynamically in such a way to produce

the best approximation possible to the problem solution.
First, an example (inputoutput vector) is presented to the
uneducated network. The input vector part of the example
(e.g. FFT or AR parameters) is propagated forward, and the
network solution (output or activation) is obtained, as given
in Eq. (14). The deviation of the network solution from the
target (true) value is computed, and using the backpropagation learning law, the error is propagated backward beginning at the output layer to adjust the connection
weights. Once again at the input layer, the signal is refed
forward into the network using the new weights, and the
procedure is repeated to reduce the error. This so-called
training procedure is applied using the many training
examples until an optimal connection weight matrix is
obtained which represents the best approximation of the
process being modeled. The backpropagation learning
algorithm combines the mechanism of feeding forward the
input signal, the backpropagation of error, and weights
adjustment which is at the heart of network development. A
step-by-step simplified mathematics for the backpropagation algorithm is given in Basheer and Hajmeer (2000).
For network trained by error backpropagation, a number
of issues have to be addressed to insure successful network
development (Basheer & Hajmeer, 2000). Most important
among those issues are the network size (architecture) and
number of training cycles. If training is insufficient, the
network will not learn the examples presented to it. In
contrast, extremely excessive training of the network will
force it to memorize the training examples. This will result
in a network that is unable to generalize to cases from
outside the training database. Additionally, an oversized
ANN comprised of large number of units in the hidden
layers tends to learn the noise and overfit the data rather than
uncover the overall underlying trend (similar to overparametrized polynomials). One practical approach to avoid
these problems is through cross validation in which test
examples (different from training examples) selected
randomly from the parent database are continuously used
to examine generalization of the network after each cycle
during training. The quality of network predictions for these
test examples, quantified using some error measure, can
serve as a criterion for stopping training or determining the
optimum network size. Unlike the error in training data that
continues to decline with network size and number of
training cycles, the test sets error reaches a minimum at the
optimum ANN size and/or number of training cycles. The
optimum network is considered to contain sufficient
knowledge about the phenomenon being modeled.
2.6. Evaluation of performance
The coherence of the diagnosis of the expert neurologists
and diagnosis information was calculated at the output of
the neural network. Prediction success of the neural network
may be evaluated by examining this table, which is called
the confusion matrix. In order to analyze the output data

obtained from the application, sensitivity (true positive
ratio) and specificity (true negative ratio) are calculated by
using confusion matrix. The sensitivity value (true positive,
same positive result as the diagnosis of expert neurologists)
was calculated by dividing the total of diagnosis numbers to
total diagnosis numbers that are stated by the expert
neurologists. Sensitivity, also called the true positive ratio,
is calculated by the formula: SensitivityZTPRZTP/(TPC
FN). On the other hand, specificity value (true negative,
same diagnosis as the expert neurologists) is calculated by
dividing the total of diagnosis numbers to total diagnosis
numbers that are stated by the expert neurologists.
Specificity, also called the true negative ratio, is calculated
by the formula: SpecificityZTNRZTN/(TNCFP).
Neural network and logistic regression analysis were
compared to each other by receiver operating characteristic
(ROC) analysis. ROC analysis is an appropriate means to
display sensitivity and specificity relationships when a
predictive output for two possibilities is continuous. In its
tabular form, the ROC analysis displays true and false
positive and negative totals and sensitivity and specificity
for each listed cutoff value between 0 and 1 (Nguyen,
Malley, Inkelis, & Kuppermann, 2002).
In order to perform the performance measure of the
output classification graphically, the ROC curve was
calculated by analyzing the output data obtained from the
test. Furthermore, the performance of the model may be
measured by calculating the region under the ROC curve.
The ROC curve is a plot of the true positive rate (sensitivity)
against the false positive rate (1-specificity) for each
possible cutoff. A cutoff value is selected that may classify
the degree of epileptic seizure detection correctly by
determining the input parameters optimally according to
the used model.
3. Wavelet neural networks

Wavelet neural networks (WNN) is a new network based
on wavelet transform (Pati & Krishnaparasad, 1993), in
which discrete wavelet function is used as node activation
function. Because the wavelet space is used as characteristic
space of pattern recognition, the characteristic extraction of
signal is realized by weighted sum of inner product of
wavelet base and signal vector. Furthermore, because it
combines the function of time-frequency localization by
wavelet transform and self-studying by neural network, the
network possesses doughty capacity of approximate and
robust. In this paper, a WNN was designed with monohidden-layer forward neural network with its node activation function based on dyadic discrete Morlet wavelet
basic function.
Wavelet transforms have emerged as a means of
representing a function in a manner which readily reveals
properties of the function in localized regions of the joint
time-frequency space. The applications of WNN are usually

limited to problems of small input dimension. The main
reason is that they are composed of regularly dilated and
translated wavelets. The number of wavelets in the WNNs
drastically increases with the dimension (Zhang, 1997).
Some work has been done on reducing the size of the WNN
by removing the redundant candidates (Wong & Leung,
1998; Xu & Ho, 2002).
Much research has been done on applications of WNNs,
which combine the capability of artificial neural networks in
learning from processes and the capability of wavelet
decomposition (Pati & Krishnaparasad, 1993; Zhang &
Benveniste, 1992), for identification and control of dynamic
systems (Sureshbabu & Farrell, 1999; Wong & Leung,
1998; Zhang, Walter, & Lee, 1995). In Zhang and
Benveniste (1992), the new notation of wavelet network is
proposed as an alternative to feedforward neural networks
for approximating arbitrary nonlinear functions based on the
wavelet transform theory, and a backpropagation algorithm
is adopted for wavelet network training; Zhang et al. (1995)
described a wavelet-based neural network for function
learning and estimation, and the structure of this network is
similar to that of the radial basis function network except
that the radial functions are replaced by orthonormal scaling
functions; Zhang (1997) presented wavelet network construction algorithms for the purpose of nonparametric
regression estimation.
3.1. Wavelet frames and wavelet networks
Two categories of wavelet functions, namely, orthogonal
wavelets and wavelet frames, were developed separately by
different groups. The fact that orthogonal wavelets cannot
be expressed in closed form is a serious drawback for their
application to function approximation and process modelling. Conversely, wavelet frames are constructed by simple
operations of translation and dilation of a single fixed
function called the mother wavelet, which must satisfy
conditions that are less stringent than orthogonality
conditions.
A wavelet jj(x) is derived from its mother wavelet j(z)
by the relation

x K mj
jj x Z j
(15)
Z jzj
dj
where the translation factor mj and the dilation factor dj are

real numbers in R and RC
, respectively. The family of
functions generated by j can be defined
(
)

x K mj
1

Uc Z p j
; mj 2R and dj 2RC
dj
dj
(16)
A family Uc is said to be a frame of (L2 R) if there exists
two constants cO0 and C!CN such that for any square
993
integrable function f, the following inequalities hold

2
X

cjjf jj2 %
hjj ; f i % Cjjf jj2
j
(17)
jj 2Uc
where kfk denotes the norm of function f and hf,gi the inner
product of functions f and g. Families of wavelet frames of
L2 R are universal approximators.
For the modelling of multivariable processes, multidimensional wavelets must be defined. In the present work,
we use multidimensional wavelets constructed as the
product of Ni scalar wavelets (Ni being the number of
variables).
Jj x Z
Ni
Y
jzjk with
zjk Z
kZ1
x K mjk
djk
(18)
where mj and dj are the translation and dilation vectors,

respectively. Families of multidimensional wavelets generated according to this scheme have been shown to be frames
of L2 RNi (Qussar & Dreyfus, 2000). Wavelet networks
were presented in the framework of static modelling
architecture, where the network output y is computed as
y Z yx Z
Nw
X
cj Jj x C
jZ1
Nj
X
a k xk
(19)
kZ0
It can be viewed as a network with an input vector of Ni

components, a layer of Nw weighted multidimensional
wavelets and a linear output neuron. The coefficients of the
linear part of the networks will be called direct connections.
Wavelet network training consists in minimizing the
usual least-squares cost function
Jq Z
N
1X
yn K yn 2
2 nZ1 p
(20)
where vector q includes all network parameters to be

estimated: translations, dilations, weights of the connections
between wavelets and output, and weights of the direct
connections; N is the number of elements of the training set,
ynp is the output of the process for example n, and yn is the
corresponding network output.
In the framework of the discrete wavelet transform, a
family of wavelets can be defined as
Ud Z fam=2 jam x K nb; m; n 2Z 2 g
(21)
where a and b are constants that fully determine, together

with the mother wavelet j, the family Ud. Actually, relation
(21) can be considered as a special case of relation (16),
where
mj Z naKm b;
dj Z aKm
(22)
These relations show that, unlike the continuous

approach, wavelet parameters cannot be varied
994
continuously; therefore, gradient-based techniques cannot

be used to adjust them. Generally, training wavelet networks
stemming from the discrete transform (Zhang, 1997; Zhang
& Benveniste, 1992) is performed using the GramSchmidt
selection method. This approach usually generates large
networks, which are less parsimonious than those trained by
gradient-based techniques. This may be a drawback for
many applications (Qussar & Dreyfus, 2000).
3.2. Initializing wavelet neural networks
Due to the fact that wavelets are rapidly vanishing
functions,
a wavelet may be too local if its dilation parameter is too
small,
it may sit out of the domain of interest, if the translation
parameter is not chosen appropriately.
Therefore, it is very inadvisable to initialize the dilations
and translations randomly, as is usually the case for the
weights of a standard neural network with sigmoid
activation function. We used an initialization procedure
that is based on the selection of discrete wavelets.
We propose to make use of wavelet frames stemming
from the discrete transform (relation (21)) to initialize the
translation and dilation parameters of wavelet networks
trained using gradient-based techniques. The procedure
comprises three steps:
(i) generate a library of wavelets, using a family of
wavelets described by relation (21),
(ii) rank all wavelets in the order of decreasing relevance,
(iii) use the translations and dilations of the most relevant
wavelets as initial values and use a gradient method to
train the network thus initialized. For a detailed
presentation of this method see (Qussar & Dreyfus,
2000).
3.3. Wavelet neural network classifier
Wavelets offer many attractive features for the analysis of
physical signals, including universal approximation properties, robustness against coefficient errors (Daubechies,
1992), and joint input-space/frequency localization. Since
EEG signals possess a combination of slow variations over
long periods, with sharp, transient variations over short
periods, WNNs seem to be a more natural choice than other
mainstream neural networks for EEG analysis. A multidimensional wavelet j(zjk) can be induced from a scalar
wavelet j(z) via an affine vectormatrix transformation of
the input x (Zhang & Benveniste, 1992). We have introduced
a variation to this theme for obtaining multidimensional
wavelets that are radially symmetric with respect to
n-dimensional translation vectors k, by dilating
the Euclidean distance between input and translation vectors:

p
(23)
jzjk Z jn jjjjx K kjj; j% 0
A multiple-input, multiple-output WNN can be obtained
by cascading a layer of neurons implementing the multidimensional wavelets in Eq. (19), with an output layer of
linear combiners.
We are interested in the nonlinearities introduced by the
WNN only to the extent that they can help improve a
function approximation. The WNN is therefore, equipped
with a linear discriminant portion that can quickly account
for a linear trend in the input/output data. The wavelet nodes
are specifically trained so as to approximate only the wavelike components in the function.
4. Results and discussion

In this study, we used EEG signals of normal and
epileptic patients in order to perform comparison between
the neural networks and logistic regression-based model.
4.1. Visual inspection and validation
Two neurologists with experience in the clinical analysis
of EEG signals separately inspected every recording
included in this study to score epileptic and normal signals.
Each event was filed on the computer memory and linked to
the tracing with its start and duration. These were then
revised by the two experts jointly to solve disagreements
and set up the training set for the program, consenting to the
choice of threshold for the epileptic seizure detection. The
agreement between the two experts was evaluatedfor the
testing setas the rate between the numbers of epileptic
seizures detected by both experts. A further step was then
performed with the aim of checking the disagreements and
setting up a gold standard reference set. When revising this
unified event set, the human experts, by mutual consent,
marked each state as epileptic or normal. They also
reviewed each recording entirely for epileptic seizures that
had been overlooked by all during the first pass and marked
them as definite or possible. This validated set provided the
reference evaluation to estimate the sensitivity and selectivity of computer scorings. Nevertheless, a preliminary
analysis was carried out solely on events in the training set,
as each stage in these sets had a definite start and duration.
4.2. Development of logistic regression model and ANNs
The objective of the modelling phase in this application
was to develop classifiers that are able to identify any input
combination as belonging to either one of the two classes:
normal or epileptic. For developing the logistic regression
and neural network classifiers, 300 examples were randomly
taken from the 600 examples and used for deriving

Table 1
Class distribution of the samples in the training and the validation data sets
Class
Training set
Validation set
Total
Epileptic
Normal
Total
102
198
300
158
142
300
260
340
600
the logistic regression models or for training the neural

networks. The remaining 300 examples were kept aside and
used for testing the validity of the developed models. The
class distribution of the samples in the training and
validation data set is summarized in Table 1. Note that
although logistic regression does not involve training, we
will use training examples to refer to that portion of
database used to derive the regression equations.
The FEBANN was designed with PSD of EEG signal in
the input layer; and the output layer consisted of one node
representing whether epileptic seizure detected or not. A
value of 0 was used when the experimental investigation
indicated a normal and 1 for epileptic seizure. The
preliminary architecture of the network was examined
using one and two hidden layers with a variable number of
hidden nodes in each. It was found that one hidden layer is
adequate for the problem at hand. Thus the sought network
will contain three layers of nodes. The training procedure
started with one hidden node in the hidden layer, followed
by training on the training data (300 data sets), and then by
testing on the validation data (300 data sets) to examine the
networks prediction performance on cases never used in its
development. Then, the same procedure was run repeatedly
each time the network was expanded by adding one more
node to the hidden layer, until the best architecture and set
of connection weights were obtained. Using the modified
error-backpropagation algorithm for training, a training rate
of 0.01 and momentum coefficient of 0.95 were found
optimum for training the network with various topologies.
The selection of the optimal network was based on
monitoring the variation of error and some accuracy
parameters as the network was expanded in the hidden
layer size and for each training cycle. The sum of squares of
error representing the sum of square of deviations of ANN
solution (output) from the true (target) values for both the
training and test sets was used for selecting the optimal
network. Additionally, because the problem involves
classification into two classes, accuracy, sensitivity and
specificity were used as a performance measure. These
parameters were obtained separately for both the training
and validation sets each time a new network topology was
examined. A computer program that we have written for the
training algorithm based on backpropagation of error was
used to develop the FEBANNs.
The classifier implemented for this work is a WNN with
one hidden layer and one output layer as shown in Fig. 5. An
input vector is applied to the input layer where all of the
inputs are distributed to each unit in the first hidden layer.
995
All of the units have weight vectors which are multiplied by

these input vectors. Each unit sums these inputs and
produces a value that is transformed by a wavelet activation
function. The output of the final layer is then computed by
multiplying the output vector from the hidden layer by the
weights into the final layer. More summations and
activations at these units then give the actual output of the
network. We used a network with a variable number of
hidden units and one output unit as in FEBANN. One output
unit is all that is needed because we are only classifying a
two-task problem.
According to the theory, the number of nodes in the
hidden layer of the network is equal to that of wavelet base.
If the number is too small, WNN may not reflect the
complex function relationship between input data and
output value. On the contrary, a large number may create
such a complex network that might lead to a very large
output error caused by over-fitting of the training sample.
The optimum number of nodes in hidden layer is 21.
As a result, the network in this paper is constructed by the
error back propagation neural network using Morlet mother
wavelet basic function as node activation function. The
amount by which the weights are adjusted on each step is
parameterized by learning rate constants. We used one
learning rate for the hidden layer and a different rate for the
output layer. After trying a large number of different values,
we found that a learning rate of 0.001 for the hidden layer
and the output layer produced the best performance.
4.3. Experimental results
Firstly we used FFT and AR spectrum of EEG signals for
logistic regression, FEBANN and WNN classification. The
procedure was repeated on EEG recordings of all subjects
(healthy and epileptic patients). The correct classification
results for FFT and AR are shown in Table 2.
As seen in the Table 2. FFT spectrum gave the poor
result; hence, it was omitted from the following experiments. The poor classification was somewhat expected due
to the nature of the EEG signals and the fact that the PSD of
FFT is not accurate.
Table 3 shows a summary of the performance measures
by using AR spectral density of EEG signals. It is obvious
from Table 3 that the WNN-based classifier is ranked first in
terms of its correct classification percentage of the EEG
signals (epileptic/normal data 93%), while the FEBANNbased classifier came second (90.6%). The logistic
Table 2
Comparison of correct classification performance for the AR and FFT
methods
Preprocessing
method
Logistic
regression (%)
FEBANN
(%)
WNN
(%)
AR with MLE
FFT
89.3
88.6
90.6
88.3
93
91
996
Table 3
Comparison of logistic regression and neural network models for EEG
signals
Classifier
type
Correctly
classified (%)
Specifity
(%)
Sensitivity
(%)
Area under
ROC curve
Logistic
regression
FEBANN
WNN
89.3
89.2
89.4
0.887
90.6
93
91.5
92.4
89.8
93.6
0.894
0.918
regression-based classifier had lowest correct classification

percentage (89.3%) compared to the neural network-based
counterparts. While the logistic regression analysis showed
89.4% success in classification of the patients (sensitivity),
the FEBANN classified the same data with a success rate of
89.8% and WNN classified the same data with a success rate
of 93.6%.
Also, the area under ROC curves for the 3 classifiers
(logistic regression, FEBANN and WNN) are given in Table
3. To quantify the performance characteristics of each
classifier, the area under ROC curve (AUC) was computed
for validation data ROC curve. Table 3 presents a summary
of these AUC values for the 3 classifiers developed. As can
be seen from Table 3, the WNN-based classifier is
undoubtedly the best classifier with AUCZ0.918 for the
validation data ROC curves. The FEBANN-based classifier
is next in performance, while the logistic regression
exhibited a slightly lower performance, followed by the
linear logistic regression-based classifier.
The testing performance of the neural network diagnostic
system is found to be satisfactory and we think that this
system can be used in clinical studies in the future after it is
developed. This application brings objectivity to the
evaluation of EEG signals and its automated nature makes
it easy to be used in clinical practice. Besides the feasibility
of a real-time implementation of the expert diagnosis
system, diagnosis may be made more accurately by
increasing the variety and the number of parameters. A
black box device that may be developed as a result of this
study may provide feedback to the neurologists for
classification of the EEG signals quickly and accurately
by examining the EEG signals with real-time
implementation.
In this paper, two approaches to develop classifiers for

identifying epileptic seizure were discussed. One approach
is based on the traditional method of statistical logistic
regression analysis where logistic regression (LR) equations
were developed. The other approach is based on the neural
network technology, mainly using feedforward neural
network trained by the error-backpropagation algorithm
(FEBANN) and the wavelet neural network (WNN). Using
FFT and AR spectrums of EEG signals, three classifiers;
namely LR, FEBANN, and WNN, were constructed and
cross-compared in terms of their accuracy relative to the
observed epileptic/normal patterns. The comparisons were
based on analysis of the receiving operator characteristic
(ROC) curves of the three classifiers and two scalar
performance measures derived from the confusion matrices;
namely specificity and sensitivity. The WNN-based classifier identified accurately all the epileptic and normal cases
with specificity 92.4% and sensitivity 93.6%, followed by
the FEBANN-based classifier with specificity 91.5% and
sensitivity 89.8%, and then the LR-based classifier with
specificity 89.2% and sensitivity 89.4%. Out of the 300
epileptic/normal cases, the LR-based classifier misclassified
a total of 32 cases; FEBANN-based classifier misclassified
28 cases, while the WNN-based classifier misclassified 21
cases.
Essentially, WNNs and FEBANNs require deciding on
the number of hidden layers, number of nodes in each
hidden layer, number of training iteration cycles, choice of
activation function, selection of the optimal learning rate
and momentum coefficient, as well as other parameters and
problems pertaining to convergence of the solution.
Advantages of WNNs over FEBANNs and logistic
regression include their robustness to noisy data (with
outliers) which can severely hamper many types of ANNs as
well as most traditional statistical methods. Finally, the fact
that a WNN-based classifier can be developed quickly
makes such classifiers efficient tools that can be easily retrained, as additional data become available, when
implemented in the hardware of EEG signal processing
systems.
With specificity and sensitivity values both above 92 %,
the wavelet neural network classification may be used as an
important diagnostic decision support mechanism to assist
physicians in the treatment of epileptic patients.
5. Summary and conclusions

References
Diagnosing epilepsy is a difficult task requiring
observation of the patient, an EEG, and gathering of
additional clinical information. An artificial neural network that classifies subjects as having or not having an
epileptic seizure provides a valuable diagnostic decision
support tool for physicians treating potential epilepsy,
since differing etiologies of seizures result in different
treatments.
Anderson, C. W., Devulapalli, S. V., & Stolz, E. A. (1995). Determining

mental state from EEG signals using neural networks. Scientific
Programming, 4, 171183.
Basheer, I., & Hajmeer, M. (2000). Artificial neural networks: Fundamentals, computing, design, and application. Journal of Microbiological Methods, 43, 331.
Daubechies, I. (1992). Ten lectures on wavelets. CBMS-NSF regional
series in applied mathematics. Philadelphia, PA: SIAM.

Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression and artificial
neural network classification models: a methodology review. Journal of
Biomedical Informatics, 35, 352359.
Gabor, A. J., Leach, R. R., & Dowla, F. U. (1996). Automated seizure
detection using a self-organizing neural network. Electroencephalography and Clinical Neurophysiology, 99, 257266.
Gotman, J. (1982). Automatic recognition of epileptic seizures in the EEG.
Electroencephalography and Clinical Neurophysiology, 54, 530540.
Goto, S., Nakamura, M., & Uosaki, K. (1995). On-line spectral estimation
of nonstationary time series based on artificial model parameter
estimation and order selection with a forgetting factor. IEEE
Transactions on Signal Processing, 43, 15191522.
Guler, I., Kiymik, M. K., Akin, M., & Alkan, A. (2001). AR spectral
analysis of EEG signals by using maximum likelihood estimation.
Computers in Biology and Medicine, 31, 441450.
Hajmeer, M., & Basheer, I. (2003). Comparison of logistic regression and
neural network-based classifiers for bacterial growth. Food Microbiology, 20, 4355.
Haselsteiner, E., & Pfurtscheller, G. (2000). Using time-dependent neural
networks for EEG classification. IEEE Transactions on Rehabilitation
Engineering, 8, 457463.
Hosmer, D. W., & Lemeshow, S. (1989). Applied logistic regression. New
York: Wiley.
Kalayci, T., & Ozdamar, O. (1995). Wavelet preprocessing for automated
neural network detection of EEG spikes. IEEE Engineering in Medicine
and Biology Magazine, 16, 0166.
Nguyen, T., Malley, R., Inkelis, S. H., & Kuppermann, N. (2002).
Comparison of prediction models for adverse outcome in pediatric
meningococcal disease using artificial neural network and logistic
regression analyses. Journal of Clinical Epidemiology, 55, 687695.
Osario, I., Frei, M. G., & Wilkinson, S. B. (1998). Real-time automated
detection and quantitative analysis of seizures and short-term prediction
of clinical onset. Epilepsia, 39, 615627.
Pati, Y. C., & Krishnaparasad, P. S. (1993). Analysis and synthesis of
feedforward neural networks using discrete affine wavelet transformations. IEEE Transactions on Neural Networks, 4(1), 7385.
Petrosian, A., Prokhorov, D., Homan, R., Dashei, R., & Wunsch, D. (2000).
Recurrent neural network based prediction of epileptic seizures in intraand extracranial EEG. Neurocomputing, 30, 201218.
Pradhan, N., Sadasivan, P. K., & Arunodaya, G. R. (1996). Detection of
seizure activity in EEG by an artificial neural network: A preliminary
study. Computers and Biomedical Research, 29, 303313.
Qu, H., & Gotman, J. (1997). A patient-specific algorithm for the detection of
seizure onset in long-term EEG monitoring: Possible use as a warning
device. IEEE Transactions on Biomedical Engineering, 44, 115122.
997
Qussar, Y., & Dreyfus, G. (2000). Initialization by selection for wavelet

network training. Neurocomputing, 34, 131143.
Schumacher, M., Robner, R., & Vach, W. (1996). Neural networks and
logistic regression: Part I. Computational Statistics & Data Analysis,
21, 661682.
Sureshbabu, A., & Farrell, J. A. (1999). Wavelet-based system identification for nonlinear control. IEEE Transactions on Automatic Control,
44(2), 412417.
Tseng, S.-Y., Chen, R.-C., Chong, F.-C., & Kuo, T.-S. (1995). Evaluation
of parametric methods in EEG signal analysis. Medical Engineering &
Physics, 17, 7178.
Tsoi, A. C., So, D. S. C., & Sergejew, A (1994). Classification of
electroencephalogram using artificial neural networks. In Jack D.
Cowan, Gerald Tesauro, & Joshua Alspector, Advances in neural
information processing systems (vol. 6) (pp. 11511158). Los Altos,
CA: Morgan Kaufmann.
Vach, W., Robner, R., & Schumacher, M. (1996). Neural networks and
logistic regression: Part II. Computational Statistics & Data Analysis,
21, 683701.
Vuckovic, A., Radivojevic, V. A., Chen, C. N., & Popovic, D. (2002).
Automatic recognition of alertness and drowsiness from EEG by
an artificial neural network. Medical Engineering & Physics, 24,
349360.
Webber, W. R. S., Lesser, R. P., Richardson, R. T., & Wilson, K. (1996).
An approach to seizure detection using an artificial neural network
(ANN). Electroencephalography and Clinical Neurophysiology, 98,
250272.
Weng, W., & Khorasani, K. (1996). An adaptive structure neural network
with application to EEG automatic seizure detection. Neural Networks,
9, 12231240.
Wong, K., & Leung, A. (1998). On-line successive synthesis of wavelet
networks. Neural Processing Letters, 7, 91100.
Xu, J., & Ho, D. W. C. (2002). A basis selection algorithm for wavelet
neural Networks. Neurocomputing, 48, 681689.
Zhang, J., Walter, G. G., & Lee, W. (1995). Wavelet neural networks for
function learning. IEEE Transactions on Signal Processing, 43(6),
14851497.
Zhang, Q. (1997). Using wavelet network in nonparametric estimation.
IEEE Transactions on Neural Networks, 8(2), 227236.
Zhang, Q., & Benveniste, A. (1992). Wavelet networks. IEEE Transactions
on Neural Networks, 3(6), 889898.
Zoubir, M., & Boashash, B. (1998). Seizure detection of newborn EEG
using a model approach. IEEE Transactions on Biomedical Engineering, 45, 673685.

27 PDF

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

27 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

27 PDF

Uploaded by

Copyright:

Available Formats

Neural Networks 18 (2005) 985997

Wavelet neural network classification of EEG signals by using

needs to analysis of EEG signals. Therefore, some

A. Subasi et al. / Neural Networks 18 (2005) 985997

Pradhan presented preliminary results for a classification of

wavelet pre-processing, to predict the onset of epileptic

2. Materials and method

A. Subasi et al. / Neural Networks 18 (2005) 985997

Fig. 1. Epileptic EEG signal.

of the American EEG Society and electrodes were placed

The aim now is to estimate the AR model parameters by

If the covariance matrix cannot be inverted in closed

A. Subasi et al. / Neural Networks 18 (2005) 985997

Fig. 2. Normal EEG signal.

is the periodogram of the data and Pxx(f) is the

The second derivative allows the NewtonRaphson or

After some calculations and derivations, the estimated

These are so-called estimated YuleWalker equations

A. Subasi et al. / Neural Networks 18 (2005) 985997

These parameters are used to compute AR spectral power

The order of the model, namely, the filter, is depend on

using FFT and AR method which has the MLE

2.3. Spectral analysis of EEG signals

Logistic regression (Hajmeer & Basheer, 2003; Hosmer

A. Subasi et al. / Neural Networks 18 (2005) 985997

a widely used statistical modeling technique in which the

In Eq. (12), b0 is the intercept and b1,b2,.,bn are the

distributed independent variables. Logistic regression

Unlike OLR, logistic regression will force the probability

A. Subasi et al. / Neural Networks 18 (2005) 985997

values of the dependent variable may be predicted from the

In Eq. (14), l is an activation function such as the simple

A. Subasi et al. / Neural Networks 18 (2005) 985997

training they change dynamically in such a way to produce

the confusion matrix. In order to analyze the output data

3. Wavelet neural networks

A. Subasi et al. / Neural Networks 18 (2005) 985997

time-frequency space. The applications of WNN are usually

integrable function f, the following inequalities hold

where mj and dj are the translation and dilation vectors,

It can be viewed as a network with an input vector of Ni

where vector q includes all network parameters to be

where a and b are constants that fully determine, together

These relations show that, unlike the continuous

A. Subasi et al. / Neural Networks 18 (2005) 985997

continuously; therefore, gradient-based techniques cannot

the Euclidean distance between input and translation vectors:

4. Results and discussion

A. Subasi et al. / Neural Networks 18 (2005) 985997

the logistic regression models or for training the neural

All of the units have weight vectors which are multiplied by

A. Subasi et al. / Neural Networks 18 (2005) 985997

regression-based classifier had lowest correct classification

In this paper, two approaches to develop classifiers for

5. Summary and conclusions

Anderson, C. W., Devulapalli, S. V., & Stolz, E. A. (1995). Determining

A. Subasi et al. / Neural Networks 18 (2005) 985997

Qussar, Y., & Dreyfus, G. (2000). Initialization by selection for wavelet