Efficient Epileptic Seizure Prediction Based On Deep Learning

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBCAS.2019.2929053, IEEE
Transactions on Biomedical Circuits and Systems
Efficient Epileptic Seizure Prediction based on

Deep Learning
Hisham Daoud, Member, IEEE, Magdy Bayoumi, Fellow, IEEE
Abstract— Epilepsy is one of the world's most common Epilepsy has a high disease burden where 50 million people
neurological diseases. Early prediction of the incoming seizures worldwide have epilepsy and there are about two million new
has a great influence on epileptic patients’ life. In this paper, a
patients recorded every year. Up to 70% of the epileptic
novel patient-specific seizure prediction technique based on deep
learning and applied to long-term scalp EEG recordings is patients could be controlled by the Anti-Epileptic Drugs
proposed. The goal is to accurately detect the preictal brain state (AED) while the other 30% are uncontrollable [2].
and differentiate it from the prevailing interictal state as early as Electroencephalogram (EEG) is the electrical recording of
possible and make it suitable for real-time. The features the brain activities and is considered the most powerful
extraction and classification processes are combined into a single diagnostic and analytical tool of epilepsy. Physicians classify
automated system. Raw EEG signal without any preprocessing is
the brain activity of the epileptic patients according to the
considered as the input to the system which further reduces the
computations. Four deep learning models are proposed to extract EEG recordings into four states: preictal state, which is
the most discriminative features which enhance the classification defined by the time period just before the seizure, ictal state
accuracy and prediction time. The proposed approach takes which is during the seizure occurrence, postictal state that is
advantage of the convolutional neural network in extracting the assigned to the period after the seizure took place and finally
significant spatial features from different scalp positions and the the interictal state which refers to the period between seizures
recurrent neural network in expecting the incidence of seizures
other than the previously mentioned states [3], these four
earlier than the current methods. A semi-supervised approach
based on transfer learning technique is introduced to improve the states are illustrated in Fig. 1.
optimization problem. A channel selection algorithm is proposed Due to unexpected seizure times, epilepsy has a strong
to select the most relevant EEG channels which makes the psychological and social effect in addition to it could be
proposed system good candidate for real-time usage. An effective considered a life-threatening disease. Consequently, the
test method is utilized to ensure robustness. The achieved highest prediction of epileptic seizures would greatly contribute to
accuracy of 99.6% and lowest false alarm rate of 0.004 𝐡−𝟏 along improving the quality of life of epileptic patients in many
with very early seizure prediction time of one hour make the
proposed method the most efficient among the state of the art.
aspects, like raising an alarm before the occurrence of the
seizure to provide enough time for taking proper action,
Index Terms— classification, deep learning, epilepsy, EEG, developing new treatment methods and setting new strategies
interictal, preictal, seizure prediction to better understand the nature of the disease. According to the
above categorization of the epileptic patient’s brain activities,
the seizure prediction problem could be viewed as a
I. INTRODUCTION classification task between the preictal and interictal brain
E PILEPSY is defined according to the International League states. An alarm is raised in case of detecting the preictal state
Against Epilepsy (ILAE) report [1], as a neurological among the predominant interictal states indicating a potential
brain disorder identified by the frequent occurrence of seizure is coming as shown in Fig.1. The prediction time is the
symptoms called epileptic seizure due to abnormal brain time before the seizure onset when the preictal state is
activities. Seizure’s characteristics include loss of awareness detected.
or consciousness and disturbances of movement, sensation or In the literature, there are various methods proposed to
other cognitive functions. The overall incidence of epilepsy is address the seizure prediction problem trying to reach high
23-100 per 100,000. People at extremes of age are the most classification accuracy with early prediction. Since EEG
affected age group while the disease crests among young signals are different across patients due to the variations in
individuals in ages between 10 to 20 years old [2]. seizure type and location [4], most seizure prediction methods
are therefore patient-specific. In these methods, supervised
learning techniques are used through two main stages which
Manuscript submitted March 1, 2019. are feature extraction and classification between preictal states
Hisham Daoud is with the Center for Advanced Computer Studies, and interictal states. In [5], the authors categorize the feature
University of Louisiana at Lafayette, Lafayette, LA 70503 USA, (e-mail:
[email protected]).
extraction schemes in terms of localization into univariate and
Magdy Bayoumi is with the Department of Electrical and Computer bivariate and in terms of linearity into linear and nonlinear
Engineering, University of Louisiana at Lafayette, Lafayette, LA 70503 USA, Multiple features are sometimes combined to capture the brain
(e-mail: [email protected]).
1932-4545 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBCAS.2019.2929053, IEEE
should be considered, therefore we introduce a channel

selection algorithm to select the best representing channels
from the multi-channel EEG recording. The used testing
p Fig. 1. Brain states in a typical epileptic EEG recording
method proves the robustness of the proposed algorithms over
different seizures.
dynamics that ends up in dimensionality increase. The
extracted features are used to train the classifier that could
II. METHODOLOGY
then be used for the analysis of new EEG recordings to predict
the occurrence of the seizure by detecting the preictal state. In this paper, we propose four deep learning based models
In the previous studies, the extracted features are for the purpose of early and accurate seizure prediction taking
categorized into three main groups: time domain, frequency into account the real-time operation. The seizure prediction
domain and nonlinear features. The authors in [6] used some problem is formulated as a classification task between
statistical measures like variance, skewness and kurtosis as interictal and preictal brain states, in which a true alarm is
time domain features. In [7], the authors calculated the considered when the preictal state is detected within the
spectral power of the EEG signals for frequency domain predetermined preictal period as shown in Fig. 1. In spite of
analysis. Some nonlinear features that are derived from the the abundant research work done in seizure prediction, there is
dynamic systems’ theory were investigated such as Lyapunov no standard duration for the preictal state. In our experiments,
exponent [8] and dynamic similarity index [9]. Based on the the preictal duration was chosen to be one hour before the
selected features, a prediction scheme that detects the preictal seizure onset and interictal duration was chosen to be at least
brain state is implemented. Most of the previous work four hours before or after any seizure as in [15]. Raw EEG
proposed machine learning based prediction schemes like data without any preprocessing and without handcrafted
Support Vector Machine (SVM). SVM classifier is used in features extracting is used as the input to all the models. The
numerous studies like [7], [10], [11] to predict the epileptic discriminative features are learned automatically using the
seizures. SVMs achieved outstanding results over other types deep learning algorithms in order to reduce the overhead and
of classifiers in terms of specificity and sensitivity [5]. speed up the classification task. Due to the limited number of
Deep learning algorithms achieved great success in multiple seizures for each patient, there is an imbalance between
classification problems for various applications like computer preictal and interictal samples. Obviously, the number of
vision and speech recognition. Some previous work utilized interictal samples is much larger than the number of preictal
deep learning in the classification stage for seizure prediction samples, and the classifiers tend to be more accurate toward
problem. In [12], the authors applied multi-layer perceptron to the class with the larger number of training samples [16]. In
the extracted features. In [13] and [14], the authors used a our experiments, we selected the number of interictal samples
convolutional neural network as a classifier that is applied on to be equal to the number of preictal samples to make the data
the extracted features from EEG data to predict seizures. balanced. The EEG signals are divided to non-overlapping
The main challenge of the previously proposed methods is five seconds segments, each segment is considered as a
to determine the most discriminative features that best training batch.
represent each class. The computation time needed to extract In our first model, Multi-layer Perceptron (MLP), a simple
these features depends on the process complexity and is deep neural network, is trained on the selected patients to learn
considered another challenge especially in real-time the network parameters that are able to do the classification
application. Motivated by these challenges and due to the task. The block diagram of the model is shown in Fig. 2. To
significance of the early and accurate seizure prediction, we enhance the classification accuracy, we propose the second
developed deep learning based seizure prediction algorithms model that relies on Deep Convolutional Neural Network
that combine the feature extraction and classification stages (DCNN) which extracts the spatial features from different
into a single automated framework. electrodes’ locations and uses MLP for the classification task
In this paper, we aim at automatic extraction of the most as illustrated in Fig. 3. In order to use DCNN, EEG data is
important features by developing deep learning based represented by a matrix with one dimension is the number of
algorithms without any preprocessing. Multi-Layer Perceptron channels and the other dimension is the time steps. In our third
is applied to the raw EEG recordings as a simple architecture model, proposed in [17], DCNN is utilized and concatenated
of multiple trainable hidden layers, then Deep Convolutional with a Bidirectional Long Short-Term Memory (Bi-LSTM)
Neural Network (DCNN) is used to learn the discriminative Network as the model back-end to do the classification as
spatial features between interictal and preictal states. In shown in Fig. 4. LSTM networks are known for their
another proposed model, Bidirectional Long Short-Term excellence in learning temporal features while maintaining
Memory (Bi-LSTM) Recurrent Neural Network is long-time sequences dependencies which helps in early
concatenated to the DCNN to do the classification task. An prediction. Prediction problems are handled better using Bi-
Autoencoder (AE) based semi-supervised model is proposed LSTM as it uses information from both previous and next time
and pre-trained using transfer learning technique to enhance instances. For the sake of training time reduction, we
the model optimization and converge faster. For the system to developed the fourth model that implements Deep
be suitable for real-time usage, computation complexity convolutional Autoencoder (DCAE) architecture. In DCAE,
we pre-trained the model front-end, DCNN, in an
unsupervised manner. Then, the training process is launched A. Dataset

with some initial values that will help the network to converge In this paper, we trained the proposed models and evaluated
faster and enhance the network optimization which in turn their performance on the CHB-MIT EEG dataset recorded at
reduce the training time and increase the accuracy. Transfer Children’s Hospital Boston [18], [19] which is publicly
learning approach is used to train the DCAE to improve the available [20]. The dataset composed of long-term scalp EEG
generalization across different seizures for the same patient. data for 22 pediatric subjects with intractable seizures and one
After training the AE, the trained encoder is connected to Bi- recording with missing data. The recordings were taken during
LSTM network for classification. Fig. 5 illustrates the two several days after anti-seizure medication withdrawal to
parts of the DCAE model. We propose a channel selection characterize their seizures and evaluate their candidacy for
algorithm to reduce the number of EEG channels which surgical intervention. Most cases have EEG recordings from
successively reduce the computation complexity and allocated surface electrodes of 23 channels in accordance with the
memory making the system suitable for real-time application. International 10-20 system. The sampling rate of the acquired
EEG signals is 256 samples per second with 16-bit resolution.
Preictal
There are some variations in many factors between all subjects
MLP such as interictal period, preictal period, number of channels,
and recording continuity. Therefore, we chose eight subjects
Interictal in this study such that the pre-determined interictal and
Deep Learning based
Multi-channel EEG Classifier preictal periods are satisfied, the recordings are not interrupted
and the full channels’ recordings are available. Table I
Fig. 2. Block Diagram of MLP based Seizure predictor
summarizes the details about the EEG recordings used in our
experiments.
Preictal
TABLE I
DCNN MLP
ML INFORMATION OF EEG DATA OF THE SELECTED PATIENTS
Interictal Number of Total Seizures
Case ID-Gender-Age
Deep Learning based Seizures Time (s)
Multi-channel EEG EEG Matrix classifier 1 1-F-11 7 442
2 3-F-14 7 402
Fig. 3. Block Diagram of DCNN + MLP based Seizure predictor 3 7-F-14.5 3 325
4 9-F-10 4 276
5 10-M-3 7 447
6 20-F-6 8 294
Preictal 7 21-F-13 4 199
DCNN Bi-LSTM 8 22-F-9 3 204
RNN
Interictal B. Multi-Layer Perceptron
Deep Learning based
Multi-channel EEG EEG Matrix classifier
Multilayer Perceptron (MLP) is considered one of the most
widely used artificial neural network (ANN). MLP consists
Fig. 4. Block Diagram of DCNN + Bi-LSTM based Seizure predictor usually of three successive layers, called: input layer, hidden
layers, and output layer [21]. Deep ANNs are composed of
multiple hidden layers that enable the network to learn the
features better using the non-linear activation functions. The
Conv. Latent Space Deconv. ANN idea is motivated by the structure of the human brain’s
Encoder Rep. Decoder neural system. A typical ANN is a buildup of connected units
Input Multi- Reconstructed called neurons. These artificial neurons incorporate the
channel EEG Multi-channel received data and transmit it to the other associated neurons,
(a) EEG
much like the biological neurons in the brain. The output of a
Preictal neuron in any ANN is computed by applying a linear or non-
Pretrained Bi-LSTM linear activation function to the weighted sum of the neurons’
Encoder RNN output in the preceding layer. When the ANN used as a
Interictal
classifier, the final output at the output layer indicates the
Deep Learning based
Multi-channel EEG EEG Matrix classifier appropriate predicted class of the corresponding input data.
(b) In our first proposed seizure prediction model, Fig. 2, we
apply the raw EEG after segmentation to MLP with four
Fig. 5. Block Diagram of the semi-supervised DCAE + Bi-LSTM model, (a)
pre-training phase of DCAE to generate the reconstructed EEG signals from hidden layers as depicted in Fig. 6. The number of units in each
the latent space representation through unsupervised learning and (b) pre- layer is 300, 100, 50, 20 starting from the first hidden layer to
trained classifier that predicts seizures through supervised learning the fourth one. The total number of trainable parameters is
8,870,291 which is considered high due to the fully connected
architecture. The model is trained with backpropagation and
the pooling layer to reduce the features’ dimension and

therefore the computational complexity. Finally, the fully
connected layer is applied to all the preceding layer’s output to
generate the one-dimensional feature vector. CNN is used as a
feature extractor to replace the complex feature engineering
used in previous work.
The proposed DCNN architecture model is shown in Fig. 7,
in which the EEG segment is converted into a 2D matrix to be
suitable for the DCNN. The architecture consists of four
convolutional layers and three maximum pooling layers
interchangeably. We chose the number of kernels in each
convolution layer to be 32 with kernel size of 3x2 to cover the
non-square matrix of EEG data. The maximum pooling layers
have pool size of 2x2. RELU activation function is used across
all the convolutional layers. Batch Normalization technique
[26] is used to improve the training speed and reduce
Fig. 6. The architecture of the proposed MLP based classifier overfitting through adding some noise to each layer’s
activation.
optimized using RMSprop algorithm. The loss function used is The Batch Normalization Transform is defined as:
the binary cross entropy defined by (1).
𝑥𝑖 −𝜇𝐵
𝑙(𝑦, 𝑦̂) = −[𝑦 log(𝑦̂) + (1 − 𝑦) log(1 − 𝑦̂)] (1) 𝐵𝑁𝛾,𝛽 (𝑥𝑖 ) = 𝛾 +𝛽 (4)
2 +𝜖
√𝜎𝐵
where 𝑦̂ and 𝑦 are the desired output and the calculated
where 𝑥𝑖 is the vector to be normalized in a mini-batch B =
output respectively and 𝑙(𝑦, 𝑦̂) is the loss function.
{𝑥1 , 𝑥2 , . . . 𝑥𝑚 }. 𝜇𝐵 and 𝜎𝐵2 are the mean and variance of the
Rectifier Linear Unit (ReLU) activation function [22], as
current mini-batch of 𝑥𝑖 , respectively. 𝜖 is a constant added to
defined by (2), is used across the hidden layers to add
nonlinearity and to ensure robustness against noise in the input the mini-batch variance for numerical stability. 𝛾 and 𝛽 are
data. learned parameters used to scale and shift the normalized value
𝑥 𝑖𝑓 𝑥 > 0 respectively [26].
𝑓(𝑥) = { (2) The proposed DCNN architecture is used as the front-end
0 𝑖𝑓 𝑥 < 0
feature extractor in our three proposed models in Fig. 3, 4, 5(b)
where 𝑥 is the sum of the weighted input signals and 𝑓(𝑥) is which helps in spatial feature extraction from the different
the ReLU activation function. electrodes position on the scalp. The number of trainable
Sigmoid activation function (3) is selected for the output layer. parameters is drastically decreased when employing DCNN
to predict the input data class. due to the weight sharing property. The number of trainable
1
parameters in the second model, DCNN + MLP, is almost
𝑝𝑖 = (3) 520K, while in the third and fourth model, DCNN + Bi-LSTM
1+𝑒 −𝑥𝑖
and DCAE + Bi-LSTM, the number of trainable parameters is
where 𝑥𝑖 is the sum of the weighted input signals and 𝑝𝑖 is the
almost 28K.
probability of the input example being preictal.
D. Bidirectional-LSTM Recurrent Neural Network
C. Deep Convolutional Neural Network
Recurrent neural network (RNN) is a type of neural network
Convolutional Neural networks (CNNs) have shown great
that can maintain state along the sequential inputs. It can
success in different pattern recognition and computer vision
process a temporal sequence of data depending on the
applications [23]. This is due to the ability of CNN to
processing done on the previous sequences. This property of
automatically extract significant spatial features that best
RNN makes it suitable for applications like prediction of time
represents the data from its raw form without any
series data. The typical architecture of RNN is trained using
preprocessing and without any human decision in selecting
backpropagation through time (BPTT) which has some
these features [24]. The sparse connectivity and parameter
drawbacks like exploding and vanishing gradients and
sharing of CNN give it high superiority regarding the memory
information morphing.
footprint as it requires much less memory to store the sparse
Long Short Term Memory Networks (LSTMs) [27] are a
weights. The equivariant representation property of the CNN
type of RNN, implemented to overcome the problems of basic
increases the detection accuracy of a pattern when it exists in a
RNN. LSTMs are able to solve the problem of vanishing
different location across the image [25]. A typical CNN formed
gradient by maintaining the gradient values during the training
of three types of layers: convolution layer, pooling layer and
process and backpropagate it through layer and time, thus
fully connected layer. The convolution layer is used to generate
LSTM has the capability of learning long-term dependencies.
the feature map by applying filters with trainable weights to the
LSTM cell, as shown in Fig. 8 consists of three controlling
input data. This feature map is then down-sampled by applying
1280x23 32@1279x22
32@639x11 32@638x10
32@319x5 32@318x4
32@159x2
32@158x1
st st nd nd rd rd th
Multi-channel EEG EEG Matrix 1 Conv 1 Pooling 2 Conv 2 Pooling 3 Conv 3 Pooling 4 Conv
Fig. 7. The architecture of the proposed DCNN front-end in DCNN based models
gates that could store or forget the previous state and use or output at each time step is the combined outputs of the two
discard the current state. Any LSTM cell computes two states blocks at this time step. In addition to the previous context
at each time step: a cell state (c) that could be maintained for processing in standard LSTM, Bi-LSTM processes the future
long time steps and a hidden state (h) that is the new output of context which enhances the prediction results. Using Bi-
the cell at each time step. The mathematical expressions LSTM as a classifier enhances the prediction accuracy through
governing the cell gates’ operation are defined as follows: extracting the important temporal features in addition to the
spatial features extracted by the DCNN.
ft = σ(Wfh ht-1 +Wfx xt +bf ) (5)
it = σ(Wih ht-1 +Wix xt +bi ) (6)
ot = σ(Woh ht-1 +Wox xt +bo ) (7)
c̃ t = tanh(Wch ht-1 +Wcx xt +bc ) (8)
ct = ft ∘ ct-1 +it ∘ c̃ t (9)
ht = ot ∘ tanh(ct ) (10)
where xt is the input at time t , ct and ht are the cell state and
Fig. 9. The unrolled Bidirectional LSTM Network
the hidden state at time t respectively. W and b denote weights
and biases parameters respectively. σ is the sigmoid function Bi-LSTM is used in two proposed models in Fig. 4, 5(b), as the
and ∘ is the Hadamard product operator. c̃ t is a candidate for back-end classifier that works on the feature vector generated
updating ct through the input gate. by DCNN. The proposed network consists of a single
The input gate it decides whether to update the cell with a bidirectional layer that predicts the class label at the last time
new cell state c̃ t , while the forget gate ft decides what to keep instance after processing all the EEG segments as shown in
or forget from the previous cell state and finally the output Fig. 9. We chose the number of units, dimensionality of the
gate ot decides how much information to be passed to the next output space, to be 20. Dropout regularization technique is
cell. utilized to avoid overfitting. The dropout is applied to the
input and the recurrent state with factor of 10% and 50%
respectively. The sigmoid activation function is used for
prediction of the EEG segment’s class and RMSprop is
selected for optimization.
E. Deep Convolutional Autoencoder
Autoencoders (AEs) are unsupervised neural networks
whose target is to find a lower dimensional representation of
the input data. This technique has many applications like data
compression [29], dimensionality reduction [30], visualizing
Fig. 8. Basic LSTM cell
high dimensional data [31] and removing noise from the input
Instead of using LSTM as the classifier, we used a data. The AE network has two main parts namely, encoder and
Bidirectional-LSTM (Bi-LSTM) network [28] in which each decoder. The encoder compresses the high dimensional input
LSTM block is replaced by two blocks that process temporal data into lower dimensional representation called latent space
sequence simultaneously in two opposite directions as representation or bottleneck and the decoder is retrieving the
depicted in Fig. 9. In the forward pass block, the feature vector data back to its original dimension. The simple AE uses fully
generated from the DCNN is processed starting from its first- connected layers for the encoder and decoder. The aim is to
time instance to the end, while the backward pass block learn the parameters that minimize the cost function which
processes the same segment in the reverse order. The network expresses the difference between the original data and the
1278x22 1278x22
639x11 639x11
637x10 636x10
Input Reconstructed
EEG Signal EEG Signal
318x5 318x5
316x4 316x4
st
158x2 158x2 rd
1 C st 156x1 rd 3 U
1 P nd nd 3 D
2 C nd
st
nd 2 U
2 P 3rd C rd 2 D
4 C 1 D 1 U
th st
3 P
Encoder Decoder
Fig. 10. The architecture of the proposed DCAE. C stands for convolution, P for pooling, D for deconvolution and U for upsampling layer
retrieved one. Deep Convolutional Autoencoder (DCAE) LSTM that is used in the third model (DCNN + Bi-LSTM).
replaces the fully connected layers in the simple AE with Training of this model is done in a supervised manner to
convolution layers. predict the patient-specific seizure onset. Since we used both
Due to the limited EEG dataset for each patient, we decided unsupervised and supervised learning algorithms, this model is
to extend our work to develop an unsupervised training considered a semi-supervised learning model.
algorithm using DCAE as shown in Fig. 5(a). The proposed
F. EEG Channel Selection
architecture of the DCAE model is depicted in Fig. 10. We
used the same proposed DCNN model as an encoder and We introduce an EEG channel selection algorithm to select
added the decoder network to build the DCAE. Unsupervised the most important and informative EEG channels related to
learning is deployed using transfer learning technique by our problem. Decreasing the number of channels helps with
training the DCAE on all the selected patients’ data (not reducing the features’ dimension, the computation load and
patient-specific). Transfer learning helps to obtain better the required memory for the model to be suitable for real-time
generalization and enhance the optimization of our prediction application. The proposed channel selection algorithm is
model and therefore reducing the training time. explained in Table II. We provide the algorithm with the EEG
In the DCAE, Fig. 10, the encoder part consists of preictal segments for each patient and the measured prediction
convolution and pooling layers interchangeably, while in the accuracy by running our fourth model, DCAE + Bi-LSTM
decoder part, the deconvolution and upsampling layers are using all channels. On the other hand, the algorithm will
used to reconstruct the original EEG segment. The encoder output the reduced channels that give the same accuracy by
output is the latent space representation which is low omitting redundant or irrelevant channels. We start by
dimensional features that best represent the EEG input computing the statistical variance defined by (12) and the
segment. On the other hand, the decoder output is the entropy defined by (13) for all the available channels (23
reconstructed version of the original input. The learned channels) of the preictal segments. Then, we select the
encoder parameters are saved to be used later for training the channels with highest variance entropy product that provide
prediction model in Fig. 5(b) allowing the training process to the same given prediction accuracy. This is done through an
have a good start point instead of random initialization of the iterative process by training our model on the reduced
parameters which reduces the training time drastically. channels over each iteration. The variance is estimated as
1
Training of the DCAE is done using unlabeled EEG 𝜎 2 (𝑋𝑐 ) = ∑𝑁
𝑖=1(𝑥𝑐 (𝑖) − 𝜇𝑐 )
2
(12)
𝑁
segments (balanced data of preictal and interictal segments) of
all the selected patients. RELU activation function is used where 𝑋𝑐 , 𝜇𝑐 and 𝑁 are the EEG data after normalization,
across all the convolutional layers. Batch Normalization mean and number of samples of channel 𝑐, respectively. The
technique is used to improve the training speed and to reduce entropy of channel 𝑐 is calculated as
overfitting. The DCAE is optimized using RMSprop optimizer.
The mean square error is utilized as the cost function and is 𝐻(𝑋𝑐 ) = − ∑𝑁
𝑖=1 𝑝(𝑥𝑐 (𝑖)) log 2 𝑝(𝑥𝑐 (𝑖)) (13)
defined as where 𝑝(𝑥𝑐 (𝑖)) is the probability mass function of the channel
1 (𝑖)
𝐽(Ɵ) = ∑𝑚
𝑖=1(𝑥́ − 𝑥 (𝑖) )2 (11) 𝑐 having 𝑁 samples.
2𝑚
In the channel selection algorithm, we chose the channels
where 𝑥 is the input EEG signal and 𝑥́ (𝑖) is the reconstructed
(𝑖)
with the highest variance entropy product because we want to
EEG signal. 𝑚 is the number of training examples and Ɵ is the
maximize both. We want to select the channel that has a high
parameters being learned.
variance during the preictal interval and also provide the
After DCAE training, the pre-trained encoder is used as a
largest amount of information.
front-end of the fourth proposed model, DCAE + Bi-LSTM, as
shown in Fig. 5(b) while the back-end is Bi-LSTM network.
We used the same network architecture of the DCNN and Bi-
TABLE II
THE PROPOSED EEG CHANNEL SELECTION ALGORITHM 𝑇𝑃
Algorithm 1: EEG channel selection algorithm 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = (14)
𝑇𝑃+𝐹𝑁
Input: Eight patients EEG preictal segments, the seizure 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑇𝑁
(15)
prediction accuracy for each patient using all 𝑇𝑁+𝐹𝑃
channels Acc[1 : 8] 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁
(16)
Output: Chred [1 : 8] (array of reduced channels for each 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
patient that give the same accuracy Acc) where TN, TP, FN and FP are the true negative, true positive,
Initialization: m = 8 (initial number of channels), done = 0 false negative and false positive respectively.
for patient ← 1 to 8 do
m = 8, done = 0; III. RESULTS
for ch ← 1 to 23 do
compute variance[ch]; A. Performance Evaluation and Analysis
compute entropy[ch]; We evaluated our proposed patient-specific models on the
compute variance[ch]*entropy[ch]; selected patients by calculating some performance measures
end such as prediction accuracy, prediction time, sensitivity,
sort the 23 channels with highest variance entropy product specificity and false alarm per hour. The training time is also
first in Temp; computed to evaluate our proposed channel selection
while done ≠ 1 do algorithm. Table III shows the obtained values of these
select first m channels from Temp; measures for the proposed four models which are MLP,
train and test the model with m channels; DCNN + MLP, DCNN + Bi-LSTM, DCAE + Bi-LSTM. The
compute the prediction accuracy Accnew ; fifth model, DCAE + Bi-LSTM + CS, is the same as the fourth
if Accnew ≥ Acc[patient] then one but with using the channel selection algorithm.
done ← 1;
As could be noticed from Table III, MLP has the worst
Chred [patient] ← Temp[1: m];
accuracy, sensitivity, specificity and false alarm rate among
else
the proposed models and this is because the learning process
m ← m+1
end in this model aims at updating the network parameters for the
end output to be close to the ground truth without extracting any
end features from the input data. The huge number of parameters
in this model (around 9 million) is another drawback. The
training time is moderate (7.3 min) due to network simplicity.
G. Training and Testing Method
In order to overcome the problem of the imbalanced dataset,
we selected the number of interictal segments to be equal to 100
the available number of preictal segments during the training
Accuracy (%)
80
process. The interictal segments were selected at random from
the overall interictal samples. To ensure robustness and 60
generality of the proposed models, we used the Leave-one-out
cross validation (LOOCV) technique as the evaluation method 40
for all of our proposed models. In LOOCV, the training is 1 3 7 9 10 20 21 22
MLP
done N separate times, where N is the number of seizures for a DCNN+MLP
Patient ID
specific patient. Each time, all seizures are involved in the DCNN+BiLSTM
training process except one seizure on which the testing is Fig. 11. The measured accuracy among three different proposed algorithms.
applied. The process is then repeated by changing the seizure
under test. By using this method, we ensure that the testing
covers all the seizures and the tested seizures are unseen 100
during the training. The performance for one patient is the
Sensitivity (%)
average across N trials and the overall performance is the 80

average across all patients. 80% of the training data is
assigned to the training set while 20% is assigned to the 60
validation set over which the hyperparameters are updated and
the model is optimized. 40
We evaluated the performance of our models by calculating MLP
1 3 7 9 10 20 21 22
some measures such as sensitivity, specificity, and accuracy DCNN+MLP Patient ID

on the test data. These measures are averaged across all DCNN+BiLSTM
patients. The prediction time of each model is recorded at the Fig. 12. The measured sensitivity among three different proposed algorithms.
time of first preictal segment detection. The evaluation
measures are defined as follows:
100
This improves the network optimization by starting the
training with an initial set of parameters that makes the
Specificity (%)
80 convergence process faster. As a result, the training time
decreased to 4.25 min on average with the same highest
60 performance. Utilizing the transfer learning technique reduces
overfitting and generalizes better.
40
1 3 7 9 10 20 21 22 The proposed channel selection algorithm reduces the
MLP
DCNN+MLP Patient ID number of channels to 10 channels on average among all the
DCNN+BiLSTM selected patients instead of using all the channels which are 23
Fig. 13. The measured specificity among three proposed algorithms. channels. Therefore, the computation complexity is reduced
making the training time to reach 2.2 min on average with
0.7
lowest number of parameters of around 18K which make this
0.6 model suitable for real-time applications. All the obtained
False Alarm
(per hour)
0.5 results are shown graphically for different models across the
0.4
0.3 selected patients in Fig. (11 – 15).
0.2 Regarding the prediction time, all the proposed models
0.1 were able to accurately predict the tested seizures from the
0
1 3 7 9 10 20 21 22 start of the preictal segments, thus the prediction time is one
MLP
DCNN+MLP Patient ID hour before the seizure onset or less in case of a shorter
DCNN+BiLSTM preictal segment.
Fig. 14. The measured false alarm rate among three proposed algorithms.
B. Statistical Analysis
We performed Kruskal-Wallis test [32] as a nonparametric
33
test statistic to compare the accuracy, sensitivity, specificity
Training Time (min)
30
27
and false alarm rate of each model of the three basic models
24
21
which are, MLP, DCNN + MLP, and DCNN + Bi-LSTM. The
18 Kruskal-Wallis test yielded (p-value < 0.05) for all the
15
12 performance measures indicating statistical significance
9
6 difference between the results among all the proposed models.
3
0 For the accuracy (p-value = 0.01), for the sensitivity (p-value
MLP 1 3 7 9 10 20 21 22
DCNN+MLP = 0.006), for the specificity (p-value = 0.04), and for the false
DCNN+BiLSTM Patient ID
DCAE+BiLSTM alarm rate (p-value = 0.04).
DCAE+BiLSTM+CS
Fig. 15. The measured training time on the test set among five proposed C. Comparison with Other Methods
algorithms: MLP, DCNN + MLP, DCNN + Bi-LSTM, DCAE + Bi-LSTM For further evaluation of our proposed method, we
and DCAE + Bi-LSTM + CS
compared our achieved experimental results with previous
By introducing the DCNN as a front-end, we found around work that have used the same dataset as shown in Table IV.
10% enhancement in the accuracy, sensitivity and specificity While the same criterion to select the patients from the dataset
and the false alarm rate is improved by 60%. This is applied in this paper and [10], the other compared work
improvement is due to the ability of DCNN to extract the employed different criteria which led to different selection of
spatial features across different scalp positions to use it in patients. In the presented previous work, some features were
discrimination between preictal and interictal brain states. On extracted like Zero-Crossing (ZC) interval in the EEG signals
the other hand, the training time is increased by 5 min and this as in [33] and ZC of the Wavelet Transform (WT) coefficients
is due to the added computation complexity by the DCNN. of the EEG signals as in [10], WT of the EEG signals as in
The network parameters are drastically decreased because of [13], spectral power as in [34] and set of features in time
the parameter sharing and sparse connectivity properties of the domain, frequency domain and from graph theory as in [11].
DCNN. In our third model, we used Bi-LSTM as the back- These studies used machine learning based classifiers like
end along with DCNN and this model increase the accuracy to SVM or Gaussian Mixture Model (GMM). The authors in [13]
be 99.6%, the sensitivity to be 99.72% and the specificity to used CNN as a classifier. The proposed method achieved the
be 99.6%. The false alarm rate is enhanced a lot to reach 0.004 highest accuracy, sensitivity and specificity among others. Our
false alarm per hour. This improvement is due to using Bi- prediction time is the earliest and the false alarm rate is the
LSTM as a classifier instead of MLP. Bi-LSTM extracts lowest.
temporal features from the input sequence which helps in
seizure prediction more accurately at the cost of training time IV. CONCLUSION
which reached 14.2 min. The number of parameters is In this paper, a novel deep learning based patient-specific
decreased by 94% by getting rid of the MLP. In the fourth epileptic seizure prediction method using long-term scalp EEG
model, DCAE is used to train the front-end part of our model. data has been proposed. This method achieves a prediction
TABLE III
PERFORMANCE EVALUATION OF THE PROPOSED MODELS
False Alarm Training No. of
Proposed Model Sensitivity Specificity Accuracy
𝒉−𝟏 Time (min) Parameters
MLP 84.67% 82.60% 83.63% 0.174 7.3 8,870,291
DCNN + MLP 95.41% 92.80% 94.10% 0.072 12.5 520,477
DCNN + Bi-LSTM 99.72% 99.60% 99.66% 0.004 14.2 27,657
DCAE + Bi-LSTM 99.72% 99.60% 99.66% 0.004 4.25 27,657
DCAE + Bi-LSTM + CS 99.72% 99.60% 99.66% 0.004 2.2 18,345
TABLE IV
COMPARISON WITH OTHER SEIZURE PREDICTION METHODS APPLIED TO CHB-MIT DATASET
Method Data Prediction
Ref. Sensitivity Specificity Accuracy False Alarm 𝒉−𝟏
Feature Extraction Classification Selection Time
[33] ZC Interval GMM LPOCV 83.81% N/A N/A 0.165 19.8 min
[10] ZC in WT SVM LPOCV 96% 90% 94% N/A N/A
[11] Time, Freq., Graph SVM 10-fold CV 85.75% 85.75% 85.75% N/A N/A
[13] WT CNN 10-fold CV 87.8% N/A N/A 0.147 5.8 min
[34] Spectral Power SVM LOOCV 98.68% N/A N/A 0.046 42.7 min
Ours DCAE + Bi-LSTM LOOCV 99.72% 99.60% 99.66% 0.004 1 hr
accuracy of 99.6%, a sensitivity of 99.72%, a specificity of [6] A. Aarabi, R. Fazel-Rezai, and Y. Aghakhani, “EEG seizure prediction:
Measures and challenges,” Annu. Int. Conf. IEEE Eng. Med. Biol. Soc.,
99.60%, a false alarm rate of 0.004 per hour and prediction pp. 1864–1867, Sep. 2009.
time of one hour prior the seizure onset. An important spatial [7] M. Bandarabadi, C. A. Teixeira, J. Rasekhi, and A. Dourado, “Epileptic
and temporal feature from raw data are learned by the DCNN seizure prediction using relative spectral power features,” Clin.
Neurophysiol., vol. 126, no. 2, pp. 237–248, Feb. 2015.
and Bi-LSTM networks respectively. DCAE based Semi-
[8] L. D. Iasemidis, J. C. Sackellares, H. P. Zaveri, and W. J. Williams,
supervised learning approach is investigated with the transfer “Phase space topography and the Lyapunov exponent of
learning technique which led to reducing the training time. For electrocorticograms in partial seizures,” Brain Topogr., vol. 2, no. 3, pp.
the system to be suitable for real-time application, a channel 187–201, 1990.
[9] M. Le Van Quyen, J. Martinerie, M. Baulac, and F. Varela,
selection algorithm is proposed which reduces the “Anticipating epileptic seizures in real time by a non-linear analysis of
computational load and the training time. Using Leave-One- similarity between EEG recordings,” Neuroreport, vol. 10, no. 10, pp.
Out exhaustive cross-validation technique to test the proposed 2149–2155, Jul. 1999.
[10] S. Elgohary, S. Eldawlatly, and M. I. Khalil, “Epileptic seizure
models proves the robustness and generality of our method prediction using zero-crossings analysis of EEG wavelet detail
against variation across various seizure types. coefficients,” IEEE Conference on Computational Intelligence in
Our experimental results and the comparison with previous Bioinformatics and Computational Biology (CIBCB), pp. 1–6, 2016.
[11] K. M. Tsiouris, V. C. Pezoulas, D. D. Koutsouris, M. Zervakis, and D. I.
work demonstrate that the proposed method is efficient, Fotiadis, “Discrimination of Preictal and Interictal Brain States from
reliable and suitable for real-time application of seizure Long-Term EEG Data,” IEEE 30th International Symposium on
prediction. This is by achieving accuracy higher than the state Computer-Based Medical Systems (CBMS), pp. 318–323, 2017.
[12] C. Alexandre Teixeira et al., “Epileptic seizure predictors based on
of the art with earlier prediction time to mitigate the potential computational intelligence techniques: A comparative study with 278
life-threatening incidents for epileptic patients. patients,” Comput. Methods Programs Biomed., vol. 114, no. 3, pp. 324–
336, May 2014.
[13] H. Khan, L. Marcuse, M. Fields, K. Swann, and B. Yener, “Focal onset
REFERENCES
seizure prediction using convolutional networks,” IEEE Trans. Biomed.
[1] R. S. Fisher et al., “ILAE Official Report: A practical clinical definition Eng., vol. 65, no. 9, pp. 2109–2118, Sep. 2018.
of epilepsy,” Epilepsia, vol. 55, no. 4, pp. 475–482, Apr. 2014. [14] H. G. Daoud, A. M. Abdelhameed, and M. Bayoumi, “Automatic
[2] World Health Organization, Neurological Disorders: Public Health epileptic seizure detection based on empirical mode decomposition and
Challenges. World Health Organization, 2006. deep neural network,” IEEE 14th International Colloquium on Signal
[3] Cheng-Yi Chiang, Nai-Fu Chang, Tung-Chien Chen, Hong-Hui Chen, Processing Its Applications (CSPA), pp. 182–186, 2018.
and Liang-Gee Chen, “Seizure prediction based on classification of EEG [15] “American Epilepsy Society Seizure Prediction Challenge,” Accessed
synchronization patterns with on-line retraining and post-processing on: Jun.17, 2018, [Online] Available: https://www.kaggle.com/c/seizure-
scheme,” 2011 Annual International Conference of the IEEE prediction..
Engineering in Medicine and Biology Society, Boston, MA, pp. 7564– [16] N. V. Chawla, N. Japkowicz, and A. Kotcz, “Editorial: special issue on
7569, 2011. learning from imbalanced data sets,” SIGKDD Explor., vol. 6, pp. 1–6,
[4] “Epilepsy prevalence, incidence and other statistics,” Joint Epilepsy Jun. 2004.
Council, Leeds, UK, 2005. [17] H. Daoud and M. Bayoumi, “Deep Learning based Reliable Early
[5] E. Bou Assi, D. K. Nguyen, S. Rihana, and M. Sawan, “Towards Epileptic Seizure Predictor,” IEEE Biomedical Circuits and Systems
accurate prediction of epileptic seizures: A review,” Biomed. Signal Conference (BioCAS), Cleveland, OH, pp. 1–4, 2018.
Process. and Control, vol. 34, pp. 144–157, Apr. 2017.
[18] A. H. Shoeb, “Application of machine learning to epileptic seizure onset Magdy A. Bayoumi received the B.Sc.
detection and treatment,” Thesis, Massachusetts Institute of Technology,
and M.Sc. degrees in electrical
2009.
[19] A. L. Goldberger et al., “PhysioBank, PhysioToolkit, and PhysioNet: engineering from Cairo University, Egypt,
Components of a New Research Resource for Complex Physiologic in 1973 and 1977, respectively, the M.Sc.
Signals,” Circulation, vol. 101, no. 23, pp. e215–e220, Jun. 2000. degree in computer engineering from
[20] “CHB-MIT Scalp EEG Database,” Accessed on: May 2, 2019 [Online]
Washington University, St. Louis, MO, in
Available: https://physionet.org/pn6/chbmit/.
[21] N. Siddique and H. Adeli, Computational Intelligence: Synergies of 1981, and the Ph.D. degree in electrical
Fuzzy Logic, Neural Networks and Evolutionary Computing. John Wiley engineering from the University of
& Sons, 2013. Windsor, Canada in 1984.
[22] R. H. R. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and
He is the Department Head of W. H. Hall Department of
H. S. Seung, “Digital selection and analogue amplification coexist in a
cortex-inspired silicon circuit,” Nature, vol. 405, no. 6789, pp. 947–951, Electrical & Computer Engineering. He is the Hall Endowed
Jun. 2000. Chair in Computer Engineering. He was the Director of the
[23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification Center for Advanced Computer Studies (CACS) and the
with Deep Convolutional Neural Networks,” Advances in Neural
Department Head of Computer Science Department. He was
Information Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou,
and K. Q. Weinberger, Eds. Curran Associates, Inc., pp. 1097–1105, also, the Loflin Eminent Scholar Endowed Chair in Computer
2012. Science, all at the University of Louisiana at Lafayette where
[24] Y. Bengio, Y. Lecun, and Y. Lecun, Convolutional Networks for he has been a faculty member since 1985.
Images, Speech, and Time-Series, 1995.
Dr. Bayoumi has graduated about 100 Ph.D. and 150 M.Sc.
[25] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
Massachusetts: The MIT Press, 2016. students, authored/co-authored about 600 research papers and
[26] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep more than 10 books. He was the guest/co-guest editor of more
Network Training by Reducing Internal Covariate Shift,” than 10 special journal issues, the latest was on Machine to
ArXiv:1502.03167, Feb. 2015.
Machine Interface. Dr. Bayoumi is an IEEE Fellow. He has
[27] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. served in many capacities in the IEEE Computer, Signal
[28] A. Graves and J. Schmidhuber, “Framewise phoneme classification with Processing, and Circuits & Systems (CAS) societies.
bidirectional LSTM networks,” Proc. IEEE International Joint Currently, he is the vice president of Technical Activities of
Conference on Neural Networks, vol. 4, pp. 2047–2052, 2005.
IEEE RFID council and he is on the IEEE RFID Distinguished
[29] P. Baldi, “Autoencoders, Unsupervised Learning, and Deep
Architectures,” Proc. of ICML Workshop on Unsupervised and Transfer Lecture Program (DLP). He is a member of IEEE IoT Activity
Learning, pp. 37–49, 2012. Board. Dr. Bayoumi has received many awards, among them;
[30] D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” the IEEE CAS Education award and the IEEE CAS
ArXiv:1312.6114, Dec. 2013.
Distinguished Service award. He was on the IEEE DLP
[31] L. van der Maaten and G. Hinton, “Visualizing Data using t-SNE,” J.
Mach. Learn. Res., vol. 9, pp. 2579–2605, 2008. programs for CAS and Computer societies. He was on the
[32] W. H. Kruskal and W. A. Wallis, “Use of Ranks in One-Criterion IEEE Fellow Selection Committee. He has been an ABET
Variance Analysis,” J. Am. Stat. Assoc., Dec. 1952. evaluator and he was an ABET commissioner and team chair.
[33] A. Shahidi Zandi, R. Tafreshi, M. Javidan, and G. A. Dumont,
He has given numerous keynote/invited lectures and talks
“Predicting Epileptic Seizures in Scalp EEG Based on a Variational
Bayesian Gaussian Mixture Model of Zero-Crossing Intervals,” IEEE nationally and internationally. Dr. Bayoumi was the general
Trans. Biomed. Eng., vol. 60, no. 5, pp. 1401–1413, May 2013. chair of IEEE ICASSP 2017 in New Orleans. He, also, chaired
[34] Z. Zhang and K. K. Parhi, “Low-Complexity Seizure Prediction From many conferences including ISCAS 2007, ICIP 2009, and
iEEG/sEEG Using Spectral Power and Ratios of Spectral Power,” IEEE
ICECS 2015. Dr. Bayoumi was the chair of an international
Trans. Biomed. Circuits Syst., vol. 10, no. 3, pp. 693–706, Jun. 2016.
delegation to China, sponsored by People-to-People
Ambassador, 2000. He received the French Government
Fellowship, University of Paris Orsay, 2003-2005 and 2009.
He was a Visiting Professor at King Saud University. He was
Hisham Daoud received the B.Sc. and a United Nation visiting scholar. He has been an advisor to
M.Sc. degrees from Cairo University, many EE/CMPS departments in several countries. Dr.
Egypt, and the Ph.D. degree from Ain Bayoumi was on the State of Louisiana Comprehensive
Shams University, Egypt, in 2004, 2007, Energy Policy Committee. He was the vice president of
and 2014, respectively, all in electronics Acadiana Technology Council. He was on the Chamber of
and communications engineering. Commerce Tourism and Education committees. He was a
Since 2004, he has held multiple member of several delegations representing Lafayette to
positions in both industry and academia. international cities. He was on the Le Centre International
He is currently with the University of Board. He was the general chair of SEASME (an organization
Louisiana at Lafayette, LA, USA. His research interest of French Speaking cities) conference in Lafayette. He is a
includes biomedical signal processing, machine learning, deep member of Lafayette Leadership Institute, he was a founding
learning, and neuromorphic computing. member of its executive committee. He was a column editor
Dr. Daoud’s awards include IEEE CASS Student Travel for Lafayette Newspaper; the Daily Advertiser.
Award, Best Paper Award in the 14th IEEE Colloquium on
Signal Processing and its Applications conference
(CSPA’2018). He has served as a reviewer for several IEEE
conferences and journals.

Efficient Epileptic Seizure Prediction Based On Deep Learning

Uploaded by

Copyright:

Available Formats

Efficient Epileptic Seizure Prediction Based On Deep Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Efficient Epileptic Seizure Prediction Based On Deep Learning

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

Efficient Epileptic Seizure Prediction based on

should be considered, therefore we introduce a channel

unsupervised manner. Then, the training process is launched A. Dataset

the pooling layer to reduce the features’ dimension and

average across N trials and the overall performance is the 80

some measures such as sensitivity, specificity, and accuracy DCNN+MLP Patient ID

You might also like