Deep Learning Algorithm For Arrhythmia Detection: Hilmy Assodiky, Iwan Syarif, Tessy Badriyah

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

,QWHUQDWLRQDO(OHFWURQLFV6\PSRVLXPRQ.QRZOHGJH&UHDWLRQDQG,QWHOOLJHQW&RPSXWLQJ ,(6.

&,&

Deep Learning Algorithm for Arrhythmia Detection

Hilmy Assodiky, Iwan Syarif, Tessy Badriyah


Electronics Engineering Polytechnic Institute of Surabaya
Surabaya, Indonesia
[email protected], [email protected], [email protected]

Abstract—Most of cardiovascular disorders or diseases can be when the doctor do not know exactly what type of Arrhythmia
prevented, but death continues to rise due to improper treatment that the patient suffers.
because of misdiagnose. One of cardiovascular diseases is
Arrhythmia. It is sometimes difficult to observe B. Electrocardiogram
electrocardiogram (ECG) recording for Arrhythmia detection.
Therefore, it needs a good learning method to be applied in the Physical symptoms can be used to detect arrhythmias, but
computer as a way to help the detection of Arrhythmia. There is the use of electrocardiogram (ECG) is needed as a standard
a powerful approach in Machine Learning, named Deep tool to recognize arrhythmias. Electrocardiogram is a test that
Learning. It starts to be widely used for Speech Recognition, checks the electrical activity of the heartbeat. The record is
Bioinformatics, Computer Vision, and many others. showed as a test paper with waveform signal that represents the
rhythm of heart’s electrical activity.
This research used the Deep Learning to classify the
Arrhythmia data. We compared the result to other popular The most accurate tool to record heartbeat rhythm is 12-
machine learning algorithm, such as Naïve Bayes, K-Nearest leads ECG. The leads are the channels of recording, which are
Neighbor, Artificial Neural Network, and Support Vector lead I, lead II, lead III, aVR, aVL, aVF, V1, V2, V3, V4, V5,
Machine. Our experiment showed that Deep Learning algorithm and V6. Every channel produces the record from different
achieved the best accuracy, which was 76,51%. angles. The doctor has to find the record from the lead which
produces the main signal by reading the other signals with
Keywords—Arrhythmia; Deep Learning; ECG Dataset; particular technique. Usually, every patient has different angle
or lead where the main signal is recorded because the order of
I. INTRODUCTION the signals is not the same. It is like recognizing the waveform
from many angles. The 12-leads ECG result is showed in
A. Arrhythmia figure 1.
Based on the World Health Organization, cardiovascular Sometimes, it is difficult to observe electrocardiogram
disease is the leading cause of death and disability. recording to analyze the arrhythmia. Therefore, it needs a good
Approximately 17.5 million people died from cardiovascular classification method to be applied in the computer as a way to
disease. It is equal to 31% of the total number of deaths in the help the detection of arrhythmias. [4]
world in 2012. [1] Although the majority of cardiovascular
diseases or disorders can be prevented, deaths continue to rise There is a powerful approach in Machine Learning, named
due to improper treatment because of misdiagnose. Deep Learning. This method is the improvement of Neural
Network algorithm. Deep Learning is a representation-learning
One of cardiovascular disease is arrhythmia. Arrhythmia is method which consists of a set of algorithms that are modeled
an abnormality in the rhythm of the heartbeat. [2] It causes the with many levels of layer. The levels are formed by non-linear
heart does not effectively pump blood throughout the body. transformations from each level of representation, ranging from
People with arrhythmia usually experience symptoms of faster raw input to a higher level of representation. [3] Deep Learning
or slower heart palpitations. Other symptoms include distributes the neurons to hidden layers. So, it reduces the
weakness, dizziness, fainting, and usually pain in the chest. But number of neurons in every layer, but increases the number of
many people with arrhythmias do not feel any symptoms. layers. [5] Deep Learning has been developed in general since
2006. And it is widely used for Computer Vision, Speech
The types of arrhythmias include tachycardia (rapid),
Recognition, Bioinformatics, and others. [3]
consisting of supraventricular tachycardia, atrial tachycardia
(fibrillation and flutter), ventricular tachycardia, and Electrocardiogram produces complex data, so using the
ventricular fibrillation. And bradycardia (slow) consists of AV method that can handle complex data is very important. In
heart blocks, bundle branch blocks, and tachybrady syndrome. Neural Network based method, the more complex the data, the
There are types that have similar symptoms and each requires more number of hidden layer it needs. So, Deep Learning
different treatments. That is why Arrhythmias are so dangerous algorithm is suitable with electrocardiogram data for
Arrhythmia classification.

‹,((( 
Fig. 1. The 12-leads Electrocardiogram

was done by using arrhythmia dataset. The maximum result


II. RELATED WORKS was SVM combined with Gradient Boosting with 84.82%
Albert Haque used Multilayer Neural Network method for accuracy. [8]
multiclass and binary (normal or abnormal) classification. The
method consisted of back-propagation and stochastic gradient III. THE PROPOSED SYSTEM
descent. The more neuron the researcher used, the less the
accuracy. So it needed to adapt the number of neuron to the A. Dataset
data. For the final decision, the researcher used two hidden The dataset were extracted from ECG data. The attributes
layers with a hundred nodes per layer. For decreasing the represent the waveform signals from 12 leads, which are heart
learning time, GPU-Accelerator was used. The result was rate, P-R interval, P-P interval, R-R interval, Q-T interval, P
Neural Network could work 6 times faster. The accuracies wave (represented by duration and amplitude of P wave), shape
were 91.9% on binary class and 75.7% on multiclass. [7] of QRS complex (represented by duration and amplitude of
Meng Huanhuan conducted deep learning experiments with QRS complex), F wave (represented by duration and amplitude
Deep Belief Network (DBN) for Arrhythmia ECG data of F wave), T wave (represented by duration and amplitude of
classification with six classes including normal. DBN was used T wave), existance of raged wave preceding P wave, existance
to extract the features, and the classification was done by using of raged wave precedes QRS complex, existance of stand-alone
Support Vector Machine (SVM) and Artificial Neural Network raged wave, vector angles, number of deflection, and so on.
(ANN). The highest accuracy is obtained by Gaussian kernel Every channel has its own values, so every row of data has
SVM amounted to 98.49%. [6] representation of 12 waveform signals. And the additional data
such as age, sex, height, and weight are also been added to the
Ignatov Andrey Dmitrievich used deep learning to detect dataset as attributes.
various diseases in human’s Heart Rate Variability (HRV) data
recorded by ECG. First approach performed with The dataset for this research was taken from the UCI
Convolutional Neural Networks to the labeled data. The second Machine Learning Repository
approach was done by using Stacked Auto-encoders and (http://archive.ics.uci.edu/ml/datasets/Arrhythmia). This
Restricted Boltzmann Machine to create a new classification on dataset contains a data file and an information file. There are
training. The first approach showed 88.45% accuracy and the 452 rows and 279 attributes. Each record is a result of different
second approach showed 90.64%. [5] patient's medical record. There are 206 linear attributes and 73
nominal attributes. There are 16 classes in the dataset. Class 1
Vasu Gupta did a comparison on some popular represents the results of an electrocardiogram without
classification methods, such as Naive Bayes, SVM, Neural arrhythmia. Classes 2 to 15 represent the results of
Networks, Random Forests, and the combination of SVM and electrocardiogram with different Arrhythmias. A total of 245
Random Forests with arrhythmia dataset. The result of highest data are included in the first class, and then 185 data in the 14
accuracy is owned by SVM + RF amounted to 77.4%. [4] kinds of Arrhythmia, and other 22 data are unclassified. Labels
Anish Batra did algorithm comparison on Neural Network, on these dataset were obtained from the cardiologist and it is
Decision Tree, Random Forest, Gradient Boosting, SVM, and considered as the best model.
SVM combined with other algorithms. The implementation


Fig. 2. The Proposed System Diagram

quality dataset. Preprocessing consisted of data cleaning, data


TABLE I.  NUMBER OF SAMPLES IN DATASET reduction, and data transformation.
Type of Number of There were several attributes removed because it contained
Arrhythmia Samples
the same value in each patient. Invariant attributes could be
1 245 checked with variance or standard deviation value. Too many
2 44 missing values could disturb the learning, so it was also
removed. As a result, the dataset has 261 feature. The 14th
3 15 attribute (Vector angles in degrees on front plane of: J) was
4 15 also removed because it contained too many missing values.
The rest missing values were replaced by average values. And
5 13
then the data were normalized by using z-transformation.
6 25
7 3
C. Feature Selection
Next step was feature selection that included 2 approaches.
8 2
The preprocessed data had so many features while the
9 9 classification method that we used had time consuming
10 50
learning process. And the limited number of data was not
balance if it was compared to the number of the features. So,
14 4 feature selection was needed to reduce the time cost and help
15 5 us to avoid overfitting and get the important features that were
maximum correlated with the output class. The first approach
16 22 was Genetic Algorithm and the second was Particle Swarm
Optimization.
B. Preprocessing The first approach was the Genetics Algorithm (GA)
method. Genetic Algorithm was inspired by the principles of
The dataset were not ready to get processed in classification
genetics and natural selection. In this algorithm, problem
process, because there were noises, missing values,
solving was represented in individuals in a population. Each
inconsistent data, and other problems that reduced the dataset
individual had chromosomes that represented a set of selected
quality. We had to do the preprocessing to produce good
features. Each feature was represented by genes. The Genetic


Algorithm started from initialization of population. The
number of genes was the number of attributes of the dataset,
and the number of population was user-given number. The
value of gen (allele) was a Boolean type value. We used large (1)
number of population size (it was about 20) to avoid local (2)
optimum of fitness.
After the population had been created randomly, some
number of individuals were selected based on its fitness by represented the particle.
using roulette algorithm. The fitness was measured by using represented the position of
Correlation Feature Selection (CFS method), so that we got the partikel. i was the index of particle, t was the t-th iteration, and
feature that was correlated to the class but uncorrelated to each N was the size of area. represented the
other. local best of i-th particle. = represented the
global best. and were learning factor. and were
The selected individuals were processed to the next steps,
which were cross-over and mutation. Cross-over is a process of random number between 0 to 1.
crossing the gene value between two individuals. Simply put,
cross-over swaps the gene values between two individuals. We D. Neural Network
used high probability value (it was about 0.8) to get optimum Artificial Neural Network is the basic method of Deep
exploitation of feature combination. And mutation is the Learning. It adopts neural network system in human brain. The
process of changing the value of a gene. Simply put, changes power of this method is in connection between the neurons.
can be made by arithmetic operations on the value of selected Each neuron has dendrite (input), soma (main process), axon
genes. The number of feature of data we used was huge, so we (activation function), and synapse (output). Every neuron takes
used not too small value (it was about 0.2) to prevent too high input and produces output. The input values are the extracted
exploration term of feature combination, yet the exploration of features from patient’s medical record. Each input has weight
high number of feature was still fast enough. All parts of (coefficient) to be computed with its input value. The formula
selected genes in cross-over and mutation were selected is:
randomly.
Cross-over and mutation produce new individuals in a new
(3)
population. Then the fitness was measured in fitness
evaluation, and so on until the maximum iteration reached. The
cycle of Genetic Algorithms is showed below:
where:
ai = inputs
wi = weights
b = bias
After getting the result from soma, the result is transformed
with activation function. There are some common function
used in Neural Network such as Sigmoid, TanH, and Linear
Rectifier. And the result after activation function is called as
the output of a neuron. Sometimes the output of neuron is used
Fig. 3. The Cycle of Genetics Algorithm as the input of other neuron. The output values in output layer
represent the scores of every class.
The second approach was done by using Particle Swarm
Optimization (PSO) algorithm. The PSO algorithm was
inspired by the habits of birds or fish that were on the way to
the target. In the PSO algorithm, solution searching was
represented in a population consisting of several particles.
Populations were generated randomly. Each particle represents
a solution of feature combination. Each particle adjusted the
best position of the particle (local best) and adjusted the best
position of the whole population (global best). The best
positions search was done with certain iteration. There were
two factors that affected the status of particles, which were
position and speed. The formula of updating position and
velocity of the particle is showed below: Fig. 4. Ilustration of a Neuron in Neural Network

Group of neurons is called layer or perceptron. There are


three kinds of layer. They are input layer, hidden layer, and


output layer. Neural Network with complete layers is called function. The range -1 to 1 allows the training process to
Multilayer Perceptron (MLP). converge faster.
The goal of Neural Network is finding the learning model The process of minimizing loss is using Stochastic
with the best weight values. Weight values are obtained by Gradient Descent (SGD). The gradient is computed by using
using back-propagation method that updates the weight when back-propagation. The Hogwild! parallelization scheme [11] is
the result is not match. The formula is: used to handle time cost issue when SGD is parallelized.
Hogwild! scheme follows a shared memory model where
multiple cores make independent process to the gradient
(4) updates asynchronously. Each node operates in parallel on its
local data until weights and biases are obtained by averaging.
[9] This is the parallel distributed and multi-threaded training
where: algorithm with SGD in H2O Deep Learning:
wi = weight value at i-th input
error = actual output – predicted output
ai = i-th input value 1. Initialize global model parameters (weight, bias)
μ = learning rate (0 to 1) 2. Distribute training data T across nodes (can be disjoint
or replicated)
E. Deep Learning 3. Iterate until convergence criterion reached:
Deep Learning is representation-learning method that 3.1. For nodes n with training subset do in parallel:
consists of a set of algorithm which is modeled with many a. Obtain copy of the global model parameters
representation levels. It is created from non-linear b. Select active subset
transformation of every representation level, starts from raw (user-given number of samples per iteration)
input to higher representation level. [3] “Deep Learning” c. Partition into by cores
means Neural Network that has many hidden layers. d. For cores on node n, do in parallel:
i. Get training example i
ii. Update all weights , biases

3.2. Set
3.3. Score the model on (potentially sampled)
train/validation score sets
Fig. 5. Neural Network with Many Hidden Layers

There are many kinds of Deep Learning; they are Deep Regularization technique is also used to prevent overfitting,
Neural Network, Deep Belief Network, Convolutional Neural L1 (Lasso) and L2 (Ridge) regularization implement the same
Network, and Recurrent Neural Network. The use of existing penalties as they do with other methods. [9] And this is the
types of deep learning is divided into supervised and formula of modified loss function to minimize loss:
unsupervised learning. It depends on the type of problem that is
wanted to solve. The key of Deep Learning is how deep the
abstractions (hidden layer) are needed and how many nodes (5)
(units of each layer) are needed to produce a good learning or
model.
, for L1 regularization, represents the sum of all
Deep Learning that we used was H2O Deep Learning. [9]
H2O Deep Learning follows the multilayer feed-forward neural L1 norms for the weights and biases; , for L2
network model as described in this paper. For initialization, regularization, represents the sum of squares of all weights and
unlike other various Deep Learning architectures that use a biases. The constants ⋋ are specified as very small, for
combination of unsupervised pre-training followed by example . [9]
supervised training, it uses a purely supervised training Increasing the number of layers means increasing learning
protocol. The initialization scheme is the uniform adaptive process time. Deep Learning is being improved to get faster
option which is an optimized initialization based on the size of learning. High learning rate can boost learning process, but it
the network. For non-linear activation function, we used TanH can be trapped to local minimum loss. The solutions are
function. This function is a rescaled and shifted logistic adaptive learning rate and momentum.


Adaptive learning rate means updating learning rate at each The second validation was done by using Leave-One-Out
epoch. If the loss is lower than the previous iteration, learning Cross Validation, which was using every record as testing data
rate is increased by a slight proportion, for example 1%-5%. in sequence.
But if the loss is higher than the previous iteration, back to the
previous weights and learning rate is sharply decreased, for In this research, the result of Deep Learning was compared
example 30%-50%. It is called Bold Driver method. to five other popular methods, which were Naive Bayes, K-
Nearest Neighbor (KNN), Artificial Neural Network (ANN),
The other method is using annealing method. It is keeping dan Support Vector Machine (SVM). We used the
the learning rate around its fixed value. It can smooth the loss collaboration of Weka dan RapidMiner tools for our
gradient descent. [10] experiment. Weka was used for selecting the features with
Genetics Algorithm and Particle Swarm Optimization.
RapidMiner was used for getting the performance.
(6) The used operators were Naive Bayes, KNN with K=3,
Neural Nets with 50 iteration and the number of nodes in
hidden layer as many as the number of output nodes, and
The other way, learning time can be reduced by using LibSVM with polynomial kernel. The rest configurations were
momentum; its value is between 0 to 1. Momentum can keep using standard parameter of Weka and RapidMiner tools.
the weights stay in global minimum loss. It contributes in back-
propagation process when updating the weight. The formula of
weight update becomes: TABLE II.  STRATIFIED 10-FOLDS CROSS VALIDATION RESULT

Feature Accuracy (%)


Selection Naive Bayes k-NN ANN SVM DL
(7)
Original 17,44 60 68,84 62,56 70,93
(8)
GA Search 53,02 63,26 70 67,21 74,44

PSO Search 63,02 64,42 71,86 70,7 76,51


where:
wi = weight value at i-th input TABLE II shows the accuracies of Naive Bayes, K-NN,
μ = learning rate (0 to 1) ANN, Deep Learning, and SVM with stratified 10-folds
m = momentum value (0 to 1) dataset. The accuracy comparation shows that Deep Learning
with PSO Search feature selection is better than the others, that
In this research, we used the ADADELTA algorithm [10] is 76,51%.
that combined the benefits of annealing and momentum
training to avoid slow convergence. There were two hyper TABLE III.  LEAVE-ONE-OUT CROSS VALIDATION RESULT
parameters. The first one was (rho) which was similar to
momentum and was related to reuse the prior weight and bias Feature Accuracy (%)
updates. Its value was about 0.9. And the second one was Selection Naive Bayes k-NN ANN SVM DL
(epsilon) which was similar to annealing value. Its value was Original 17,67 60 69,53 63,02 67,21
about less than .
GA Search 57,21 60,7 71,4 67,21 71,16

IV. EXPERIMENT AND RESULT PSO Search 63,02 63,26 71,63 70,93 75,81

A. Feature Selection
TABLE III shows the accuracies of Naive Bayes, K-NN,
There were two approaches to select the features in this
ANN, Deep Learning, and SVM with Leave-One-Out Cross
research as discussed before. The first approach was Genetics
Validation method. The accuracy comparation shows that Deep
Algorithm (GA Search) method and the second one was
Learning with PSO Search feature selection is better than the
Particle Swarm Optimization (PSO Search) method. The
others, that is 75,81%.
technical discussions were discussed in the proposed method
session of this paper. GA Search resulted 31 features and PSO
Search resulted 23 features. CONCLUSION AND FURTHER WORK

B. Classification In this research, the dataset that we used has so many


We did the validation by using two approaches. The first features. Particle Swarm Optimization performance in reducing
approach was taking 10% of dataset as testing data with 10- attributes was better than Genetics Algorithm. Particle Swarm
folds cross validation method. This method divided the data Optimization showed feature reduction from the number of 261
into two sets, which were training and testing sets. The to 23 attributes while Genetics Algorithm reduced the number
distribution of class were stratified done, so that the training of 261 to 31 attributes.
and testing class records were balance as its percentage split.


In classification performance term, Deep Learning [3] Yann Lecun, "Deep Learning," NATURE, vol. 521, Mei 2015.
algorithm showed the best result, which was 76,51% using [4] Vasu Gupta, "Prediction and Classification of Cardiac Arrhythmia,"
PSO algorithm as feature selection method and 74,44% using 2014.
GA algorithm as feature selection method. It was better than [5] Ignatov Andrey Dmitrievich, "Deep Learning in information analysis of
electrocardiogram signals for disease diagnostics," The Ministry of
other popular methods, which were Naive Bayes, K-Nearest Education and Science of The Russian Federation Moscow Institute of
Neighbor (KNN), Artificial Neural Network (ANN), dan Physics and Technology, 2015.
Support Vector Machine (SVM). [6] Meng Huanhuan and Zhang Yue, "Classification of Electrocardiogram
Signals with Deep Belief Networks," IEEE 17th International
The result that we got was still far from good accuracy for Conference on Computational Science and Engineering, 2014.
medical term. The data we used had many samples for normal
[7] Albert Haque, "Cardiac Dysrhythmia Detection with GPU-Accelerated
classes, whereas other classes had only a few samples. That Neural Networks," Computer Science Department, Stanford University,
was good enough to learn the normal label, but not to learn the Desember 2014.
Arrhythmia types. In further work, we will improve the dataset [8] Anish Batra, "Classification of Arrhythmia Using Conjunction of
quality to improve the performance of learning. Machine Learning Algorithms and ECG Diagnostic Criteria,"
International Jounal of Biology and Biomedicine, vol. 1, 2016.
[9] Arno Candel and Erin LeDell. (2017, August) Deep Learning with H2O.
REFERENCES http://h2o:ai/resources/.
[10] Matthew D. Zeiler. (2012, December) [Online].
[1] World Health Organization. (2016, September) World Health http://arxiv.org/pdf/1212.5701v1.pdf
Organization. [Online]. [11] Feng Niu, Banjamin Recht, Christopher Re, and Stephen J. Wright.
http://www.who.int/mediacentre/factsheets/fs317/en/ (2011) [Online]. http://i.stanford.edu/hazy/papers/hogwild-nips.pdf
[2] John A Kastor, "Cardiac Arrhythmias," ENCYCLOPEDIA OF LIFE
SCIENCES, 2002.



You might also like