Deep Learning Algorithm For Arrhythmia Detection: Hilmy Assodiky, Iwan Syarif, Tessy Badriyah
Deep Learning Algorithm For Arrhythmia Detection: Hilmy Assodiky, Iwan Syarif, Tessy Badriyah
Deep Learning Algorithm For Arrhythmia Detection: Hilmy Assodiky, Iwan Syarif, Tessy Badriyah
&,&
Abstract—Most of cardiovascular disorders or diseases can be when the doctor do not know exactly what type of Arrhythmia
prevented, but death continues to rise due to improper treatment that the patient suffers.
because of misdiagnose. One of cardiovascular diseases is
Arrhythmia. It is sometimes difficult to observe B. Electrocardiogram
electrocardiogram (ECG) recording for Arrhythmia detection.
Therefore, it needs a good learning method to be applied in the Physical symptoms can be used to detect arrhythmias, but
computer as a way to help the detection of Arrhythmia. There is the use of electrocardiogram (ECG) is needed as a standard
a powerful approach in Machine Learning, named Deep tool to recognize arrhythmias. Electrocardiogram is a test that
Learning. It starts to be widely used for Speech Recognition, checks the electrical activity of the heartbeat. The record is
Bioinformatics, Computer Vision, and many others. showed as a test paper with waveform signal that represents the
rhythm of heart’s electrical activity.
This research used the Deep Learning to classify the
Arrhythmia data. We compared the result to other popular The most accurate tool to record heartbeat rhythm is 12-
machine learning algorithm, such as Naïve Bayes, K-Nearest leads ECG. The leads are the channels of recording, which are
Neighbor, Artificial Neural Network, and Support Vector lead I, lead II, lead III, aVR, aVL, aVF, V1, V2, V3, V4, V5,
Machine. Our experiment showed that Deep Learning algorithm and V6. Every channel produces the record from different
achieved the best accuracy, which was 76,51%. angles. The doctor has to find the record from the lead which
produces the main signal by reading the other signals with
Keywords—Arrhythmia; Deep Learning; ECG Dataset; particular technique. Usually, every patient has different angle
or lead where the main signal is recorded because the order of
I. INTRODUCTION the signals is not the same. It is like recognizing the waveform
from many angles. The 12-leads ECG result is showed in
A. Arrhythmia figure 1.
Based on the World Health Organization, cardiovascular Sometimes, it is difficult to observe electrocardiogram
disease is the leading cause of death and disability. recording to analyze the arrhythmia. Therefore, it needs a good
Approximately 17.5 million people died from cardiovascular classification method to be applied in the computer as a way to
disease. It is equal to 31% of the total number of deaths in the help the detection of arrhythmias. [4]
world in 2012. [1] Although the majority of cardiovascular
diseases or disorders can be prevented, deaths continue to rise There is a powerful approach in Machine Learning, named
due to improper treatment because of misdiagnose. Deep Learning. This method is the improvement of Neural
Network algorithm. Deep Learning is a representation-learning
One of cardiovascular disease is arrhythmia. Arrhythmia is method which consists of a set of algorithms that are modeled
an abnormality in the rhythm of the heartbeat. [2] It causes the with many levels of layer. The levels are formed by non-linear
heart does not effectively pump blood throughout the body. transformations from each level of representation, ranging from
People with arrhythmia usually experience symptoms of faster raw input to a higher level of representation. [3] Deep Learning
or slower heart palpitations. Other symptoms include distributes the neurons to hidden layers. So, it reduces the
weakness, dizziness, fainting, and usually pain in the chest. But number of neurons in every layer, but increases the number of
many people with arrhythmias do not feel any symptoms. layers. [5] Deep Learning has been developed in general since
2006. And it is widely used for Computer Vision, Speech
The types of arrhythmias include tachycardia (rapid),
Recognition, Bioinformatics, and others. [3]
consisting of supraventricular tachycardia, atrial tachycardia
(fibrillation and flutter), ventricular tachycardia, and Electrocardiogram produces complex data, so using the
ventricular fibrillation. And bradycardia (slow) consists of AV method that can handle complex data is very important. In
heart blocks, bundle branch blocks, and tachybrady syndrome. Neural Network based method, the more complex the data, the
There are types that have similar symptoms and each requires more number of hidden layer it needs. So, Deep Learning
different treatments. That is why Arrhythmias are so dangerous algorithm is suitable with electrocardiogram data for
Arrhythmia classification.
,(((
Fig. 1. The 12-leads Electrocardiogram
Fig. 2. The Proposed System Diagram
Algorithm started from initialization of population. The
number of genes was the number of attributes of the dataset,
and the number of population was user-given number. The
value of gen (allele) was a Boolean type value. We used large (1)
number of population size (it was about 20) to avoid local (2)
optimum of fitness.
After the population had been created randomly, some
number of individuals were selected based on its fitness by represented the particle.
using roulette algorithm. The fitness was measured by using represented the position of
Correlation Feature Selection (CFS method), so that we got the partikel. i was the index of particle, t was the t-th iteration, and
feature that was correlated to the class but uncorrelated to each N was the size of area. represented the
other. local best of i-th particle. = represented the
global best. and were learning factor. and were
The selected individuals were processed to the next steps,
which were cross-over and mutation. Cross-over is a process of random number between 0 to 1.
crossing the gene value between two individuals. Simply put,
cross-over swaps the gene values between two individuals. We D. Neural Network
used high probability value (it was about 0.8) to get optimum Artificial Neural Network is the basic method of Deep
exploitation of feature combination. And mutation is the Learning. It adopts neural network system in human brain. The
process of changing the value of a gene. Simply put, changes power of this method is in connection between the neurons.
can be made by arithmetic operations on the value of selected Each neuron has dendrite (input), soma (main process), axon
genes. The number of feature of data we used was huge, so we (activation function), and synapse (output). Every neuron takes
used not too small value (it was about 0.2) to prevent too high input and produces output. The input values are the extracted
exploration term of feature combination, yet the exploration of features from patient’s medical record. Each input has weight
high number of feature was still fast enough. All parts of (coefficient) to be computed with its input value. The formula
selected genes in cross-over and mutation were selected is:
randomly.
Cross-over and mutation produce new individuals in a new
(3)
population. Then the fitness was measured in fitness
evaluation, and so on until the maximum iteration reached. The
cycle of Genetic Algorithms is showed below:
where:
ai = inputs
wi = weights
b = bias
After getting the result from soma, the result is transformed
with activation function. There are some common function
used in Neural Network such as Sigmoid, TanH, and Linear
Rectifier. And the result after activation function is called as
the output of a neuron. Sometimes the output of neuron is used
Fig. 3. The Cycle of Genetics Algorithm as the input of other neuron. The output values in output layer
represent the scores of every class.
The second approach was done by using Particle Swarm
Optimization (PSO) algorithm. The PSO algorithm was
inspired by the habits of birds or fish that were on the way to
the target. In the PSO algorithm, solution searching was
represented in a population consisting of several particles.
Populations were generated randomly. Each particle represents
a solution of feature combination. Each particle adjusted the
best position of the particle (local best) and adjusted the best
position of the whole population (global best). The best
positions search was done with certain iteration. There were
two factors that affected the status of particles, which were
position and speed. The formula of updating position and
velocity of the particle is showed below: Fig. 4. Ilustration of a Neuron in Neural Network
output layer. Neural Network with complete layers is called function. The range -1 to 1 allows the training process to
Multilayer Perceptron (MLP). converge faster.
The goal of Neural Network is finding the learning model The process of minimizing loss is using Stochastic
with the best weight values. Weight values are obtained by Gradient Descent (SGD). The gradient is computed by using
using back-propagation method that updates the weight when back-propagation. The Hogwild! parallelization scheme [11] is
the result is not match. The formula is: used to handle time cost issue when SGD is parallelized.
Hogwild! scheme follows a shared memory model where
multiple cores make independent process to the gradient
(4) updates asynchronously. Each node operates in parallel on its
local data until weights and biases are obtained by averaging.
[9] This is the parallel distributed and multi-threaded training
where: algorithm with SGD in H2O Deep Learning:
wi = weight value at i-th input
error = actual output – predicted output
ai = i-th input value 1. Initialize global model parameters (weight, bias)
μ = learning rate (0 to 1) 2. Distribute training data T across nodes (can be disjoint
or replicated)
E. Deep Learning 3. Iterate until convergence criterion reached:
Deep Learning is representation-learning method that 3.1. For nodes n with training subset do in parallel:
consists of a set of algorithm which is modeled with many a. Obtain copy of the global model parameters
representation levels. It is created from non-linear b. Select active subset
transformation of every representation level, starts from raw (user-given number of samples per iteration)
input to higher representation level. [3] “Deep Learning” c. Partition into by cores
means Neural Network that has many hidden layers. d. For cores on node n, do in parallel:
i. Get training example i
ii. Update all weights , biases
3.2. Set
3.3. Score the model on (potentially sampled)
train/validation score sets
Fig. 5. Neural Network with Many Hidden Layers
There are many kinds of Deep Learning; they are Deep Regularization technique is also used to prevent overfitting,
Neural Network, Deep Belief Network, Convolutional Neural L1 (Lasso) and L2 (Ridge) regularization implement the same
Network, and Recurrent Neural Network. The use of existing penalties as they do with other methods. [9] And this is the
types of deep learning is divided into supervised and formula of modified loss function to minimize loss:
unsupervised learning. It depends on the type of problem that is
wanted to solve. The key of Deep Learning is how deep the
abstractions (hidden layer) are needed and how many nodes (5)
(units of each layer) are needed to produce a good learning or
model.
, for L1 regularization, represents the sum of all
Deep Learning that we used was H2O Deep Learning. [9]
H2O Deep Learning follows the multilayer feed-forward neural L1 norms for the weights and biases; , for L2
network model as described in this paper. For initialization, regularization, represents the sum of squares of all weights and
unlike other various Deep Learning architectures that use a biases. The constants ⋋ are specified as very small, for
combination of unsupervised pre-training followed by example . [9]
supervised training, it uses a purely supervised training Increasing the number of layers means increasing learning
protocol. The initialization scheme is the uniform adaptive process time. Deep Learning is being improved to get faster
option which is an optimized initialization based on the size of learning. High learning rate can boost learning process, but it
the network. For non-linear activation function, we used TanH can be trapped to local minimum loss. The solutions are
function. This function is a rescaled and shifted logistic adaptive learning rate and momentum.
Adaptive learning rate means updating learning rate at each The second validation was done by using Leave-One-Out
epoch. If the loss is lower than the previous iteration, learning Cross Validation, which was using every record as testing data
rate is increased by a slight proportion, for example 1%-5%. in sequence.
But if the loss is higher than the previous iteration, back to the
previous weights and learning rate is sharply decreased, for In this research, the result of Deep Learning was compared
example 30%-50%. It is called Bold Driver method. to five other popular methods, which were Naive Bayes, K-
Nearest Neighbor (KNN), Artificial Neural Network (ANN),
The other method is using annealing method. It is keeping dan Support Vector Machine (SVM). We used the
the learning rate around its fixed value. It can smooth the loss collaboration of Weka dan RapidMiner tools for our
gradient descent. [10] experiment. Weka was used for selecting the features with
Genetics Algorithm and Particle Swarm Optimization.
RapidMiner was used for getting the performance.
(6) The used operators were Naive Bayes, KNN with K=3,
Neural Nets with 50 iteration and the number of nodes in
hidden layer as many as the number of output nodes, and
The other way, learning time can be reduced by using LibSVM with polynomial kernel. The rest configurations were
momentum; its value is between 0 to 1. Momentum can keep using standard parameter of Weka and RapidMiner tools.
the weights stay in global minimum loss. It contributes in back-
propagation process when updating the weight. The formula of
weight update becomes: TABLE II. STRATIFIED 10-FOLDS CROSS VALIDATION RESULT
IV. EXPERIMENT AND RESULT PSO Search 63,02 63,26 71,63 70,93 75,81
A. Feature Selection
TABLE III shows the accuracies of Naive Bayes, K-NN,
There were two approaches to select the features in this
ANN, Deep Learning, and SVM with Leave-One-Out Cross
research as discussed before. The first approach was Genetics
Validation method. The accuracy comparation shows that Deep
Algorithm (GA Search) method and the second one was
Learning with PSO Search feature selection is better than the
Particle Swarm Optimization (PSO Search) method. The
others, that is 75,81%.
technical discussions were discussed in the proposed method
session of this paper. GA Search resulted 31 features and PSO
Search resulted 23 features. CONCLUSION AND FURTHER WORK
In classification performance term, Deep Learning [3] Yann Lecun, "Deep Learning," NATURE, vol. 521, Mei 2015.
algorithm showed the best result, which was 76,51% using [4] Vasu Gupta, "Prediction and Classification of Cardiac Arrhythmia,"
PSO algorithm as feature selection method and 74,44% using 2014.
GA algorithm as feature selection method. It was better than [5] Ignatov Andrey Dmitrievich, "Deep Learning in information analysis of
electrocardiogram signals for disease diagnostics," The Ministry of
other popular methods, which were Naive Bayes, K-Nearest Education and Science of The Russian Federation Moscow Institute of
Neighbor (KNN), Artificial Neural Network (ANN), dan Physics and Technology, 2015.
Support Vector Machine (SVM). [6] Meng Huanhuan and Zhang Yue, "Classification of Electrocardiogram
Signals with Deep Belief Networks," IEEE 17th International
The result that we got was still far from good accuracy for Conference on Computational Science and Engineering, 2014.
medical term. The data we used had many samples for normal
[7] Albert Haque, "Cardiac Dysrhythmia Detection with GPU-Accelerated
classes, whereas other classes had only a few samples. That Neural Networks," Computer Science Department, Stanford University,
was good enough to learn the normal label, but not to learn the Desember 2014.
Arrhythmia types. In further work, we will improve the dataset [8] Anish Batra, "Classification of Arrhythmia Using Conjunction of
quality to improve the performance of learning. Machine Learning Algorithms and ECG Diagnostic Criteria,"
International Jounal of Biology and Biomedicine, vol. 1, 2016.
[9] Arno Candel and Erin LeDell. (2017, August) Deep Learning with H2O.
REFERENCES http://h2o:ai/resources/.
[10] Matthew D. Zeiler. (2012, December) [Online].
[1] World Health Organization. (2016, September) World Health http://arxiv.org/pdf/1212.5701v1.pdf
Organization. [Online]. [11] Feng Niu, Banjamin Recht, Christopher Re, and Stephen J. Wright.
http://www.who.int/mediacentre/factsheets/fs317/en/ (2011) [Online]. http://i.stanford.edu/hazy/papers/hogwild-nips.pdf
[2] John A Kastor, "Cardiac Arrhythmias," ENCYCLOPEDIA OF LIFE
SCIENCES, 2002.