Hir 25 201

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Original Article

Healthc Inform Res. 2019 July;25(3):201-211.


https://doi.org/10.4258/hir.2019.25.3.201
pISSN 2093-3681 • eISSN 2093-369X

Deep Learning-Based Electrocardiogram Signal


Noise Detection and Screening Model
Dukyong Yoon1,2, Hong Seok Lim3, Kyoungwon Jung4, Tae Young Kim1, Sukhoon Lee5
1
Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Korea
2
Department of Biomedical Sciences, Graduate School of Medicine, Ajou University, Suwon, Korea
3
Department of Cardiology, Ajou University School of Medicine, Suwon, Korea
4
Division of Trauma Surgery, Department of Surgery, Ajou University School of Medicine, Suwon, Korea
5
Department of Software Convergence Engineering, College of Convergence Engineering, Kunsan National University, Gunsan, Korea

Objectives: Biosignal data captured by patient monitoring systems could provide key evidence for detecting or predicting
critical clinical events; however, noise in these data hinders their use. Because deep learning algorithms can extract features
without human annotation, this study hypothesized that they could be used to screen unacceptable electrocardiograms (ECGs)
that include noise. To test that, a deep learning-based model for unacceptable ECG screening was developed, and its screen-
ing results were compared with the interpretations of a medical expert. Methods: To develop and apply the screening model,
we used a biosignal database comprising 165,142,920 ECG II (10-second lead II electrocardiogram) data gathered between
August 31, 2016 and September 30, 2018 from a trauma intensive-care unit. Then, 2,700 and 300 ECGs (ratio of 9:1) were
reviewed by a medical expert and used for 9-fold cross-validation (training and validation) and test datasets. A convolutional
neural network-based model for unacceptable ECG screening was developed based on the training and validation datasets.
The model exhibiting the lowest cross-validation loss was subsequently selected as the final model. Its performance was
evaluated through comparison with a test dataset. Results: When the screening results of the proposed model were compared
to the test dataset, the area under the receiver operating characteristic curve and the F1-score of the model were 0.93 and 0.80
(sensitivity = 0.88, specificity = 0.89, positive predictive value = 0.74, and negative predictive value = 0.96). Conclusions: The
deep learning-based model developed in this study is capable of detecting and screening unacceptable ECGs efficiently.

Keywords: Electrocardiography, Noise, Deep Learning, Signal Detection Analysis, Physiologic Monitoring

Submitted: May 14, 2019


Revised: 1st, June 14, 2019; 2nd, July 8, 2019
I. Introduction
Accepted: July 16, 2019
Electrocardiograms (ECGs) are commonly performed cardi-
Corresponding Author ology tests that record the electrical activity of the heart over
Dukyong Yoon a period using electrodes [1]. These electrodes detect small
Department of Biomedical Informatics, Ajou University School of
electrical changes caused by depolarization and repolariza-
Medicine, 206 World cup-ro, Yeongtong-gu, Suwon 16499, Korea.
Tel: +82-31-219-4476, E-mail: [email protected] (https:// tion in the electrophysiological pattern of the heart muscle
orcid.org/0000-0003-1635-8376) during each heartbeat [1]. ECGs are widely used for various
purposes, including measuring heart rate consistency, size,
This is an Open Access article distributed under the terms of the Creative Com-
mons Attribution Non-Commercial License (http://creativecommons.org/licenses/by- and location; identifying damage to the heart; and observing
nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduc-
tion in any medium, provided the original work is properly cited.
the effects of devices, such as pacemakers or heart-regulating
medications. The information obtained via ECGs can also be
ⓒ 2019 The Korean Society of Medical Informatics
used for medical diagnosis. ECGs represent the best method
Dukyong Yoon et al

for measuring and diagnosing abnormal heart rhythms [2]— all kinds of waveforms captured in the ICU were included)
an especially useful trait when applied to the measurement from over 8,000 patients with an observational period of ap-
damaged conduction tissues that transmit electrical signals proximately 517 patient-years as of October 2018.
[3]. ECGs can be used to detect damage to specific por- A different approach to noise was thus believed to be re-
tions of the myocardium during myocardial infarctions [4]. quired. First, because there were sufficient data, researchers
Furthermore, digitally gathered and stored ECG data can be did not need to use de-noised data, which might still contain
used for automatic ECG signal analysis [5]. incorrect information (some real-world signals captured
ECG signals, however, are frequently interrupted by vari- noise exclusively without any ECG information, as shown
ous types of noise and artifacts (Figure 1A–1C). Previous in Figure 1C). Second, a deep learning algorithm is needed.
works have classified these into common artifact types [6–8], Deep learning algorithms possess multiple advantages. They
such as baseline wandering (BA), muscle artifacts (MA), and do not require feature extraction processes performed by
powerline interference (PLI). Subject movements or respira- domain experts; the abovementioned biosignal repositories
tory activities cause BA, which manifests as slowly wander- collected ECG alongside many other biosignal data types
ing baselines primarily related to random body movements. (such as respiration, ABP, PPG, and central venous pres-
ECGs with MAs are contaminated with muscular contrac- sure [CVP]), and because each type of waveform possesses
tion artifacts. PLIs, caused by electrical power leakage or im- unique characteristics, it requires a customized algorithm.
proper equipment grounding, are indicated by varying ECG Because a feature extraction process is unnecessary, it is
amplitudes and indistinct isoelectric baselines. Because such
noise or artifacts may lead to disturbances of further auto-
A
matic ECG signal analysis, their detection and elimination is
of great importance, as this could prevent ECG noise-related
misclassifications or misdiagnosis.
Previous studies have attempted to de-noise ECG signals
using a wide range of approaches, including wavelet trans- B
formation [9–11], weighted averages [12,13], adaptive filter-
ing [14], independent component analysis [15], and empiri-
cal mode decomposition (EMD) [16–19]. However, existing
methods have several noise-removal limitations [20]. For
example, EMD-based approaches may filter out p- and t- C
waves. Adaptive filters, proposed by Rahman et al. [21], can
apply filters such as the signed regression algorithm and nor-
malized least-mean square, but they encounter difficulties
obtaining noise-signal references from a typical ECG signal
acquisition. D
Recent ECG-related research has required a substantially
different approach to noise because the scale of data collec-
tion has become very large. The Electrocardiogram Vigilance
with Electronic data Warehouse II (ECG-ViEW II) released
979,273 ECG results from 461,178 patients, with plans to
add all 12-lead data in the next version [22,23]. More recent E
attempts have, additionally, been made to acquire ECG mea-
surements from patient monitoring equipment in intensive
care units (ICUs) [24,25]. The MIMIC III (Medical Infor-
mation Mart for Intensive Care) dataset contains 22,317
Figure 1. Acceptable and non-acceptable electrocardiogram (ECG)
waveform records (ECG, arterial blood pressure [ABP]
examples. Non-acceptable ECGs have global (A) or local
waveforms, fingertip photoplethysmogram [PPG] signals, (B) noise, or signals from ECG equipment rather than
and respiration). The biosignal repository used in this study patients (C). (D) and (E) are examples of acceptable
collected 23,187,206 waveform records (like MIMIC III, ECGs.

202 www.e-hir.org https://doi.org/10.4258/hir.2019.25.3.201


Deep Learning-Based ECG Noise Screening

easier to develop a deep learning-based algorithm to screen University Hospital Institutional Review Board (No. AJIRB-
unacceptable signals than to develop an algorithm for each MED-MDB-16-155). Only de-identified data were used and
signal. analyzed, retrospectively.
In this context, a new unacceptable ECG (ECG with noise)
detection and screening deep learning-based model for fur- 1. Data Source
ther automatic ECG signal analysis was developed in this The data used for this study were obtained from a biosignal
study. In the development process, we minimized the manu- repository constructed by us through our previous research
al review effort of the medical expert by pre-screening ECG [25] (Figure 2). From September 1, 2016 to September 30,
data using non-experts. 2018, the biosignal data collected from the trauma ICU of
Ajou University Hospital comprised a total of 2,767,845 PPG,
II. Methods measuring peripheral oxygen saturation (SpO2); 2,752,382
ECG lead II; 2,597,692 respiration data; 1,864,864 ABP; and
Informed consent was waived in this study by the Ajou 754,240 CVP (each indicating 10-minute data files). Because

Ajou Biosignal Repository


- From September 1, 2016
to September 30, 2018
- 165,164,920 10-second long
ECG lead II

About 0.01% random sampling Random sampling


from September 1, 2016 from July 1, 2018
to June 30, 2018 to September 30, 2018

15,400 ECGs 300 ECGs

Manual review
by non-experts

15,400 ECGs
(acceptable 13,485,
unacceptable 1,915)

Random sampling
with 1:1 ratio

2,700 ECGs Manual review


(acceptable 1,350, by clinical expert
unacceptable 1,350)

Manual review
by clinical experts

2,700 ECGs
(acceptable 1,893, Figure 2. S tudy process flow. From
unacceptable 807) 15,400 electrocardiograms
(ECGs) reviewed by non-
experts, 2,400 and 300
9-fold cross-validation

Train Validation (2,400 ECGs)


ECGs were confirmed by a
Test dataset medical expert and used as
(300 ECGs) training and validation da-
taset. Three hundred ECGs
independently gathered
from different periods were
reviewed by the expert and
used to evaluate the per-
The best model Performance
evaluation formance of the developed
deep learning-based model.

Vol. 25 • No. 3 • July 2019 www.e-hir.org 203


Dukyong Yoon et al

the data were collected from the trauma ICU, most patients utes) session was conducted for each non-expert evaluator.
were admitted because of physical trauma rather than any Labeling results were manually reviewed and corrected by a
disease the patients had. Approximately 52.1% of patients in medical expert before they were used for model training (see
the trauma ICU experienced surgery to repair physical dam- Section II-4 and Figure 2).
age to their bodies. The average ± standard deviation of the
age of the patients was 50.7 ± 20.5. Males constituted 73.0% 4. Training, Validation, and Test Datasets
of the patients. The datasets for model development training and validation
were initially reviewed by two non-experts. If one evaluator
2. Unacceptable ECG (ECG with Noise) Definition classified an ECG as unacceptable, the ECG was considered
The ECG waveforms were classified into two types: accept- unacceptable even if the other disagreed. From 15,400 la-
able or unacceptable. An acceptable waveform was defined beled ECGs, 13,485 (87.6%) were classified as acceptable by
as a waveform in the normal category that is able to be used the non-experts. Among these pre-screened ECGs, 2,700
for further analysis. Unacceptable waveforms included the were randomly sampled ECGs with a 50:50 balance ratio
following subtypes: (1) BA, waveforms with variations in the (acceptable:unacceptable) to adjust the imbalance between
measured voltage baseline; (2) MA, partial noise caused by them in the real-world dataset. The sampled ECGs were
patient movements or body instability; (3) PLI, noise gener- confirmed via manual review by a medical expert. These
ated across the entire waveform owing to close contact be- ECGs were randomly divided into a training dataset (2,400
tween the measurement sensor and voltage interference; (4) ECGs) and a validation dataset (300 ECGs) for 9-fold cross-
unacceptable (other reasons), waveforms not categorized as validation (Figure 2). The test dataset (300 ECGs) was ran-
normal waveforms because of other causes; and (5) unclear, domly sampled from ECGs gathered from various periods
waveforms in which the preceding type-judgements were and data from patients whose data were included in the
inappropriate. All waveforms that were not acceptable were training or validation dataset were excluded (Figure 3). The
classified as unacceptable (ECG with noise). data were also evaluated as acceptable or unacceptable by a
medical expert using the tool described above. Finally, the
3. Labeling Tool Development dataset for 9-fold cross-validation (training and validation)
A web-based tool was developed to label the two types of and test datasets were generated in a ratio of 9:1.
ECG waveforms defined above. This facilitated rapid evalua-
tion and efficient management in labeling the results of each 5. Deep Learning Model Development
ECG signal (Figure 3). The tool displayed a 10-second ECG 1) Model development
result, allowing the evaluator to select one of the two types: Waveform data representing 10-second ECGs were used as
acceptable or unacceptable. Using these tools, randomly input because they were labeled for this timespan. For the
selected ECGs were reviewed by two non-experts and a original ECGs, there were 2,500 data points because they
medical expert. A short pre-training (approximately 10 min- were measured at 250 Hz. Sampling was performed at a 50%

1,000

800

600

400

200

0 Figure 3. Web-based labeling tool. The


tool presented 10-second
200
electrocardiogram wave-
0 500 1,000 1,500 2,000 forms and asked evaluators
if they were acceptable or
Pre 1 Go to page... Next
unacceptable. All answers
were recorded and managed
Acceptable Unacceptable Submit
in the backend database.

204 www.e-hir.org https://doi.org/10.4258/hir.2019.25.3.201


Deep Learning-Based ECG Noise Screening

ratio to reduce input size, and the input data was ultimately tection, there are no references on an optimal architecture.
defined as a 1 × 1250 vector. In addition, the frequency do- Therefore, we tested four different internal architectures,
main information of the ECG signals was also taken into shown in Figure 4 and Table 1, moving from simple to more
consideration. The ECG signal energy in each frequency complex networks. In model architecture #1, which is the
band was extracted using fast Fourier transform (FFT), and simplest approach, the network consists of a single convo-
the structure of the transformed data was the same as that of lutional layer with 64 feature maps for the time domain and
a 1 × 1250 vector (real part of the double-sided FFT results). frequency domain data. Furthermore, a single fully con-
The input data were normalized via min-max normalization. nected layer with 64 neurons is used to combine the time
Because convolutional neural network (CNN) models have domain and frequency domain information. The number of
not previously been used for unacceptable ECG signal de- convolutional layers, feature maps, fully connected layers,

A B

1 1 1 1
1250 1250 1250 1250

64 64 64 64
625 625 625 625

64 128 128
313 313
Acceptable or not
128

64

Acceptable or not

C D

1 1 1 1
1250 1250 1250 1250

64 64 64 64
625 625 625 625

128 128 128 128


313 313 313 313

256 256 256 256

157 157 157 157

256
512 512
128
Figure 4. A rchitecture of the four
79 79
64 convolutional neural net-
work models: (A) model 1,
Acceptable or not
512 (B) model 2, (C) model 3,
and (D) model 4. After time
256
domain and frequency do-
128 main data were abstracted
by dependent convolutional
Convolutional & max-pooling layer
64 layers, they were combined
Fully connected layer in the ensuing fully con-
Acceptable or not nected layers.

Vol. 25 • No. 3 • July 2019 www.e-hir.org 205


Dukyong Yoon et al

Table 1. Description of the convolutional neural network architecture

Model 1 Model 2 Model 3 Model 4


Number of convolutional layers 1 2 3 4
Number of feature maps in each convolutional layer
Convolutional layer 1 64 64 64 64
Convolutional layer 2 - 128 128 128
Convolutional layer 3 - - 256 256
Convolutional layer 4 - - - 512
a
Size of kernel in convolutional layers 16 / 48 / 96 16 / 48 / 96 16 / 48 / 96 16 / 48 / 96
Stride of kernel in convolutional layers 1 1 1 1
Size of kernel in max-pooling layers 20 20 20 20
Stride of kernel in max-pooling layers 2 2 2 2
Number of fully connected layers 1 2 3 4
Number of neurons in each fully connected layer
Fully connected layer 1 64 128 256 512
Fully connected layer 2 - 64 128 256
Fully connected layer 3 - - 64 128
Fully connected layer 4 - - - 64
Learning rate 1.00E-04 1.00E-04 1.00E-04 1.00E-04
Decay rate for learning rate 0.8 0.8 0.8 0.8
Minimum learning rate 1.00E-06 1.00E-06 1.00E-06 1.00E-06
Keep probability for dropout layers 0.8 0.8 0.8 0.8
a
It is tested three different sizes (16, 48, and 96) in all the models.

and neurons gradually increase in model architectures #2, in cross-validation was selected as the final model, and its
#3, and #4. The ReLU activation function was used in all the performance was evaluated.
fully connected layers, and a final layer that outputs the bi-
nary classification result (i.e., acceptable or not) by using the 6. Performance Evaluation
softmax activation was also present. A threshold of 0.5 was Model performance was evaluated via comparison with
used for the probabilities from the softmax layer to classify the test dataset of 300 ECG signals labeled by a medical ex-
unacceptable (= 1) and acceptable (= 0). To define the opti- pert. The performance evaluation considered unacceptable
mum size of the kernel in the convolutional layers, we also screening as a positive value and calculated the sensitivity,
tested three different sizes (16, 48, and 96) in all the models. specificity, positive and negative predictive values (PPV and
NPV), F1-score, and the area under the receiver operating
2) Model optimization characteristic (AUROC) curve. Sensitivity (true positive
A cross-entropy loss function was selected as the cost func- rate) refers to the ability of the model to correctly detect
tion. The adaptive moment estimation (Adam) optimizer unacceptable ECGs not meeting the gold standard. Specific-
(learning rate = 0.0001, decay = 0.8, minimum learning rate ity (true negative rate) indicates the ability of the model to
= 0.000001) was used to train the model. We repeated the correctly reject acceptable ECGs identified as accurate. PPV
training process up to 100 epochs with a batch size of 200 and NPV represent the proportion of true to modeled posi-
(i.e., up to 600 iterations). During the iterations, the results tive and negative results, respectively. The F1-score is the
that showed the lowest loss in the validation set was selected harmonic average of the sensitivity and PPV.
for the finally trained model.
To evaluate the robustness of the results, we conducted 7. Cutoff for Classifying Unacceptable ECGs
9-fold cross-validation with variation of the training and To set an appropriate cutoff to screen out as many unaccept-
validation datasets. The model that exhibited the lowest loss able ECGs as possible, even at the expense of some accept-

206 www.e-hir.org https://doi.org/10.4258/hir.2019.25.3.201


Deep Learning-Based ECG Noise Screening

able ECGs, we conducted sensitivity analysis on the cutoff were incorrectly evaluated as unacceptable (Table 3). The
and determined the optimal cutoff for our study. signals defined as unacceptable by the algorithm were 74%
unacceptable and 26% misclassified acceptable waveforms.
8. Software Tools and Computational Environment In contrast, only 9 of 76 unacceptable ECGs in the test data-
MS-SQL 2017 was used for data management, and Python set were misclassified as acceptable, and 96% of the signals
(version 3.6.1) and the TensorFlow library (version 1.2.1) evaluated as acceptable by the model were acceptable.
were used to develop the data preprocessing and deep learn- The time required to evaluate the 300 gold-standard data-
ing models. The machine used for model development had set signals was 0.48 seconds, taking an average of 0.58 ms for
one Intel Xeon CPU E5-1650 v4 (3.6 GHz), 128 GB RAM, each ECG.
and one GeForce GTX 1080 graphics card.

III. Results
Among the model architectures, model architecture #3 ex- 1.0
hibited the best average cross-validation accuracy and loss
(Figure 5 and Supplementary Table S1). With increasing 0.8
model architecture complexity, accuracy increased, and loss

True positive rate


decreased. With model architecture #4, however, neither the 0.6

accuracy nor the loss improved. During evaluation of the


kernel size in the convolutional layers, a smaller size showed 0.4

better performance in all models. Therefore, model archi-


0.2
tecture #3 with a kernel size of 16 was selected as the final
ROC curve (AUROC = 0.93)
model architecture.
0.0
When the final model showing the lowest loss in cross- 0.0 0.2 0.4 0.6 0.8 1.0
validation among models with model architecture #3 was False positive rate
applied to the gold standard dataset (test dataset), it gave a Figure 6. R eceiver operating characteristics (ROC). When the
result of 0.93 AUROC (Figure 6). In the sensitivity analysis final model was applied to the test dataset, the area
to set the cutoff to classify unacceptable ECGs, when 0.05 under the ROC curve (AUROC) reached 0.93. The blue
point represents the results when 0.05 was used as the
was used as the cutoff value, high sensitivity was achieved
threshold for the probabilities output by the convolu-
with reasonable performance in the other indexes (Table 2). tional neural network. Depending on the user’s purpose
When 0.05 was used as the cutoff value, 88% of the unac- (for example, higher sensitivity), the threshold can be
ceptable ECGs were detected, and 11% of the acceptable data adjusted.

A B
Validation accuracy Validation loss
1.000 0.30

0.975
0.25
0.950
0.20
0.925
Acurracy

Loss

0.900 0.15

0.875
0.10
0.850 Kernel size = 16 Kernel size = 16
Kernel size = 48 0.05 Kernel size = 48
0.825 Kernel size = 96 Kernel size = 96
0.800 0.00
Model 1 Model 2 Model 3 Model 4 Model 1 Model 2 Model 3 Model 4
Figure 5. Classification accuracy (A) and loss (B) of the four convolutional neural network models with various kernel sizes. The more
complex model architecture with smaller kernel size exhibited better classification accuracy and less loss on the validation
dataset.

Vol. 25 • No. 3 • July 2019 www.e-hir.org 207


Dukyong Yoon et al

Table 2. Result of sensitivity analysis to set optimal cutoff value

Cutoff values
0.005 0.05 0.5 0.95 0.995
Sensitivity 0.91 0.88 0.76 0.70 0.66
Specificity 0.76 0.89 0.93 0.96 0.97
PPV 0.57 0.74 0.79 0.85 0.89
NPV 0.96 0.96 0.92 0.90 0.89
F1-score 0.70 0.80 0.78 0.77 0.76
The performance when the finally selected cutoff value was used is shown in bold.
PPV: positive predictive value, NPV: negative predictive value.

Table 3. Confusion matrix when cutoff value of 0.05 was applied generated abnormal signal, as shown in Figure 1C. These
to the finally selected convolutional neural network model data points cannot be de-noised, as they are exclusively noise
without any true ECG signal information.
Classification of our model
Our approach, which minimizes intervention of domain
Unacceptable Acceptable
experts by conducting non-experts’ pre-screening, would be
Test dataset Unacceptable 67 9
also viable for application to other biosignals. Because the
Acceptable 24 200 enrollment of medical expertise is also accompanied by high
costs, it was necessary to test whether accurate noise evalua-
IV. Discussion tions could be developed while reducing the effort required
of medical expertise. Conventional de-noising or quality
This study developed a deep learning model for screening assessment methods have been designed considering the
unacceptable ECG waveforms. The developed model was features of specific waveforms; however, they require much
able to identify most (88%) of the unacceptable ECGs de- effort from domain experts, which could be a barrier to the
tected by a clinical expert. Because the time required to ana- development of a model that can be applied to a wide variety
lyze a single 10-second waveform is only approximately 0.58 of biosignals.
ms, this model can be used in real-time and for large volume Our results in CNN model optimization suggest that deep-
ECG analysis. er but smaller convolutional filter (kernel) sizes provide bet-
Previous studies have primarily attempted to de-noise ECG ter performance. This finding was also observed in another
signals [26], for example, using principal component analy- domain, image recognition. The VGG-16 model, which won
sis (PCA) to abstract original signal data into a few eigenvec- first place in the ImageNet Challenge 2014, improved the
tors with low noise-levels. A discrete wavelet transformation performance by increasing network depth with very small
(DWT) concentrated on true signals, which possessed larger convolutional filters [27].
coefficients than noise data; however, threshold definition In analysis of ECG signal, we chose 1D CNNs rather than
was critical. Other methods, such as wavelet Wiener filtering the 2D CNN with the spectrogram of models because we
and pilot estimation, have also been used. assumed that noise or unacceptable ECG signal is indepen-
The background to the problem addressed in this study is dent from time. For the same reason, we did not use recur-
different from that of existing attempts. As described earlier, rent neural network (RNN)-based models (long short-term
there are now sufficient data to establish the groundwork for memory models, gated recurrent unit, etc.) and only focused
a future algorithm to achieve exclusively accurate data, with- on the morphological characteristics of signal.
out input noise. This is because when noisy input is entered In addition, to evaluating the advantages of using both
into a learning model, its performance declines. This study time and frequency domain input rather than time domain
therefore aimed to leave only clean data as much as possible, data only or frequency domain data only, we conducted ad
even at the expense of some acceptable data. Additionally, a hoc analysis by using time domain data only or frequency
significant portion of noise in real-world ECG data is gener- domain data only in model architecture #3, which showed
ated not by alterations from specific effects, but data ren- the best performance in our study. Based on the results, we
dered unacceptable because they were of a type of system- confirmed that accuracy and loss were better in 9-fold cross-

208 www.e-hir.org https://doi.org/10.4258/hir.2019.25.3.201


Deep Learning-Based ECG Noise Screening

validation when both data were used together (in mean ± situations.
standard deviation; accuracy = 0.95 ± 0.02, loss = 0.12 ± 0.03) The proposed model may be used in two ways: planted in
than when only time domain data (accuracy = 0.93 ± 0.02, monitoring devices, or to process centrally collected data.
loss = 0.22 ± 0.06) or only frequency domain data was used As this model runs very quickly on an Intel Xeon CPU E5-
(accuracy = 0.94 ± 0.02, loss = 0.19 ± 0.05). 1650 v4 (3.6 GHz) with 128 GB RAM and one GeForce GTX
Some specific waveforms due to pathology status may have 1080 graphic card computing environment, time constraints
been screened as unacceptable. In some cases, signal modi- would remain insignificant in the latter application because
fication caused by cardiovascular system diseases might central systems would possess sufficient computing power.
be quite similar to those affected by noise or artifacts (for In the near future, monitoring devices themselves will ana-
example, atrial fibrillation). Therefore, we conducted ad hoc lyze signals and provide warnings of danger. Because predic-
analysis by applying the proposed model to the results of tion models are highly complex, the proposed unacceptable
portable ECGs that include interpretation by the ECG ma- ECG detection model needed to be as computationally
chine [23]. When our model was applied to the randomly simple as possible. Various approaches attempted previously
selected 10,000 ECG lead II data, 3,337 ECGs were classified in this regard [28] require further research.
as unacceptable. Further, we could observe the tendency that This study also encountered the following limitations. First,
waveforms of certain types of arrhythmia are likely to be the test dataset used as the gold standard was generated by
classified as unacceptable (Table 4). However, there is a pos- only one expert. As previously mentioned, this study did not
sibility that situations of measuring ECGs for arrhythmic pa- aim to diagnose diseases requiring a high level of domain
tients or the statuses of arrhythmic patients were less stable knowledge; thus, the gap between different experts should
and led to more unacceptable ECGs. Therefore, this model not be significant. However, there was no supplemental eval-
would be proper for filtering noise data for the preparation uation to correct for potential mistakes. Second, the specific
of a noise-free training dataset. If this model is used to filter category of noise was not evaluated. Noise could be classified
noise signal before the application of certain arrhythmia de- into five categories: BA, MA, PLI, unacceptable, and unclear.
tection models, sufficient input data must be prepared con- However, data collection in an actual clinical environment
sidering that the filtering rate could be high in pathologic resulted in a majority of normal waveform points and a small

Table 4. Distribution of ECG types in ECGs classified as acceptable or unacceptable by our model

Type of ECGs Acceptable Unacceptable p-valuea


Total ECGs 6,663 (66.6) 3,337 (33.4)
Normal sinus rhythm 4,641 (69.5) 2,033 (30.5) <0.001
Atrial arrhythmia
Sinus bradycardia 1,040 (71.0) 424 (29.0) <0.001
Premature atrial complexes 72 (48.6) 76 (51.4) <0.001
Supraventricular tachycardia 10 (55.6) 8 (44.4) 0.456
Atrial flutter 25 (55.6) 20 (44.4) 0.158
Atrial fibrillation 188 (50.0) 188 (50.0) <0.001
Junctional arrhythmia 53 (52.0) 49 (48.0) 0.002
Ventricular arrhythmia
Premature ventricular complexes 140 (45.0) 171 (55.0) <0.001
Heart blocks
1st degree 236 (64.0) 133 (36.0) 0.311
2nd degree 3 (33.3) 6 (66.7) 0.068
Bundle branch block 251 (51.5) 236 (48.5) <0.001
Values are presented as number (%).
ECG: electrocardiograms.
a
Chi-square test or Fisher exact test against the expectation of unacceptable in total ECGs.

Vol. 25 • No. 3 • July 2019 www.e-hir.org 209


Dukyong Yoon et al

volume of noise data. Therefore, the data were insufficient org/10.4258/hir.2019.25.3.201. Table S1. Performance of the
for the deep learning model to learn to classify all defined model in cross-validation.
categories. Furthermore, the noise categories were integrated
without distinction because distinguishing between noise References
causes in actual applications was inconsequential (as this
study aimed to eliminate all unacceptable signals, regardless 1. Kumar A. ECG-simplified. [place unknown]: LifeHug-
of their cause). All waveforms evaluated as non-acceptable ger; 2010.
were thus also integrated as unacceptable ECGs. Finally, the 2. Kaplan NM. Systemic hypertension therapy. In:
PPV was not quite as high as 0.74, meaning that many of the Braunwald E, editor. Braunwald’s heart disease: a text-
unacceptable ECGs identified by the proposed algorithms book of cardiovascular medicine. Philadelphia (PA):
were truly acceptable ECGs. However, as mentioned above, Saunders; 1997.
this study aimed to increase sensitivity, and some normal 3. Van Mieghem C, Sabbe M, Knockaert D. The clini-
waveform loss was acceptable because these data were suf- cal value of the ECG in noncardiac conditions. Chest
ficiently represented. Instead, the proposed algorithm was 2004;125(4):1561-76.
able to ensure high sensitivity and successfully screened 88% 4. American Health Association. Part 8: Stabilization of
of unacceptable ECGs. the patient with acute coronary syndromes. Circulation
In conclusion, this study developed a model capable of ef- 2005;112(24_Suppl):IV.89-IV.110.
ficiently detecting unacceptable ECGs. The developed unac- 5. Li H, Yuan D, Wang Y, Cui D, Cao L. Arrhythmia
ceptable ECG detection model is expected to provide a first classification based on multi-domain feature extrac-
step for future automated large-scale ECG analyses. tion for an ECG recognition system. Sensors (Basel)
2016;16(10):E1744.
Conflict of Interest 6. Rodrigues J, Belo D, Gamboa H. Noise detection on
ECG based on agglomerative clustering of morphologi-
No potential conflict of interest relevant to this article was cal features. Comput Biol Med 2017;87:322-34.
reported. 7. Sivaraks H, Ratanamahatana CA. Robust and accu-
rate anomaly detection in ECG artifacts using time
Acknowledgments series motif discovery. Comput Math Methods Med
2015;2015:453214.
This research was supported by grants from the Korea 8. Satija U, Ramkumar B, Manikandan MS. Automated
Health Technology R&D Project through the Korea Health ECG noise detection and classification system for unsu-
Industry Development Institute (KHIDI) funded by the pervised healthcare monitoring. IEEE J Biomed Health
Ministry of Health & Welfare, Republic of Korea (No. Inform 2018;22(3):722-32.
HG18C0067, Government-wide R&D Fund project for in- 9. Ercelebi E. Electrocardiogram signals de-noising using
fectious disease research). This work was also supported by lifting-based discrete wavelet transform. Comput Biol
the faculty research fund of Ajou University School of Medi- Med 2004;34(6):479-93.
cine. 10. Ho CY, Ling BW, Wong TP, Chan AY, Tam PK. Fuzzy
multiwavelet denoising on ECG signal. Electron Lett
ORCID 2003;39(16):1163-4.
11. Tikkanen PE. Nonlinear wavelet and wavelet packet
Dukyong Yoon (http://orcid.org/0000-0003-1635-8376) denoising of electrocardiogram signal. Biol Cybern
Hong Seok Lim (http://orcid.org/0000-0002-3127-2071) 1999;80(4):259-67.
Kyoungwon Jung (http://orcid.org/0000-0001-7895-0362) 12. Iravanian S, Tung L. A novel algorithm for cardiac bio-
Tae Young Kim (http://orcid.org/0000-0002-2591-0129) signal filtering based on filtered residue method. IEEE
Sukhoon Lee (http://orcid.org/0000-0002-3390-5602) Trans Biomed Eng 2002;49(11):1310-7.
13. Leski JM. Robust weighted averaging. IEEE Trans
Supplementary Materials Biomed Eng 2002;49(8):796-804.
14. Almenar V, Albiol A. A new adaptive scheme for ECG
Supplementary materials can be found via http://doi. enhancement. Signal Process 1999;75(3):253-63.

210 www.e-hir.org https://doi.org/10.4258/hir.2019.25.3.201


Deep Learning-Based ECG Noise Screening

15. Barros AK, Mansour A, Ohnishi N. Removing artifacts 73.


from electrocardiographic signals using independent 22. Kim YG, Shin D, Park MY, Lee S, Jeon MS, Yoon D, et
components analysis. Neurocomputing 1998;22(1- al. ECG-ViEW II, a freely accessible electrocardiogram
3):173-86. database. PLoS One 2017;12(4):e0176222.
16. Blanco-Velasco M, Weng B, Barner KE. ECG signal 23. Chung D, Choi J, Jang JH, Kim TY, Byun J, Park H,
denoising and baseline wander correction based on et al. Construction of an electrocardiogram database
the empirical mode decomposition. Comput Biol Med including 12 lead waveforms. Healthc Inform Res
2008;38(1):1-13. 2018;24(3):242-6.
17. Huang NE, Shen Z, Long SR, Wu MC, Shih HH, 24. Johnson AE, Pollard TJ, Shen L, Lehman LW, Feng M,
Zheng Q, et al. The empirical mode decomposition Ghassemi M, et al. MIMIC-III, a freely accessible criti-
and the Hilbert spectrum for nonlinear and non- cal care database. Sci Data 2016;3:160035.
stationary time series analysis. Proc Math Phys Eng 25. Yoon D, Lee S, Kim TY, Ko J, Chung WY, Park RW.
Sci 1998;454(1971):903-95. System for collecting biosignal data from multiple
18. Kabir MA, Shahnaz C. Denoising of ECG signals based patient monitoring systems. Healthc Inform Res
on noise reduction algorithms in EMD and wavelet do- 2017;23(4):333-7.
mains. Biomed Signal Process Control 2012;7(5):481-9. 26. Reddy CK, Aggarwal CC. Healthcare data analytics.
19. Karagiannis A, Constantinou P. Noise-assisted data Boca Raton (FL): CRC Press; 2015.
processing with empirical mode decomposition in 27. Simonyan K, Zisserman A. Very deep convolutional
biomedical signals. IEEE Trans Inf Technol Biomed networks for large-scale image recognition [Internet].
2011;15(1):11-8. Ithaca (NY): arXiv.org; 2014 [cited at 2019 Jul 10].
20. Xiong P, Wang H, Liu M, Zhou S, Hou Z, Liu X. ECG Available from: https://arxiv.org/pdf/1409.1556.pdf.
signal enhancement based on improved denoising auto- 28. Huynh LN, Lee Y, Balan RK. Deepmon: mobile GPU-
encoder. Eng Appl Artif Intell 2016;52:194-202. based deep learning framework for continuous vision
21. Rahman MZ, Shaik RA, Reddy DR. Efficient and sim- applications. Proceedings of the 15th Annual Interna-
plified adaptive noise cancelers for ECG sensor based tional Conference on Mobile Systems, Applications, and
remote health monitoring. IEEE Sens J 2012;12(3):566- Services; 2007 Jun 19-23; Niagara Falls, NY. p. 82-95

Vol. 25 • No. 3 • July 2019 www.e-hir.org 211

You might also like