Hir 25 201
Hir 25 201
Hir 25 201
Objectives: Biosignal data captured by patient monitoring systems could provide key evidence for detecting or predicting
critical clinical events; however, noise in these data hinders their use. Because deep learning algorithms can extract features
without human annotation, this study hypothesized that they could be used to screen unacceptable electrocardiograms (ECGs)
that include noise. To test that, a deep learning-based model for unacceptable ECG screening was developed, and its screen-
ing results were compared with the interpretations of a medical expert. Methods: To develop and apply the screening model,
we used a biosignal database comprising 165,142,920 ECG II (10-second lead II electrocardiogram) data gathered between
August 31, 2016 and September 30, 2018 from a trauma intensive-care unit. Then, 2,700 and 300 ECGs (ratio of 9:1) were
reviewed by a medical expert and used for 9-fold cross-validation (training and validation) and test datasets. A convolutional
neural network-based model for unacceptable ECG screening was developed based on the training and validation datasets.
The model exhibiting the lowest cross-validation loss was subsequently selected as the final model. Its performance was
evaluated through comparison with a test dataset. Results: When the screening results of the proposed model were compared
to the test dataset, the area under the receiver operating characteristic curve and the F1-score of the model were 0.93 and 0.80
(sensitivity = 0.88, specificity = 0.89, positive predictive value = 0.74, and negative predictive value = 0.96). Conclusions: The
deep learning-based model developed in this study is capable of detecting and screening unacceptable ECGs efficiently.
Keywords: Electrocardiography, Noise, Deep Learning, Signal Detection Analysis, Physiologic Monitoring
for measuring and diagnosing abnormal heart rhythms [2]— all kinds of waveforms captured in the ICU were included)
an especially useful trait when applied to the measurement from over 8,000 patients with an observational period of ap-
damaged conduction tissues that transmit electrical signals proximately 517 patient-years as of October 2018.
[3]. ECGs can be used to detect damage to specific por- A different approach to noise was thus believed to be re-
tions of the myocardium during myocardial infarctions [4]. quired. First, because there were sufficient data, researchers
Furthermore, digitally gathered and stored ECG data can be did not need to use de-noised data, which might still contain
used for automatic ECG signal analysis [5]. incorrect information (some real-world signals captured
ECG signals, however, are frequently interrupted by vari- noise exclusively without any ECG information, as shown
ous types of noise and artifacts (Figure 1A–1C). Previous in Figure 1C). Second, a deep learning algorithm is needed.
works have classified these into common artifact types [6–8], Deep learning algorithms possess multiple advantages. They
such as baseline wandering (BA), muscle artifacts (MA), and do not require feature extraction processes performed by
powerline interference (PLI). Subject movements or respira- domain experts; the abovementioned biosignal repositories
tory activities cause BA, which manifests as slowly wander- collected ECG alongside many other biosignal data types
ing baselines primarily related to random body movements. (such as respiration, ABP, PPG, and central venous pres-
ECGs with MAs are contaminated with muscular contrac- sure [CVP]), and because each type of waveform possesses
tion artifacts. PLIs, caused by electrical power leakage or im- unique characteristics, it requires a customized algorithm.
proper equipment grounding, are indicated by varying ECG Because a feature extraction process is unnecessary, it is
amplitudes and indistinct isoelectric baselines. Because such
noise or artifacts may lead to disturbances of further auto-
A
matic ECG signal analysis, their detection and elimination is
of great importance, as this could prevent ECG noise-related
misclassifications or misdiagnosis.
Previous studies have attempted to de-noise ECG signals
using a wide range of approaches, including wavelet trans- B
formation [9–11], weighted averages [12,13], adaptive filter-
ing [14], independent component analysis [15], and empiri-
cal mode decomposition (EMD) [16–19]. However, existing
methods have several noise-removal limitations [20]. For
example, EMD-based approaches may filter out p- and t- C
waves. Adaptive filters, proposed by Rahman et al. [21], can
apply filters such as the signed regression algorithm and nor-
malized least-mean square, but they encounter difficulties
obtaining noise-signal references from a typical ECG signal
acquisition. D
Recent ECG-related research has required a substantially
different approach to noise because the scale of data collec-
tion has become very large. The Electrocardiogram Vigilance
with Electronic data Warehouse II (ECG-ViEW II) released
979,273 ECG results from 461,178 patients, with plans to
add all 12-lead data in the next version [22,23]. More recent E
attempts have, additionally, been made to acquire ECG mea-
surements from patient monitoring equipment in intensive
care units (ICUs) [24,25]. The MIMIC III (Medical Infor-
mation Mart for Intensive Care) dataset contains 22,317
Figure 1. Acceptable and non-acceptable electrocardiogram (ECG)
waveform records (ECG, arterial blood pressure [ABP]
examples. Non-acceptable ECGs have global (A) or local
waveforms, fingertip photoplethysmogram [PPG] signals, (B) noise, or signals from ECG equipment rather than
and respiration). The biosignal repository used in this study patients (C). (D) and (E) are examples of acceptable
collected 23,187,206 waveform records (like MIMIC III, ECGs.
easier to develop a deep learning-based algorithm to screen University Hospital Institutional Review Board (No. AJIRB-
unacceptable signals than to develop an algorithm for each MED-MDB-16-155). Only de-identified data were used and
signal. analyzed, retrospectively.
In this context, a new unacceptable ECG (ECG with noise)
detection and screening deep learning-based model for fur- 1. Data Source
ther automatic ECG signal analysis was developed in this The data used for this study were obtained from a biosignal
study. In the development process, we minimized the manu- repository constructed by us through our previous research
al review effort of the medical expert by pre-screening ECG [25] (Figure 2). From September 1, 2016 to September 30,
data using non-experts. 2018, the biosignal data collected from the trauma ICU of
Ajou University Hospital comprised a total of 2,767,845 PPG,
II. Methods measuring peripheral oxygen saturation (SpO2); 2,752,382
ECG lead II; 2,597,692 respiration data; 1,864,864 ABP; and
Informed consent was waived in this study by the Ajou 754,240 CVP (each indicating 10-minute data files). Because
Manual review
by non-experts
15,400 ECGs
(acceptable 13,485,
unacceptable 1,915)
Random sampling
with 1:1 ratio
Manual review
by clinical experts
2,700 ECGs
(acceptable 1,893, Figure 2. S tudy process flow. From
unacceptable 807) 15,400 electrocardiograms
(ECGs) reviewed by non-
experts, 2,400 and 300
9-fold cross-validation
the data were collected from the trauma ICU, most patients utes) session was conducted for each non-expert evaluator.
were admitted because of physical trauma rather than any Labeling results were manually reviewed and corrected by a
disease the patients had. Approximately 52.1% of patients in medical expert before they were used for model training (see
the trauma ICU experienced surgery to repair physical dam- Section II-4 and Figure 2).
age to their bodies. The average ± standard deviation of the
age of the patients was 50.7 ± 20.5. Males constituted 73.0% 4. Training, Validation, and Test Datasets
of the patients. The datasets for model development training and validation
were initially reviewed by two non-experts. If one evaluator
2. Unacceptable ECG (ECG with Noise) Definition classified an ECG as unacceptable, the ECG was considered
The ECG waveforms were classified into two types: accept- unacceptable even if the other disagreed. From 15,400 la-
able or unacceptable. An acceptable waveform was defined beled ECGs, 13,485 (87.6%) were classified as acceptable by
as a waveform in the normal category that is able to be used the non-experts. Among these pre-screened ECGs, 2,700
for further analysis. Unacceptable waveforms included the were randomly sampled ECGs with a 50:50 balance ratio
following subtypes: (1) BA, waveforms with variations in the (acceptable:unacceptable) to adjust the imbalance between
measured voltage baseline; (2) MA, partial noise caused by them in the real-world dataset. The sampled ECGs were
patient movements or body instability; (3) PLI, noise gener- confirmed via manual review by a medical expert. These
ated across the entire waveform owing to close contact be- ECGs were randomly divided into a training dataset (2,400
tween the measurement sensor and voltage interference; (4) ECGs) and a validation dataset (300 ECGs) for 9-fold cross-
unacceptable (other reasons), waveforms not categorized as validation (Figure 2). The test dataset (300 ECGs) was ran-
normal waveforms because of other causes; and (5) unclear, domly sampled from ECGs gathered from various periods
waveforms in which the preceding type-judgements were and data from patients whose data were included in the
inappropriate. All waveforms that were not acceptable were training or validation dataset were excluded (Figure 3). The
classified as unacceptable (ECG with noise). data were also evaluated as acceptable or unacceptable by a
medical expert using the tool described above. Finally, the
3. Labeling Tool Development dataset for 9-fold cross-validation (training and validation)
A web-based tool was developed to label the two types of and test datasets were generated in a ratio of 9:1.
ECG waveforms defined above. This facilitated rapid evalua-
tion and efficient management in labeling the results of each 5. Deep Learning Model Development
ECG signal (Figure 3). The tool displayed a 10-second ECG 1) Model development
result, allowing the evaluator to select one of the two types: Waveform data representing 10-second ECGs were used as
acceptable or unacceptable. Using these tools, randomly input because they were labeled for this timespan. For the
selected ECGs were reviewed by two non-experts and a original ECGs, there were 2,500 data points because they
medical expert. A short pre-training (approximately 10 min- were measured at 250 Hz. Sampling was performed at a 50%
1,000
800
600
400
200
ratio to reduce input size, and the input data was ultimately tection, there are no references on an optimal architecture.
defined as a 1 × 1250 vector. In addition, the frequency do- Therefore, we tested four different internal architectures,
main information of the ECG signals was also taken into shown in Figure 4 and Table 1, moving from simple to more
consideration. The ECG signal energy in each frequency complex networks. In model architecture #1, which is the
band was extracted using fast Fourier transform (FFT), and simplest approach, the network consists of a single convo-
the structure of the transformed data was the same as that of lutional layer with 64 feature maps for the time domain and
a 1 × 1250 vector (real part of the double-sided FFT results). frequency domain data. Furthermore, a single fully con-
The input data were normalized via min-max normalization. nected layer with 64 neurons is used to combine the time
Because convolutional neural network (CNN) models have domain and frequency domain information. The number of
not previously been used for unacceptable ECG signal de- convolutional layers, feature maps, fully connected layers,
A B
1 1 1 1
1250 1250 1250 1250
64 64 64 64
625 625 625 625
64 128 128
313 313
Acceptable or not
128
64
Acceptable or not
C D
1 1 1 1
1250 1250 1250 1250
64 64 64 64
625 625 625 625
256
512 512
128
Figure 4. A rchitecture of the four
79 79
64 convolutional neural net-
work models: (A) model 1,
Acceptable or not
512 (B) model 2, (C) model 3,
and (D) model 4. After time
256
domain and frequency do-
128 main data were abstracted
by dependent convolutional
Convolutional & max-pooling layer
64 layers, they were combined
Fully connected layer in the ensuing fully con-
Acceptable or not nected layers.
and neurons gradually increase in model architectures #2, in cross-validation was selected as the final model, and its
#3, and #4. The ReLU activation function was used in all the performance was evaluated.
fully connected layers, and a final layer that outputs the bi-
nary classification result (i.e., acceptable or not) by using the 6. Performance Evaluation
softmax activation was also present. A threshold of 0.5 was Model performance was evaluated via comparison with
used for the probabilities from the softmax layer to classify the test dataset of 300 ECG signals labeled by a medical ex-
unacceptable (= 1) and acceptable (= 0). To define the opti- pert. The performance evaluation considered unacceptable
mum size of the kernel in the convolutional layers, we also screening as a positive value and calculated the sensitivity,
tested three different sizes (16, 48, and 96) in all the models. specificity, positive and negative predictive values (PPV and
NPV), F1-score, and the area under the receiver operating
2) Model optimization characteristic (AUROC) curve. Sensitivity (true positive
A cross-entropy loss function was selected as the cost func- rate) refers to the ability of the model to correctly detect
tion. The adaptive moment estimation (Adam) optimizer unacceptable ECGs not meeting the gold standard. Specific-
(learning rate = 0.0001, decay = 0.8, minimum learning rate ity (true negative rate) indicates the ability of the model to
= 0.000001) was used to train the model. We repeated the correctly reject acceptable ECGs identified as accurate. PPV
training process up to 100 epochs with a batch size of 200 and NPV represent the proportion of true to modeled posi-
(i.e., up to 600 iterations). During the iterations, the results tive and negative results, respectively. The F1-score is the
that showed the lowest loss in the validation set was selected harmonic average of the sensitivity and PPV.
for the finally trained model.
To evaluate the robustness of the results, we conducted 7. Cutoff for Classifying Unacceptable ECGs
9-fold cross-validation with variation of the training and To set an appropriate cutoff to screen out as many unaccept-
validation datasets. The model that exhibited the lowest loss able ECGs as possible, even at the expense of some accept-
able ECGs, we conducted sensitivity analysis on the cutoff were incorrectly evaluated as unacceptable (Table 3). The
and determined the optimal cutoff for our study. signals defined as unacceptable by the algorithm were 74%
unacceptable and 26% misclassified acceptable waveforms.
8. Software Tools and Computational Environment In contrast, only 9 of 76 unacceptable ECGs in the test data-
MS-SQL 2017 was used for data management, and Python set were misclassified as acceptable, and 96% of the signals
(version 3.6.1) and the TensorFlow library (version 1.2.1) evaluated as acceptable by the model were acceptable.
were used to develop the data preprocessing and deep learn- The time required to evaluate the 300 gold-standard data-
ing models. The machine used for model development had set signals was 0.48 seconds, taking an average of 0.58 ms for
one Intel Xeon CPU E5-1650 v4 (3.6 GHz), 128 GB RAM, each ECG.
and one GeForce GTX 1080 graphics card.
III. Results
Among the model architectures, model architecture #3 ex- 1.0
hibited the best average cross-validation accuracy and loss
(Figure 5 and Supplementary Table S1). With increasing 0.8
model architecture complexity, accuracy increased, and loss
A B
Validation accuracy Validation loss
1.000 0.30
0.975
0.25
0.950
0.20
0.925
Acurracy
Loss
0.900 0.15
0.875
0.10
0.850 Kernel size = 16 Kernel size = 16
Kernel size = 48 0.05 Kernel size = 48
0.825 Kernel size = 96 Kernel size = 96
0.800 0.00
Model 1 Model 2 Model 3 Model 4 Model 1 Model 2 Model 3 Model 4
Figure 5. Classification accuracy (A) and loss (B) of the four convolutional neural network models with various kernel sizes. The more
complex model architecture with smaller kernel size exhibited better classification accuracy and less loss on the validation
dataset.
Cutoff values
0.005 0.05 0.5 0.95 0.995
Sensitivity 0.91 0.88 0.76 0.70 0.66
Specificity 0.76 0.89 0.93 0.96 0.97
PPV 0.57 0.74 0.79 0.85 0.89
NPV 0.96 0.96 0.92 0.90 0.89
F1-score 0.70 0.80 0.78 0.77 0.76
The performance when the finally selected cutoff value was used is shown in bold.
PPV: positive predictive value, NPV: negative predictive value.
Table 3. Confusion matrix when cutoff value of 0.05 was applied generated abnormal signal, as shown in Figure 1C. These
to the finally selected convolutional neural network model data points cannot be de-noised, as they are exclusively noise
without any true ECG signal information.
Classification of our model
Our approach, which minimizes intervention of domain
Unacceptable Acceptable
experts by conducting non-experts’ pre-screening, would be
Test dataset Unacceptable 67 9
also viable for application to other biosignals. Because the
Acceptable 24 200 enrollment of medical expertise is also accompanied by high
costs, it was necessary to test whether accurate noise evalua-
IV. Discussion tions could be developed while reducing the effort required
of medical expertise. Conventional de-noising or quality
This study developed a deep learning model for screening assessment methods have been designed considering the
unacceptable ECG waveforms. The developed model was features of specific waveforms; however, they require much
able to identify most (88%) of the unacceptable ECGs de- effort from domain experts, which could be a barrier to the
tected by a clinical expert. Because the time required to ana- development of a model that can be applied to a wide variety
lyze a single 10-second waveform is only approximately 0.58 of biosignals.
ms, this model can be used in real-time and for large volume Our results in CNN model optimization suggest that deep-
ECG analysis. er but smaller convolutional filter (kernel) sizes provide bet-
Previous studies have primarily attempted to de-noise ECG ter performance. This finding was also observed in another
signals [26], for example, using principal component analy- domain, image recognition. The VGG-16 model, which won
sis (PCA) to abstract original signal data into a few eigenvec- first place in the ImageNet Challenge 2014, improved the
tors with low noise-levels. A discrete wavelet transformation performance by increasing network depth with very small
(DWT) concentrated on true signals, which possessed larger convolutional filters [27].
coefficients than noise data; however, threshold definition In analysis of ECG signal, we chose 1D CNNs rather than
was critical. Other methods, such as wavelet Wiener filtering the 2D CNN with the spectrogram of models because we
and pilot estimation, have also been used. assumed that noise or unacceptable ECG signal is indepen-
The background to the problem addressed in this study is dent from time. For the same reason, we did not use recur-
different from that of existing attempts. As described earlier, rent neural network (RNN)-based models (long short-term
there are now sufficient data to establish the groundwork for memory models, gated recurrent unit, etc.) and only focused
a future algorithm to achieve exclusively accurate data, with- on the morphological characteristics of signal.
out input noise. This is because when noisy input is entered In addition, to evaluating the advantages of using both
into a learning model, its performance declines. This study time and frequency domain input rather than time domain
therefore aimed to leave only clean data as much as possible, data only or frequency domain data only, we conducted ad
even at the expense of some acceptable data. Additionally, a hoc analysis by using time domain data only or frequency
significant portion of noise in real-world ECG data is gener- domain data only in model architecture #3, which showed
ated not by alterations from specific effects, but data ren- the best performance in our study. Based on the results, we
dered unacceptable because they were of a type of system- confirmed that accuracy and loss were better in 9-fold cross-
validation when both data were used together (in mean ± situations.
standard deviation; accuracy = 0.95 ± 0.02, loss = 0.12 ± 0.03) The proposed model may be used in two ways: planted in
than when only time domain data (accuracy = 0.93 ± 0.02, monitoring devices, or to process centrally collected data.
loss = 0.22 ± 0.06) or only frequency domain data was used As this model runs very quickly on an Intel Xeon CPU E5-
(accuracy = 0.94 ± 0.02, loss = 0.19 ± 0.05). 1650 v4 (3.6 GHz) with 128 GB RAM and one GeForce GTX
Some specific waveforms due to pathology status may have 1080 graphic card computing environment, time constraints
been screened as unacceptable. In some cases, signal modi- would remain insignificant in the latter application because
fication caused by cardiovascular system diseases might central systems would possess sufficient computing power.
be quite similar to those affected by noise or artifacts (for In the near future, monitoring devices themselves will ana-
example, atrial fibrillation). Therefore, we conducted ad hoc lyze signals and provide warnings of danger. Because predic-
analysis by applying the proposed model to the results of tion models are highly complex, the proposed unacceptable
portable ECGs that include interpretation by the ECG ma- ECG detection model needed to be as computationally
chine [23]. When our model was applied to the randomly simple as possible. Various approaches attempted previously
selected 10,000 ECG lead II data, 3,337 ECGs were classified in this regard [28] require further research.
as unacceptable. Further, we could observe the tendency that This study also encountered the following limitations. First,
waveforms of certain types of arrhythmia are likely to be the test dataset used as the gold standard was generated by
classified as unacceptable (Table 4). However, there is a pos- only one expert. As previously mentioned, this study did not
sibility that situations of measuring ECGs for arrhythmic pa- aim to diagnose diseases requiring a high level of domain
tients or the statuses of arrhythmic patients were less stable knowledge; thus, the gap between different experts should
and led to more unacceptable ECGs. Therefore, this model not be significant. However, there was no supplemental eval-
would be proper for filtering noise data for the preparation uation to correct for potential mistakes. Second, the specific
of a noise-free training dataset. If this model is used to filter category of noise was not evaluated. Noise could be classified
noise signal before the application of certain arrhythmia de- into five categories: BA, MA, PLI, unacceptable, and unclear.
tection models, sufficient input data must be prepared con- However, data collection in an actual clinical environment
sidering that the filtering rate could be high in pathologic resulted in a majority of normal waveform points and a small
Table 4. Distribution of ECG types in ECGs classified as acceptable or unacceptable by our model
volume of noise data. Therefore, the data were insufficient org/10.4258/hir.2019.25.3.201. Table S1. Performance of the
for the deep learning model to learn to classify all defined model in cross-validation.
categories. Furthermore, the noise categories were integrated
without distinction because distinguishing between noise References
causes in actual applications was inconsequential (as this
study aimed to eliminate all unacceptable signals, regardless 1. Kumar A. ECG-simplified. [place unknown]: LifeHug-
of their cause). All waveforms evaluated as non-acceptable ger; 2010.
were thus also integrated as unacceptable ECGs. Finally, the 2. Kaplan NM. Systemic hypertension therapy. In:
PPV was not quite as high as 0.74, meaning that many of the Braunwald E, editor. Braunwald’s heart disease: a text-
unacceptable ECGs identified by the proposed algorithms book of cardiovascular medicine. Philadelphia (PA):
were truly acceptable ECGs. However, as mentioned above, Saunders; 1997.
this study aimed to increase sensitivity, and some normal 3. Van Mieghem C, Sabbe M, Knockaert D. The clini-
waveform loss was acceptable because these data were suf- cal value of the ECG in noncardiac conditions. Chest
ficiently represented. Instead, the proposed algorithm was 2004;125(4):1561-76.
able to ensure high sensitivity and successfully screened 88% 4. American Health Association. Part 8: Stabilization of
of unacceptable ECGs. the patient with acute coronary syndromes. Circulation
In conclusion, this study developed a model capable of ef- 2005;112(24_Suppl):IV.89-IV.110.
ficiently detecting unacceptable ECGs. The developed unac- 5. Li H, Yuan D, Wang Y, Cui D, Cao L. Arrhythmia
ceptable ECG detection model is expected to provide a first classification based on multi-domain feature extrac-
step for future automated large-scale ECG analyses. tion for an ECG recognition system. Sensors (Basel)
2016;16(10):E1744.
Conflict of Interest 6. Rodrigues J, Belo D, Gamboa H. Noise detection on
ECG based on agglomerative clustering of morphologi-
No potential conflict of interest relevant to this article was cal features. Comput Biol Med 2017;87:322-34.
reported. 7. Sivaraks H, Ratanamahatana CA. Robust and accu-
rate anomaly detection in ECG artifacts using time
Acknowledgments series motif discovery. Comput Math Methods Med
2015;2015:453214.
This research was supported by grants from the Korea 8. Satija U, Ramkumar B, Manikandan MS. Automated
Health Technology R&D Project through the Korea Health ECG noise detection and classification system for unsu-
Industry Development Institute (KHIDI) funded by the pervised healthcare monitoring. IEEE J Biomed Health
Ministry of Health & Welfare, Republic of Korea (No. Inform 2018;22(3):722-32.
HG18C0067, Government-wide R&D Fund project for in- 9. Ercelebi E. Electrocardiogram signals de-noising using
fectious disease research). This work was also supported by lifting-based discrete wavelet transform. Comput Biol
the faculty research fund of Ajou University School of Medi- Med 2004;34(6):479-93.
cine. 10. Ho CY, Ling BW, Wong TP, Chan AY, Tam PK. Fuzzy
multiwavelet denoising on ECG signal. Electron Lett
ORCID 2003;39(16):1163-4.
11. Tikkanen PE. Nonlinear wavelet and wavelet packet
Dukyong Yoon (http://orcid.org/0000-0003-1635-8376) denoising of electrocardiogram signal. Biol Cybern
Hong Seok Lim (http://orcid.org/0000-0002-3127-2071) 1999;80(4):259-67.
Kyoungwon Jung (http://orcid.org/0000-0001-7895-0362) 12. Iravanian S, Tung L. A novel algorithm for cardiac bio-
Tae Young Kim (http://orcid.org/0000-0002-2591-0129) signal filtering based on filtered residue method. IEEE
Sukhoon Lee (http://orcid.org/0000-0002-3390-5602) Trans Biomed Eng 2002;49(11):1310-7.
13. Leski JM. Robust weighted averaging. IEEE Trans
Supplementary Materials Biomed Eng 2002;49(8):796-804.
14. Almenar V, Albiol A. A new adaptive scheme for ECG
Supplementary materials can be found via http://doi. enhancement. Signal Process 1999;75(3):253-63.