Computers in Biology and Medicine 151 (2022) 106297

Computers in Biology and Medicine

Using a machine learning-based risk prediction model to analyze the

coronary artery calcification score and predict coronary heart disease and
risk assessment
Yue Huang a, 1, YingBo Ren a, 1, Hai Yang a, YiJie Ding b, Yan Liu a, YunChun Yang a,
AnQiong Mao a, Tan Yang d, YingZi Wang c, Feng Xiao c, QiZhou He e, **, Ying Zhang a, *
Department of Anesthesiology, Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, 646000, Sichuan, China
Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 324000, Quzhou, Zhejiang, China
Southwest Medical University, Luzhou, 646099, Sichuan, China
Department of Cardiac and Vascular Surgery, Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, 646000, Sichuan, China
Department of Radiology,Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, 646000, Sichuan, China


Keywords: Objectives: To calculate the coronary artery calcification score (CACS) obtained from coronary artery computed
Coronary artery calcification(CAC) tomography angiography (CCTA) examination and combine it with the influencing factors of coronary artery
Machine learning(ML) calcification (CAC), which is then analyzed by machine learning (ML) to predict the probability of coronary heart
Coronary artery calcification score (CACS)
Coronary atherosclerotic heart disease(CHD)
Coronary artery computed tomography
Methods: All patients who were admitted to the Affiliated Hospital of Traditional Chinese Medicine of Southwest
angiography (CCTA) Medical University from January 2019 to March 2022, suspected of CHD, and underwent CCTA inspection were
retrospectively selected. The degree of CAC was quantified based on the Agatston score. To compare the cor­
relation between the CACS and clinical-related factors, we collected 31 variables, including hypertension, dia­
betes, smoking, hyperlipidemia, among others. ML models containing the random forest (RF), radial basis
function neural network (RBFNN),support vector machine (SVM),K-Nearest Neighbor algorithm (KNN) and
kernel ridge regression (KRR) were used to assess the risk of CHD based on CACS and clinical-related factors.
Results: Among the five ML models, RF achieves the best performance about accuracy (ACC) (78.96%), sensitivity
(SN) (93.86%), specificity(Spe) (51.13%), and Matthew’s correlation coefficient (MCC) (0.5192).It also has the
best area under the receiver operator characteristic curve (ROC) (0.8375), which is far superior to the other four
ML models.
Conclusion: Computer ML model analysis confirmed the importance of CACS in predicting the occurrence of CHD,
especially the outstanding RF model, making it another advancement of the ML model in the field of medical

1. Introduction safety and quality of life.

CAC is a significant marker of coronary atherosclerosis, and CACS is
Global statistics demonstrated that among the leading causes of also the most important predictor of CHD and all atherosclerotic car­
death, cardiovascular disease is dominant. Meanwhile, the incidence of diovascular diseases (ASCVD) [2,3]. The presence and amount of CAC
coronary heart disease (CHD) was 7.2% according to the global de­ detected by CCTA correlate with the severity of coronary atheroscle­
mographic disease statistics from 2015 to 2018, and the number of rosis, which is strongly associated with cardiovascular events. Therefore
deaths from CHD reached 360,900 in 2019 alone [1]. Risk assessment CAC is an important predictor of future cardiovascular events [4].
and early diagnosis of CHD are of great significance to improve the Therefore, CCTA has become a viable noninvasive alternative to

E-mail addresses: [email protected] (Q. He), [email protected] (Y. Zhang).
These authors contributed equally to this work.
Received 7 August 2022; Received in revised form 12 October 2022; Accepted 6 November 2022
Available online 15 November 2022
Y. Huang et al. Computers in Biology and Medicine 151 (2022) 106297

investigating coronary anatomy [5–7]. However, a wide range of factors random forests (RF) [17], artificial neural networks, k-nearest neighbor
affected CAC, some of which also can directly affect CHD occurrences, algorithms (KNN), Kernel Ridge Regression(KRR), support vector ma­
such as hypertension, diabetes, and dyslipidemia [8–10]. These may be chines (SVM) [18], and Bayesian learning. These methods are generally
one of the reasons for the prevalence of CAC and the high incidence of applied to the field of data mining. In this study, RF, KNN, SVM, KRR,
CHD. and radial basis function neural networks (RBFNN) [19] are selected as
In recent years, the rapid development of artificial intelligence(AI) prediction models.
has penetrated into various fields, especially in the field of medical The RF [17] is a classifier that contains multiple decision trees, and
applications [11,12]. machine learning (ML) prediction models have its output classes are determined by the mode of class outputs in the
shown the same or better performance as human beings in cardiology individual trees. First, sampling with replacement is taken from the
diagnosis, decision-making, risk prediction, and other medical tasks original dataset, and a sub-dataset is constructed. The data volume of the
[13]. The use of medical AI has shifted from research to daily clinical sub-dataset is the same as that of the original dataset. Elements in
treatment. AI is widely used in cardiac imaging as the number of CCTA different and similar sub-datasets can be repeated. Second, using the
exams increases [14]. By using imaging reports and clinical parameters, sub-dataset to build a sub-decision tree, these data were placed into each
ML algorithms can obtain better prediction results. This is beyond the sub-decision tree, and each sub-decision tree outputs a result. Finally, if
traditional risk scoring [15]. It has also emerged in the field of prediction there is new data and the classification result should be obtained
and risk assessment in medical applications; however, only a few studies through the RF, the output result can be obtained by voting on the
used ML with clinical factors and CACS to evaluate CHD. We hypothe­ judgment result of the sub-decision tree. The topological structure of RF
sized that ML could be a better predictor of CHD than the most advanced is shown in Fig. 1.
risk prediction methods using the CACS detected by CCTA combined KNN is a classic computer AI algorithm. The essential idea of this
with other clinical variables that influence CAC. technique is: if the k is the most similar specimen belonging to a certain
class in the feature space (mean and k-nearest neighbors) of the spec­
2. Methods imen to be classified, then the specimen also appertains the class.
SVM [18] is a class of generalized linear classifiers that perform bi­
Study population: We retrospectively selected all patients admitted nary classification on data in a supervised learning manner, and its de­
to the Affiliated Hospital of Traditional Chinese Medicine of Southwest cision boundary is the maximum margin hyperplane (Fig. 2.) that
Medical University from January 2019 to March 2022 suspected of CHD resolves the learning specimens. SVM utilizes the hinge loss function to
diagnosis and had undergone CCTA scanning. Simultaneously, the estimate the empirical venture and adds a regularization term to the
following criteria must be met: age >30 years and hospitalized. How­ solution system for structural risk optimization. It is a categorizer with
ever, those who meet the following characteristics were excluded: (1) on sparsity and robustness. SVM can perform nonlinear classification
out-of-hospital treatment; (2) who had previously undergone coronary through kernel methods and is one of the common kernel learning
angiographic stent implantation or coronary artery bypass grafting or methods.
other cardiac revascularization; (3) incomplete or biased Agatston score Radial basis function (RBF) [19] neural network should use RBF as
in CCTA; (4) with incomplete medical records; (5) who were hyper­ the hidden unit before forming the hidden layer space. The hidden layer
susceptible to iodine contrast media; (6) with difficulty undergoing the transforms the input vector and maps the low-dimensional space to the
scanning scheme; and (7) with a history of renal impairment and acute high-order space (Fig. 3); where x is the input, c is the high dimension
myocardial infarction in hemodynamic instability decompensated heart represent, w is the learnable parameters, and y is the final output.This
failure. figure shows the specific algorithm used for classification in this study.
All participants had been informed and the study was approved by thus, the nonlinear issue can be figured out. As the RBF neural network
the Ethics Review Committee of the Affiliated Hospital of Traditional has mighty nonlinear fitting competence, it has simple learning rules but
Chinese Medicine of Southwest Medical University (No.:BY2022009). can map any intricate nonlinear relationship.
Clinical demographics: Basic personal information, CCTA examina­ Since the data may be nonlinear, the effects of simple linear
tion reports and Laboratory test results were recorded during the first
examination post-admission.Patient record information from CISZYYSZ
5.5 2003 clinical database. Hypertension, diabetes, fatty liver, osteo­
porosis, and hyperlipidemia were defined by doctors based on the pa­
tient’s report or scan or previous medical treatment history. The CHD
diagnosis was made by doctors with >10 years of professional experi­
ence, who made a clear diagnosis by combining symptoms, signs, and
examination results. Hypertension was classified as follows: healthy,
<140/90 mmHg; level 1, 140–159/90–99 mmHg; level 2, 160–180/
100–110 mmHg; and level 3, 180/110 mmHg. The smoking history was
classified as never, quit, or current. Scoring criteria for the course of
hypertension or diabetes: none, 0; <1 year, 1 point; 1–4 years, 2 points;
5–9 years, 3 points; 10–14 years, 4 points; 15–19 years, 5 points; and 20
years, 6 points.
CCTA introduction: Calcification of coronary artery branches and
extra-coronary calcium were detected in sections using Siemens Dazzle
dual source CT-SOMATOM Definition Flash. The Agatston score was
used for CAC, which was calculated by the density score of the examined
calcification site multiplied by the area. First, CT values of the lesions
were assigned as follows: 130–199, 1 point; 200–299, 2 points; 300–399,
3 points; and >400, 4 points. Then, CT value multiplied by calcification
area. Finally, the scores of coronary branches were added together [16].
CT threshold of 130 HU was adopted in this study, and the equal-quality
correction factor was 0.736.
AI intelligent model: ML methods mainly include decision trees, Fig. 1. The topological structure of the random forest.

Y. Huang et al. Computers in Biology and Medicine 151 (2022) 106297

(NLR), C-reactive protein (CRP), Monocyte-lymphocyte ratio (MLR);

Imaging examinations [49] include: aortic calcification, aortic valve
calcification, and mitral valve calcification. First, we analyze the cor­
relation of these influencing factors with CAC and rank their correlation.
Then, ML models including RF, KNN, SVM, KRR and RBFNN assess the
risk of CHD, using the CACS and those clinical-related factors.

3. Results

We use the IBM SPSS Statistics 25 to analyze the relationship be­

tween CACS and variables. The mean ± standard deviation is used to
represent continuous variables, and frequency to represent categorical
3.1.1 A comparison of patients without and with CHD have higher
Fig. 2. The SVM schematic. CACS (P = 0.000) in Table 1.
3.1.2 The correlation analysis between CACS and clinical influencing
factor variables is listed in Table 1: Age, systolic blood pressure, hy­
pertension grade, aortic calcification, diabetes mellitus, osteoporosis,
ALP, CRP, aortic valve calcification, calcium-phosphorus product, NLR,
MLR and mitral valve calcification are positively correlated with CACS
(P < 0.05). Gender, random blood glucose, hyperlipidemia, diabetes
course, TG, and LP (a) are positively correlated with CACS, but the
differences are not statistically significant. The variables negatively
correlated with CACS are TC, LDL-C, and APO-A (P < 0.05), whereas

