sustainability-15-02754-v2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

sustainability

Article
A Comparative Analysis of Machine Learning Models: A Case
Study in Predicting Chronic Kidney Disease
Hasnain Iftikhar 1,2 , Murad Khan 3 , Zardad Khan 3 , Faridoon Khan 4 , Huda M Alshanbari 5, *
and Zubair Ahmad 2

1 Department of Mathematics, City University of Science and Information Technology Peshawar,


Peshawar 25000, Pakistan
2 Department of Statistics, Quaid-i-Azam University, Islamabad 44000, Pakistan
3 Department of Statistics, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
4 Pakistan Institute of Development Economics, Islamabad 44000, Pakistan
5 Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University,
P.O. Box 84428, Riyadh 11671, Saudi Arabia
* Correspondence: [email protected]

Abstract: In the modern world, chronic kidney disease is one of the most severe diseases that nega-
tively affects human life. It is becoming a growing problem in both developed and underdeveloped
countries. An accurate and timely diagnosis of chronic kidney disease is vital in preventing and
treating kidney failure. The diagnosis of chronic kidney disease through history has been considered
unreliable in many respects. To classify healthy people and people with chronic kidney disease,
non-invasive methods like machine learning models are reliable and efficient. In our current work, we
predict chronic kidney disease using different machine learning models, including logistic, probit, ran-
dom forest, decision tree, k-nearest neighbor, and support vector machine with four kernel functions
(linear, Laplacian, Bessel, and radial basis kernels). The dataset is a record taken as a case–control
study containing chronic kidney disease patients from district Buner, Khyber Pakhtunkhwa, Pakistan.
To compare the models in terms of classification and accuracy, we calculated different performance
measures, including accuracy, Brier score, sensitivity, Youdent, specificity, and F1 score. The Diebold
and Mariano test of comparable prediction accuracy was also conducted to determine whether there
Citation: Iftikhar, H.; Khan, M.;
is a substantial difference in the accuracy measures of different predictive models. As confirmed
Khan, Z.; Khan, F.; Alshanbari, H.M.;
Ahmad, Z. A Comparative Analysis
by the results, the support vector machine with the Laplace kernel function outperforms all other
of Machine Learning Models: A Case models, while the random forest is competitive.
Study in Predicting Chronic Kidney
Disease. Sustainability 2023, 15, 2754. Keywords: chronic kidney disease; machine learning models; comparative analysis; predictions
https://doi.org/10.3390/su15032754

Academic Editors: Amandeep Kaur


and Kamal Gulati
1. Introduction
Received: 5 January 2023 Chronic kidney disease (CKD) is a recognized priority of public health worldwide [1].
Revised: 24 January 2023 According to the 2010 Worldwide Burden Disease Study that listed the causes of mortality
Accepted: 29 January 2023
worldwide from 1990 to 2010, over two decades, CKD moved up on the list from the 27th
Published: 2 February 2023
to the 18th position [2]. Over the years, the epidemic of CKD has caused an 82% increase in
years of life lost associated with kidney disease; a disease toll of the same magnitude as
diabetes [3]. Premature death is part of the equation because most of the survivors of CKD
Copyright: © 2023 by the authors.
move towards end-stage renal failure, a disability-related illness that reduces life quality
Licensee MDPI, Basel, Switzerland.
and causes extensive social and financial losses. Although the annual patient rate of the
This article is an open access article incidence for renal replacement therapy (RRT) ranges dramatically across nations, varying
distributed under the terms and from around 150 to 400 per million inhabitants (PMI) in the Western world to 50 PMIs or
conditions of the Creative Commons much less in developing countries, healthcare facilities are limited. Because early detection
Attribution (CC BY) license (https:// will help prevent the worsening of CKD and boost survivability, worldwide screening
creativecommons.org/licenses/by/ systems are being upgraded. Because of this, accurate epidemic-related knowledge of CKD
4.0/). at the national and international levels is very important in involving key stakeholders,

Sustainability 2023, 15, 2754. https://doi.org/10.3390/su15032754 https://www.mdpi.com/journal/sustainability


Sustainability 2023, 15, 2754 2 of 13

such as patients, general practitioners, nephrologists, and financing organizations, who


enable the healthcare system to plan and enforce effective preventive policies [4–6].
Nowadays, CKD is becoming an increasingly serious problem in both developed
and developing countries. In developing countries, due to growing urbanization, people
are adopting an unhealthy lifestyle, which causes diabetes and high blood pressure. The
increasing urbanization in developing countries leads to chronic kidney disease, while
15–25% of people with diabetes die of kidney disease. For example, in Pakistan, CKD
spreads quickly due to the ingestion of harmful and low-quality foods, self-medication,
extreme use of drugs, polluted water, obesity, high blood pressure, hypertension, anemia,
diabetes, and kidney stones [7]. However, in developed countries, such as the United
States of America, 26 million adults (one in nine) have CKD, and other diseases are at a
higher risk. USA researchers have developed an eight-point risk factor specification list
to predict chronic kidney disease. The specification list includes older age, female sex,
anemia, diabetes, hypertension, cardiovascular disease, and peripheral vascular disease [8].
According to recent studies, its prevalence ranges from 5–15% worldwide. Generally,
around 5–10 million individuals die per year due to CKD [9]. As CKD worsens with time,
detection and effective cures are simple curative ways of reducing the death rate.
Machine learning (ML) techniques play a prominent role in the medical sector for the
identification of various diseases due to their specific features [10,11]. The authors of [12]
used the support vector machine (SVM) approach to diagnose CKD. The authors used
different kinds of data reduction techniques, i.e., principal component analysis (PCA). Their
finding shows that the SVM model with Gaussian radial basis selection performed better
in terms of diagnostic precision and accuracy than the rival models. Some researchers
compared kernel functions based on random forest models. Few studies have compared
computational intelligence and SVM models, others used artificial neural networks (ANN)
and SVM algorithms to classify and predict four different types of kidney disease. Accord-
ing to the results, the ANN produced the most accurate results when compared to other
models. Additionally, many researchers compared the statistical and computational intelli-
gence models and the class-balanced order for dual classes of non-uniform distribution
by featuring the ranks [13,14]. The authors of [15] showed in their investigation ways of
detecting the appropriate diet plan for CKD patients, using many classification methods
such as multiple class decision trees, multiple class decision forests, multiple class logistic
regression, and multiple class ANN. The outcome shows that the multi-class decision forest
achieved the highest accuracy (99.17%) in evaluation compared to the rest of the models.
Additionally, supervised classifier techniques are easy to use and have been widely
applied in the diagnosis of various diseases in the past. The authors of [16] have used
k-nearest neighbor (KNN) and SVM classifiers for the CKD dataset. For model comparison
purposes, different types of performance measures like precision, recall, accuracy, and
F1 score are considered. Based on the measures, KNN outperforms predicting CKD
compared to SVM. The authors of [17] conducted a comparative analysis of ML and
classical regression models, including artificial neural networks, decision trees, and logistic
models, in terms of accuracy measures. After the comparison, it was shown that the
highest accuracy (93%) was produced by ANN. In recent research, ML models were mostly
used in the medical field. The authors of [18] utilized different kinds of ML methods, i.e.,
ANN, backpropagation, and SVM, for the classification and prediction of patients with
kidney stone disease. Backpropagation is the best algorithm among all commonly used ML
algorithms. On the other hand, some researchers compared naive Bayes (NB) models with
the SVM algorithm. For example, the researchers in [19,20] used the NB models and SVM
to identify different kinds of kidney disease and also check the accuracy of the models.
In the end, the results specify that SVM models are more effective in classification and
are considered the best classifiers. In this research, different ML models were used for
comparative investigation for the prediction of CKD.
In this research study, various ML models were used for predicting CKD, and the
main contributions are as follows:
Sustainability 2023, 15, 2754 3 of 13

1. For the first time, we included primary data from CKD patients in district Buner, Kyber
Pakhtunkhwa, Pakistan, to motivate developing countries to implement machine
learning algorithms to reliably and efficiently classify healthy people and people with
chronic kidney disease;
2. To assess the consistency of the considered ML models, three different scenarios of
training and testing set were adopted: (a) 90% training, 10% testing; (b) 75% training,
25% testing; and (c) 50% training, 50% testing. Additionally, within each validation
scenario, the simulation was ran one thousand times to test the models’ consistency;
3. The prominent machine learning models were used for the comparison of predicting
CKD, including logistic, probit, random forest, decision tree, k-nearest neighbor, and
support vector machine with four kernel functions (linear, Laplacian, Bessel, and
radial basis kernels);
4. The performance of the models is evaluated using the six performance measures,
including accuracy, Brier score, sensitivity, Youdent, specificity, and F1 score. More-
over, to assess the significance of the differences in the prediction performance of the
models, the Diebold and Mariano test was performed.
The rest of the article is organized as follows: Section 2 contains materials and methods,
Section 3 contains results and discussion, and Section 4 contains a conclusion.

2. Materials and Methods


This section discusses in detail the considered predictive models, and the description
of the features used in the kidney disease data of district Buner are discussed in detail.

2.1. Description of Variables


The dataset is collected from the Medical Complex, Buner Khyber Pakhtunkhwa, Pak-
istan. Diagnostic test reports have been observed for hundreds of patients on their arrival
for a check-up with nephrologists at the Medical Complex, Buner, Khyber Pakhtunkhwa,
Pakistan, with different twenty-one categorical variables illustrated in Table 1. These
patients were observed from November 2020 to March 2021, and the sample size was
calculated according to the formula in reference [21].

z2 pq
n= (1)
m2
Here, the sample size, n, z is the statistic with a subsequent level of confidence, p is the
expected prevalence proportion of CKD patients, q = 1 − p, and m is the precision with
a corresponding effect size. For the calculation of the expected sample size, we have
assumed that 270 patients had CKD and 230 did not have CKD, and calculated the expected
prevalence proportion p = 0.54, q = 0.46, z = 1.96 (95% confidence interval), and m = 0.05;
some authors [22] recommended using 5% precision if the expected prevalence proportion
lies between 10% and 90%. After putting the values of p, z, and m in Equation (1), the
approximated sample of size n = 382 was produced, which is further used for analysis in
this research.

2.2. Specification of Machine Learning Models


In this study, different ML predictive models were used, including logistic, probit,
random forest, decision tree, k-nearest neighbor, and support vector machine with four
kernel functions included (linear, Laplacian, Bessel, and radial basis kernels) models.
Statistical software R version 3.5.3 was used in the analysis of data.

2.2.1. Logistic Regression (LR)


The LR model is a dominant and well-established supervised classification approach [23].
It is the extended form of the regression family and can be modeled as a binary variable,
which commonly signifies the presence and absence of an event. It can be generalized to a
Sustainability 2023, 15, 2754 4 of 13

multiple variable model, known as multiple logistic regression (MLR). The mathematical
equation for MLR is

exp(α0 + α1 z1 + α2 z2 + . . . + αk zk )
Pr ( Z ) = (2)
1 + exp(α0 + α1 z1 + α2 z2 + . . . + αk zk )

where Z = (z1 , z2 , · · · , zk ) are k predictors. In our case, k = 21 is thoroughly explained in


the previews section. The unknown parameters are estimated by the MLE method.

Table 1. Description of variables.


Variables Scale of Variables Notation (Counts) Label
Age numerical Years (12 to 99) -
Ph numerical Mean (5.565) Sd (0.561) -
Specific gravity numerical Mean (1.016) Sd (0.0052) -
Gender nominal male (185) female (195) 1 0
Urine color nominal yellow (243) p.yellow (137) 1 0
Albumin nominal trace (227) nil (153) 1 0
Glucose nominal trace (27) nil (353) 1 0
Sugar nominal positive (63) nil (317) 1 0
Ketone bodies nominal trace (64) not_trace (316) 1 0
Bile pigment nominal present (64) absent (316) 1 0
Urobilinogen nominal abnormal (38) normal (342) 1 0
Blood nominal positive (62) negative (318) 1 0
Pus cells/WBCs nominal normal (166) abnormal (214) 0 1
Red cells/RBCs nominal normal (217) abnormal (163) 0 1
Epithelial cells nominal nil (153) Positive (227) 0 1
Mucus thread nominal present (181) none (199) 1 0
Calcium oxalate nominal positive (112) nil (268) 1 0
Granular cast nominal seen (94) nil (286) 1 0
Bacteria nominal seen (123) notseen (257) 1 0
Calcium carbonate nominal found (335) not found (45) 1 0
Disease status nominal ckd (240) notckd (142) 1 0
Noted: WBC (white blood cells), RBC (red blood cells), p.yellow (pale yellow), ckd (chronic kidney disease).

2.2.2. K-Nearest Neighbor (KNN)


The KNN algorithm is one of the simplest and earliest classification algorithms [24].
It is used for the prediction of the label of a new data point based on its nearest neighbor
labels in the training set if the new response is similar to the samples in the training part.
Let (z1 , z2 ) be the training observations and the learning function f : z1 −→ z2 , so that
given an observation of z1 , f (z1 ) can determine the z2 value.

2.2.3. Support Vector Machine (SVM)


SVM models are used to classify both linear and nonlinear data. They initially draw
each data point into a variable space, with the number of variables k. Then, they use
the hyperplane to divide the data items into two classes while maximizing the margin of
distance for both classes and minimizing the error of classification [25]. The margin of the
distance for a class is the distance between the nearest instance and the decision hyperplane,
which is a member of that class. The SVM models use different kinds of functions that are
known as the kernel. The kernel is used to transform data into the necessary form through
input. The SVM models use different types of kernel functions. In this work, four kernels,
linear, Laplacian, Bessel, and radial, are adopted as the basis.
Linear kernel: The linear kernel function is the modest one. These kernel functions
return the inner product between two points in a suitable feature space.

K (zm , zn ) = (zm .zn + 1) (3)


Sustainability 2023, 15, 2754 5 of 13

Laplacian kernel: The Laplacian kernel function is completely correspondent to the


exponential kernel, except for being less sensitive to changes in the sigma parameter.

− k zm − zn k
K (zm , zn ) = e( ); α > 0 (4)
σ
Bessel kernel: The Bessel function is well-known in the theory of kernel function
spaces of fractional smoothness, which is given as

Bv+1 (σ k zm − zn k)
K (zm , zn ) = (5)
k z m − z n k − n ( v +1)

where B is the Bessel function of the first kind.


Radial basis kernel: In SVM models, the radial basis, or RBF, is a popular kernel
function used in various kernelized learning models.
2
K ( zm , zn ) = e−αkzm −zn k ; α > 0 (6)

2.2.4. Decision Tree (DT)


A DT is a tree-like structure, where each node indicates a feature, each branch indicates
a decision rule, and each leaf represents a categorical or continuous value outcome. The
main idea behind DT is to create a tree-like pattern for the overall dataset and process a
single outcome at every leaf to reduce the errors in every leaf. DT separates observation into
branches to make a tree improve prediction accuracy. In the DT algorithm, the identification
of a variable and the subsequent cutoff for that variable breaks the input observation into
multiple subsets by using mathematical techniques, i.e., information gain, Gini index, chi-
squared test, etc. This splitting process is repeated until the construction of the complete
tree. The purpose of splitting algorithms is to find the threshold for a variable that boosts
up homogeneity in the output of samples. A decision tree is non-parametric and used
to partition the data using some sort of mechanism to find the potential values in the
feature [26]. In decision trees, the problem of overfitting is tackled by tuning hyper-
parameters with the maximum depth and maximum leaf node of the tree. In our case,
the hyper-parameters are tuned with maximum depths of 5, 10, 15, 20, 25, and 30 via an
iterative process.

2.2.5. Random Forest (RF)


RF is a type of supervised ML algorithm. The name shows that it builds a forest, but
it is made by some random. The decision trees grown very deep, frequently because of
overfitting the training part, resulting in a great deviation in the outcome of classification
for a slight variation in the input data. The different decision trees of the random forest
are trained using the different training datasets. For the classification of a new sample,
the input vector of that sample is needed to pass down each decision tree of the forest.
Furthermore, each DT then considers a different part of that input vector and yields a
classified outcome. The forest then opts for the classification of having the most ‘votes’
in case of discrete classification outcome, or the average of all trees in the forest in the
form of a numeric classification outcome. Since the random forest algorithm considers the
outcomes from many different decision trees, it can reduce the variation resulting from the
consideration of a single decision tree for the same dataset [27].

2.3. Performance Measures


In this work, different performance measures are used for model comparison, such as
accuracy, Brier score, sensitivity, Youdent, specificity, and F1 score.
Sustainability 2023, 15, 2754 6 of 13

2.3.1. Accuracy
The capacity of data items that is precisely classified is defined as accuracy. This means
that the predictions of the data points are closer to the actual values. The mathematical
equation can be described as

TP + TN
Accuracy = (7)
TP + FP + TN + FN

2.3.2. Brier Score


Brier Score (BS) is identified by the difference between the summation of the mean
squared predicted probabilities and the observed binary output. The mathematical formula
can be described as
R C  
BrierScore = mean( ∑ ∑ f rc − orc )2 (8)
r =1 c =1

Here, r shows the r th predicted probability in the cth category, and is the related
observed binary output with r th probability in the category of cth classification. The
minimum value of BS shows that the method is consistent and accurate.

2.3.3. Sensitivity
The measure of true positive observations that are properly identified as positive is
called sensitivity. The mathematical formula is

TP
Sensitivity = (9)
TP + FN

2.3.4. Youdent
Youdent can be generally described as Youdent = max a (Sensitivity ( a) + Speci f icity ( a)
−1). The cut-point that attains this high is decribed as the optimum cut-point ( a∗ ) since it
is the cut-point that improves the biomarker’s distinguishing capability when equivalent
weight is given to specificity and sensitivity.

2.3.5. Specificity
The quantity of true negative values that are precisely recognized as negative is called
specificity. The mathematical formula is

TN
Speci f icity = (10)
TN + FP

2.3.6. F1 Score
The amount of mixture of precision and recall to retain stability between Them is
called F1 score
The mathematical function for computing the F1 score is given by

2 ∗ Precison ∗ Recall
F1 − score = (11)
Precison + Recall
F1 score confirmed the goodness of classifiers in the terms of precision or recall.

3. Results and Discussion


This section elaborates on the results from different perceptions. Initially, we investi-
gate the performance of various machine learning methods, such as LR, PB, KNN, DT, RF,
SVM-L, SVM-LAP, SVM-RB, and SVM-B, on the CKD dataset. For the training and testing
prediction, consider three different scenarios, including 50%, 75%, and 90%. To evaluate
the prediction performance of the selected models, six performance measures were used,
3. Results and Discussion
This section elaborates on the results from different perceptions. Initially, we inves-
tigate the performance of various machine learning methods, such as LR, PB, KNN, DT,
RF, SVM-L, SVM-LAP, SVM-RB, and SVM-B, on the CKD dataset. For the training and
Sustainability 2023, 15, 2754 testing prediction, consider three different scenarios, including 50%, 75%, and 90%. 7 of To
13
evaluate the prediction performance of the selected models, six performance measures
were used, namely, accuracy, Brier score, sensitivity, Youdent, specificity, and F1 score,
for eachaccuracy,
namely, method for Briera total
score,ofsensitivity,
one thousand runs.specificity, and F1 score, for each method
Youdent,
In theofinitial
for a total scenario,runs.
one thousand the training dataset is 90% and the testing is 10%. The results
are shown in Table
In the initial 2. In Table
scenario, 2, it is evident
the training dataset that
is 90%theand
SVM-LAP andisRF
the testing produce
10%. better
The results
predictions
are shown inasTablecompared to the2,rest
2. In Table it is of the models.
evident that theThe best predictive
SVM-LAP and RF model
produce obtained
better
0.9171, 0.8671,
predictions 0.9484, 0.8155,
as compared to the0.0643, 0.0829,
rest of and 0.9319
the models. forbest
The mean accuracy,model
predictive specificity,
obtainedsen-
0.9171,
sitivity,0.8671,
Youdent,0.9484,
Brier0.8155,
score, F10.0643,
score,0.0829, and 0.9319
and error, for mean
respectively. accuracy, specificity,
RF produced the second
sensitivity,
best resultsYoudent,
compared Brier
to score, F1 score,
the other models, andwhile
error,LGrespectively. RF produced
and PB performed thethan
better second the
best resultstwo
remaining compared
models.toWe thecanother models,
also check thewhile LG and PB
superiority performed
of models better than
by creating the
figures.
remaining
For example,twoFigure
models.1 Weshowscan the
alsomean
checkaccuracy,
the superiority of models
Brier score, by creating
sensitivity, figures.
Youdent, For
speci-
example,
ficity, andFigure 1 shows
F1 score of allthe
themean accuracy,
models. As seenBrier
fromscore, sensitivity,
all figures, theYoudent,
SVM-Lapspecificity,
produces
and F1 score
superior of all the
outcomes models.toAs
compared seen
the restfrom allmodels.
of the figures,Although
the SVM-Lap RF isproduces superior
competitive, KNN
outcomes compared
showed the worst results. to the rest of the models. Although RF is competitive, KNN showed
the worst results.

Figure1.1.Performance
Figure Performancemeasure
measureplots
plots(90%
(90%training,
training,10%
10%testing)
testing)of
ofthe
themodels.
models.

Table2.2.First
Table Firstscenario
scenario(90%,
(90%,10%):
10%):different
differentperformance
performancemeasures
measuresand
androw-wise
row-wisepredictive
predictivemodels
models
inrows.
in rows.

Models
Models Accuracy
Accuracy Specificity
Specificity Sensitivity
Sensitivity Youdent
Youdent Brier Score
Brier Score Error
Error FFScore
Score
Logistic
Logistic 0.8945
0.8945 0.8736
0.8736 0.9073
0.9073 0.7809
0.7809 0.0699
0.0699 0.1055
0.1055 0.9135
0.9135
Probit
Probit 0.8942
0.8942 0.8736
0.8736 0.9073
0.9073 0.7809
0.7809 0.0686
0.0686 0.1058
0.1058 0.9139
0.9139
D-Tree 0.8839 0.8411 0.9096 0.7507 0.0953 0.1161 0.8985
D-Tree 0.8839 0.8411 0.9096 0.7507 0.0953 0.1161 0.8985
KNN 0.6309 0.4826 0.7209 0.2035 0.2457 0.3691 0.7800
KNN
SVM-RB 0.6309
0.8995 0.4826
0.8794 0.7209
0.9127 0.2035
0.7921 0.2457
0.0671 0.3691
0.1005 0.7800
0.9202
SVM-L
SVM-RB 0.8978
0.8995 0.9219
0.8794 0.8846
0.9127 0.8065
0.7921 0.0644
0.0671 0.1022
0.1005 0.9108
0.9202
SVM-LAP
SVM-L
0.9171
0.8978
0.8671
0.9219
0.9484
0.8846
0.8155
0.8065
0.0643
0.0644
0.0829
0.1022
0.9319
0.9108
SVM-B 0.8961 0.8751 0.9096 0.7846 0.0672 0.1039 0.9175
SVM-LAP 0.9171 0.8671 0.9484 0.8155 0.0643 0.0829 0.9319
RF 0.9129 0.8808 0.9330 0.8138 0.0652 0.0871 0.9270
SVM-B 0.8961 0.8751 0.9096 0.7846 0.0672 0.1039 0.9175
RF 0.9129 0.8808 0.9330 0.8138 0.0652 0.0871 0.9270

In the second scenario, the training dataset is 75% and testing is 25%. The results
are shown in Table 3. In Table 3, it is evident again that SVM-Lap and RF led to a better
prediction than the competitor predictive models. The best predictive model produced
0.9135, 0.8643, 0.9468, 0.8111, 0.0663, 0.0865, and 0.9328 for mean accuracy, specificity,
sensitivity, Youdent, Brier-score, F1 score, and error, respectively. RF again produced the
second best results. The superiority of the model can be observed in the figures. For
example, Figure 2 shows the mean accuracy, specificity, sensitivity, Youdent, Brier score,
and F1 score of all models. As seen from all figures, SVM-Lap produces superior results
compared to the rest of the models. Although RF is competitive and KNN shows the worst
results compared to the rest.
sitivity, Youdent, Brier-score, F1 score, and error, respectively. RF again produced the sec-
ond best results. The superiority of the model can be observed in the figures. For example,
Figure 2 shows the mean accuracy, specificity, sensitivity, Youdent, Brier score, and F1
score of all models. As seen from all figures, SVM-Lap produces superior results com-
Sustainability 2023, 15, 2754 pared to the rest of the models. Although RF is competitive and KNN shows the worst 8 of 13
results compared to the rest.

Figure 2.
Figure Performance measure
2. Performance measure plots
plots (50%
(50% training,
training, 50%
50% testing)
testing) of
of the
the models.
models.

Table 3. Second scenario (75%, 25%): different performance measures and row-wise predictive
Table 3. Second scenario (75%, 25%): different performance measures and row-wise predictive mod-
models
els in rows.
in rows.
Models
Models Accuracy
Accuracy Specificity
Specificity Sensitivity
Sensitivity Youdent
Youdent Brier
BrierScore
Score Error
Error FF Score
Score
Logistic 0.8923 0.8736 0.9073 0.7809 0.0760 0.1077 0.9135
Logistic 0.8923 0.8736 0.9073 0.7809 0.0760 0.1077 0.9135
Probit
Probit 0.8927
0.8927 0.8686
0.8686 0.9088
0.9088 0.7774
0.7774 0.0742
0.0742 0.1073
0.1073 0.9137
0.9137
D-Tree
D-Tree 0.8722
0.8722 0.8297
0.8297 0.9072
0.9072 0.7369
0.7369 0.1040
0.1040 0.1278
0.1278 0.9024
0.9024
KNN
KNN 0.6809
0.6809 0.5811
0.5811 0.7423
0.7423 0.3233
0.3233 0.1933
0.1933 0.3191
0.3191 0.7800
0.7800
SVM-RB
SVM-RB 0.9005
0.9005 0.8761
0.8761 0.9151
0.9151 0.7913
0.7913 0.0686
0.0686 0.0995
0.0995 0.9191
0.9191
SVM-L
SVM-L 0.8918
0.8918 0.9143
0.9143 0.8841
0.8841 0.7984
0.7984 0.0673
0.0673 0.1082
0.1082 0.9125
0.9125
SVM-LAP 0.9135 0.8643 0.9468 0.8111 0.0663 0.0865 0.9328
SVM-LAP 0.9135 0.8643 0.9468 0.8111 0.0663 0.0865 0.9328
SVM-B 0.8972 0.8729 0.9115 0.7845 0.0691 0.1028 0.9163
SVM-B 0.8972 0.8729 0.9115 0.7845 0.0691 0.1028 0.9163
RF 0.9084 0.8764 0.9318 0.8082 0.0672 0.0916 0.9284
RF 0.9084 0.8764 0.9318 0.8082 0.0672 0.0916 0.9284
Finally, in the third scenario, the training dataset is 50% and the testing is 50%. The
outcomes areinlisted
Finally, in Table
the third 4. As the
scenario, seen in Table
training 4, it isisevident
dataset 50% and again that SVM-Lap
the testing is 50%. and
The
RF have a better
outcomes prediction.
are listed in TableThis time,
4. As seentheinbest
Tablepredictive modelagain
4, it is evident produced 0.9052, 0.8585,
that SVM-Lap and
0.9449,
RF have0.8034,
a better0.0701, 0.0948,
prediction. Thisand
time,0.9304 forpredictive
the best mean accuracy, specificity,
model produced sensitivity,
0.9052, 0.8585,
Youdent, Brier score, F1 score, and error, respectively. RF again produced the second
0.9449, 0.8034, 0.0701, 0.0948, and 0.9304 for mean accuracy, specificity, sensitivity, Youdent, best
results. The superiority of the model is shown in the figures. For example,
Brier score, F1 score, and error, respectively. RF again produced the second best results.Figure 3 shows
the
Themean accuracy,
superiority specificity,
of the model is sensitivity,
shown in the Youdent,
figures. Brier score, and
For example, F1 score
Figure of allthe
3 shows models.
mean
It conforms
accuracy, to all other
specificity, figures; Youdent,
sensitivity, SVM-LapBrier produces
score, the
andlowest error.
F1 score of allAlthough
models. ItRF is com-
conforms
petitive, KNN
to all other shows
figures; the worst
SVM-Lap results compared
produces the lowest to all the
error. models.RF is competitive, KNN
Although
shows the worst results compared to all the models.
Once the accuracy measures were computed, the superiority of these results was
evaluated. For this purpose, in the past, many researchers performed the Diebold and
Mariano (DM) test [28–30]. In this study, to verify the superiority of the predictive model
results (accuracy measures) listed in Tables 2–4, we used the DM test [31]. A DM test is
the most used statistical test for comparative predictions acquired from different models.
The DM test for identical prediction accuracy has been performed for pairs of models. The
results (p-values) of the DM test are listed in Table 5; each entry in the table shows the
p-value of a hypothesis assumed, where the null hypothesis supposes no difference in the
accuracy of the predictor in the column or row against the research hypothesis that the
predictor in the column is more accurate as compared to the predictor in the row. In Table 5,
we observed that the prediction accuracy of the previously defined model is not statistically
different from all other models except for KNN.
Sustainability 2023, 15, x FOR PEER REVIEW 9 of 13
Sustainability 2023, 15, 2754 9 of 13

Figure 3. Performance measure plots (75% training, 25% testing) of the models.
Figure 3. Performance measure plots (75% training, 25% testing) of the models.
Table 4. Third scenario (50%, 50%): different performance measures and row-wise predictive models
Once the accuracy measures were computed, the superiority of these results was
in rows.
evaluated. For this purpose, in the past, many researchers performed the Diebold and
Models Accuracy Mariano (DM) test
Specificity [28–30]. In this
Sensitivity study, to verify
Youdent Brierthe superiority
Score Error of the predictive
F Score model
Logistic 0.8858 results (accuracy measures) listed in Tables 2–4, we used the DM test [31]. 0.9135
0.8736 0.9073 0.7809 0.0929 0.1142 A DM test is
Probit 0.8873 the most used statistical
0.8628 0.9090 test for0.7718
comparative predictions
0.0900 acquired
0.1127from different
0.9125models.
D-Tree 0.8457
The DM test for identical
0.8021 0.9065
prediction
0.7086
accuracy has been performed
0.1207 0.1543
for pairs of0.8950
models. The
results (p-values) of the DM test are listed in Table 5; each entry in the table shows the p-
KNN 0.7145 0.6492 0.7553 0.4045 0.1554 0.2855 0.7685
value of a hypothesis assumed, where the null hypothesis supposes no difference in the
SVM-RB 0.9001 0.8712 of the predictor
accuracy 0.9182 in the0.7893
column or row 0.0705 0.0999
against the research 0.9196that the
hypothesis
SVM-L 0.8878 predictor
0.9077 in the column
0.8843 is more0.7920
accurate as compared
0.0720 to the predictor
0.1122 in the 0.9110
row. In Table
SVM-LAP 0.9052 5,0.8585
we observed that the
0.9449 prediction accuracy
0.8034 of the previously
0.0701 defined
0.0948 model is not statis-
0.9304
tically different from all other models except for KNN.
SVM-B 0.8978 0.8692 0.9144 0.7836 0.0714 0.1022 0.9171
RF 0.9021 0.8694
Table 0.9314(50%, 50%):0.8008
4. Third scenario 0.0712 measures0.0979
different performance 0.9264 mod-
and row-wise predictive
els in rows.

Models Accuracy Specificity


Table 5. p-values Sensitivity Youdent
for the Diebold and Brier Score
Marion test identical predictionError F Score
accuracy against the alter-
Logistic 0.8858 native
0.8736 hypothesis that
0.9073the model in the
0.7809column is more accurate
0.0929 than in the
0.1142 row (using
0.9135squared
Probit 0.8873 loss function).
0.8628 0.9090 0.7718 0.0900 0.1127 0.9125
D-Tree
Models. 0.8457
Logistic 0.8021 D-Tree 0.9065
Probit KNN 0.7086
SVM-RB 0.1207
SVM-L 0.1543SVM-B0.8950RF
SVM-LAP
KNN
Logistic - 0.7145 0.01 0.6492 0.99 0.7553
0.99 0.4045
0.01 0.010.1554 0.01 0.28550.01 0.76850.01
SVM-RB
Probit 0.9001
0.99 _- 0.8712 0.99 0.9182
0.99 0.7893
0.01 0.010.0705 0.01 0.09990.01 0.91960.01
SVM-L 0.8878 0.9077 0.8843 0.7920 0.0720 0.1122 0.9110
D-Tree 0.01 0.01 - 0.99 0.01 0.01 0.01 0.01 0.01
SVM-LAP 0.9052 0.8585 0.9449 0.8034 0.0701 0.0948 0.9304
KNN 0.01 0.01 0.01 _- 0.01 0.01 0.01 0.01 0.01
SVM-B 0.8978 0.8692 0.9144 0.7836 0.0714 0.1022 0.9171
SVM-RB
RF 0.99
0.9021 0.99 0.8694 0.99 0.99
0.9314 -
0.8008 0.990.0712 0.01 0.09790.99 0.92640.01
SVM-L 0.99 0.99 0.99 0.99 0.01 _- 0.01 0.01 0.01
SVM-LAP 0.99 0.99 In the
0.99end, we also
0.99 plotted 0.99
an error box plot for each
0.99 - scenario 0.99
with all predictive
0.99
SVM-B 0.99 models.
0.99 This
0.99 error was obtained
0.99 as (1-accuracy),
0.01 0.99 during the
0.01one thousand
_- simulations
0.01 for
model consistency. It can see in Figure 4a that the first scenario is 90%, 10%; in Figure
RF 0.99 0.99 0.99 0.99 0.99 0.99 0.01 0.99 -
4b,the second scenario is 75%, 25%, and in Figure 4c the third scenario is 50%, 50%. In all
three cases, SVM-LAP captures the lowest variation, and the second best model is RF.
Additionally, confirmed
In the end, we alsoinplotted
Figurean4a–c, KNN
error boxproduces
plot for the
eachworst results.
scenario Finally,
with it is con-
all predictive
firmed
models.that
ThisSVM-LAP
error wasisobtained
highly efficient for the prediction
as (1-accuracy), during theof CKD
one in patients,
thousand and RFfor
simulations is
the second
model best model.
consistency. It can see in Figure 4a that the first scenario is 90%, 10%; in Figure 4b,
the second scenario is 75%, 25%, and in Figure 4c the third scenario is 50%, 50%. In all
Logistic - 0.01 0.99 0.99 0.01 0.01 0.01 0.01 0.01
Probit 0.99 _- 0.99 0.99 0.01 0.01 0.01 0.01 0.01
D-Tree 0.01 0.01 - 0.99 0.01 0.01 0.01 0.01 0.01
KNN 0.01 0.01 0.01 _- 0.01 0.01 0.01 0.01 0.01
SVM-RB 2023, 15,0.99
Sustainability 2754 0.99 0.99 0.99 - 0.99 0.01 0.99 0.01
10 of 13
SVM-L 0.99 0.99 0.99 0.99 0.01 _- 0.01 0.01 0.01
SVM-LAP 0.99 0.99 0.99 0.99 0.99 0.99 - 0.99 0.99
SVM-B 0.99 0.99 0.99 0.99 0.01 0.99 0.01 _- 0.01
three cases, SVM-LAP captures the lowest variation, and the second best model is RF.
RF 0.99 0.99 0.99 0.99 0.99 0.99 0.01 0.99 -
Additionally, confirmed in Figure 4a–c, KNN produces the worst results. Finally, it is
confirmed that SVM-LAP is highly efficient for the prediction of CKD in patients, and RF is
the second best model.

(a)

(b)

Figure 4. Cont.
Sustainability 2023, 15, x FOR PEER REVIEW 11 of 13

Sustainability 2023, 15, 2754 11 of 13

(c)
Figure
Figure 4. 4.
(a)(a) Error
Error box-plots
box-plots (90%training,
(90% training,10%
10%testing)
testing)ofofthe
themodels;
models;(b)
(b)Error
Errorbox-plots
box-plots(50%
(50%
training, 50% testing) of the models; (c) Error box-plots (75% training, 25% testing) of the models.
training, 50% testing) of the models; (c) Error box-plots (75% training, 25% testing) of the models.

4. Conclusions
4. Conclusions
In In
this research,
this research,wewe attempted
attempted a comparative
a comparativeanalysis
analysisofofdifferent
differentmachine
machinelearning
learning
methods
methods using
usingthetheCKDCKDdata of district
data Buner,
of district Khyber
Buner, Pakhtunkhwa,
Khyber Pakhtunkhwa, Pakistan. The dataset
Pakistan. The da-
consists of records
taset consists collected
of records as part as
collected of part
a case–control study involving
of a case–control patients
study involving with CKD
patients with
from the entire Buner district. For the training and testing prediction, we
CKD from the entire Buner district. For the training and testing prediction, we considered considered three
different scenarios,
three different including
scenarios, 50%, 75%,
including 50%,and75%,90%. For the
and 90%. Forcomparison
the comparison of theof models
the modelsin
terms of classification,
in terms we calculated
of classification, we calculated various
variousperformance
performance measures,
measures,i.e.,
i.e.,accuracy,
accuracy,Brier
Brier
score, sensitivity,
score, sensitivity, Youdent,
Youdent,specificity, and and
specificity, F1 score. The The
F1 score. results indicate
results that the
indicate thatSVM-LAP
the SVM-
model
LAPoutperforms
model outperforms other models in all three
other models scenarios,
in all while the
three scenarios, RF model
while the RFismodel
competitive.
is com-
Additionally, the DM testthe
petitive. Additionally, was DMused
testtowas
ensure
usedthe superiority
to ensure of predictive
the superiority model accuracy
of predictive model
measures.
accuracyThe result (DM
measures. The test p-values)
result (DM test shows the best
p-values) performance
shows of all used methods
the best performance in
of all used
themethods
prediction of chronic kidney disease, excluding the KNN method.
in the prediction of chronic kidney disease, excluding the KNN method.
This study
This study can bebe
can further
further extended
extendedininfuture
futureresearch
researchprojects
projectsin inmedical
medicalsciences
sciences by,
by,
forfor
example,
example, predicting
predictingthetheeffectiveness
effectivenessofofa aparticular
particularmedicine
medicineininspecific
specificdiseases.
diseases. Fur-
Fur-
thermore,
thermore, thethe
reported bestbest
reported MLML models in this
models work
in this can be
work canused to predict
be used otherother
to predict conditions,
condi-
such as heart
tions, such as disease, cancer, cancer,
heart disease, and tuberculosis. Moreover,
and tuberculosis. a novel
Moreover, hybrid
a novel system
hybrid for the
system for
same
the dataset will bewill
same dataset proposed to obtain
be proposed moremore
to obtain accurate and efficient
accurate prediction
and efficient results.
prediction results.

Author
AuthorContributions:
Contributions: H.I., conceptualized
H.I., conceptualizedand andanalyzed
analyzedthethedata,
data,and
andprepared
preparedthe
thefigures
figuresand
and
draft;
draft; M.K.,
M.K., collect
collect data,
data, designed
designed thethe methodology,
methodology, performedthe
performed thecomputation
computationwork,
work,prepared
preparedthethe
tables
tables and and graphs,
graphs, andand authored
authored thethe draft;
draft; Z.K.,
Z.K., supervised,authored,
supervised, authored,reviewed,
reviewed,and
andapproved
approvedthe the
final
final draft;
draft; F.K.,
F.K., H.M.A.
H.M.A. andand
Z.A.Z.A. authored,
authored, reviewed,
reviewed, administered,
administered, and and approved
approved the final
the final draft.draft.
All
All authors
authors haveand
have read read and agreed
agreed to the to the published
published version
version of theof the manuscript.
manuscript.
Funding
Funding: : Princess
Princess Nourah
Nourah bint
bint Abdulrahman
Abdulrahman UniversityResearchers
University ResearchersSupporting
SupportingProject
Projectnumber
number
(PNURSP2023R
(PNURSP2023R 299),
299), Princess
Princess Nourah
Nourah bint
bint Abdulrahman
Abdulrahman University,
University, Riyadh,
Riyadh, SaudiArabia.
Saudi Arabia.
Informed
Informed Consent
Consent Statement:
Statement: Informed
Informed consent
consent was was obtained
obtained from
from all all subjects
subjects involved
involved in the
in the study.
study.
Data Availability Statement: The research data will be provided upon the request to the first author.
Data Availability Statement: The research data will be provided upon the request to the first au-
Acknowledgments:
thor. Princess Nourah bint Abdulrahman University Researchers Supporting Project
number (PNURSP2023R 299), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
Conflicts of Interest: The authors declare no conflict of interest.
Sustainability 2023, 15, 2754 12 of 13

References
1. Yan, M.T.; Chao, C.T.; Lin, S.H. Chronic kidney disease: Strategies to retard progression. Int. J. Mol. Sci. 2021, 22, 10084. [CrossRef]
[PubMed]
2. Lozano, R.; Naghavi, M.; Foreman, K.; Lim, S.; Shibuya, K.; Aboyans, V.; Abraham, J.; Adair, T.; Aggarwal, R.; Ahn, S.Y.; et al.
Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: A systematic analysis for the Global
Burden of Disease Study 2010. Lancet 2012, 380, 2095–2128. [CrossRef]
3. Jha, V.; Garcia-Garcia, G.; Iseki, K.; Li, Z.; Naicker, S.; Plattner, B.; Saran, R.; Wang, A.Y.M.; Yang, C.W. Chronic kidney disease:
Global dimension and perspectives. Lancet 2013, 382, 260–272. [CrossRef] [PubMed]
4. Eckardt, K.U.; Coresh, J.; Devuyst, O.; Johnson, R.J.; Köttgen, A.; Levey, A.S.; Levin, A. Evolving importance of kidney disease:
From subspecialty to global health burden. Lancet 2013, 382, 158–169. [CrossRef]
5. Rapa, S.F.; Di Iorio, B.R.; Campiglia, P.; Heidland, A.; Marzocco, S. Inflammation and oxidative stress in chronic kidney
disease—Potential therapeutic role of minerals, vitamins and plant-derived metabolites. Int. J. Mol. Sci. 2019, 21, 263. [CrossRef]
6. Jayasumana, C.; Gunatilake, S.; Senanayake, P. Glyphosate, hard water and nephrotoxic metals: Are they the culprits behind
the epidemic of chronic kidney disease of unknown etiology in Sri Lanka? Int. J. Environ. Res. Public Health 2014, 11, 2125–2147.
[CrossRef] [PubMed]
7. Mubarik, S.; Malik, S.S.; Mubarak, R.; Gilani, M.; Masood, N. Hypertension associated risk factors in Pakistan: A multifactorial
case-control study. J. Pak. Med. Assoc. 2019, 69, 1070–1073.
8. Naqvi, A.A.; Hassali, M.A.; Aftab, M.T. Epidemiology of rheumatoid arthritis, clinical aspects and socio-economic determinants
in Pakistani patients: A systematic review and meta-analysis. JPMA J. Pak. Med. Assoc. 2019, 69, 389–398.
9. Hsu, R.K.; Powe, N.R. Recent trends in the prevalence of chronic kidney disease: Not the same old song. Curr. Opin. Nephrol.
Hypertens. 2017, 26, 187–196. [CrossRef]
10. Salazar, L.H.A.; Leithardt, V.R.; Parreira, W.D.; da Rocha Fernandes, A.M.; Barbosa, J.L.V.; Correia, S.D. Application of machine
learning techniques to predict a patient’s no-show in the healthcare sector. Future Internet 2022, 14, 3. [CrossRef]
11. Elsheikh, A.H.; Saba, A.I.; Panchal, H.; Shanmugan, S.; Alsaleh, N.A.; Ahmadein, M. Artificial intelligence for forecasting the
prevalence of COVID-19 pandemic: An overview. Healthcare 2021, 9, 1614. [CrossRef] [PubMed]
12. Khamparia, A.; Pandey, B. A novel integrated principal component analysis and support vector machines-based diagnostic
system for detection of chronic kidney disease. Int. J. Data Anal. Tech. Strateg. 2020, 12, 99–113. [CrossRef]
13. Zhao, Y.; Zhang, Y. Comparison of decision tree methods for finding active objects. Adv. Space Res. 2008, 41, 1955–1959. [CrossRef]
14. Vijayarani, S.; Dhayanand, S.; Phil, M. Kidney disease prediction using SVM and ANN algorithms. Int. J. Comput. Bus. Res.
(IJCBR) 2015, 6, 1–12.
15. Dritsas, E.; Trigka, M. Machine learning techniques for chronic kidney disease risk prediction. Big Data Cogn. Comput. 2022, 6, 98.
[CrossRef]
16. Wickramasinghe, M.P.N.M.; Perera, D.M.; Kahandawaarachchi, K.A.D.C.P. (2017, December). Dietary prediction for patients with
Chronic Kidney Disease (CKD) by considering blood potassium level using machine learning algorithms. In Proceedings of the
2017 IEEE Life Sciences Conference (LSC), Sydney, Australia, 13–15 December 2017; IEEE: Piscataway, NJ, USA, 2018; pp. 300–303.
17. Gupta, A.; Eysenbach, B.; Finn, C.; Levine, S. Unsupervised meta-learning for reinforcement learning. arXiv 2018, arXiv:1806.04640.
18. Lakshmi, K.; Nagesh, Y.; Krishna, M.V. Performance comparison of three data mining techniques for predicting kidney dialysis
survivability. Int. J. Adv. Eng. Technol. 2014, 7, 242.
19. Zhang, H.; Hung, C.L.; Chu, W.C.C.; Chiu, P.F.; Tang, C.Y. Chronic kidney disease survival prediction with artificial neural
networks. In Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain,
3–6 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1351–1356.
20. Kavakiotis, I.; Tsave, O.; Salifoglou, A.; Maglaveras, N.; Vlahavas, I.; Chouvarda, I. Machine learning and data mining methods in
diabetes research. Comput. Struct. Biotechnol. J. 2017, 15, 104–116. [CrossRef]
21. Singh, V.; Asari, V.K.; Rajasekaran, R. A Deep Neural Network for Early Detection and Prediction of Chronic Kidney Disease.
Diagnostics 2022, 12, 116. [CrossRef]
22. Pourhoseingholi, M.A.; Vahedi, M.; Rahimzadeh, M. Sample size calculation in medical studies. Gastroenterol. Hepatol. Bed Bench
2013, 6, 14.
23. Naing, L.; Winn TB, N.R.; Rusli, B.N. Practical issues in calculating the sample size for prevalence studies. Arch. Orofac. Sci.
2006, 1, 9–14.
24. Nhu, V.H.; Shirzadi, A.; Shahabi, H.; Singh, S.K.; Al-Ansari, N.; Clague, J.J.; Jaafari, A.; Chen, W.; Miraki, S.; Dou, J.; et al. Shallow
landslide susceptibility mapping: A comparison between logistic model tree, logistic regression, naïve bayes tree, artificial neural
network, and support vector machine algorithms. Int. J. Environ. Res. Public Health 2020, 17, 2749. [CrossRef] [PubMed]
25. Joachims, T. Making large-scale svm learning. In Practical Advances in Kernel Methods-Support Vector Learning; MIT Press:
Cambridge, MA, USA, 1999.
26. Criminisi, A.; Shotton, J.; Konukoglu, E. Decision forests: A unified framework for classification, regression, density estimation,
manifold learning and semi-supervised learning. Found. Trends Comput. Graph. Vis. 2012, 7, 81–227. [CrossRef]
27. Tyralis, H.; Papacharalampous, G.; Langousis, A. A brief review of random forests for water scientists and practitioners and their
recent history in water resources. Water 2019, 11, 910. [CrossRef]
Sustainability 2023, 15, 2754 13 of 13

28. Shah, I.; Iftikhar, H.; Ali, S.; Wang, D. Short-term electricity demand forecasting using components estimation technique. Energies
2019, 12, 2532. [CrossRef]
29. Shah, I.; Iftikhar, H.; Ali, S. Modeling and forecasting medium-term electricity consumption using component estimation
technique. Forecasting 2020, 2, 163–179. [CrossRef]
30. Shah, I.; Iftikhar, H.; Ali, S. Modeling and forecasting electricity demand and prices: A comparison of alternative approaches.
J. Math. 2022, 2022. [CrossRef]
31. Diebold, F.X.; Mariano, R.S. Comparing predictive accuracy. J. Bus. Econ. Stat. 2002, 20, 134–144. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like