Paper_1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/382461697

Machine Learning-Based Analysis and Prediction of Liver Cirrhosis

Conference Paper · July 2024


DOI: 10.1109/TSP63128.2024.10605929

CITATIONS READS

0 166

3 authors:

Ahmet Ercan Topcu Ersin Elbaşı


American University of the Middle East American University of the Middle East
69 PUBLICATIONS 589 CITATIONS 114 PUBLICATIONS 1,193 CITATIONS

SEE PROFILE SEE PROFILE

Yehia Ibrahim Alzoubi


American University of the Middle East
57 PUBLICATIONS 1,102 CITATIONS

SEE PROFILE

All content following this page was uploaded by Ersin Elbaşı on 23 July 2024.

The user has requested enhancement of the downloaded file.


Machine Learning-based Analysis and Prediction of
Liver Cirrhosis
Ahmet Ercan Topcu 1, Ersin Elbasi 1, Yehia Ibrahim Alzoubi 2
1
College of Engineering and Technology, American University of the Middle East, Kuwait; [email protected];
[email protected]
2
College of Business Administration, American University of the Middle East, Kuwait; [email protected]

Abstract—Liver cirrhosis poses a significant threat as a highly However, the challenge of early detection persists, with 75
infectious blood-borne illness, often remaining asymptomatic in to 80 percent of cases progressing to irreversible stages before
its initial stages, thereby complicating early diagnosis and symptoms manifest, rendering therapy ineffective [2]. Timely
treatment. As the disease advances to its later stages, diagnostic identification of the disease, as highlighted by [3], offers the
and therapeutic interventions become increasingly daunting. This best prospects for successful intervention, minimizing liver
work endeavors to provide a robust solution by introducing an damage and reducing the risk of associated complications like
Artificial Intelligence (AI) system driven by state-of-the-art liver cancer [5]. Thus, the development of novel diagnostic
Machine Learning (ML) algorithms, aiming to aid healthcare methodologies capable of detecting liver cirrhosis in its nascent
professionals in the early detection of liver cirrhosis. Multiple ML stages holds immense promise, enabling prompt initiation of
algorithms are under development with the primary objective of therapies and curbing further transmission of the virus. AI-
predicting the likelihood of liver cirrhosis infection. Through this based disease diagnostics and prediction algorithms offer
research, seven distinct models have been crafted, leveraging significant potential for facilitating the early detection of both
diverse parameters and employing different ML algorithms such acute infections and chronic disorders. Many studies have
as Logistic Regression (LR), Linear Discriminant Analysis (LDA),
investigated the role of ML and Deep Learning (DL) in the
k-Nearest Neighbors (KNN), Random Forest (RF), Multi-layer
prediction of liver cirrhosis and reported promising findings
Perceptron (MLP), AdaBoost, and Bernoulli Naive Bayes
(BernoulliNB). Among these, the RF algorithm emerged as the
(e.g., [6], [7]). This underscores the critical role of AI in
frontrunner, exhibiting an impressive accuracy rate of enabling timely interventions and improving patient outcomes
approximately 98 percent. Utilizing an open-access liver cirrhosis by leveraging structured datasets and advanced ML algorithms.
dataset, this methodology surpasses earlier research endeavors, In this study, various well-established ML algorithms were
showcasing a substantial improvement in predictive accuracy. employed, yielding notable outcomes. LR, LDA, KNN, RF,
Rigorous comparisons between models underscore their MLP, AdaBoost, and BernoulliNB emerged as effective
robustness, affirming their reliability and establishing a clear
algorithms, boasting accuracy of 92%, 82%, 67%, 98%, 74%,
pathway forward for future investigations in this domain.
66%, and 66%, respectively. The significantly higher accuracy
percentage of the models utilized in this study compared to
Keywords—liver cirrhosis; machine learning; prediction;
prior research underscores their enhanced reliability. Through
health rigorous model comparisons, these findings demonstrate the
I. INTRODUCTION robustness of the proposed approach, laying the groundwork
for further development and implementation. The subsequent
Liver cirrhosis, a blood-borne disease, impacts millions sections of this paper delve into related work in Section 2,
worldwide, taking a significant toll on public health. Despite detailing research method in Section 3, experimental findings
its global prevalence, symptoms often surface only in later in Section 4, followed by conclusions in Section 5.
stages, contributing to substantial morbidity and mortality rates
[1]. According to the World Health Organization (WHO), II. BACKGROUND AND RELATED LITERATURE
millions contract the virus annually [2]. The delayed onset of
symptoms complicates early detection and intervention efforts, A. Liver Cirrhosis Prediction
emphasizing the urgent need for innovative diagnostic Liver cirrhosis is a chronic liver disease characterized by
strategies. The insidious nature of liver cirrhosis, with most the progressive deterioration of liver function and the
patients showing no symptoms in the early stages, underscores formation of scar tissue [8]. It is often the result of long-term
the urgent need for effective diagnostic tools. Without early liver damage caused by conditions such as chronic alcohol
detection, liver damage escalates, leading to heightened fatality abuse, viral hepatitis, or non-alcoholic fatty liver disease. As
rates [3]. Given the absence of a viable vaccination, precise cirrhosis progresses, it can lead to serious complications such
assessment of liver damage becomes pivotal for the effective as liver failure, portal hypertension, and hepatocellular
management and prevention of viral transmission [4]. In this carcinoma [8]. Early detection and intervention are crucial for
context, AI emerges as a promising tool, demonstrating managing cirrhosis and preventing further liver damage.
comparable diagnostic efficacy to human clinicians and, in
some cases, surpassing them, particularly among less Prediction of liver cirrhosis using ML techniques has
experienced physicians. gained attention as a promising approach for early diagnosis
2 of 5

and prognosis. ML models can analyze large volumes of Other work [9] focused on feature extraction and
patient data, including clinical, laboratory, and imaging data, to integration methods. For instance, the authors proposed an
identify patterns and predict the likelihood of developing integrated feature extraction approach for liver disease
cirrhosis [9]. By leveraging advanced algorithms and classification, achieving an accuracy of 88.10% and an F1
computational techniques, these models can provide accurate score of 88.68% using logistic regression, random forest, KNN,
and personalized predictions, allowing healthcare providers to SVM, MLP, and ensemble voting classifiers. In [14], the
intervene early and tailor treatment plans to individual patients authors combined SVM and modified particle swarm
[10]. Additionally, ML techniques enable the integration of optimization for heart and liver data classification, reporting
diverse data sources and the extraction of hidden insights from enhanced accuracy and performance compared to traditional
complex datasets, enhancing our understanding of the methods.
underlying mechanisms of liver cirrhosis and improving
diagnostic accuracy and patient outcomes [11]. One research group focused on prognostic models for liver
disease. The authors in [15] developed a prognostic model for
B. Literature Review ICU readmission after liver transplantation, achieving high
The literature reviewed encompasses various studies accuracy, precision, recall, and F1-score using ML methods.
exploring ML approaches for liver disease prediction and The authors in [5] compared the prognosis of hepatocellular
diagnosis. Four themes were identified: assessing the efficacy carcinoma (HCC) in NAFLD patients against other etiologies,
of ML models, prediction and identification, feature extraction revealing worse outcomes in NAFLD-related HCC and
and integration methods, prognostic models, data demonstrating the efficacy of extreme gradient boosting for
preprocessing, and ensemble learning. The themes extracted predictive modeling. Additionally, the authors in [3] proposed
from these studies highlight the diversity of methodologies and a deep learning model for detecting liver disease tumors,
approaches employed, as well as the significant contributions demonstrating superior performance in accuracy, dice
to advancing liver disease detection and prognosis. similarity coefficient, and specificity compared to existing
algorithms.
Several studies focus on assessing the efficacy of ML
models in diagnosing liver diseases and predicting patient Another group of studies focused on data preprocessing
outcomes. In [1], the authors evaluated ML models' precision, and ensemble learning. In [16], the authors evaluated various
recall, and transferability, while in [2], they compared three ML models and ensemble methods for liver disease prediction,
alternative models, including Support Vector Machine (SVM), highlighting the superior performance of the voting classifier
decision tree classification, and RF classification, with RF with SMOTE. The authors in [17] and [18] utilized ensemble
exhibiting the highest accuracy of around 97%. Also, in [10], learning algorithms and data preprocessing techniques to
the authors assessed the performance of various ML algorithms enhance liver disease detection and classification. The authors
in predicting liver disease occurrence, achieving good accuracy in [17] reported the highest testing accuracy of 91.82% using
after feature selection. Moreover, the authors in [12] develop an extra tree classifier, while in [18] the authors achieved LR
ML scores to predict liver disease outcomes, demonstrating with SMOTE providing the highest level of accuracy at
their association with clinically significant parameters and 93.18%.
suggesting their utility in clinical practice. Furthermore, in [4], This study not only validates the findings of previous
the authors conducted a comparison analysis of different ML research but also expands upon them by conducting thorough
classifiers for liver disease prediction, showcasing the accuracy testing and comparison of several ML techniques, including
of various algorithms across different datasets. LR, LDA, KNN, RF, MLP, AdaBoost, and BernoulliNB. By
Another group of studies focused on diagnosis, employing a diverse array of ML methods, this study offers a
prediction, and identification. For example, in [6], the authors comprehensive evaluation of their effectiveness in addressing
identified diagnostic biomarkers of NAFLD using ML the research objectives. Through rigorous testing and
algorithms, revealing genes associated with NAFLD and their comparison, the study provides valuable insights into the
correlation with immune cell infiltration, providing insights relative strengths and weaknesses of each technique, thereby
into NAFLD mechanisms and treatment. Also, the authors in enhancing our understanding of their applicability in the
[11] reviewed and compared denoising, deblurring, and context of liver cirrhosis diagnosis and prediction.
segmentation methods for liver disease diagnosis, highlighting Furthermore, the results of our study surpassed those of
the effectiveness of DL-based convolutional neural networks. previous research, achieving a 98% accuracy rate for Random
Moreover, the authors in [13] applied ML algorithms to Forest.
construct prediction models for liver disease severity, III. MATERIALS AND METHODS
achieving comparable predictive abilities with established
clinical scores like SOFA, SAPS II, MELD, and MELD Na. We conducted our experiments utilizing the Phoenix High-
Similarly, the authors in [7] aimed to lower the high cost of Performance Computing facility at the American University of
liver disease diagnosis through prediction using various ML the Middle East (AUM). This system features CentOS 7.4
algorithms. Their proposed intelligent model achieves an (RedHat-based) operating systems equipped with 640 CPU
accuracy of 0.884 and a miss rate of 0.116, demonstrating its cores, Intel Xeon E5-2698 v3 2.3GHz processors, and a
effectiveness and comprehensiveness in predicting liver substantial 1.28 TB of memory. With a peak performance
disease. capability of 23 TFLOPS, this platform facilitated our research
3 of 5

endeavors with exceptional efficiency and precision. We used


these resources to get the benefits of the UNIX system
environments for the computations.
In our study, we utilized a dataset sourced from [19] to
predict the early-stage status of cirrhosis disease based on
dataset features. Employing various ML algorithms, including
LR, LDA, KNN, RF, MLP, AdaBoost, and BernoulliNB, we
aimed to attain optimal accuracy results. Initially, we
partitioned the dataset into training (80%) and testing (20%)
sets and employed cross-validation techniques on the
complementary subset. The training set facilitated algorithm
training, while the testing set assessed their performance across
different ML algorithms. Data preprocessing is essential prior
to model development to eliminate noise and outliers,
enhancing the model's efficacy. For instance, removing
irrelevant columns, such as the untitled column, streamlines the
modeling process. Additionally, label encoding converts string
literals into numeric values, facilitating machine Fig. 1. Performance analysis
understanding. This conversion is vital, as machines typically
process numerical data. Notably, label encoding transforms all B. Training Time Analysis
strings within the dataset into numerical values, ensuring The training and testing durations varied notably across
compatibility with ML algorithms. different ML algorithms. MLP exhibited the longest training
time among the models assessed, at 0.3104 seconds, owing to
IV. RESULTS AND DISCUSSION
its intricate architecture and expansive parameter spaces,
A. Performance Analysis necessitating extensive computation. Similarly, RF
Figure 1 illustrates the performance of various ML demonstrated a relatively prolonged training time, totaling
algorithms. RF emerged as the top performer, achieving the 0.1549 seconds, attributable to its ensemble construction and
highest accuracy of 98%, followed by LR at 92%. LDA also decision tree amalgamation. Conversely, LDA and
demonstrated commendable performance, with accuracies BernoulliNB showcased the shortest training times, demanding
reaching 0.82%. Conversely, KNN, AdaBoost, BernoulliNB, 0.0015 and 0.0012 seconds respectively, owing to their
and MLP exhibited lower accuracies compared to other streamlined and efficient methodologies. Notably, LDA
algorithms. Additionally, the results highlighted the superior computes closed-form solutions for classification, enhancing
performance of rule-based algorithms over probabilistic its efficiency.
approaches. Concerning testing duration, KNN and AdaBoost exhibited
RF offers versatility, excelling in both regression and longer testing times compared to other algorithms, with KNN
classification tasks. Its transparency allows users to identify the requiring 0.0008 seconds and AdaBoost requiring 0.0985
most influential features easily [20]. Moreover, its default seconds. Notably, KNN's testing time is influenced by the
hyperparameters typically yield intuitive results, simplifying computation of distances between test and training samples,
the modeling process [21]. Also, RF classifiers are less prone while AdaBoost's complexity stems from combining multiple
to overfitting compared to other models, particularly when a weak learners. In contrast, LR and LDA demonstrated the
sufficient number of trees are included in the forest [22]. swiftest testing times, with LR clocking in at 0.0002 seconds
and LDA at 0.0001 seconds. This is attributable to their
straightforward linear models, resulting in minimal
computational overhead during inference. The training and
testing durations are illustrated in Figure 2.
4 of 5

RF classification demonstrating superior performance. The


high accuracy rate of 98% achieved by the models in this study
surpasses that of previous research, indicating their enhanced
reliability. The RF ML algorithm demonstrates lower MAE and
RMSE values, alongside a higher Cohen's Kappa value, which
is consistent with its higher accuracy performance.
The findings of this study hold promise for clinicians and
researchers in the effective monitoring and personalized
modeling of liver disease. By incorporating quality-of-life
indicators, the models offer flexibility and insight into patient
well-being and condition-related challenges. However, it is
important to acknowledge the study's limitations. The dataset
used may lack clear definitions and diagnostic criteria for liver
disease diagnosis. Moreover, utilizing data from hospital units
or institutes could enhance model evaluation by introducing a
Fig. 2. Training time analysis wider range of features. Yet, accessing sensitive medical data
poses challenges due to privacy concerns. Future utilization of
C. Error Analysis diverse ML models could further enhance framework models,
In our experimental analysis, we employed Cohen's Kappa improving their reliability and efficacy. Ultimately, an ML
coefficient, Mean Absolute Error (MAE), and Root Mean framework could aid in identifying individuals at risk of liver
Squared Error (RMSE) to thoroughly assess the agreement cirrhosis, enabling early intervention, and enhancing patient
when multiple raters classify items into distinct, non- outcomes.
overlapping categories, as illustrated in Figure 3. Cohen's
Kappa score is a metric used to measure the performance of REFERENCES
machine learning classification models by evaluating the
[1] J. Allenki and H. K. Soni, "Analysis of chronic liver disease detection by
perfect agreement and agreement by chance. MAE accurately using machine learning techniques," in 2024 IEEE International
measures the average absolute difference between predicted Students' Conference on Electrical, Electronics and Computer Science
and actual values. At the same time, RMSE, akin to MAE, (SCEECS), IEEE. Bhopal, India, 2024, pp. 1-8.
magnifies the impact of larger errors by considering the square [2] I. Hanif and M. M. Khan, "Liver cirrhosis prediction using machine
root of the mean of squared deviations between predicted and learning approaches," in 13th Annual Ubiquitous Computing, Electronics
actual values. & Mobile Communication Conference (UEMCON), IEEE. New York,
USA, 2022, pp. 0028-0034.
[3] R. Manjunath, A. Ghanshala, and K. Kwadiki, "Deep learning algorithm
performance evaluation in detection and classification of liver disease
using CT images," Multimedia Tools and Applications, vol. 83, no. 1, pp.
2773-2790, 2024.
[4] K. Pal, S. Panwar, and D. Choudhury, "A pragmatic approach of heart
and liver disease prediction using machine learning classifiers," in 2024
International Conference on Emerging Systems and Intelligent
Computing (ESIC), IEEE. Bhubaneswar, India, 2024, pp. 728-734.
[5] M. Suárez et al., "Machine learning-based assessment of survival and risk
factors in non-alcoholic fatty liver disease-related hepatocellular
carcinoma for optimized patient management," Cancers, vol. 16, no. 6,
p. 1114, 2024.
[6] N. Han, J. He, L. Shi, M. Zhang, J. Zheng, and Y. Fan, "Identification of
biomarkers in nonalcoholic fatty liver disease: Amachine learning
method and experimental study," Frontiers in Genetics, vol. 13, p.
1020899, 2022.
[7] T. M. Ghazal, A. U. Rehman, M. Saleem, M. Ahmad, S. Ahmad, and F.
Mehmood, "Intelligent model to predict early liver disease using machine
learning technique," in 2022 International Conference on Business
Analytics for Technology and Security (ICBATS), IEEE. Dubai, United
Fig. 3. Error analysis
Arab, 2022, pp. 1-5.
[8] N. Slivinski and Z. Sheikh. Cirrhosis of the liver: Symptoms, stages, and
V. CONCLUSION treatment. https://www.webmd.com/digestive-disorders/understanding-
cirrhosis-basic-information, accessed on 17 April 2024 [Online].
Liver cirrhosis, a potentially life-threatening condition,
[9] R. Amin, R. Yasmin, S. Ruhi, M. H. Rahman, and M. S. Reza, "Prediction
requires prompt treatment to prevent future complications. of chronic liver disease patients using integrated projection based
Developing ML models can assist in detecting liver disease statistical feature extraction with machine learning algorithms,"
early and mitigating its long-term health impacts. Various ML Informatics in Medicine Unlocked, vol. 36, p. 101155, 2023.
algorithms are assessed for their accuracy and errors in [10] K. Gupta, N. Jiwani, N. Afreen, and D. Divyarani, "Liver disease
predicting liver cirrhosis based on physiological factors, with prediction using machine learning classification techniques," in 11th
5 of 5

International Conference on Communication Systems and Network


Technologies (CSNT), IEEE. Indore, India, 2022, pp. 221-226.
[11] R. A. Khan, Y. Luo, and F.-X. Wu, "Machine learning based liver disease
diagnosis: A systematic review," Neurocomputing, vol. 468, pp. 492-509,
2022.
[12] M. Noureddin et al., "Machine learning liver histology scores correlate
with portal hypertension assessments in nonalcoholic steatohepatitis
cirrhosis," Alimentary Pharmacology & Therapeutics, vol. 57, no. 4, pp.
409-417, 2023.
[13] J. Tian, R. Cui, H. Song, Y. Zhao, and T. Zhou, "Prediction of acute
kidney injury in patients with liver cirrhosis using machine learning
models: Evidence from the MIMIC-III and MIMIC-IV," International
Urology and Nephrology, vol. 56, no. 1, pp. 237-247, 2024.
[14] M. P. Behera, A. Sarangi, D. Mishra, and S. K. Sarangi, "A hybrid
machine learning algorithm for heart and liver disease prediction using
modified particle swarm optimization with support vector machine,"
Procedia Computer Science, vol. 218, pp. 818-827, 2023.
[15] G. Chongo and J. Soldera, "Use of machine learning models for the
prognostication of liver transplantation: A systematic review," World
Journal of Transplantation, vol. 14, no. 1, 2024.
[16] E. Dritsas and M. Trigka, "Supervised machine learning models for liver
disease risk prediction," Computers, vol. 12, no. 1, p. 19, 2023.
[17] A. Q. Md, S. Kulkarni, C. J. Joshua, T. Vaichole, S. Mohan, and C.
Iwendi, "Enhanced preprocessing approach using ensemble machine
learning algorithms for detecting liver disease," Biomedicines, vol. 11,
no. 2, p. 581, 2023.
[18] R. K. Sachdeva, P. Bathla, P. Rani, V. Solanki, and R. Ahuja, "A
systematic method for diagnosis of hepatitis disease using machine
learning," Innovations in Systems and Software Engineering, vol. 19, no.
1, pp. 71-80, 2023.
[19] UCI. Cirrhosis patient survival prediction.
https://www.aum.edu.kw/english/innovation-amp-research/centers-amp-
labs/high-performance-computing, accessed 15 April 2024 [Online].
[20] Y. I. Alzoubi, A. E. Topcu, and A. E. Erkaya, "Machine learning-based
text classification comparison: Turkish language context," Applied
Sciences, vol. 13, no. 16, p. 9428, 2023.
[21] F. Shaar, A. Yılmaz, A. E. Topcu, and Y. I. Alzoubi, "Remote sensing
image segmentation for aircraft recognition using u-net as deep learning
architecture," Applied Sciences, vol. 14, no. 6, p. 2639, 2024.
[22] A. E. Topcu, Y. I. Alzoubi, and H. A. Karacabey, "Text analysis of smart
cities: A big data-based model," International Journal of Intelligent
Systems and Applications in Engineering, vol. 11, no. 4, pp. 724-733,
2023.

View publication stats

You might also like