TBC Gastrointestinal

bs_bs_banner
doi:10.1111/jgh.16430
S Y S T E M AT I C R E V I E W
Artificial intelligence for discrimination of Crohn’s disease and

gastrointestinal tuberculosis: A systematic review
Anurag Sachan,* Rinkalben Kakadiya,* Shubhra Mishra,* Praveen Kumar-M,† Anuraag Jena,*
‡ § ¶
Pankaj Gupta, Shaji Sebastian, Parakkal Deepak and Vishal Sharma*
Departments of *Gastroenterology, ‡Radiodiagnosis, Postgraduate Institute of Medical Education and Research, Chandigarh, †Nference Labs, Bengaluru,
India; §IBD Unit, Hull University Teaching Hospitals NHS Trust, Hull, UK; ¶Division of Gastroenterology, Washington University School of Medicine in St.
Louis, St. Louis, Missouri, USA
Key words
Abstract
Intestinal tuberculosis, Gastrointestinal
tuberculosis, Inflammatory bowel disease, Background and Aim: Discrimination of gastrointestinal tuberculosis (GITB) and
Crohn’s disease, Artificial intelligence, Machine Crohn’s disease (CD) is difficult. Use of artificial intelligence (AI)-based technologies
learning, Convolutional neural network. may help in discriminating these two entities.
Methods: We conducted a systematic review on the use of AI for discrimination of GITB
Accepted for publication 13 November 2023. and CD. Electronic databases (PubMed and Embase) were searched on June 6, 2022, to
identify relevant studies. We included any study reporting the use of clinical, endoscopic,
Correspondence Vishal Sharma, Department of and radiological information (textual or images) to discriminate GITB and CD using any
Gastroenterology, Postgraduate Institute of AI technique. Quality of studies was assessed with MI-CLAIM checklist.
Medical Education and Research, Chandigarh, Results: Out of 27 identified results, a total of 9 studies were included. All studies used ret-
India. rospective databases. There were five studies of only endoscopy-based AI, one of
Email: [email protected] radiology-based AI, and three of multiparameter-based AI. The AI models performed fairly
well with high accuracy ranging from 69.6–100%. Text-based convolutional neural net-
Author contributions: Anurag Sachan was
work was used in three studies and Classification and regression tree analysis used in
responsible for the study screening, extraction,
two studies. Interestingly, irrespective of the AI method used, the performance of discrim-
RoB, draft, and revisions. Rinkalben Kakadiya
inating GITB and CD did not match in discriminating from other diseases (in studies where
contributed to the study screening, extraction,
a third disease was also considered).
RoB, and revision. AJ helped in the screening
and validation. Shubhra Mishra, Praveen Kumar-
Conclusion: The use of AI in differentiating GITB and CD seem to have acceptable accu-
M, Pankaj Gupta, Shaji Sebastian, and Parakkal racy but there were no direct comparisons with traditional multiparameter models. The use
Deepak provided critical revisions and approval. of multiple parameter-based AI models have the potential for further exploration in search
Vishal Sharma was responsible for the study of an ideal tool and improve on the accuracy of traditional models.
conception, search, screening, extraction,
drafting, and revisions.
Introduction
Artificial intelligence (AI) is a broad term encompassing several One important clinical dilemma is differentiating Crohn’s dis-
technologies that simulate human intelligence in computer-based ease (CD) from gastrointestinal tuberculosis (GITB).4 GITB is a
processes, typically having the ability to learn and solve specific common clinical condition in many regions, including Asia and
problems. These tools have already become relevant to many Africa. These regions are witnessing an increasing number of pa-
fields of human health and healthcare.1 In gastroenterology, tech- tients with inflammatory bowel diseases. CD is a very close mimic
niques using AI and machine learning have already found use of GITB: The two granulomatous diseases have very similar clin-
and application in detecting abnormalities during endoscopy, in- ical, radiological, endoscopic, and histopathological presentation.5
cluding polyp detection.2 Several other applications have been The microbiological positivity in GITB is low; therefore, the diag-
evaluated, including the detection of small bowel bleeding, identi- nosis is often established without microbiological confirmation.6
fication of underlying diagnoses like inflammatory bowel disease, In many such cases, a therapeutic trial of anti-tubercular therapy
and identification of hepatobiliary diseases.3 The availability of (ATT) is initiated, and the final diagnosis is established based on
high-resolution images and targeted histopathology by modern en- the early mucosal response to therapy. Such a strategy has its
doscopy devices provides a huge amount of processable data. risks—delay in the therapy of CD, potentially increasing
Newer techniques and investigations are being developed rapidly stricturing complications, and risk of ATT-related complications
for enhancing the diagnosis and management of gastrointestinal in patients with CD.7 Therefore, it is of utmost importance to es-
(GI) diseases. These data can be analyzed in real-time and retro- tablish a precise diagnosis at the baseline. Many workers have
spectively using pre-designed algorithms and neural networks. used an approach of combining multiple parameters to have a se-
This computer-aided diagnosis can help improve the diagnostic ac- cure diagnosis.8 Recent reports have also reported using AI, in-
curacy for both beginners and experienced endoscopists. cluding machine learning approaches to differentiate between
422 Journal of Gastroenterology and Hepatology 39 (2024) 422–430

© 2023 Journal of Gastroenterology and Hepatology Foundation and John Wiley & Sons Australia, Ltd.
14401746, 2024, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/jgh.16430 by Cochrane Peru, Wiley Online Library on [23/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
A Sachan et al. Intestinal tuberculosis versus Crohn’s disease
these the two entities. We also explore the emerging techniques information from each study using the MI-CLAIM checklist mod-
and principles applied to this area. ified for the present review.10 The items which were extracted in-
Therefore, we performed a systematic review to explore the role cluded study questions, characteristics, and numbers in training
of AI in differentiating CD and GITB across all platforms of en- and testing cohorts, if the cohorts represented real-life settings or
doscopy, biochemical, histological, and radiological imaging. were derived from real-life conditions without significant exclu-
sions, independence of training and testing cohorts, baseline com-
Methods parison standard, or gold standard for diagnosis, data types used
and data transformations, details of input data, details of model
The present systematic review was conducted in accordance with or code to select the best performing model, metrics to assess the
the updated guidance provided by the Preferred Reporting Items performance of the algorithm for diagnostic differentiation and
for Systematic reviews and Meta-Analyses (PRISMA) statement its clinical utility, details regarding the model (technique of exam-
2020.9 ination, relevance, and interpretability of the techniques) and fi-
nally details regarding the transparency of the model.
Data sources and search strategy. We searched elec-
tronic databases (PubMed and Embase) for eligible studies on June Data synthesis and interpretation. The present system-
6, 2022. The search strategy combined the terms relevant to atic review was planned to systematically summarize the available
Crohn’s disease (Crohn’s disease OR Granulomatous colitis), information on AI in the differentiation of GITB and CD. We did
Gastrointestinal Tuberculosis (Gastrointestinal Tuberculosis OR not plan any quantitative synthesis. The information regarding the
Intestinal Tuberculosis), and Artificial intelligence (Artificial performance of various AI models was summarized including the
Intelligence OR Machine learning OR Neural network OR Deep sensitivity, specificity, positive predictive value, negative predic-
Learning OR Random Forest OR Support Vector machine OR tive value (NPV), and area under receiver operator characteristic
Decision tree OR Natural language processing OR K Nearest curve as available.
neighbour) using the operator ‘AND’. The detailed search strategy
is depicted in Table S1. Additionally, we did a backward and
Quality assessment of studies. We used the MI-CLAIM
forward snowball search to identify additional studies using the
at the stage of data extraction, which includes items regarding the
eligible studies’ references and citations.
appropriateness of model, training and testing cohorts, baseline
The results were combined, and duplicates were removed. The
gold standard, and appropriateness of data and model including
titles were then screened using title, and abstract and eligible stud-
the transparency of the model.10 Therefore, a separate risk of bias
ies finally underwent full-text screening to identify relevant re-
with the standard tools not relevant to the assessment of AI-based
ports. Two investigators working independently did each of the
studies was not performed.
steps, and the differences were resolved by consensus (AS, RK,
AJ, and VS).
Results
Study selection and eligibility. The present systematic A total of 27 records were identified, and 6 duplicates were re-
review included all reports that reported on the use of AI (includ- moved. Finally, 13 studies, including conference abstracts, were
ing machine learning and deep learning) to differentiate GITB and screened for full text. The earliest was published in 2012. We in-
CD. The reports were included irrespective of the kind of data cluded nine studies while four studies were excluded for various
used regarding clinical or investigational work-up, images of en- reasons (see Fig. 1, PRISMA flow chart).8,11–22 Table 1 shows
doscopy, radiology, or histology including textual information. the details of the included studies while Table S2 includes details
The studies were included irrespective of the article type (if they of the excluded studies with the reasons for exclusion.
provided original information), geographic location, language of
publication, or type of publication (abstract or full paper). Endoscopic findings. One of the earliest retrospective study
We excluded those reports that did not directly address the ques- on this topic was by Park et al., which was based on the textual de-
tion (diagnostic differentiation of GITB and CD), contained no scription of ileo-colonoscopic findings from patients with GITB
original data (systematic review, reviews, and comments), or were and CD.11 Two hundred and five patients (106 GITB and 99 CD
not relevant to the study question or used AI-based techniques patients) were enrolled 2:1 into the training and validation cohort.
solely for the analysis of genomics or proteomics-based data. On univariate analysis of the training cohort, scars, transverse
The steps included screening (AS and VS), study selection (AS, shape ulcers, pseudodiverticulum, focal distributions, patulous
RK, VS, and AJ), data extraction (AS, RK, and RJ), and quality ileocecal valve, and ascending colon involvement were in favor
assessment (AS and RK) were performed by two investigators of tuberculosis. In contrast, longitudinal shape ulcer, skip lesion,
independently, and differences were resolved by consensus. No aphthous lesion, cobblestone appearance, pseudopolyp, segmental
automation was used, and each step was performed manually. or diffuse distributions, and involvement of terminal ileum, trans-
verse colon, descending colon, sigmoid colon, or anorectum were
Data collection and extraction. The data from relevant more common in patients with CD than intestinal tuberculosis.
studies were extracted on a proforma by two investigators. We ex- They used classification and regression analysis (CART) to con-
tracted information regarding study authors, geographic location, struct a model with three variables—anorectal involvement, pres-
number of patients/items, the gold standard for diagnosis, and ence of an aphthous lesion, and patulous ileocecal valve. The
numbers used in training and testing subsets. We also extracted CART model could differentiate between CD and GITB in 56/68
Journal of Gastroenterology and Hepatology 39 (2024) 422–430 423

Intestinal tuberculosis versus Crohn’s disease A Sachan et al.
Figure 1 PRISMA flow chart showing the process of study selection and inclusion.
(82.4%) of patients. Another model by stepwise multiple logistic but the authors suggest that it could be easily applied to other
regression with four independent predictors: (1.4 [2.92 languages.
× aphthous lesion] [1.89 × pseudopolyp] + [2.10 × ascending Choi et al. tried to determine the accuracy of AI using retrospec-
colon involvement] [3.59 × anorectal involvement] < 0) could tive images from data archive as compared with beginners and ex-
predict CD with an accuracy of 60/68 (88.2%). perienced endoscopists in differentiating GITB and CD after
In a large study, Tong et al. retrospectively evaluated using elec- colonoscopy.13 They used 2000 images of CD with 1053 in the ac-
tronic health records the utility of two machine learning models— tive phase and 681 images of GITB with 126 in the active phase
random forest (RF) and convolutional neural network (CNN) in for training of the AI model. A test set comprising 100 images
differentiating ulcerative colitis (UC), CD, and GITB with a tex- each of CD and TB was used to assess the sensitivity and specific-
tual description of colonoscopy images.12 They included 875 CD ity between the AI and endoscopists. A deep neural
and 396 GITB patients. The involvement of the ileocecal valve network/convoluted neural network (DNN/CNN) model was used
and undamaged ileum were the differentiating features for GITB. for training the AI model using images and clinical and laboratory
In contrast, lumen stenosis, mucosal congestion, loss of vascular data. The histology findings and response to ATT were used as a
texture, and ulcers covered with white exudates indicate CD. reference standard for this training set. They found that the AI sys-
CNN performed better than RF in differentiating CD from GITB tem had 100% sensitivity, specificity, and NPV in differentiating
with sensitivity/specificity of 90.0%/77.1% and 72.3%/77.2%. CD from GITB, but only 40% of novice endoscopists could differ-
The performance of both RF and CNN was better in differentiating entiate CD from GITB.
UC from CD and GITB, compared with the differentiation of TB Kim et al. trained an AI algorithm using retrospective data of
and CD. The shortcomings were using text descriptions of images 6617 colonoscopy images, including 2123 CD patients, and
rather than visual images. The text was in the Chinese language, 1642 GITB patients to differentiate between intestinal Bechet’s

Table 1 The included studies with details of patients, diagnostic criteria, and artificial intelligence used with performance of various models
A Sachan et al.
Serial Reference Study details Population Gold Standard Cohort Data/Description AI method used Results Outcome measure
number
1. Park et al Korea CD—99 Not provided Training Ileocolonoscopic Classification and Tree model algorithm Diagnosis of ITB:
11
2012 Jan 2003–Jan ITB—106 Validation features regression tree (CART) of four variables sensitivity: 80.0%,
2010 analysis including the anorectal specificity: 90.9%,
Retrospective involvement, ulcer PPV: 90.3%, NPV:
review of shape, aphthous 81.1%
ileocolonoscopic lesion, and patulous IC Accuracy of
findings valve diagnosis = 85.3%
2. Tong Y et al China CD—875 Clinical, microbiological Training Textual description of Random forest (RF) and The performance of RF:
12
2020 January 2008 to ITB—396 and response based 70% colonoscopy image convolutional neural these models was sensitivity = 72.3%,
November 2018 diagnosis Testing— network (CNN) much poorer in specificity = 77.2%,
30% discriminating ITB-CD AUC = 0.816
Journal of Gastroenterology and Hepatology 39 (2024) 422–430

as compared with CNN:
ITB-UC or UC-CD sensitivity = 90.0%,
specificity = 72.1%,
AUC = 0.910
3. Kim JM et al Republic of Korea CD—211 Diagnostic clinical Training 80% Typical image Convolutional neural Included Behçet’s Typical image:–
14
2021 January 2000 and ITB—217 guideline Validation All image network, disease also. Analysis Accuracy = 75.66%,
June 2019 2123 images of 10% gradient-weighted class confirms that ITB-CD AUC = 0.78
CD patients, and Test sets activation mapping differentiation is most All image:—
1642 of ITB 10% difficult Accuracy = 69.59%
patients AUC = 0.82
4. Zhu C et al China CD—93 Clinical, histological and Training— Region of interest Radiomics features Clinical radiomics Validation cohort—
16
2021 Retrospective ITB—67 response to treatment 111 (ROI) for the lesions in filtered by the gradient model performed AUROC: 0.93,
June 2015–May Test—49 the ileocecal region on boosting decision tree better than clinical and Sensitivity: 89.3%,
2021 computed tomography (GBDT) radiomics models Specificity: 90.5%,
accuracy: 89.8%
5. Choi Y-I et al South Korea CD (2000 images, Clinical, laboratory, Training Colonoscopy images Deep neural network/ Sensitivity = 100%,
13
2019 January 2010 to 1053 active CD) colonoscopy, histology cohort of Clinical and laboratory convolutional neural Specificity = 100%,
June 2018 and TB (681 and response to ATT. images from data network (DNN/CNN) PPV = 100%
images, 126 database NPV = 100%
active) Testing Compared with 40%
cohort from sensitivity of novice
patients endoscopists
undergoing
colonoscopy
6 Weng F et al China Active CD = 160 Microbiological/ 60% training Text description of Explainable machine Compared with ANN, XGBoost best AI
18
2022 Active ITB = 40 histological or clinical cohort demographic data, learning. SVM and Bayes method AUC = 0.891
response to ATT clinical manifestations, Three step process Sensitivity = 81.3%
(Continues)
Intestinal tuberculosis versus Crohn’s disease
425
426
Table 1 (Continued)
Serial Reference Study details Population Gold Standard Cohort Data/Description AI method used Results Outcome measure
number
40% testing biochemical indicators, 1 SMOTE (uses Specificity = 96.9%

cohort and endoscopic KNN) based Precision = 0.867
performance processing for the
imbalanced data
2 Tree based model
(XGBoost) to
detect CD from ITB
3 Interpretation and
visualization of
Intestinal tuberculosis versus Crohn’s disease
model using
Shapley additional
explanation
7 Lu Y et al China CD = 84 (training) Clinical diagnosis Training Multivariate positive Classification and Accuracy = 88.64%
17
2021 Retrospective + 22(validation) cohort: May TSPOT, regression tree (CART) Sensitivity = 90.91%
study ITB = 84 (training) 1, 2013–April 4 or more segments Specificity = 86.36%
May 1, 2013–April + 22(validation) 30, 2019 (84 involved, longitudinal PPV = 86.96%
30, 2019: training each) ulcer, circular ulcer, NPV = 90.48%
cohort, May 1, Validation and aphthous ulcer
2019–May 1, 2020: cohort: May
validation 1, 2019 and
May 1, 2020
(22 each)
8 Lu K et al China CD (n = 875) ITB ITB: Evidence of 80% training Chinese textual Natural language TextCNN (distill)
15
2022 Based on (n = 396) microbiological/ cohort description of processing (NLP) Accuracy = 84%,
electronic health histological/extra- 20% testing endoscopic data Using text convolutional F1 score = 0.88
records abdominal Tuberculosis/ cohort neural network TextCNN (Robust)
January 2008 to response to ATT (TextCNN) Accuracy = 87%,
November 2018 CD: F1 score = 0.77
Clinical, endoscopic,
and pathological
findings with clinical
response to CD
treatment
9 Chen Y et al China 287 cases ITB: (I) caseating 70% training 50 indicators including Fusion correlation neural Accuracy: 91.32%,
19
2022 January 2012 and granuloma detected in cohort demographics, clinical network Sensitivity: 93.5%
January 2021 the intestinal biopsy, 30% testing features, laboratory, Specificity: 86%
surgical specimen, or cohort radiological and Better performance
mesenteric lymph endoscopic features than random forest,
nodes; and (II) complete SVM, extreme
clinical recovery learning machine
accompanied by
(Continues)
Journal of Gastroenterology and Hepatology 39 (2024) 422–430
A Sachan et al.
AUROC, area under receiver operator characteristic curve; GBDT, gradient boosting decision tree; KNN, k-nearest neighbor; NPV, negative predictive value; PPV, positive predictive value; SMOTE,
neural network (DNN)
disease, CD, and GITB.14 Of the images 80% were used for train-
Outcome measure
ing of AI using CNN, while 10% each were used for validation
(ELM), and deep

and testing of the algorithm. They divided the imaging cohort into
‘typical’ images according to their pre-determined criteria as avail-
able in literature for each disease. Typical images of CD patients
were defined as having deep, stellate, longitudinal, linear, or ser-
piginous ulcers; multiple aphthous ulcers; and cobblestones ap-
pearance of the mucosa, and skip mucosal lesions. Transverse
ulcers, localized involvement, and a patulous ileocecal valve sig-
nificantly favored colonoscopy images of ITB patients. They used
area under ROC curve (AUC) to assess the prediction model. The
Results
AUC for differentiating CD and GITB was 0.7846 (95% confi-

dence interval [CI] [0.7379–0.8313]) for all images and 0.8211
for typical images. There was no significant difference in the pre-
dictive value of the algorithm between all images and typical im-
ages showing the strength of the AI algorithm. Interestingly, the
performance of this model was better in differentiating Behcet’s
AI method used
disease from CD and GITB than differentiating CD from GITB.

Lu et al. created an AI tool using prospective patient data to
achieve high accuracy. They developed a model using translated
text-based description of electronic colonoscopy records. They an-
alyzed total of 1271 patients with distribution: CD (n = 875) or
GITB (n = 396).15 The data were divided as 80% as training set
Data/Description
and 20% as validation set. The model was trained using CNN.
They described serial steps such as fine-tuning, distilling, interpre-
tation, manual check, debiasing, and then finally deploying for en-
hancing natural language processing with this tool and used
statistical analysis to test the predictive ability. They found that
the TextCNN model had an accuracy of 83% in the original data
whereas accuracy was reduced to 70% for validation set. They
concluded that only endoscopic data had far less applicability than
Cohort
clinical, laboratory, and endoscopic findings when combined with

the suggestion of using text-based CNN in further research.
intestine or mesenteric
revealing the diagnosis
lymph nodes; and (II) a

granuloma detected in
favorable response to
endoscopic mucosal
CD: (I) pathology of

surgical specimen
healing with ATT
Gold Standard
non-caseating
Radiology. The study by Zhu et al. used images from com-

synthetic minority oversampling technique; SVM, support vector machine.
of CD, with
CD therapy
puted tomography enterography (CTE) retrospectively extracted

from data records. They used radiomics for data extraction from
the CTE images from the region of interest (ROI) selected by ex-
perienced radiologist. A total of 93 patients with CD and 67 pa-
tients with GITB were included where 111 were included in the
training cohort and remaining in the test cohort.16 They used clin-
Population
ical, histological, and radiological response to specific treatment

for a diagnosis of CD and GITB. Further, they developed a clinical
model and a radiomics model differentiating between the two dis-
eases. The clinical model included T-SPOT test results and small
intestine segmental lesion (SISL) as independent predictors. They
Study details
selected a ROI from each imaging study and identified nine fea-
tures from 1595 features to create a radiomics model, which
showed a significant difference in differentiating between CD
and GITB in both training and validation cohorts. They even stud-
ied a combined clinical radiomics model with three parameters:
(Continued)
T-SPOT result, SISL, and radiomics score and calculated a nomo-

Serial Reference
gram using multivariate regression analysis. AUC of the clinical

radiomics score was 0.96(0.93–0.99) in training cohort thus
outperforming the clinical and radiomics model but 0.93 in valida-
Table 1
number
tion cohort with no statistically significant difference than them. Its

major limitation was the selection of ROI in the region of

ileo-cecal and ascending colon by a radiologist and exclusion of the primary statistical metric used.11,13,15,17,19 Only three studies
non-significant lesions. have shared the code completely.15,16,19
Multiple parameters for differentiation. Lu et al. cre-

Discussion
ated an AI model using retrospective medical records including The present systematic review identified varied approaches, in-
clinical, laboratory, and colonoscopy findings.17 They included cluding use of textual or image-based data of clinical, radiological,
84 patients each in training and 22 each in the validation cohort and endoscopic information for differentiating GITB and CD
of GITB and CD patients. They developed a CART-based model (Fig. 2). Four studies utilized a tree-based machine learning ap-
with longitudinal ulcer, positive T-SPOT, aphthous ulcer, circular proach where the sensitivity varied from 80% to 91% while the
ulcer and ≥ 4 segments of colon involved after multivariate analy- specificity varied from 89.4% to 96.9%.[Table 1]11,12,15,18 CART
sis. This model could differentiate between the two diseases with and CNN were the two methods of ML used more frequently in
an overall accuracy of 88.64% (sensitivity of 90.91%, specificity the studies included in the review. The CNN model was further
of 86.36%). modified to increase the accuracy and adaptability of an AI tool.
Weng et al. prospectively collected clinical, laboratory, and en- Two of the studies were from the same data set and author group,
doscopic data. They finally extracted nine variables to be used for but a different form of TextCNN was used.12,15 The performance
diagnosis using multiple statistical methods, and machine learning of various models varied, and some studies also reported compar-
methods were compared with an XGBoost algorithm.18 A total of isons between different models. However, given the heterogeneity
160 patients with CD and 40 patients with TB were divided into of various studies with respect to included information and AI
training (60%) and validation (40%) cohorts with stratified random models, no summative analyses were performed. Many studies
sampling. Here, they used statistical methods of linear discrimi- had an imbalance of the data in terms of GITB and CD as per
nant analysis and logistic regression, whereas machine learning the prevalence in the respective area. Some studies have tried to
methods of artificial neural network, support vector machine with use statistical methods like SMOTE to deal with the imbalance be-
different kernel functions, Bayesian regression (Bayes), RF, and tween the groups.18 Many studies preferred the textual description
gradient boosting decision tree were compared with XGBoost. of the endoscopy images over the actual images for its simplicity,
The data in two cohorts were balanced using an ML method of but it does not decrease the workload on human resources as the
synthetic minority oversampling technique (SMOTE). The operator would still have to feed description manually for the AI
XGBoost outperformed all the statistical and ML methods it was to analyze rather than the AI picking the image up directly during
compared to, with an AUC of 0.85. They used Shapley additional visual examination. In studies that compared multiple models, two
explanations (Shap) method as the explanation of this particular studies demonstrated a better performance of CNN as compared
ML technique, which can be implemented for individual patients. with RF model.12,19 Chen et al. also demonstrated better perfor-
Chen et al. in their retrospective dataset used clinical, labora- mance of FCNN as compared with other models like DNN,
tory, endoscopic, and radiological findings.19 They used 287 pa- SVM, and RF.19 Brief descriptions of various models used have
tients in 7:3 ratio, that is, 200 patients as training cohort and 87 been provided in Table S4.
patients as validation cohort. They found that patients with One common thread across the studies, compared the ability to
perianal fistulas, skin and mucosal lesions, intestinal perforation, differentiate multiple diseases (Behcet’s or UC in addition to
comb sign, and a cobblestone appearance were more likely to be GITB and CD) in addition to GITB and CD, was that the models
diagnosed with CD whereas ascites was suggestive of intestinal tu- had the most difficulty in differentiating GITB and CD.14 This cor-
berculosis. They have shown how we can further improve the ac- relates with the clinical knowledge of these two conditions being
curacy of deep neural networks in differentiating GITB and CD by the difficult to differentiate. Differentiation between GITB and
using fusion correlation neural networks (FCNN), which fuse the CD is a major issue in the developing world especially because
attributes and the correlation information between training and IBD is rising. Unfortunately, most of the published literature on
sample set. Comparison between multiple ML methods was per- the use in AI in IBD has focused on diagnosis and prognostication
formed including linear regression, decision tree, RF, support vec- but not on differentiation from the mimics.
tor machine (SVM), extreme learning machine (ELM), and DNN. Previously, multiple reports have used a combination of patient
The mean and maximum accuracy of FCNN in this study was characteristics and investigation modalities for differentiation of
91.32% and 97.70% respectively which was significantly higher GITB and CD. This approach has been used by Limsrivilai et al.
than other ML techniques. utilizing data from their meta-analysis using a Bayesian model
where their model achieved a high sensitivity and specificity of
90% and 92%, respectively.8 Most of the AI models could not
Assessment of quality of studies. We evaluated each of achieve higher sensitivity than this Bayesian model. Few models
the included studies using the MI-CLAIM checklist. The detailed based on CNN of image description and FCNN did achieve a
assessment is depicted in Table S3. All the studies had a clear re- higher sensitivity. A non-AI multiparameter model based on seven
search question and the patients selected were real world patients. parameters selected by multivariate regression analysis, that is,
The training and validation set characteristics have not been given age, diarrhea, ring shaped ulcer, longitudinal ulcer, sigmoid in-
in two studies.12,15 Details of the model were not provided with volvement, suspicious radiological pulmonary tuberculosis, and
abstracts, which was expected due to word limits.11,13,17 Most gender, had a high sensitivity approaching 98% with a specificity
used structured data however free text was used in some of 92%, which raises the question about the utility of AI currently
studies.13,15 Five of the studies did not address the rationality of in practice.23 Studies comparing ML with traditional statistical

models have found that AI outperforms the statistical methods in studies is a composite of clinical, radiological, histological, treat-
answering the complex question in hand.15,18 These two studies ment response, and endoscopic findings for training the AI
have independently claimed that novel ML methods such as method, which is not fool-proof. We used MI-ClAIM checklist
XGBoost and FCNN outperform other ML and statistical for study quality rather than QUADAS tool as it is still undergoing
methods.15,18 A study by Chen et al. suggests that we can improve development for AI-related studies.24 On the other hand, we pro-
on CNN AI model by fusion of the correlation and attributes be- vide a summary of various approaches used and identify lacunae
tween training and validation cohort.19 A head-to-head trial be- for further research in this enigmatic area. This is the first system-
tween these two models for superiority would be needed in atic review on this topic with coverage of AI with single and mul-
further studies. tiple parameters to differentiate GITB and CD. The results suggest
An AI method completely based on single modality such as en- that AI-based tools may make a differentiation between two enti-
doscopy findings are upcoming and few models as described ties reasonably and may be of value in reducing the load of inves-
above have reached accuracy of 75–90% in this differentiation tigations if a single-parameter-based model could be developed.
of GITB and CD.11,14,15,19 The role of radiomics, with clinical data Future research should include robust prospective multicenter data
without endoscopy findings in an AI model, is again a recent de- inputs for the development of a prediction model for differentia-
velopment, and a single study showed a superior AUC of 0.93 in tion between GITB and CD. These studies should be planned for
its validation cohort, which was superior to other studies based better and easy applicability in clinical management. The clinical
only on endoscopic findings.9 The use of a single AI tool with lim- accuracy and cost-effectiveness of AI-based models should be
ited parameters would definitely be cheaper (and perhaps quicker) studied. The role of AI is ever-increasing in all fields, let alone
as compared with multiparameter conventional tools, and this area medical field. The development of AI tools has been unprece-
needs further high-quality research for regions with limited finan- dented over the last few years, and AI may have the answer to this
cial resources. challenging clinical scenario haunting the developing countries.
The limitations of the present systematic review include hetero- Understanding the present research and the lacunae (use of single
geneity in the included population, in the gold standard for the di- parameter, lack of use of images, lack of combination of radiolog-
agnosis of GITB, different types of data used, variation in AI ical and endoscopic images, and lack of use of histologic images)
techniques, and the reporting methods. The gold standard in most and taking forward the work putting link by link in this chain of
Figure 2 Sunburst plot of the studies included depicting the data and AI methods used in various studies.

research is the way forward. This systematic review is aimed to 13 Choi YI, Kim YJ, Park DK. Su1787–Development of a deep learning
provide a link in the same pathway. algorithm for differential diagnosis between Crohn’s disease and
To conclude, AI is relevant in the difficult clinical scenario of intestinal tuberculosis. Gastroenterology 2019; 156: S-611.
14 Kim JM, Kang JG, Kim S, Cheon JH. Deep-learning system for
differentiating GITB from CD. Various models have been tried
real-time differentiation between Crohn’s disease, intestinal Behçet’s
and have shown acceptable accuracy but currently not superior
disease, and intestinal tuberculosis. J Gastroenterol Hepatol 2021; 36:
to the statistical models. ML methods using multiparametric data, 2141–8.
that is, clinical combined with endoscopic/radiological and labora- 15 Lu K, Tong Y, Yu S et al. Building a trustworthy AI differential
tory investigations such as T-SPOT test have the best discrimina- diagnosis application for Crohn’s disease and intestinal tuberculosis.
tory ability. There is a lack of substantive data on the use of BMC Med Inform Decis Mak; 23: 160.
image-based models (radiological, endoscopic, and histological) 16 Zhu C, Yu Y, Wang S et al. A novel clinical radiomics nomogram to
for differentiating GITB and CD. identify Crohn’s disease from intestinal tuberculosis. J Inflamm Res
2021; 14: 6511.
17 Lu Y, Chen Y, Peng X, Yao J, Zhong W, Li C, Zhi M. Development and
validation of a new algorithm model for differential diagnosis between
Crohn’s disease and intestinal tuberculosis: a combination of
laboratory, imaging and endoscopic characteristics. BMC Gastroenterol
References 2021; 21: 1–9.
1 Davenport T, Kalakota R. The potential for artificial intelligence in 18 Weng F, Meng Y, Lu F et al. Differentiation of intestinal tuberculosis
healthcare. Future Healthc J 2019; 6: 94. and Crohn’s disease through an explainable machine learning method.
2 Topol EJ. High-performance medicine: the convergence of human and Sci Rep 2022; 12: 1–2.
artificial intelligence. Nat Med 2019; 25: 44–56. 19 Chen Y, Li Y, Wu M, Lu F, Hou M, Yin Y. Differentiating Crohn’s
3 Le Berre C, Sandborn WJ, Aridhi S et al. Application of artificial disease from intestinal tuberculosis using a fusion correlation neural
intelligence to gastroenterology and hepatology. Gastroenterology network. Knowledge-Based Syst 2022; 244: 108570.
2020; 158: 76–94. 20 Zhang F, Xu C, Ning L et al. Exploration of serum proteomic profiling
4 Jha DK, Pathiyil MM, Sharma V. Evidence-based approach to and diagnostic model that differentiate Crohn’s disease and intestinal
diagnosis and management of abdominal tuberculosis. Indian J tuberculosis. PLoS ONE 2016; 11: e0167109.
Gastroenterol 2023; 42: 17–31. 21 Rukmangadachar LA, Makharia GK, Mishra A et al. Proteome analysis
5 Goyal P, Shah J, Gupta S, Gupta P, Sharma V. Imaging in of the macroscopically affected colonic mucosa of Crohn’s disease and
discriminating intestinal tuberculosis and Crohn’s disease: past, present intestinal tuberculosis. Sci Rep 2016; 6: 1.
and the future. Expert Rev Gastroenterol Hepatol 2019; 13: 995–1007. 22 Markandey M, Bajaj A, Vuyyuru SK et al. P709 Distinct pattern of gut
6 Sharma V, Soni H, Kumar-M P et al. Diagnostic accuracy of the Xpert microbial dysbiosis in Crohn’s disease and intestinal tuberculosis—a
MTB/RIF assay for abdominal tuberculosis: a systematic review and machine learning-based classification model. J Crohns Colitis 2022;
meta-analysis. Expert Rev Anti Infect Ther 2021; 19: 253–65. 16: i606–7.
7 Gupta A, Pratap Mouli V, Mohta S et al. Antitubercular therapy given 23 Jung Y, Hwangbo Y, Yoon SM et al. Predictive factors for
to differentiate Crohn’s disease from intestinal tuberculosis predisposes differentiating between Crohn’s disease and intestinal tuberculosis in
to stricture formation. J Crohns Colitis 2020; 14: 1611–8. https://doi. Koreans. Am J Gastroenterol 2016; 111: 1156–64. https://doi.org/10.
org/10.1093/ecco-jcc/jjaa091 1038/ajg.2016.212
8 Limsrivilai J, Shreiner AB, Pongpaibul A et al. Meta-analytic Bayesian 24 Sounderajah V, Ashrafian H, Rose S et al. A quality assessment tool for
model for differentiating intestinal tuberculosis from Crohn’s disease. artificial intelligence-centered diagnostic test accuracy studies:
Am J Gastroenterol 2017; 112: 415. QUADAS-AI. Nat Med 2021; 27: 1663–5.
9 Page MJ, McKenzie JE, Bossuyt PM et al. The PRISMA 2020
statement: an updated guideline for reporting systematic reviews. Syst Supporting information
Rev 2021; 10: 1–1.
10 Norgeot B, Quer G, Beaulieu-Jones BK et al. Minimum information Additional supporting information may be found online in the
about clinical artificial intelligence modeling: the MI-CLAIM Supporting Information section at the end of the article.
checklist. Nat Med 2020; 26: 1320–4.
11 Park JJ, Park SJ, Hong SP, Kim TI, Kim WH, Cheon JH. Su1932 Table S1: Search strategy for the systematic review.
differential diagnosis between intestinal tuberculosis and Crohn’s Table S2: The excluded studies with reasons for exclusion.
disease by ileocolonoscopic findings. Gastroenterology 2012; 142: Table S3: Tables showing the critical appraisal of the included
S–539. studies using MI-CLAIM checklist.
12 Tong Y, Lu K, Yang Y et al. Can natural language processing help Table S4: Description of various AI related modalities used for
differentiate inflammatory intestinal diseases in China? Models discriminating GITB and CD.
applying random forest and convolutional neural network approaches.
BMC Med Inform Decis Mak 2020; 20: 1–9.


TBC Gastrointestinal

Uploaded by

Copyright:

Available Formats

TBC Gastrointestinal

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TBC Gastrointestinal

Uploaded by

Copyright:

Available Formats

bs_bs_banner

Artiﬁcial intelligence for discrimination of Crohn’s disease and

422 Journal of Gastroenterology and Hepatology 39 (2024) 422–430

Journal of Gastroenterology and Hepatology 39 (2024) 422–430 423

424 Journal of Gastroenterology and Hepatology 39 (2024) 422–430

Journal of Gastroenterology and Hepatology 39 (2024) 422–430

40% testing biochemical indicators, 1 SMOTE (uses Speciﬁcity = 96.9%

(ELM), and deep

AUC for differentiating CD and GITB was 0.7846 (95% conﬁ-

disease from CD and GITB than differentiating CD from GITB.

clinical, laboratory, and endoscopic ﬁndings when combined with

lymph nodes; and (II) a

CD: (I) pathology of

Radiology. The study by Zhu et al. used images from com-

puted tomography enterography (CTE) retrospectively extracted

ical, histological, and radiological response to speciﬁc treatment

T-SPOT result, SISL, and radiomics score and calculated a nomo-

gram using multivariate regression analysis. AUC of the clinical

tion cohort with no statistically signiﬁcant difference than them. Its

Journal of Gastroenterology and Hepatology 39 (2024) 422–430 427

Multiple parameters for differentiation. Lu et al. cre-

428 Journal of Gastroenterology and Hepatology 39 (2024) 422–430

Journal of Gastroenterology and Hepatology 39 (2024) 422–430 429

430 Journal of Gastroenterology and Hepatology 39 (2024) 422–430

You might also like