Rimsha Qaisar
Rimsha Qaisar
Rimsha Qaisar
By
Department of Statistics
Islamabad, Pakistan
(2024)
Outlier Detection in
Gastrointestinal Tract Images
using Machine Learning Algorithms
By
Rimsha Qaisar
Masters of Science in
Statistics
Islamabad, Pakistan
(2024)
DEDICATION
To my parents and sisters, whose constant love and encouragement have been my pil-
lars of strength. Their sacrifices and unwavering faith have guided me through every
challenge and made this achievement possible.
ACKNOWLEDGEMENTS
All praise and gratitude are due to Allah Almighty, the most compassionate and
benevolent, who created the universe and endowed me with countless blessings. I
am deeply thankful for the strength and perseverance granted to me, which enabled
me to complete this thesis. My heartfelt appreciation extends to my supervisor, Dr.
Tahir Mehmood, whose unwavering support, insightful guidance, and patience have
been instrumental throughout this journey. May Allah bestow upon him His abun-
dant blessings. This research would not have been possible without his expert advice
and encouragement, which have significantly deepened my understanding and respect
for this field. I am also really greatful to my GEC members, Dr. Firdos Khan and
Dr. Zamir Hussain, for their support and direction in finalizing this thesis. Lastly, I
am profoundly grateful to my family and friends for their continuous support and en-
couragement throughout my academic endeavors. May God bless them all for mak-
ing this challenging journey a success.
Contents
LIST OF TABLES VI
LIST OF ABBREVIATIONS IX
ABSTRACT X
1 Introduction 1
1.1 A Brief Overview of Machine Learning and Its Methodologies . . . . . 1
1.1.1 Methodologies of Machine Learning . . . . . . . . . . . . . . . . 1
1.2 Applications of Machine Learning in Image Recognition . . . . . . . . . 2
1.2.1 Medical Image Analysis . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Deep Learning in Medical Image Analysis . . . . . . . . . . . . 2
1.2.2.1 Motivation for the Study . . . . . . . . . . . . . . . . . 3
1.2.3 Study Design and Methodology . . . . . . . . . . . . . . . . . . 4
1.3 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Dataset Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1.1 Classes Included in the Dataset . . . . . . . . . . . . . 5
1.3.1.1.1 Anatomical Landmarks: . . . . . . . . . . . . 5
1.3.1.1.2 Pathological Findings: . . . . . . . . . . . . . 5
1.3.1.1.3 Polyp Removal: . . . . . . . . . . . . . . . . . 6
1.3.1.2 Dataset Categories . . . . . . . . . . . . . . . . . . . . 6
2 LITERATURE REVIEW 7
2.1 Previous Researches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Research Gaps: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
II
3 MATERIALS AND METHODS 19
3.1 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 Kvasir Dataset Overview . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1.1 Dataset Categories . . . . . . . . . . . . . . . . . . . . 20
3.1.1.1.1 Anatomical Landmarks . . . . . . . . . . . . 20
3.1.1.1.2 Pathological Findings . . . . . . . . . . . . . 21
3.1.1.1.3 Polyp Removal . . . . . . . . . . . . . . . . . 24
3.1.1.2 Endoscopic Procedures . . . . . . . . . . . . . . . . . . 25
3.1.1.2.1 Colonoscopy . . . . . . . . . . . . . . . . . . . 25
3.1.1.2.2 Gastroscopy . . . . . . . . . . . . . . . . . . . 26
3.1.1.3 Clinical Significance . . . . . . . . . . . . . . . . . . . 26
3.1.1.3.1 Impact on Research . . . . . . . . . . . . . . 26
3.2 Research Workflow: From Data Preprocessing to Model Training . . . . 26
3.2.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Classification Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.1 Convolutional Neural Network (CNN) Implementation . . . . . 28
3.3.1.1 Data Augmentation and Preprocessing . . . . . . . . . 28
3.3.1.2 Binary CNN Architecture . . . . . . . . . . . . . . . . 29
3.3.1.2.1 Model Compilation and Training for Binary
CNN: . . . . . . . . . . . . . . . . . . . . . . 30
3.3.1.3 Multi-Class CNN Architecture . . . . . . . . . . . . . 30
3.3.1.3.1 Model Compilation and Training for Multi-
Class CNN: . . . . . . . . . . . . . . . . . . . 31
3.3.2 DenseNet121 Architecture Overview . . . . . . . . . . . . . . . 31
3.3.2.1 Binary and Multi-Class DenseNet Architecture . . . . 32
3.3.2.2 Model Compilation and Training . . . . . . . . . . . . 33
3.3.3 Outlier Detection and Data Refinement . . . . . . . . . . . . . . 34
3.3.3.1 K-Means ClusteringAlgorithm . . . . . . . . . . . . . . 35
3.3.3.1.1 How K-Means Clustering Works: . . . . . . . 35
3.3.3.1.2 Objective of K-Means Clustering: . . . . . . . 36
3.3.3.1.3 Evaluating Clustering Quality Using Silhou-
ette Analysis: . . . . . . . . . . . . . . . . . . 36
3.3.3.1.3.1 Silhouette Score Calculation: . . . . . . 36
3.3.3.1.3.2 Average Silhouette Score: . . . . . . . . 37
3.3.3.1.4 Selecting the Optimal K: . . . . . . . . . . . . 37
3.3.3.1.5 Visual Demonstration: Clustering M1 and
M2 Using K-Means . . . . . . . . . . . . . . . 38
3.3.3.2 Outlier Detection Using K-Means Clustering . . . . . . 45
3.3.3.2.1 Distance Calculation: . . . . . . . . . . . . . . 45
3.3.3.2.2 Compute Quartiles: . . . . . . . . . . . . . . . 45
3.3.3.2.3 Calculate the IQR: . . . . . . . . . . . . . . . 45
3.3.3.2.4 Determine the Upper Bound: . . . . . . . . . 46
3.3.3.2.5 Identify Outliers: . . . . . . . . . . . . . . . . 46
3.3.3.3 Post-Outlier Detection Data Refinement and Classifi-
cation . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.3.3.1 Multi-Class Classification: . . . . . . . . . . . 46
3.3.3.3.2 Binary Classification: . . . . . . . . . . . . . . 47
4.1 Accuracy per class across different folds, including average accuracy
for each class and the total average accuracy. . . . . . . . . . . . . . . . 49
4.2 Accuracy per class across different folds, including average accuracy
for each class and the total average accuracy. . . . . . . . . . . . . . . . 52
4.3 Accuracy per class across different folds, including average accuracy
for each class and the total average accuracy. . . . . . . . . . . . . . . . 71
4.4 Accuracy per class across different folds, including average accuracy
for each class and the total average accuracy. . . . . . . . . . . . . . . . 74
4.5 Accuracy of CNN on refined data across different runs. . . . . . . . . . 76
4.6 Model accuracy across different runs . . . . . . . . . . . . . . . . . . . 76
4.7 Sample Images of Outliers Across Categories . . . . . . . . . . . . . . . 80
VI
List of Figures
3.1 Esophagitis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Polyp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 ulcerative colitis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 dyed-and-lifted-polyp . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 dyed-resection-margin . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6 various types of endoscopy examinations . . . . . . . . . . . . . . . . . 26
3.7 Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . . . . . 31
3.8 Densely Connected Convolutional Network . . . . . . . . . . . . . . . . 34
3.9 x-y axis scatter plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.10 Randomly Selected Centroids for Initial Clustering with K=2 . . . . . . 39
3.11 Centroid Distance Calculation . . . . . . . . . . . . . . . . . . . . . . . 39
3.12 Cluster Assignment Visualization . . . . . . . . . . . . . . . . . . . . . 40
3.13 Centroid Recalculation Process . . . . . . . . . . . . . . . . . . . . . . 41
3.14 New Cluster Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.15 Centroid Initialization and Assignment . . . . . . . . . . . . . . . . . . 42
3.16 Updated Cluster Centroids . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.17 Cluster Reassignment Process . . . . . . . . . . . . . . . . . . . . . . . 43
3.18 Converged Clustering Result . . . . . . . . . . . . . . . . . . . . . . . . 44
3.19 Final Cluster Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . 44
VII
4.6 Distance Distribution Plot with Outlier Threshold . . . . . . . . . . . . 58
4.7 Distance Distribution Plot with Outlier Threshold . . . . . . . . . . . . 58
4.8 Silhouette Score for Class 2 . . . . . . . . . . . . . . . . . . . . . . . . 59
4.9 Silhouette Score for Class 3 . . . . . . . . . . . . . . . . . . . . . . . . 60
4.10 Silhouette Score for Class 4 . . . . . . . . . . . . . . . . . . . . . . . . 60
4.11 Silhouette Score for Class 5 . . . . . . . . . . . . . . . . . . . . . . . . 61
4.12 Silhouette Score for Class 6 . . . . . . . . . . . . . . . . . . . . . . . . 61
4.13 Silhouette Score for Class 7 . . . . . . . . . . . . . . . . . . . . . . . . 62
4.14 Silhouette Score for Class 8 . . . . . . . . . . . . . . . . . . . . . . . . 62
4.15 Distance Distribution Plot with Outlier Threshold for Class 2 . . . . . 63
4.16 Distance Distribution Plot with Outlier Threshold for Class 2 . . . . . 64
4.17 Distance Distribution Plot with Outlier Threshold for Class 3 . . . . . 64
4.18 Distance Distribution Plot with Outlier Threshold for Class 3 . . . . . 65
4.19 Distance Distribution Plot with Outlier Threshold for Class 4 . . . . . 65
4.20 Distance Distribution Plot with Outlier Threshold for Class 4 . . . . . 66
4.21 Distance Distribution Plot with Outlier Threshold for Class 5 . . . . . 66
4.22 Distance Distribution Plot with Outlier Threshold for Class 5 . . . . . 67
4.23 Distance Distribution Plot with Outlier Threshold for Class 6 . . . . . 67
4.24 Distance Distribution Plot with Outlier Threshold for Class 6 . . . . . 68
4.25 Distance Distribution Plot with Outlier Threshold for Class 7 . . . . . 68
4.26 Distance Distribution Plot with Outlier Threshold for Class 7 . . . . . 69
4.27 Distance Distribution Plot with Outlier Threshold for Class 8 . . . . . 69
4.28 Distance Distribution Plot with Outlier Threshold for Class 8 . . . . . 70
4.29 CNN Test Accuracies Per Class . . . . . . . . . . . . . . . . . . . . . . 73
4.30 DENSENET Test Accuracies Per Class . . . . . . . . . . . . . . . . . . 75
4.31 Training and validation accuracy and loss for Run 3. . . . . . . . . . . 76
4.32 Accuracy and loss during training and validation for Run 4. . . . . . . 77
LIST OF ABBREVIATIONS
ML Artificial Intelligence
ML Machine Learning
DL Deep Learning
GI Gastrointestinal Tract
CNN Convolutional Neural Network
DENSENET Densely Connected Convolutional Network
KMC K-Means Clustering
IX
Abstract
This study explores the effectiveness of Convolutional Neural Networks (CNNs) and
DenseNet models for binary and multi-class classification tasks using the Kvasir dataset,
which consists of gastrointestinal tract images. The research aims to perform a com-
parative analysis of CNN and DenseNet models on both raw and refined datasets.
Initially, binary and multi-class classifications were conducted on the original dataset
to establish baseline performances for both models. Following this, K-Means Cluster-
ing was applied for outlier detection to refine the dataset by removing anomalies. The
refined dataset was then used to re-evaluate the models’ performances on the same
classification tasks.
The results demonstrated that CNN outperformed DenseNet in binary classifica-
tion tasks, achieving an accuracy of 91.1% on raw data and 91.0% on refined data,
while DenseNet’s accuracy dropped from 83.5% to 73.1% after refinement. For multi-
class classification, DenseNet performed better on the raw dataset with an accuracy
of 84%, compared to CNN’s 74%. However, after data refinement, DenseNet’s perfor-
mance decreased slightly to 79%, whereas CNN showed a minor improvement, achieving
78%. The data refinement process, involving outlier removal, did not significantly af-
fect the overall performance of the CNN model but had a notable negative impact on
DenseNet, especially in binary classification tasks. These findings suggest that while
CNN is robust across both binary and multi-class tasks with or without data refine-
ment, DenseNet is more sensitive to changes in the dataset. This study highlights the
importance of dataset refinement and its varying impact on different neural network
architectures in medical image classification.
X
Chapter 1
Introduction
1
1.2 Applications of Machine Learning in Image Recog-
nition
In the domain of image recognition, machine learning is crucial for enabling computers
to accurately identify and interpret visual information, much like how humans perceive
images. Machine learning algorithms, at their core, learn from large sets of labeled
images, figuring out patterns and features that distinguish one object or scene from
another. As they see more examples, these algorithms get better at recognizing specific
visual concepts. Techniques like CNNs analyze images by detecting features and pat-
terns, improving with more training data. Once trained, CNNs can be used to analyze
new images, extracting important features and making predictions about what objects
are present. They’re really good at tasks like finding objects in pictures, understanding
scenes, recognizing faces, and even diagnosing medical conditions from images.
The accurate diagnosis and classification of gastrointestinal (GI) tract diseases are
crucial for effective treatment and patient care. With the increasing availability of
medical imaging data, particularly from endoscopic procedures, there is a pressing
need for robust models that can efficiently classify these images. The Kvasir dataset,
which contains diverse images of the GI tract, provides an excellent foundation for
developing and testing such models.
In recent years, Convolutional Neural Networks (CNNs) and DenseNet architectures
have emerged as powerful tools for image classification. However, their performance
can vary significantly based on the quality and characteristics of the dataset used.
Traditional datasets often contain outliers that can negatively impact model training
and reduce overall accuracy. This issue is particularly relevant in medical imaging,
where outlier data can represent rare or unusual cases that might skew the model’s
performance.
The motivation for this study stems from the need to understand how these ad-
vanced neural network models perform on both raw and refined datasets. By applying
outlier detection using K-Means Clustering, this research aims to refine the dataset and
enhance the reliability of classification results. Evaluating the impact of data refine-
ment on the performance of CNN and DenseNet models will provide valuable insights
into the best practices for preparing medical imaging data, ultimately contributing to
more accurate diagnostic tools.
Furthermore, comparing the performance of CNN and DenseNet on both binary
and multi-class classification tasks, before and after outlier removal, will offer a com-
prehensive understanding of the strengths and limitations of these models in handling
various levels of data complexity. This study not only aims to improve model accuracy
but also seeks to optimize data preprocessing techniques, potentially setting a new
standard for medical image analysis.
Figure 1.2: Sample visuals from the Kvasir dataset featuring eight classes
This classification groups the dataset into three classes based on the types of gas-
trointestinal tract images: anatomical landmarks, pathological findings, and images
related to polyp removal techniques. Each class contains distinct categories that repre-
sent different aspects of GI conditions and procedures, offering a structured approach
for analysis and research in medical imaging.
Chapter 2
LITERATURE REVIEW
7
vs. diseased pictures). They achieved noteworthy accuracies of 99.7% and 96.4%, re-
spectively, outperforming existing models. Validation on datasets such as ETIS-Larib
Polyp DB (10,000 pictures) and KVASIR showed that FLATer outperformed CNNs and
ViTs in terms of accuracy, precision, and recall. Notably, FLATer achieved an amazing
throughput of 16.4k photos per second while maintaining strong performance even in
the absence of pre-training. Their ablation investigation emphasised how important the
spatial attention module and residual block are for improving classification accuracy.
Although FLATer represents a major step forward in the classification of GIT diseases,
its usefulness could be improved by larger datasets and additional clinical validation,
according to the authors (1). Dheir et al.(2022) from Al-Azhar University employed
deep learning techniques to enhance the classification of gastrointestinal (GI) tract
anomalies using the Kvasir dataset. This dataset, consisting of 8,000 annotated images
across eight classes, includes anatomical landmarks (pylorus, z-line, cecum), patho-
logical findings (esophagitis, polyps, ulcerative colitis), and procedural images (dyed
lifted polyps, dyed resection margins). The researchers retrained and evaluated five
prominent neural network architectures—VGG16, ResNet, MobileNet, Inception-v3,
and Xception—achieving varying accuracies. VGG16 and Xception outperformed the
others with accuracies of 98.3% and demonstrated robust performance due to their pre-
training on ImageNet, effectively handling the classification challenges posed by medical
images. Their approach included robust image preprocessing, data augmentation, and
model evaluation using the F-score metric, highlighting VGG16 as the most effective
model for GI anomaly classification (2). Pogorelov et al. (2017) introduced the Kvasir
dataset to enhance computer-aided detection of gastrointestinal (GI) diseases through
medical imaging. This dataset, curated with input from medical experts, comprises
4,000 annotated images categorized into eight classes.The study conducted baseline
experiments employing global feature extraction (GF), convolutional neural networks
(CNN), and transfer learning (TFL) with models like Inception v3. Results showed that
combining six global features with the Logistic Model Tree (LMT) classifier achieved
the highest performance, yielding an F1 score of 0.747 and 80 frames per second (FPS).
While the 6-layer CNN outperformed the 3-layer CNN in detection performance, TFL
demonstrated superior accuracy among the deep learning methods tested, highlighting
its efficacy. The research underscores the dataset’s pivotal role in enabling reproducible
studies and innovation in medical multimedia applications, serving as a fundamental
resource for advancing GI tract diagnostics (3). Gao et al. (2020) developed a novel
approach for outlier detection in wireless capsule endoscopy (WCE) images. They intro-
duced the Semi-Supervised Deep Model (SODM) framework, leveraging a combination
of Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks
(LSTMs). The model focused on identifying anomalous patterns in WCE images by
analyzing spatial-scale trends across sequential image regions. They utilized a dataset
comprising approximately 22,000 WCE images, categorizing images into normal and
abnormal classes representative of various small intestinal diseases. They compared
their approach with traditional outlier detection methods such as K-nearest neighbors
(KNN), Local Outlier Factor (LOF), and Support Vector Data Description (SVDD),
showing superior performance in terms of accuracy (93.27%) and sensitivity (86.17%).
Their findings underscored the efficacy of integrating deep learning architectures with
anomaly detection techniques for enhancing diagnostic capabilities in WCE imaging,
paving the way for future advancements in medical image analysis and disease detec-
tion (4). Iakovidis et al. (2018) developed a pioneering method using deep learning to
automatically detect and pinpoint gastrointestinal (GI) anomalies in endoscopic video
frames. They leveraged weakly annotated images for training, which proved cost-
effective compared to detailed pixel-level annotations. Their methodology comprised
three phases: first, a weakly supervised CNN classified video frames as normal or ab-
normal; second, a deep saliency detection algorithm identified key points in abnormal
images; and third, an iterative cluster unification technique localized GI anomalies us-
ing these points. Evaluating their approach on the MICCAI Gastroscopy Challenge
Dataset and the KID Dataset, they achieved impressive results with AUC scores sur-
passing 80%, peaking at 96% for anomaly detection in gastroscopy images and 88% for
wireless capsule endoscopy images. Their use of WCNN for classification, coupled with
DSD and ICU for localization, demonstrated significant efficacy in anomaly detection
and localization tasks, offering a robust framework for analyzing GI endoscopy videos
without necessitating intricate pixel-level annotations (5). The goal of Habte et al.’s
(2019) study is to identify gastrointestinal (GI) disorders using endoscopic pictures by
applying deep learning algorithms. To train their models, they used the Kvasir dataset,
which is an openly accessible database of GI photographs divided into eight types. Out
of the 4000 photos in the dataset, 2000 images were used, with 60% going towards train-
ing, 30% going towards testing, and 10% going towards validation. The authors used
two convolutional neural network (CNN) architectures that were optimised using pre-
trained ImageNet weights: ResNet50 and DenseNet121. After undergoing evaluation
on a distinct set of 600 photos, DenseNet121 and ResNet50 demonstrated accuracy
rates of 86.9% and 87.8%, respectively. Overall, the study indicated that the ResNet50
model performed marginally better than the DenseNet121 model, especially when it
came to reliably differentiating between specific groups such as dyed lifted polyps and
dyed resection margins. Additionally, they noted many misclassifications, such as the
confusion of oesophagitis with a normal z-line because of visual similarities between
the pictures (6). Ramzan et al. (2022)introduced the Graft-U-Net, a deep learning
model tailored for the segmentation of gastrointestinal tract polyps from colonoscopy
images. Utilizing datasets Kvasir-SEG and CVC-ClinicDB, comprising 1000 and 612
images respectively, they aimed to improve early detection of colorectal anomalies cru-
cial for cancer prevention. The Graft-U-Net, an enhanced version of UNet, integrates
three stages: preprocessing to enhance image contrast, an encoder for feature analy-
sis, and a decoder for feature synthesis. Evaluations demonstrated superior segmen-
tation performance with mean Dice coefficients of 96.61% (Kvasir-SEG) and 89.95%
(CVC-ClinicDB), surpassing previous models like UNet and ResUNet (7). Ismael et
al.(2020), from various Iraqi institutions developed an automated system for classi-
fying white blood cells (WBCs) based on shape features extracted from medical im-
ages. They focused on five WBC types: Basophil, Eosinophil, Lymphocyte, Monocyte,
and Neutrophil, aiming to streamline diagnosis and reduce errors in medical settings.
The system comprises image preprocessing, segmentation, feature extraction (includ-
ing shape and texture), and classification using machine learning algorithms like K*
classifier, Additive Regression, Bagging, Input Mapped Classifier, and Decision Table.
Evaluations showed that the K* classifier performed best, achieving high accuracy in
classifying WBCs. The study used an undisclosed dataset of WBC images, emphasizing
robust feature selection and classifier performance to enhance diagnostic capabilities
(8).
The usefulness of Generative Adversarial Networks (GANs) for anomaly detection
(AD) in biomedical imaging was assessed by Esmaeili et al. (2023) using seven differ-
ent medical image datasets. The research emphasises the difficulties in AD brought
about by the absence of annotated data and examines cutting-edge GAN-based tech-
niques from both a model-centric and data-centric standpoint, such as F-ANOGAN,
GANomaly, and Multi-KD. The datasets included blood cancer pictures, mammo-
grams, retinal OCT, CT, and MRI images, and varied in sample size, image dimensions,
and anomaly kinds. When performance measures including AUC, F1-Score, Precision,
Recall, and Specificity were applied, the findings were somewhat inconsistent (AUC:
0.475-0.991; Sensitivity: 0.17-0.98; Specificity: 0.14-0.97). The results showed that
none of the techniques worked consistently well, highlighting the need for more reliable
and broadly applicable models. In light of the need for more research to improve AD
models for biomedical imaging, the authors concluded that the current unsupervised
DL-based AD methods are unreliable for clinical applications and suggested taking
anomaly subtlety, spread, and tissue differences into consideration in future AD algo-
rithm designs (10).
The study by Cai et al. (2024) comprehensively evaluates anomaly detection (AD)
methods in medical images across seven datasets. They focused on developing a bench-
mark for fair evaluation, addressing the lack of comprehensive assessments in the field.
The datasets cover a variety of abnormality patterns and comprise images from chest
X-rays, brain MRIs, dermatoscopic images, retinal fundus images, and histology entire
slide images. Twenty-seven AD techniques addressing both pixel-level segmentation
and image-level classification were tested. Notably, they compared reconstruction-
based and self-supervised learning (SSL) methods, highlighting the effectiveness of
SSL approaches like AnatPaste and NSA for generating realistic anomalies. Results
showed SSL methods with realistic synthetic data generally outperforming others, es-
pecially in two-stage paradigms. Additionally, they found that methods utilizing Ima-
geNet pre-trained weights, such as ResNet18, demonstrated strong performance across
datasets, indicating the potential of pre-trained models in medical AD (11). A group
of researchers provides a technique for the early detection and classification of COVID-
19 using chest X-ray pictures in "COVID-19 Anomaly Detection and Classification
Method Based on Supervised Machine Learning of Chest X-ray Images" (2021). Ac-
knowledging the significance of prompt diagnosis in enhancing recuperation prospects
and impeding viral dissemination, they executed an array of picture processing method-
ologies. The procedure entails preprocessing (morphological operations, thresholding,
and noise reduction), segmenting and identifying the Region of Interest (ROI), and
extracting features utilising the Histogram of Orientated Gradients (HOG), Local Bi-
nary Pattern (LBP), and Haralick texture features. They used Support Vector Ma-
chine (SVM) and K-Nearest Neighbour (KNN) for classification, yielding six distinct
models: LBP-KNN, HOG-KNN, Haralick-KNN, LBP-SVM, HOG-SVM, and Haralick-
SVM. These models underwent 5-fold cross-validation testing on 5,000 photos. With
an average accuracy of 98.66%, sensitivity of 97.76%, specificity of 100%, and precision
of 100%, the LBP-KNN model performed the best. This method eliminates the need
for manual feature extraction and selection by demonstrating a reliable and automated
end-to-end solution for the early detection and classification of COVID-19 (13).
Tiwari et al. (2024) discuss techniques for enhancing outlier detection and dimen-
sionality reduction in machine learning, particularly focusing on extreme value analysis.
They emphasize the detrimental impact outliers can have on machine learning mod-
els, leading to inaccurate results and prolonged training times. The paper explores
various methods, including colorful styles for detecting different types of outliers in
high-dimensional datasets. Challenges such as computational complexity in feature
reduction for streaming data are highlighted, alongside the classification of outlier de-
tection techniques into predictive and direct methods. The authors advocate for the de-
velopment of efficient techniques capable of handling large volumes of data while main-
taining accuracy. They underscore the importance of these techniques in real-world
applications such as medical diagnosis and fraud detection, concluding with insights
into dimensionality reduction methods like t-SNE for preserving valuable data structure
(16). Sri Krishna et al.(2024) conducted a study on outlier detection in smart home
energy consumption data using various machine learning and statistical techniques on
the "Tracebase" dataset. They compared methods such as ARIMA, autoencoder, DB-
SCAN, isolation forest, k-means, HDBSCAN, SVM, LOF, LSTM, winsorization, IQR,
and Z-score. Their findings revealed that DBSCAN consistently outperformed other
techniques in accurately identifying outliers, especially those indicating nonlinearity
and unexpected load behavior. DBSCAN’s robust performance was highlighted by its
ability to effectively isolate significant deviations from the dataset’s norm. Conversely,
methods like Z-score, IQR, and winsorization struggled with the complexities of non-
linear data patterns (17). Omar Alghushairy et al. (2024) introduces an anomaly-based
network outlier detection system (NODS) designed to enhance cybersecurity by iden-
tifying abnormal network traffic. Utilizing the NSL-KDD and CICIDS2017 datasets,
the study employs various techniques including normalization, feature selection using
PCA and CFS, and hyperparameter tuning with a Genetic Algorithm (GA) to optimize
detection accuracy. Results demonstrate that their SVM-based approach significantly
reduces false alarms and detection times while improving classification accuracy com-
pared to traditional methods (18). Yalla et al.(2022) developed the OALOFS-MLC
model for financial crisis prediction (FCP) within a big data framework. Utilizing the
German Credit dataset (1,000 samples, 24 features) and the Australian Credit dataset
(690 instances, 14 features), they applied an oppositional ant lion optimizer-based fea-
ture selection and the DRVFLN classification model. Their approach outperformed
other methods such as PIOFS, ACOFS, GWOFS, and PSOFS across various metrics.
The OALOFS-MLC model achieved impressive results: on the German Credit dataset,
it attained an accuracy of 98.75%, sensitivity (sensy) of 97.36%, specificity (specy) of
97.06%, F-score of 97.31%, Matthews correlation coefficient (MCC) of 96.13%, and
kappa of 96.19%. Similarly, on the Australian Credit dataset, it reached an accuracy
of 98.50%, sensy of 97.41%, specy of 96.53%, F-score of 97.92%, MCC of 97.53%, and
kappa of 96.22%. These findings underscore the OALOFS-MLC model’s effectiveness
in enhancing FCP accuracy, suggesting its potential utility in economic forecasting
and risk management applications (19). In their 2024 survey, Rahimighazvini et al.
review methods for anomaly detection and diagnosis in power electronics using ma-
chine learning and deep learning techniques. The study highlights the crucial role of
power electronics in applications like renewable energy and electric vehicles, noting the
systems’ susceptibility to cyber and physical anomalies. They categorize anomalies
into point, contextual, and collective types and discuss detection methods including
supervised, unsupervised, and statistical approaches. Supervised methods like Random
Forest, Extreme Gradient Boosting, Logistic Regression, and K-Nearest Neighbors are
used for classifying anomalies, while deep learning methods such as autoencoders and
LSTM networks are highlighted for their pattern recognition capabilities. Unsuper-
vised techniques, including K-means, DBSCAN, and OPTICS, and statistical methods
like the Mahalanobis Distance and Local Outlier Factor are also detailed for their ef-
ficacy in identifying outliers. The survey emphasizes the importance of distinguishing
between cyber-attacks and physical faults to ensure system reliability and security, un-
derscoring the need for advanced detection and diagnosis systems to handle the growing
complexity of power electronics (20).
In the paper "Blood Donation Prediction using Artificial Neural Network" by Eman
Alajrami et al. (2019), the researchers explore the efficacy of the JustNN environment
in predicting blood donation needs. This study addresses the increasing demand for
blood due to surgeries, accidents, and diseases. By developing an Artificial Neural Net-
work model, the researchers aimed to determine if the JustNN tool could significantly
enhance prediction performance. Accurate forecasting of blood donor numbers is cru-
cial for medical professionals to plan effectively and attract enough volunteers to meet
the rising demand. The study concluded that the ANN model using the JustNN tool
achieved a test set performance accuracy of 99.31%, which is superior to other studies’
results. This indicates that JustNN is a highly effective tool for blood donation pre-
diction (21). In the paper "Age and Gender Prediction and Validation Through Single
User Images Using CNN" by Abdullah M. Abu Nada et al. (2020), published in the
International Journal of Academic Engineering Research (IJAER), the authors pro-
pose a novel method to validate user gender and age from photos using Convolutional
Neural Networks (CNN). The study, utilizing a dataset of 430 University of Palestine
students’ photos, achieved a gender prediction accuracy of 82% overall (89% for males
and 74% for females), but struggled with age prediction, which had an accuracy of 57%.
Challenges included less distinct female facial features and hijabs obscuring features,
as well as natural variations in aging. The research highlights the need for improved
models and diverse datasets to enhance demographic predictions from images (22). In
the work by Mukrimah Nawir et al. (2018), researchers proposed an efficient approach
to network anomaly detection using machine learning algorithms. They addressed
the challenge of limited labeled network datasets by focusing on the UNSW-NB15
dataset, modifying it to enhance experimental reliability by excluding certain irrele-
vant features. The study utilized three Bayesian algorithms—Average One Dependence
Estimator (AODE), Bayesian Network (BN), and Naive Bayes (NB)—implemented
through WEKA tools. Through rigorous experimentation, they demonstrated that
AODE performed exceptionally well with an accuracy of 94.37%. Their findings un-
derscored AODE’s efficiency and effectiveness in handling network anomaly detection,
particularly on the UNSW-NB15 dataset, making it a robust choice compared to BN
and NB algorithms (23). Yadav et al. investigated deep learning techniques for pneu-
monia classification from chest X-ray images in their 2019 study. Three methods were
assessed: a capsule network that was trained from scratch, transfer learning on VGG16
and InceptionV3 CNNs, and a linear SVM classifier with orientation-free and local ro-
tation variables. Through the use of a dataset of 624 testing and 5232 training photos,
they discovered that data augmentation enhanced performance in all cases. The best
results were obtained using transfer learning using VGG16, which achieved an accuracy
of 90.2%. The study stressed how crucial it is to adjust particular parameters and bal-
ance network complexity with dataset quantity. The effectiveness of capsule networks
was lower than that of VGG16. The generalisation of the approach was confirmed by
validating these results on an OCT dataset (24).
Faes et al. (2019) investigated the feasibility of automated deep learning tools for
medical image classification, targeted at healthcare professionals without coding ex-
pertise. They utilized five public datasets: MESSIDOR (retinal fundus), Guangzhou
Medical University and Shiley Eye Institute (OCT), HAM10000 (skin lesions), and NIH
(pediatric and adult chest X-rays). Employing Google Cloud AutoML for neural archi-
tecture search, they developed models achieving high diagnostic properties (sensitivity
73.3% to 97.0%; specificity 67% to 100%) and discriminative performance (AUPRC
0.57 to 1.00) in internal validations. External validation on the Edinburgh Dermofit
Library dataset showed lower performance (AUPRC 0.47, sensitivity 49%, positive pre-
dictive value 52%). The study highlighted the potential of automated tools in medical
image analysis while noting limitations in dataset quality and model complexity for
advanced tasks (25). Amiri et al. (2023) conducted a systematic literature review
on Deep Learning (DL) techniques for pattern recognition across cyber-physical-social
systems. They analyzed 60 articles focusing on DL methods like CNNs, RNNs, GANs,
and more, categorizing them by application and performance metrics. Using Python
for implementation, they evaluated models based on accuracy, adaptability, and se-
curity across various datasets including medical imaging and visual recognition tasks.
The review highlighted advancements and limitations in current DL approaches, em-
phasizing the need for improved security measures and adaptive capabilities in future
research to enhance pattern recognition accuracy and applicability (26). In their 2020
study, Abu-Saqer and Al-Shawwa developed a Grapefruit classification system using
deep learning techniques. They utilized a dataset from Kaggle comprising 1,312 images
of Pink and White Grapefruit, with 70% for training, 30% for validation, and achieved
100% accuracy on both sets. Implementing Convolutional Neural Networks (CNNs)
with four layers and a dropout of 0.2, their model successfully classified Grapefruit
types based on image features extracted through CNNs. The system aims to automate
classification tasks in various applications, such as restaurants and factories, demon-
strating robust performance in distinguishing between different Grapefruit varieties
(28). Huang et al.(2022) proposed a novel hybrid neural network for medical image
classification using a combination of PCANet and DenseNet architectures. They aimed
to enhance classification accuracy despite limited training data. Utilizing datasets in-
cluding DDSM, osteosarcoma histology images, and MIAS, their approach involved
a modified PCANet for initial feature extraction followed by a simplified DenseNet
for precise classification. Achieving superior results compared to popular models like
VGG, ResNet, and DenseNet, their HybridNet demonstrated an accuracy of 83%, sen-
sitivity of 89.3%, and specificity of 78.7%. The hybrid approach effectively addressed
overfitting issues while outperforming other networks in classifying breast tissue densi-
ties, showing promise for future medical imaging applications (29). A DenseNet-based
model for metastatic cancer image categorisation was presented by Zhong et al. in
2020. They made use of an altered version of the PatchCamelyon (PCam) dataset,
which was designed specifically for metastasis detection binary image classification.
There were 220,025 samples in the collection, of which 89,117 were positive (malig-
nant) and 130,908 were negative (non-cancerous).They made use of DenseNet201 and
its enhanced variant, known as DenseNet201 TTA. Their tests showed that in terms
of accuracy and Auc-Roc score, DenseNet201 models performed better than ResNet34
and VGG19. With an accuracy of 0.989 and the greatest Auc-Roc score of 0.971,
DenseNet201 (TTA) outperformed the other models by a wide margin. The study
showcased the robust performance metrics and promise for future improvements in
medical diagnostics of DenseNet designs, emphasising its usefulness in enhancing the
accuracy of cancer picture categorisation (30).
In 2020, Poornima et al. proposed an online anomaly detection system for Wireless
Sensor Networks (WSNs) using the OLWPR algorithm. Their study focused on enhanc-
ing WSN security by identifying anomalous sensor data. They utilized a dataset from
IBRL comprising 40,000 sensor readings, including temperature, humidity, and light
measurements. After preprocessing to handle missing and duplicate records, anoma-
lies were injected for testing. OLWPR, aided by PCA for dimensionality reduction,
achieved an 86% detection rate with a low 16% error rate. The advantage of OL-
WPR was shown in comparisons with Gaussian, SMO, and Linear regression in terms
of RMSE, percentage error, accuracy, F1-score, sensitivity, and specificity. Accord-
ing to the study’s findings, OLWPR performed better in real-time anomaly detection
in WSNs than conventional techniques like Logistic Regression, Decision Tree, Ran-
dom Forest, Adaboost, SVM, and ANN(31). Pratta et al.(2016) presented a study at
MIUA 2016 detailing a Convolutional Neural Network (CNN) approach for diagnos-
ing diabetic retinopathy (DR) using fundus images. They utilized a Kaggle dataset
of 80,000 images, training a CNN with data augmentation to classify DR severity
levels. Achieving 75% accuracy and 95% specificity on 5,000 validation images, the
CNN demonstrated robust performance in automated DR diagnosis, particularly in
distinguishing proliferative cases and absence of DR. However, sensitivity for mild and
moderate DR cases was lower, indicating challenges in detecting subtle features. Fu-
ture plans include refining the CNN with improved datasets and comparing it with
other classification methods like SVM (32). Rao et al.(2011) introduced a method
using K-means clustering and ID3 decision trees for anomaly detection in computer
networks. They applied these techniques to classify normal and anomalous activities,
focusing on both supervised and unsupervised learning approaches. Using datasets
like iris.arff and weather.nominal, they achieved clustering with 67% and 33% distribu-
tion among clusters, and utilized ID3 decision trees to classify weather data effectively.
The combined K-means and ID3 approach aimed to enhance classification performance
by refining decision boundaries within clusters(33). In their 2023 study, Vania et al.
investigated the use of deep learning (DL) and machine learning (ML) methods for
identifying lesions in the upper gastrointestinal (GI) tract. They reviewed 65 studies
using datasets like KVASIR, MEDICO 2018, BIOMEDIA 2019, and others, focusing on
ML models like SVM which achieved accuracies, sensitivities, and specificities ranging
from 0.87 to 0.98, 0.85 to 0.98, and 0.93 to 0.98, respectively. DL models, particularly
CNN-based supervised learning models like SSD and Mask RCNN, were also promi-
nent in GI image analysis. RGB imaging proved crucial for detecting features like
bleeding. Challenges included dataset variability, suggesting a need for standardized
databases to train robust AI systems for GI endoscopy (34). Ramzan et al.(2021) devel-
oped a computer-aided diagnostic system (CADx) for classifying gastrointestinal (GI)
tract infections using deep learning techniques. They utilized color image datasets like
KVASIR, NERTHUS, and stomach ULCER, evaluating models such as InceptionNet,
ResNet50, and VGG-16. Preprocessing in LAB color space and feature fusion with
local binary patterns (LBP) enhanced disease prediction accuracy. Feature selection
methods like PCA and mRMR were employed to optimize characteristics for various
classifiers. The subspace discriminant classifier achieved notable results, with 95.02%
accuracy on KVASIR, outperforming other classifiers. On NERTHUS, the best accu-
racy was 99.9% with cubic SVM, and on ULCER, cubic SVM reached 100% accuracy,
indicating robust performance across datasets (35). The article by de Lange et al.
in 2018 focuses on developing machine learning algorithms to enhance gastrointesti-
nal (GI) endoscopy performance. They address the variability in diagnostic accuracy
among endoscopists, which affects detection rates of mucosal lesions, leading to chal-
lenges like the 20% average polyp miss-rate in colonoscopies. The research employs a
range of machine learning methodologies, encompassing both conventional and deep
learning approaches such as generative adversarial networks (GANs) and convolutional
neural networks (CNNs). They emphasize the importance of dataset quality and size,
recommending at least 1000 images per class for robust deep learning applications.
Results show promising accuracies above 90%, with CNNs often outperforming sim-
pler methods. They advocate for standardized metrics and open datasets to facilitate
reproducibility and comparisons in AI-assisted GI endoscopy systems (36). Using the
Kvasir dataset, Cogan et al.’s (2019) study focusses on applying deep learning to ac-
curately detect illnesses and anatomical landmarks in gastrointestinal tract images.
They present the MAPGI framework for image preprocessing in order to handle is-
sues such as sparse annotations and image variability. With accuracies of 98.45%,
98.48%, and 97.35%, respectively, three deep neural network architectures—Inception-
v4, Inception-ResNet-v2, and NASNet—are trained and compared. With excellent
recall (93.9%), specificity (99.1%), F1 score (93.8%), precision (93.8%), and Matthews
correlation coefficient (MCC) of 92.9%, Inception-v4 performs better than other mod-
els. The authors point out that smaller models—such as Inception-v4—perform better
on this dataset than larger models like NASNet because of their computational effi-
ciency and lower risk of overfitting (37). The study by Song et al. (2021) tackle the
challenge of localizing a colonoscope in the GI tract using monocular images. They in-
novate by blending deep learning with traditional geometry-based methods to improve
localization accuracy despite limited labeled data. Using a Siamese architecture, their
DL models classify images into anatomical zones based on expert-segmented GI tract
zones, aiding in initial pose estimation. Validation on synthetic and in-vivo datasets
shows high zone classification accuracies of up to 98.6% for synthetic data and around
97-98% for in-vivo data. Pose accuracy results are impressive, with deviations as small
as 1.41 degrees and 0.05 units. Comparative analyses indicate their hybrid approach
outperforms pure DL or geometry-based methods, especially when trained on synthetic
data and tested on in-vivo data, achieving superior zone classification accuracies, up to
79%. Future plans include incorporating depth information and enhancing the realism
of synthetic datasets through adversarial learning (38). Future objectives include us-
ing adversarial learning to improve the realism of synthetic datasets and adding depth
information (38). Gautam Buddha University’s Pachauri et al. (2015) discuss fault
detection in medical wireless sensor networks (WSNs). In order to improve anomaly
detection capabilities, they apply machine learning methods, concentrating on cat-
egorising and identifying anomalous sensor readings from the MIMIC dataset (121
records), such as heart rate, SpO2, PULSE, body temperature, and respiration rate.
For classification, algorithms such as J48, Random Forests, and k-Nearest Neighbours
are used; Random Forests perform better in ROC analysis and mean absolute error
comparison. Additive Regression with k-NN produces the best correlation coefficient
and lowest error for regression tasks. Overall, their methodology highlights the poten-
tial of machine learning in healthcare applications by showing promise in enhancing
fault detection efficiency in medical WSNs (39). In their 2021 study, Reddy et al. from
various institutions in India explore machine learning techniques for outlier detection
in medical datasets. They propose a novel algorithm combining supervised and un-
supervised learning to identify outliers based on attributes like heart rate and oxygen
saturation from datasets sourced from Kaggle. Using their approach, they compare
various methods and find their machine learning model achieves superior accuracy in
outlier detection, particularly on real-time medical data. Their experiments highlight
the effectiveness of this approach in enhancing anomaly detection efficiency, suggest-
ing its potential for reducing healthcare industry workloads and improving diagnostic
accuracy (40).
In this study, we provide an overview of the dataset used, emphasizing its pivotal role
in our methodology aimed at refining outlier detection in gastrointestinal tract images.
Our approach involves initial classification using Convolutional Neural Networks (CNN)
and DenseNet models on the dataset, followed by outlier detection using k-means
clustering. We then explore how refining the dataset by removing outliers impacts
classification accuracy using these deep learning models.
19
The Cancer Registry of Norway (CRN) and Vestre Viken medical specialists have
painstakingly annotated every image in the dataset. The CRN, affiliated with the
South-Eastern Norway Regional Health Authority and independently operated under
Oslo University Hospital Trust, conducts cancer research and manages national cancer
screening programs aimed at early detection and prevention of cancer-related deaths.
The Kvasir dataset focuses on images and annotations related to the gastrointestinal
(GI) tract, crucial for understanding and diagnosing diseases such as the three most
common cancers worldwide, which affect this system.
• Z-Line
• The Z-line marks the border where the esophagus meets the stomach. When
viewed through an endoscope, it appears as a clear line where the white esophageal
tissue meets the reddish stomach lining.
• Recognizing the Z-line is important to assess if any disease is present, such as
signs of gastro-esophageal reflux.
• It’s also helpful for describing any problems in the esophagus.
• Pylorus
• The pylorus is the area around the opening from the stomach into the small
intestine (duodenum). This opening has circular muscles that regulate the flow
of food from the stomach.
• Identifying the pylorus is crucial for navigating the endoscope into the duodenum,
which can be challenging during gastroscopy.
• In an endoscopic image from inside the stomach, the pylorus appears as a smooth,
round opening surrounded by uniform pink stomach tissue.
• Cecum
• The cecum is the first part of the large intestine, located at the beginning of the
colon.
• Reaching the cecum confirms a thorough colonoscopy, and its successful exami-
nation is an important quality indicator.
• One distinctive feature of the cecum is the appendiceal orifice, seen as a crescent-
shaped slit.
• Documentation of the cecum, including its appearance and location via photos
or notes in reports, is essential for verifying the completeness of the colonoscopy.
• In the endoscopic view, the green picture-in-picture display shows the scope’s
position to confirm the cecum’s location.
• Esophagitis
• Breaks in the esophageal lining around the Z-line indicate the presence of esophagi-
tis, an inflammation of the esophagus. Red stains on the white lining of the esoph-
agus are observed in image 3.1 as an illustration. The length of these breaks and
the area of the circle impacted indicate the degree of inflammation.
• The most prevalent causes of this illness include hernias, vomiting, and acid
reflux—the backflow of stomach acid into the esophagus.
• Polyps
• Polyps are abnormal growths in the bowel lining that can vary in shape (flat,
raised, or on a stalk). They can be identified from normal tissue by their colour
and surface roughness. While the majority of polyps are benign, some may
eventually develop into malignant ones.
• The green boxes in the image 3.2 illustrate how endoscope positions are tracked
during live procedures, aiding in locating and assessing polyps.
• Ulcerative Colitis
• Ulcerative colitis is a chronic inflammatory disease that affects the large intestine,
causing symptoms like bleeding, swelling, and ulceration of the intestinal lining.
• Diagnosis is primarily based on findings from colonoscopy. The severity of the dis-
ease varies, with mild cases showing swollen and reddened mucosa, and moderate
cases displaying prominent ulcerations.
• Image 3.3 depicts ulcerative colitis, where the mucosa is covered in a white layer
(fibrin) over the ulcers.
• Using automated computer systems for assessing disease severity could improve
accuracy in grading and managing this condition.
• A polyp that was lifted by indigo carmine and saline injection is shown in Figure
3.4. The polyp’s pale blue borders stand out sharply against the regular tissue’s
deeper hue.
• Further useful information for automated reporting might include the success
of lifting and any areas that remain unliftable, which could indicate potential
malignancy.
• It’s crucial to assess the margins of the resected tissue to confirm whether the
entire polyp has been completely removed.
• Any remaining polyp tissue could lead to further growth and, in the worst case,
develop into cancer.
• Figure 3.5 illustrates the site after removing a polyp. Automatically recognizing
the location of polyp removals is valuable for automated reporting systems and
for assessing how effectively the polyp has been removed.
• Image Resizing: Within each category, the function loops through all the image
files. For each image:
o The image is read using OpenCV’s cv2.imread function.
o A check is performed to ensure the image was read correctly.
o The image is resized to the target size (64x64 pixels) using cv2.resize.
• Saving Resized Images: The resized image is saved to the corresponding cat-
egory subdirectory in the output path using cv2.imwrite.
By implementing this method, I successfully resized all images in the Kvasir
dataset to 64x64 pixels, creating a new dataset that was uniformly sized and
ready for further preprocessing and model training. This step was essential for
ensuring that the images fed into the convolutional neural networks (CNN and
DenseNet) were of a consistent size, thereby improving the model training process
and overall performance.
To enhance the training process and improve model generalization, I applied data
augmentation techniques using the ImageDataGenerator class from TensorFlow. This
included rescaling pixel values, applying shear transformations, zooming, and horizon-
tal flipping of the images. This step ensures that the model is exposed to a variety of
image transformations, helping it generalize better to unseen data.
3.3.1.2 Binary CNN Architecture
• Input Layer: The input layer is designed to accept images of dimensions 64x64
pixels with 3 color channels (RGB).
• Convolutional Layer: The initial layer applies convolutional filters to the in-
put image. Each filter is of size 3x3, and with 32 filters, the operation can be
represented as:
X2 X 2
Outputi,j = σ Inputi+m,j+n × Filterm,n + Bias
m=0 n=0
• Flattening Layer: After the pooling layer, the feature maps are flattened into
a single vector. This step prepares the data for the fully connected layers by
converting the 2D feature maps into a 1D feature vector.
• Fully Connected Layer: Two dense layers with ReLU activation. The first
layer with 128 neurons learns higher-level representations from the flattened input
data:
n
X
Output = σ Inputi × Weighti + Bias
i=1
• Output Layer: The final output layer uses a sigmoid activation function, pre-
dicting probabilities for binary classification (positive class = 1).
3.3.1.2.1 Model Compilation and Training for Binary CNN: The binary
CNN model was compiled using the Adam optimizer with binary cross-entropy loss,
optimized for binary classification tasks:
N
1 X
Binary Cross-Entropy Loss:L = −
yi log(ŷi ) + (1 − yi ) log(1 − ŷi )
N i=1
where yi are the true labels (0 or 1), and ŷi are the predicted probabilities.
Training involved fitting the model to augmented data in batches of size 16 for 10
epochs. Performance was monitored on a validation set to optimize training accuracy
and generalization. The test set was used for final evaluation to assess the model’s
performance on unseen data.
• Convolutional Layers: Three layers with increasing filters (32, 64, 128) and
ReLU activation functions extract hierarchical features from the input images:
2
X 2 X
X 2
Outputi,j,k = σ Inputi+m,j+n,p × Filterm,n,p + Bias
m=0 n=0 p=0
• Pooling Layers: Max-pooling layers after each convolutional layer with a pool
size of 2x2 reduce spatial dimensions while retaining significant features.
• Fully Connected Layers: Two dense layers with ReLU activation. The first
layer with 128 neurons learns high-level representations:
n
X
Output = σ Inputi × Weighti + Bias
i=1
• Output layer: The final output layer uses softmax activation, outputting prob-
abilities across 8 classes for multi-class classification.
3.3.1.3.1 Model Compilation and Training for Multi-Class CNN: The multi-
class CNN model was compiled using the Adam optimizer with categorical cross-
entropy loss, suitable for multi-class classification problems:
N X
X C
L=− yi,j log(ŷi,j )
i=1 j=1
where yi,j are the true labels (one-hot encoded) and ŷi,j are the predicted probabil-
ities.
Training involved 10 epochs with a batch size of 32, using augmented data for batch
processing. Evaluation used accuracy as the metric, with performance monitored on
a validation set during training to optimize model performance. The final model’s
performance was assessed using a separate test set to provide an unbiased estimate of
its generalization capabilities.
Input Layer: The input layer accepts images of dimensions 48 × 48 pixels with 3 color
channels (RGB).
Base Model: The architecture utilizes DenseNet121 with pre-trained weights on
ImageNet. The top classification layer is excluded to leverage the learned features for
both binary and multi-class classification tasks.
Dense Blocks: DenseNet121 contains four dense blocks. Each block consists of
multiple convolutional layers with batch normalization and ReLU activation. The
number of layers in each block are as follows:
• Block 1: 6 layers
• Block 2: 12 layers
• Block 3: 24 layers
• Block 4: 16 layers
Transition Layers: Between dense blocks, transition layers are used to reduce the
feature map size. Each transition layer includes:
Additional Layers:
• Flattening Layer: After the dense blocks, the output feature maps are flattened
into a one-dimensional vector.
• Output Layer: For binary classification, the output layer uses a sigmoid acti-
vation function:
1
Probability = −(Input×Weight+Bias)
1+e
For multi-class classification, the output layer uses softmax activation.
Binary Classification: The binary DenseNet model is compiled using the Adam
optimizer with binary cross-entropy loss:
N
1 X
Binary Cross-Entropy Loss: L = −
yi log(ŷi ) + (1 − yi ) log(1 − ŷi )
N i=1
where yi are the true labels (0 or 1), and ŷi are the predicted probabilities.
Training involves:
• Data Preparation: Data is shuffled, split into training , test and validation
sets, and augmented using an ImageDataGenerator.
where yi,j are the true labels (one-hot encoded) and ŷi,j are the predicted probabilities.
Training involves:
• Data Preparation: Generators for training validation and test datas are created
with augmentation.
Clustering is a specialized area within Machine Learning focused on grouping data into
homogeneous clusters based on shared characteristics. The K-means algorithm is a
widely recognized unsupervised method used in Clustering.
Unsupervised Machine Learning involves training a computer to work with unla-
beled and unclassified data, allowing the algorithm to function independently without
guidance. In this approach, the machine organizes the data based on similarities, pat-
terns, and variations without prior training on the data.
2. Assignment: For each data point, compute the distance to each of the K
centroids and assign the data point to the cluster with the nearest centroid. This step
creates K clusters.
3. Update Centroids: After all data points are assigned to clusters, recalculate
the centroids of each cluster by averaging the positions of all data points within the
cluster.
4. Repeat: Iterate through steps 2 and 3 until convergence is achieved, which oc-
curs when the centroids stabilize or a predetermined number of iterations is completed.
5. Final Result: Upon convergence, the algorithm produces the final centroids
and assigns each data point to a cluster.
The goal of this iterative procedure is to minimize the sum of distances between
data points and their assigned cluster centroids.
1. Grouping Similar Data Points: K-Means groups data points with similar
characteristics together in an attempt to find patterns in your data. This enables you
to find hidden patterns in the data.
1 X
ai = d(i, j)
|Ci | − 1 j∈C
i
j̸=i
where |Ci | is the number of points in the cluster Ci containing point i, and d(i, j) is
the distance between points i and j.
where C represents clusters other than Ci , and d(i, j) is the distance between points
i and j.
3- Compute the Silhouette Score si : The silhouette score for a data point i is
given by:
max(ai , bi ) − ai
si =
b i − ai
where the range of si is -1 to +1. The data point may be improperly clustered
if the score is close to -1, whereas a score near +1 indicates that the data point is
well-clustered.
3.3.3.1.4 Selecting the Optimal K: he silhouette scores are compared for vari-
ous values of K in order to find the optimal number of clusters, K.
2- Identify the Best K: The optimal K is selected based on the highest average
silhouette score. This value indicates the number of clusters that provides the best
separation and cohesion, meaning that the clusters are well-defined and distinct from
one another.
By leveraging silhouette analysis, we can effectively evaluate and select the most
suitable number of clusters, ensuring that the K-Means clustering results are both
meaningful and robust.
• The scatter plot’s data points will now be assigned to the nearest centroid or
K-point. Thus, a median will be drawn between the two centroids. as seen in
Figure 3.10.
• As can be seen in figure 3.10, the points on the left side of the line are in close
proximity to the K1, or blue centroid, whereas the points on the right side of the
line are in close proximity to the yellow centroid. To make them easier to see,
let’s colour them blue and yellow, as Figure 3.11 illustrates.
• We will choose a new centroid and repeat the process until we locate the nearest
cluster. As illustrated in figure 3.12, we shall compute the centroids’ centres of
gravity in order to select new centroids.
Figure 3.13: Centroid Recalculation Process
• Every datapoint will then be assigned to the new centroid. We’ll go through the
same steps again to obtain the median line for this. The median will resemble
what figure 3.13 illustrates.
• Since there has been reassignment, we will once more go on to step 4, which
involves locating new centroids or K-points. The procedure will be repeated
to determine the centroids’ centres of gravity, resulting in new centroids that
resemble those in figure 3.15.
Figure 3.16: Updated Cluster Centroids
• We will reassign the data points and create the median line once we get the new
centroids. The graphic will resemble that in figure 3.16.
• Figure 3.16 shows that there are no dissimilar data points on either side of the
line, indicating that our model has been built. View Image 3.17.
Figure 3.18: Converged Clustering Result
• Now that our model is complete, we may eliminate the posited centroids, resulting
in the two final clusters depicted in figure 3.18.
In this section, we employ the Interquartile Range (IQR) method to detect outliers
within each cluster formed by the K-Means algorithm. Below is a detailed explanation
of each step involved in the outlier detection process.
3.3.3.2.1 Distance Calculation: For each cluster, compute the Euclidean dis-
tance of each data point from the cluster centroid. This distance is a measure of how
far each point is from the center of its cluster. This distance d of a point xi from the
centroid c is calculated as follows:
v
u n
uX
d(xi , c) = t (xij − cj )2
j=1
where xij and cj are the coordinates of the data point and the centroid, respectively.
• In Colab-Based Implementation:
The Euclidean distances are calculated using NumPy’s linear algebra functions
to find the norm of the difference between each data point and the centroid.
3.3.3.2.2 Compute Quartiles: Calculate the first quartile (Q1) and the third
quartile (Q3) of the distances. These quartiles represent the 25th and 75th percentiles
of the distance values, respectively.
• In Colab-Based Implementation:
NumPy’s percentile function is used to compute Q1 and Q3 for the distances of
data points in each cluster.
3.3.3.2.3 Calculate the IQR: Find the Interquartile Range (IQR), which is the
quartile difference between the first and third. The middle 50% of the data spread is
measured by the IQR.
IQR = Q3 − Q1
• In Colab-Based Implementation:
The IQR is computed by subtracting Q1 from Q3 using basic arithmetic opera-
tions in NumPy.
3.3.3.2.4 Determine the Upper Bound: Compute the upper bound for detect-
ing outliers. In my implementation, the upper bound is set to Q3 + 1 × IQR. This
threshold helps identify data points that are significantly farther from the centroid
compared to the majority of points.
3.3.3.2.5 Identify Outliers: Compare each distance with the upper bound. If the
distance of a point exceeds this threshold, it is marked as an outlier.
• Training Runs: Each model was trained across 10 different runs, with each run
consisting of up to 30 epochs and utilizing early stopping to prevent overfitting.
This approach ensured robustness and reliability of the results, balancing com-
putational efficiency with thorough exploration of different model configurations.
3.3.3.3.2 Binary Classification:
• Output Layer: 1 unit with a sigmoid activation function for binary classification
• Optimizer: Adam
• Metrics: accuracy
48
• Batch Size: 16
Table 4.1. Accuracy per class across different folds, including average accuracy for
each class and the total average accuracy.
As we see in table 4.1 , for each fold and each class, I obtained different accuracies.
The highest accuracies for each class are as follows:
• Class 2: The highest accuracies were achieved in both Fold 3 and Fold 5, each
with a value of 93%.
• Class 3: The highest accuracies were observed in Folds 1, 3, and 5, all reaching
95%.
These results reflect the model’s performance variability across different folds and
highlight the folds where the model performed best for each class.
To visually represent these accuracies, refer to the figure 4.1 , which illustrates the
accuracy distributions across different folds for each class.
For DenseNet model training, images were resized to 64×64 pixels and normalized,
with labels one-hot encoded for each class. We used 5-fold cross-validation, further
splitting the data into training, validation, and test sets using a split of 70% for training,
10% for validation, and 20% for testing sets for each fold. Data augmentation with
ImageDataGenerator, including rescaling, was applied to enhance generalization. Each
model was trained for 10 epochs per fold.
After training DenseNet, we see in table 4.2 , for each fold and each class, I obtained
different accuracies. The highest accuracies for each class are as follows:
• • The highest accuracy for Class 1 was 89%, attained at Fold 5.
• • At Folds 3, 4, the maximum accuracy for Class 2 was 88%.
• • The maximum accuracy for Class 3 was 87%, attained at Folds 2 and 4.
• • For Class 4, Folds 1 and 2 yielded the maximum accuracy of 88%.
• • The maximum accuracy for Class 5 was attained at Fold 5, with an accuracy
of 87
• • The best accuracy for Class 6 was 85%, which was attained at Folds 1, 2, and
4.
• • The maximum accuracy for Class 7 was attained at Fold 2, with an accuracy
of 88%.
• • When it came to Class 8, Fold 3 yielded the highest accuracy of 87%.
To visually represent these accuracies, refer to the figure 4.2, which illustrates the
accuracy distributions across different folds for each class:
CLASSES FOLD 1 FOLD 2 FOLD 3 FOLD 4 FOLD 5 Average
0 87% 87% 88% 87% 89% 87.6%
1 86% 87% 88% 88% 88% 87.4%
2 84% 87% 86% 87% 86% 86.0%
3 88% 88% 85% 87% 84% 86.4%
4 82% 86% 82% 78% 87% 81.0%
5 85% 85% 79% 85% 76% 80.0%
6 53% 88% 83% 87% 83% 78.8%
7 84% 84% 87% 78% 86% 83.8%
Total Average 83.5%
Table 4.2. Accuracy per class across different folds, including average accuracy for
each class and the total average accuracy.
We observe that CNN outperforms DenseNet when considering the overall average
accuracy across all classes. Specifically, CNN achieves an average accuracy of 91.1%,
while DenseNet falls short at 83.5%.
Looking closer at the class-wise average accuracies, the results further highlight
CNN’s superiority. For CNN, the average accuracies across classes 0 to 7 are 88.2%,
90.2%, 92.4%, 94.6%, 93.4%, 87.8%, 88.0%, and 92.0%, respectively. In contrast,
DenseNet’s average accuracies for the same classes are 87.6%, 87.4%, 86.0%, 86.4%,
81.0%, 80.0%, 78.8%, and 83.8%. These figures demonstrate that CNN consistently
achieves higher accuracy across most classes, highlighting its overall effectiveness in
binary classification tasks.
• Conv2D Layers: The model included three convolutional layers with 32, 64, and
128 filters, respectively, each with a kernel size of (3, 3).
• Pooling: MaxPooling2D was applied after each convolutional layer with a pool
size of (2, 2) to reduce the spatial dimensions.
• Output Layer: A softmax activation function was used for the final layer to
classify the images into the respective classes.
• Optimizer: Adam.
• Metrics: Accuracy.
The CNN achieved a test accuracy of 74%. To visualize the model’s performance,
figures of CNN accuracy and CNN loss over the epochs are provided in figure 4.3.
After working with the CNN model, I proceeded with training a DenseNet model
for the multi-class classification task. The dataset was consistent with the previous
setup.
The following hyperparameters were set when configuring the DenseNet model:
• Global Average Pooling: Applied after the base model to reduce the feature
dimensions.
• Output Layer: A softmax activation function was used for the final layer to
classify the images into the respective classes.
• Optimizer: Adam.
• Loss Function: Categorical cross-entropy.
• Metrics: Accuracy.
The DENSENET achieved a test accuracy of 84%. To visualize the model’s perfor-
mance, figures of CNN accuracy and CNN loss over the epochs are provided in figure
4.4.
With an overall accuracy of 84%, the DenseNet model outperformed the CNN model
in the multi-class classification setting. By contrast, the accuracy of the CNN model
was 74%. This outcome demonstrates how well DenseNet manages the complexity of
multi-class classification tasks.
IQR = Q3 − Q1
upper_bound = Q3 + 1 × IQR
outliers = distances > upper_bound
Points with distances exceeding this upper bound were marked as outliers and re-
moved from the dataset.
The refined dataset, free from outliers, was then used for binary and multi-class
classification tasks with CNN and DenseNet models to ensure more accurate and reli-
able classification results.
The plots shown, are the "Distance Distribution Plot with Outlier Threshold" for
Cluster 0 and Cluster 1 of class 1, as illustrated in the figures 4.6 and 4.7.
Figure 4.6: Distance Distribution Plot with Outlier Threshold
From these plots, it is evident that distances exceeding the red dotted line are con-
sidered outliers. After applying the outlier detection process, 29 points were identified
as outliers and removed from the dataset. Consequently, 971 points remained for class
1, which is designated as "dyed and lifted polyps."
To find the optimal number of clusters, the silhouette score for each class was computed.
Below are the silhouette score plots for each class:
The K-Means clustering algorithm was used to determine the optimal number of clus-
ters for each class, and the distances between each data point and the centroid of its
cluster were then calculated. Then, to find outliers, the Interquartile Range (IQR)
approach was applied. The following figures display the "Distance Distribution Plot
with Outlier Threshold" for each class:
Figure 4.15: Distance Distribution Plot with Outlier Threshold for Class 2
Figure 4.16: Distance Distribution Plot with Outlier Threshold for Class 2
Figure 4.17: Distance Distribution Plot with Outlier Threshold for Class 3
Figure 4.18: Distance Distribution Plot with Outlier Threshold for Class 3
Figure 4.19: Distance Distribution Plot with Outlier Threshold for Class 4
Figure 4.20: Distance Distribution Plot with Outlier Threshold for Class 4
Figure 4.21: Distance Distribution Plot with Outlier Threshold for Class 5
Figure 4.22: Distance Distribution Plot with Outlier Threshold for Class 5
Figure 4.23: Distance Distribution Plot with Outlier Threshold for Class 6
Figure 4.24: Distance Distribution Plot with Outlier Threshold for Class 6
Figure 4.25: Distance Distribution Plot with Outlier Threshold for Class 7
Figure 4.26: Distance Distribution Plot with Outlier Threshold for Class 7
Figure 4.27: Distance Distribution Plot with Outlier Threshold for Class 8
Figure 4.28: Distance Distribution Plot with Outlier Threshold for Class 8
In all figures, the red dotted line represents the threshold beyond which points are
considered outliers. Distances greater than this threshold are marked as outliers and
are removed from the dataset.
For each class, the number of outliers identified and removed is as follows:
• Class 2: 7
• Class 3: 67
• Class 4: 38
• Class 5: 61
• Class 6: 43
• Class 7: 47
• Class 8: 50
In total, 342 outlier points were identified and removed across all 8 classes. After
excluding these outliers, the refined dataset comprises 7,658 data points out of the
original 8,000. This refined dataset was then used for subsequent classification tasks.
After removing the outliers, the refined dataset has been used for classification
tasks.
The rest of the hyperparameters, including Conv2D filters, pooling layers, Dense
layers, optimizer, loss function, metrics, batch size, and early stopping criteria, are the
same as in the raw training scenario.
Table 4.3. Accuracy per class across different folds, including average accuracy for
each class and the total average accuracy.
Table 4.3 shows that the accuracies varied for each class across different folds. The
highest accuracies for each class are summarized as follows:
• Class 0: The highest accuracy was achieved in Folds 1, 4, and 5, each with a
value of 89%.
• Class 2: The highest accuracies were achieved in both Fold 2 and Fold 5, each
with a value of 94%.
• Class 3: The highest accuracies were observed in Folds 1, 2, 3, and 5, all reaching
95%.
• Class 4: At 99
These findings show how the model’s performance varies at various folds and em-
phasize the folds where the model achieved the highest accuracy for each class.
For a visual representation of these accuracies, see Figure 4.15, which shows the
distribution of accuracy across the different folds for each class.
Figure 4.29: CNN Test Accuracies Per Class
DenseNet was used in the analysis after the Convolutional Neural Network (CNN)
was trained on the improved dataset for binary classification. The DenseNet model
was set up with an input shape of (48, 48, 3) at this stage. The DenseNet model used a
basis of DenseNet121 with a Dense layer of 128 units with ReLU activation, pre-trained
on ImageNet and omitting the top layer. For binary classification, dropout was set to
0.5 and there was one unit in the output layer with a sigmoid activation function. The
binary cross-entropy loss function and Adam optimiser were employed in the model,
and accuracy was the evaluation metric. There were three training epochs and a batch
size of 4288. ImageDataGenerator was used to enrich the data with features including
rescaling, shearing, zooming, and horizontal flipping. For evaluation. With the dataset
divided into training , test and validation sets , 5-fold cross-validation was used for
evaluation. The optimiser, loss function, metrics, and dropout rate—the remaining
hyperparameters, align with those used in the raw training scenario.
Table 4.4 illustrates the variation in accuracies for each class across different folds.
The summary of the highest accuracies achieved for each class is as follows:
• Class 0: The highest accuracy of 0.8731 was consistently achieved across all
folds (Folds 1, 2, 3, 4, and 5).
Table 4.4. Accuracy per class across different folds, including average accuracy for
each class and the total average accuracy.
• Class 2: The highest accuracy of 0.8787 was achieved in Fold 4 and Fold 5.
• Class 3: The highest accuracy of 0.8750 was consistently observed across Folds
1, 2, 4, and 5.
• Class 5: The highest accuracy of 0.8750 was consistent across all folds (Folds 1,
2, 3, 4, and 5).
• Class 7: Folds 1, 2, and 5 had the highest accuracy, which was 0.8759.
Figure 4.30: DENSENET Test Accuracies Per Class
For binary classification on refined data, CNN outperforms DenseNet when considering
the overall average accuracy across all classes. CNN achieves an average accuracy of
91.0%, while DenseNet falls behind at 73.1%. Examining class-wise average accuracies,
CNN consistently shows strong performance across all classes, with accuracies ranging
from 88.6% to 95.2%. In contrast, DenseNet’s performance is notably inconsistent,
with accuracies varying significantly between 44.6% and 88.0%, indicating a less reliable
classification across different classes.
Runs Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10
Accuracy 75% 71% 78% 75% 75% 75% 73% 72% 71% 75%
Here we can see from table 4.5 that Run 3 achieved the highest accuracy of 78%.
Figure 4.31: Training and validation accuracy and loss for Run 3.
After trainin cnn for the multi-class classification task using DenseNet on refined
data, images were resized to (48, 48, 3), matching the size used for CNN in this
scenario. For 30 epochs, the DenseNet model was trained with early stopping, and
the training process was repeated for 10 runs. Accuracy was plotted for each run to
assess performance. All other hyperparameters, including the optimizer, loss function,
and model architecture, are consistent with the configurations used in the raw data
scenario.
RUNS RUN 1 RUN 2 RUN 3 RUN 4 RUN 5 RUN 6 RUN 7 RUN 8 RUN 9 RUN 10
Accuracy 67% 69% 61% 79% 73% 69% 55% 66% 75% 67%
For the multi-class classification task using refined data, The accuracy achieved by the
Convolutional Neural Network (CNN) was 78% at Run 3, whereas the DenseNet model
achieved a slightly higher accuracy of 79% in Run 4. This result indicates a competitive
performance between the two models, with DenseNet showing a marginal advantage in
this scenario. The performance of both models highlights their effectiveness in handling
multi-class classification problems when refined data is utilized.
When evaluating the performance of CNN and DenseNet in both binary and multi-class
classification tasks on refined data, distinct differences in overall average accuracy are
observed.
For CNN, the binary classification scenario demonstrates superior performance com-
pared to the multi-class classification task. The CNN achieves an overall average accu-
racy of 91.0% in binary classification, which is significantly higher than its performance
in multi-class classification, where the overall accuracy is 78% in Run 3. This indicates
that CNN performs notably better in binary classification scenarios on refined data,
reflecting its strength in distinguishing between two classes.
In the case of DenseNet. For binary classification, DenseNet achieves an overall
average accuracy of 73.1%, while its multi-class performance is slightly better, with
an accuracy of 79% in Run 4. Although DenseNet performs better in multi-class clas-
sification compared to its binary classification scenario, the improvement is relatively
modest compared to the gains seen in CNN.
• Darkened Images: Some images appear significantly darker than their non-
outlier counterparts. This reduction in brightness can obscure important details,
making it challenging to accurately assess the condition of the gastrointestinal
tract.
• Slight Blurring: A number of outlier images are slightly blurred, which dimin-
ishes the clarity of the visual information. Blurred images may hinder the precise
identification of features and conditions.
• Polyp Removal Class: Specifically for the "Polyp Removal" class, the outlier
images exhibit significant issues that impede the visibility of critical details. In
these images, it is difficult to discern whether the polyps have been properly dyed
or removed, making it challenging to evaluate the effectiveness of the removal
procedure.
To illustrate these issues, two sample outlier images from each category are shown
in Figure 4.7. This figure presents representative examples of the visual anomalies
observed across the different categories, providing a clear view of the types of challenges
encountered with outlier data.
These visual anomalies highlight the limitations and challenges of working with
outlier data, underscoring the importance of ensuring high-quality, well-lit, and sharp
images for accurate medical analysis.
Category Image 1 Image 2
Normal Z-Line
Normal Pylorus
Normal Cecum
Esophagitis
Polyps
Ulcerative Colitis
Conclusions
The analysis of model performance on both raw and refined datasets reveals notable
insights into the effectiveness of data refinement through outlier detection.
For binary classification tasks, CNN demonstrated consistent performance, achiev-
ing a high accuracy of 91.1% with raw data and a slightly improved accuracy of 91.0%
with refined data. This indicates that the CNN model’s ability to classify binary cat-
egories is robust and relatively unaffected by the refinement process.
In contrast, DenseNet showed a significant drop in binary classification accuracy,
falling from 83.5% with raw data to 73.1% with refined data. This decline suggests
that the data refinement process, which involves outlier detection, adversely impacted
DenseNet’s performance in binary classification.
For multi-class classification, DenseNet initially outperformed CNN with an accu-
racy of 84% on raw data, compared to CNN’s 74%. However, after data refinement,
DenseNet’s accuracy decreased to 79%, while CNN’s accuracy improved slightly to 78%.
Despite this improvement, CNN still did not surpass DenseNet’s original multi-class
performance.
In summary, the refinement process through outlier detection had no significant
impact on CNN’s binary classification accuracy and only a slight enhancement in multi-
class accuracy. Conversely, it led to a notable decrease in DenseNet’s performance
across both classification tasks. Overall, while the refinement process showed limited
benefits for CNN and detrimental effects for DenseNet, it did not significantly enhance
the overall performance of the models.
81
Bibliography
[1] Shibin Wu, Ruxin Zhang, Jiayi Yan, Chengquan Li, Qicai Liu, Liyang Wang,
and Haoqian Wang. High-speed and accurate diagnosis of gastrointestinal disease:
Learning on endoscopy images using lightweight transformer with local feature
attention. Bioengineering, 10(12):1416, 2023.
[2] Ibtesam M Dheir and Samy S Abu-Naser. Classification of anomalies in gastroin-
testinal tract using deep learning. 2022.
[3] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Griwodz, Sigrun Losada
Eskeland, Thomas de Lange, Dag Johansen, Concetto Spampinato, Duc-Tien
Dang-Nguyen, Mathias Lux, Peter Thelin Schmidt, et al. Kvasir: A multi-class
image dataset for computer aided gastrointestinal disease detection. In Proceedings
of the 8th ACM on Multimedia Systems Conference, pages 164–169, 2017.
[4] Yan Gao, Weining Lu, Xiaobei Si, and Yu Lan. Deep model-based semi-supervised
learning way for outlier detection in wireless capsule endoscopy images. IEEE
Access, 8:81621–81632, 2020.
[5] Dimitris K Iakovidis, Spiros V Georgakopoulos, Michael Vasilakakis, Anastasios
Koulaouzidis, and Vassilis P Plagianakos. Detecting and locating gastrointestinal
anomalies using deep learning and iterative cluster unification. IEEE transactions
on medical imaging, 37(10):2196–2210, 2018.
[6] Abel KahsayGebreslassie, Misgina Tsighe Hagos, et al. Automated gastrointestinal
disease recognition for endoscopic images. In 2019 International Conference on
Computing, Communication, and Intelligent Systems (ICCCIS), pages 312–316.
IEEE, 2019.
[7] Muhammad Ramzan, Mudassar Raza, Muhammad Imran Sharif, and Seifedine
Kadry. Gastrointestinal tract polyp anomaly segmentation on colonoscopy images
using graft-u-net. Journal of Personalized Medicine, 12(9):1459, 2022.
[8] Sami H Ismael, Shahab W Kareem, and Firas H Almukhtar. Medical image
classification using different machine learning algorithms. AL-Rafidain Journal of
Computer Sciences and Mathematics, 14(1):135–147, 2020.
82
[9] Marc D Kohli, Ronald M Summers, and J Raymond Geis. Medical image data
and datasets in the era of machine learning—whitepaper from the 2016 c-mimi
meeting dataset session. Journal of digital imaging, 30:392–399, 2017.
[10] Marzieh Esmaeili, Amirhosein Toosi, Arash Roshanpoor, Vahid Changizi, Marjan
Ghazisaeedi, Arman Rahmim, and Mohammad Sabokrou. Generative adversarial
networks for anomaly detection in biomedical imaging: A study on seven medical
image datasets. IEEE Access, 11:17906–17921, 2023.
[11] Yu Cai, Weiwen Zhang, Hao Chen, and Kwang-Ting Cheng. Medianomaly:
A comparative study of anomaly detection in medical images. arXiv preprint
arXiv:2404.04518, 2024.
[12] Mengfang Li, Yuanyuan Jiang, Yanzhou Zhang, and Haisheng Zhu. Medical image
analysis using deep learning algorithms. Frontiers in Public Health, 11:1273253,
2023.
[13] Jamal N Hasoon, Ali Hussein Fadel, Rasha Subhi Hameed, Salama A Mostafa,
Bashar Ahmed Khalaf, Mazin Abed Mohammed, and Jan Nedoma. Covid-19
anomaly detection and classification method based on supervised machine learning
of chest x-ray images. Results in Physics, 31:105045, 2021.
[14] Alexander P Abadir, Mohammed Fahad Ali, William Karnes, and Jason B Sama-
rasena. Artificial intelligence in gastrointestinal endoscopy. Clinical endoscopy,
53(2):132–141, 2020.
[15] Justin Ker, Lipo Wang, Jai Rao, and Tchoyoson Lim. Deep learning applications
in medical image analysis. Ieee Access, 6:9375–9389, 2017.
[16] Ashish Jain, Rohit Singh, and Priyanka Singh. Enhancing outlier detection and
dimensionality reduction in machine learning for extreme value analysis. Int. J.
Advanced Networking and Applications, 15(06):6204–6210, 2024.
[17] N Sri Krishna, YV Pavan Kumar, K Purna Prakash, and G Pradeep Reddy.
Machine learning and statistical techniques for outlier detection in smart home
energy consumption. In 2024 IEEE Open Conference of Electrical, Electronic and
Information Sciences (eStream), pages 1–4. IEEE, 2024.
[22] AM Abu Nada, Eman Alajrami, Ahmed A Al-Saqqa, and Samy S Abu-Naser. Age
and gender prediction and validation through single user images using cnn. Int.
J. Acad. Eng. Res.(IJAER), 4:21–24, 2020.
[23] Mukrimah Nawir, Amiza Amir, Ong Bi Lynn, Naimah Yaakob, and
R Badlishah Ahmad. Performances of machine learning algorithms for binary
classification of network anomaly detection system. In Journal of Physics: Con-
ference Series, volume 1018, page 012015. IOP Publishing, 2018.
[24] Samir S Yadav and Shivajirao M Jadhav. Deep convolutional neural network based
medical image classification for disease diagnosis. Journal of Big data, 6(1):1–18,
2019.
[25] Livia Faes, Siegfried K Wagner, Dun Jack Fu, Xiaoxuan Liu, Edward Korot,
Joseph R Ledsam, Trevor Back, Reena Chopra, Nikolas Pontikos, Christoph Kern,
et al. Automated deep learning design for medical image classification by health-
care professionals with no coding experience: a feasibility study. The Lancet Digital
Health, 1(5):e232–e242, 2019.
[26] Zahra Amiri, Arash Heidari, Nima Jafari Navimipour, Mehmet Unal, and Ali
Mousavi. Adventures in data analysis: A systematic review of deep learning
techniques for pattern recognition in cyber-physical-social systems. Multimedia
Tools and Applications, 83(8):22909–22973, 2024.
[27] Joost N Kok, Jacek Koronacki, Ramon Lopez de Mantaras, Stan Matwin, and
Dunja Mladenic. Knowledge Discovery in Databases: PKDD 2007: 11th European
Conference on Principles and Practice of Knowledge Discovery in Databases, War-
saw, Poland, September 17-21, 2007, Proceedings, volume 4702. Springer Science
& Business Media, 2007.
[29] Zhiwen Huang, Xingxing Zhu, Mingyue Ding, and Xuming Zhang. Medical image
classification using a light-weighted hybrid neural network based on pcanet and
densenet. Ieee Access, 8:24697–24712, 2020.
[30] Ziliang Zhong, Muhang Zheng, Huafeng Mai, Jianan Zhao, and Xinyi Liu. Cancer
image classification based on densenet model. In Journal of physics: conference
series, volume 1651, page 012143. IOP Publishing, 2020.
[31] I Gethzi Ahila Poornima and B Paramasivan. Anomaly detection in wireless sensor
network using machine learning algorithm. Computer communications, 151:331–
337, 2020.
[32] Harry Pratt, Frans Coenen, Deborah M Broadbent, Simon P Harding, and Yalin
Zheng. Convolutional neural networks for diabetic retinopathy. Procedia computer
science, 90:200–205, 2016.
[33] K Hanumantha Rao, G Srinivas, Ankam Damodhar, and M Vikas Krishna. Imple-
mentation of anomaly detection technique using machine learning algorithms. In-
ternational journal of computer science and telecommunications, 2(3):25–31, 2011.
[34] Malinda Vania, Bayu Adhi Tama, Hasan Maulahela, and Sunghoon Lim. Re-
cent advances in applying machine learning and deep learning to detect upper
gastrointestinal tract lesions. IEEE Access, 2023.
[36] Thomas De Lange, Pål Halvorsen, and Michael Riegler. Methodology to develop
machine learning algorithms to improve performance in gastrointestinal endoscopy.
World journal of gastroenterology, 24(45):5057, 2018.
[37] Timothy Cogan, Maribeth Cogan, and Lakshman Tamil. Mapgi: Accurate identi-
fication of anatomical landmarks and diseased tissue in gastrointestinal tract using
deep learning. Computers in biology and medicine, 111:103351, 2019.
[38] Jingwei Song, Mitesh Patel, Andreas Girgensohn, and Chelhwon Kim. Combining
deep learning with geometric features for image-based localization in the gastroin-
testinal tract. Expert Systems with Applications, 185:115631, 2021.
[39] Girik Pachauri and Sandeep Sharma. Anomaly detection in medical wireless sensor
networks using machine learning algorithms. Procedia Computer Science, 70:325–
333, 2015.
[40] R. Vijaya Kumar Reddy et al. Machine learning based outlier detection for med-
ical data. Indonesian Journal of Electrical Engineering and Computer Science,
24(1):564–569, 2021.
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]
[23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40]