Thesis Format 2
Thesis Format 2
Thesis Format 2
using
Convolutional Neural Networks
A Project Report
submitted by
BACHELOR OF SCIENCE
This is to certify that the thesis titled Diabetic Retinopathy Prediction using Convolu-
tional Neural Networks, submitted by Rajib Das Bhagat (CS17M034), to the Ananda
Mohan College, Kolkata-9, for the award of the degree of Bachelor of Science, is a
bonafide record of the project work done by him under our supervision. The contents
of this project, in full or in parts, have not been submitted to any other Institute or
University for the award of any degree or diploma.
Place: Chennai
Date:
ACKNOWLEDGEMENTS
I am very much thankful to my project guide Prof. Balaraman Ravindran for his prolific
encouragement and guidance. He consistently allowed this thesis to be my own work,
and steered me in the right direction whenever he thought I needed it.
I would also like to thank my friends and Mr. Karthik Thiagarajan (MS Scholar,
Dept. of Computer Science & Engineering, IIT Madras) and Mr. Saurabh Desai (Project
Associate, RBC-DSAI laboratory, IIT Madras) for making the 2 years of my stay here
most cherishable. Their presence unfolded numerous eventful experiences that I will
always remember.
Diabetic Retinopathy is depicted as any harm caused to the retina of an eye, making
vision hindrance to the general population experiencing diabetes. Diabetic Retinopa-
thy as a term alludes to retinal vascular illness, or harm to the retina brought about by
irregular blood flow and the nearness of exudates and hemorrhages as well. The identi-
fication of features related to abnormalities in the images taken from a diabetic patient
effectively is troublesome and truly tedious. The point of this examination is to investi-
gate how we can automatize the procedure of accurately predicting and classifying DR.
The characterization is performed dependent on binary and multi-class classification
and the rightness is contemplated dependent on its sensitivity, specificity and precision.
The system was prepared utilizing a GPU, which is outfitted with top of the line
illustrations handling unit (GPU), giving high computational power. This report is
based on multi-class classification and in continuation to binary classification.
TABLE OF CONTENTS
ACKNOWLEDGEMENTS i
ABSTRACT ii
LIST OF TABLES v
LIST OF FIGURES vi
1 INTRODUCTION 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Diabetic Retinopathy and Stages . . . . . . . . . . . . . . . . . . . 2
1.3.1 DR Stages . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 5
2 BACKGROUND 6
2.1 Convolutional Neural Network (CNNs) . . . . . . . . . . . . . . . 6
2.1.1 Local Receptive Fields . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Issues in CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 RELATED WORK 9
4 THE DATASET 11
4.1 Dataset Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Issues related to dataset: . . . . . . . . . . . . . . . . . . . . . . . 12
5 METHODOLOGY 13
5.1 Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Hardware and Software: . . . . . . . . . . . . . . . . . . . . . . . 15
6 EXPERIMENTS 16
6.1 The Initial Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.1.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 16
6.1.2 Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2 The Final Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 18
6.2.2 Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7 RESULTS 22
7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
8 CONCLUSION 27
LIST OF TABLES
INTRODUCTION
1.1 Motivation
Deep learning is an application of artificial intelligence (AI) that frameworks the capac-
ity to consequently take in and improve for a fact without being unequivocally
modified. AI centers around the advancement of computers programs that can get
information and use it to learn for themselves.
The way toward learning begins with perceptions or information, for example, mod-
els, direct involvement, or guidance, so as to search for examples/patterns in informa-
tion and settle on better choices. The essential point is to allow the computers to learn
and adapt consequently without human assistance or help and adjust actions accord-
ingly. Considering the scenario when a person suffering from diabetic retinopathy. As
per medicinal reports, because of diabetic retinopathy an individual experiencing dia-
betes have high odds of vision loss (11). The conditions happen when individuals with
high glucose levels, causing vein harm in the retina. This entanglement, can prompt
swelling and afterward spillage of veins which can prompt enduring vision misfortune.
Due to the severity of the disease, a diabetic person completely loss vision sight and
might lead to other corollary harm too. Thus, the study is to explore how we can
automate the process of correctly classify the eye images into different stages of DR
and take precautionary measures before hand.
In this thesis, eye images based on binary and multi-class classification are
analyzed, experimented with already existing method and discussion is made. In the
first phase of the project, all the background details related to deep learning were
studied and
the system was set-up accordingly. While in second phase, experiments for binary
classification was performed. For the final phase of the project, experiments on multi-
class classification on the fundus images were performed.
At first, diabetic retinopathy may cause no symptoms or just mild vision issues. The
longer a person suffering from diabetes and the less controlled the blood sugar is, the
more likely one is to build up this eye complication. In brief, it can cause visual impair-
ment.
In non-proliferative DR [Table 1.1], the walls of the blood vessels in retina weak-
ens. Micro-aneurysms develops from the vessel walls of the smaller vessels, sometimes
leaking fluid and blood into the retina. Larger retinal vessels can begin to dilate and
become irregular in shape, as well. Non-proliferative DR can progress from mild to
severe, as more blood vessels become blocked. Nerve fibers in the retina may begin to
swell, and the central part of the retina (macula) begins to swell (macular edema) too.
Figure 1.1: A normal eye [left] and related abnormalities [right] (12).
In this proliferative type [Table 1.1], damaged blood vessels clogs off, causing the
growth of new abnormal blood vessels in the retina, and can leak lipids, a jelly-like
substance that fills the center of an eye (vitreous).
Eventually, the growth of new blood vessels may cause the retina to detach from
the back. Pressure may build up in the eye, causing damages to the nerve that carries
images from your eye to your brain (optic nerve), resulting in glaucoma.
(a) Mild (b) Moderate
Figure 1.2: Fundus image of DR stages [related abnormalities are circled] (13)
1.3.1 DR Stages
i. Normal
The earliest visible detectable lesions are micro-aneurysms. Micro-aneurysms are usu-
ally round, red, tiny dots sore like, which may leak fluid into the retina. Micro-
aneurysms may occur in a group or in isolation, this is classified as mild non-
proliferative [Fig.1.2a].
As a mild diabetic retinopathy progresses, the blood vessels that nourish the retina may
swell or distort a bit. Blood vessel loses their ability to transport blood. The lesion
present in this is known as soft exudates. The lesion is like cotton wool spots or micro-
infarctions. Soft exudates are small, white lesions not clearly visible and are termed as
a moderate non-proliferative [Fig.1.2b].
Hard exudates lesion present and spilled lipid from the debilitated blood vessels. Le-
sions are ordinarily yellow colored and clear edges are waxy alike. Hard exudates tend
to accumulate in groups. This stage is severe non-proliferative [Fig.1.2c].
v. Proliferative
When micro-aneurysm ruptures, spillage of blood occurs from the blood vessels. This
is known as hemorrhages. Hemorrhages are the red dot, flame-like presence in cluster
or rings. New blood vessels also tend to develop, which are typically weak and gets
torn off easily. This draining can cause perpetual vision loss. This happens due to
absence of oxygen and known as neovascularization. The presence of hemorrhage and
neovascularization is the final stage of DR and is termed as proliferative [Fig.1.2d].
The different abnormalities identified and related to DR are shown in Fig. 1.1 [right]
and in fundus images [Fig. 1.2]. In this report, section 1 incorporate the introduction
with a brief explanation of an eye structure and DR classification, section 2 relates to a
brief rewind of convolutional neural network and related issues. Section 3 presents the
literature survey and reviews in the related field. The dataset used is described in section
4. Section 5 highlights the methodology and strategies used. Section 6 deals with
various experiments performed as initial and final phase. While section 7 previews the
results both for binary and multi-class classification and section 8 concludes with
conclusion.
CHAPTER 2
BACKGROUND
In neural systems, convolutional neural system (ConvNets or CNNs) is one of the prin-
ciple classifications to do images recognition, images classifications. CNN image clas-
sifications takes an input image, process it and classifies depending on the input and
activation function.
A CNN comprises of convolution layers, pooling layers and fully connected lay-
ers. A conventional layer applies operations known as a convolution operation to the
input and pass the result to the next layer. Each neuron processes data in its local
receptive fields. Series of steps involved are feed-forwarding the input values and back-
propagating while updating and minimizing the parameters termed as weights and bias.
And, the accuracy is finally calculated.
The basis working steps for convolutional neural network architecture are further
described below.
Input size of n x n neurons is fed into a CNN. These n x n neurons are termed as input
pixels. Input pixels are connected to the first hidden layers comprising of many hidden
neurons. Here, every input neurons are not connected to every hidden neuron.
Generally, only a few connections are made in small. These small connections are based
on localized regions for the input neurons.
The localized region for the input neurons is the local receptive field for the hidden
neurons. This is like a small window on the input neurons present. Thus, the connec-
tions from input neurons to hidden neurons represents both weights and biases. The
local receptive field is slided and the output is different for different hidden neurons.
The sliding of the local receptive field is done in accordance with stride length. Stride
can be termed as the number of pixels need to be slided for the small window present.
If stride length is 1, 2, ..., n; the local receptive field can be slide by 1, 2, ..., n neurons
respectively.
2.1.2 Filters
Filter is the mapping of the input layer to the next hidden layer. The weights present in
the filter is termed as shared weights, while the bias is termed as shared bias. This shared
weights and bias is defined as a kernel or filter. Filters are used to blur images, sharpen
images, and also performs edge detection. In CNN, filters are not pre-defined. The
weights in each filter is learned as the training process is carried on.
2.1.3 Pooling
Pooling layers are used after each and every convolution layer. Pooling technique com-
bines the neuron cluster output at one layer into a single neuron in the next layer. The
feature use of pooling is to simplify the output from the convolutional layer. Pooling
progressively reduces the number of parameters and the huge number of computation
associated with CNN. This simplification is done to generate a compressed model of
the feature map. The pooling techniques available are max-pooling, average-pooling
and l2-pooling.
Neural networks are fundamentally difficult to train, because of many known issues
included. Few of the issues related are narrated and some of the compact solutions are
described here.
While a model starts to learn, after sometime it happens that the CNN stops learning.
This issue is termed as neuron saturation. Neuron saturation is a huge problem in neural
networks. By using a proper choice of activation function such as ReLu or LeakyReLu,
this neuron saturation issue can be solved. If the neurons saturates the training process
is stopped. This is modelled as early stopping.
Other known issues, such as vanishing gradient problem (9) and exploding gradient
problem (9) do also exist. Neurons learns much slowly in the earlier layer in contrasted
to neurons in the later layers. This is the vanishing gradient problem. In converse to
this, is exploding gradient problem. Last, but not the least neural networks sometimes
learns at an alternate pace and is known as an unstable gradient problem.
CHAPTER 3
RELATED WORK
Correctness in screening for diabetic retinopathy is the first and foremost essential per-
spective for further treatment. The identification precision is important for both cost
and treatment effectiveness.
Numerous amount of research work [Table 3.1] for DR classification such as binary
classification (1), three-class classification (3) and five-class classification (4) (5) (6)
has been carried out, and still in advancement on the kinds of characterization and
classification of DR.
It tends to plainly observe [Table 3.1] that the significant part of the works have
been carried out utilizing the SVM technique. This technique requires feature extraction
methods before being feed into the SVM classifier. For five-class classification,
features such as micro-aneurysm, exudates, hemorrhages were used (5). A very small
number of the dataset were used for training, testing or validation purpose. For the real-
time ap- plication, a convolutional neural network is much appropriate as far as
characterization and classification, as well as prediction is concerned.
Classification Techniques
Authors Paper Year Accuracy
Type Used
GG
Automatic detection of diabetic retinopathy
Gardner, D Neural
1 using an artificial neural network:a 1996 Binary 88.4%
Keating, Network
screening tool (1)
T H Williamson,
A T Elliott
Nayak J,
Bhat PS,
Automated identification of diabetic Neural
2 Acharya R, 2008 Three class 93.0%
retinopathy stages using digital fundus images Network
Lim CM,
(3)
Kagathi M
1 Acharya
0 UR, Chua
Application of higher order spectra for the
3 CK, Ng EY, 2008 Five Class SVM 82.0%
identification of diabetes retinopathy stages
Yu W,
(4)
Chee C
Acharya UR,
Lim CM,
Computer-based detection of diabetes
4 Ng EY, 2009 Five Class SVM 85.9%
Chee C, retinopathy stages using digital fundus images (5)
Tamura
T
P. Adarsh, Multiclass SVM-based automated diagnosis
5 2013 Five Class SVM 96.0%
D. Jeyakumari of diabetic retinopathy (6)
THE DATASET
A competition was hosted by Kaggle on the classification of DR. This dataset is the
part of the competition. The image dataset was collected from Kaggle website (13). The
website contains a total of 88,702 images for DR classification [Table 4.1] and the total
size is of around 89 GB. The image dimensions are around 6M pixels per image. The
image dimensions were really huge. The image’s dimensions were re-sized to 512 x
512 pixels, in order to make-up for the memory issue. The re-sized images of testing
and training data were barely 5.7 GB and 3.73 GB respectively. The training and testing
files were sorted according to their respective categories as normal, mild, moderate,
severe and proliferative DR from the original dataset [Table 4.2, 4.3].
DR Type Count
0 Normal 25810
1 Mild Non-proliferative 2443
2 Moderate Non-proliferative 5292
3 Severe Non-proliferative 873
4 Proliferative 708
Total 35126
For binary classification, category [3,4] was grouped as abnormal and category [0]
as normal was chosen for training and testing respectively. The reason for grouping is
that in category [1,2]; there were no visual symptoms available for our naked eye.
Hence, in category [3,4] symptoms were clearly visible and it was grouped and chosen
for training and testing purpose. Total of 3600 files chosen as training data and 400 was
chosen for testing data for each category named abnormal and normal. Hence, the total
of 8000 files was selected for training and testing purpose.
From the Table 4.2, 4.3 provided above, it can also be seen that the classes are clearly
imbalance. Imbalanced class typically refers to a problem with classification problems
where the classes are not represented equally. The known issue with imbalance classes
is overfitting with respect to highest number of instance class. Down-sampling of higher
class as well as up-sampling of lower classes of instances were performed to address
this issue. As the number of experiments increased, the process of down-sampling and
up-sampling was performed repeatedly, to get some optimal results.
The dataset also contained lots of noisy data. Some images were corrupted, blur,
not properly focused, thus making the pixel intensity varying a lot. Lighting effects also
create unnecessary distortion. Few of the noisy images were manually removed.
12
CHAPTER 5
METHODOLOGY
The convolutional neural network architecture used is shown in Table 5.1, this similar
architecture was used by Harry Pratt (7). The architecture comprising of 10 convolu-
tional layers and 3-fully connected layers. The input layer was fed with size 512 x 512
neurons. Each convolution layer is trailed by a leaky rectified linear unit (LeakyReLu)
activation function, along with batch normalization. For pooling layers, a max-pooling
technique is used with kernel size 3 x 3 and 2 x 2 strides. A kernel size is basically an
odd number. Before adding to the fully-connected layer, the network from the final
convolutional layer was flattened to one dimension.
For running our CNN, GPU cluster was used. The cluster has high-end GPU. Ran- dom
GPU type available were used such as k20, k80, TitanX or 1080Ti. The GPU contains
deep neural network libraries for machine learning and training, with Tensor- Flow
support at the back-end. Along with GPU, keras package which is a high-level neural
network API is used in order to train and test the neural network model. Keras
documentation is available at https://keras.io.
CHAPTER 6
EXPERIMENTS
In the initial phase of experiment, the fundus images were classified as abnormal and
normal. Due to presence of noisy and unbalanced training data, fewer number of data
was manually chosen for training. The images were pre-processed before feeding it to
the convolutional neural network. And, process of augmentation was also applied.
6.1.1 Pre-processing
As described earlier, the images contained a huge amount of noise. Thus, Gaussian Blur
color normalization [Fig. 3b] was implemented using OpenCV (https://opencv.org/)
package numpy. The Gaussian blur is a type of image-blurring filter channel that uses
a Gaussian function for calculating the transformation to apply to each pixel in the
image.
2π
σ2
Here, along with color normalization, the images were re-sized to 512 x 512. The
pre-processing algorithm (10) was used by Ben Graham; one of the winners in Kaggle
diabetic retinopathy competition and the algorithm is stated as:
Algorithm 1 Gaussian Pre-processing
procedure PRE-PROCESSING(source):
for file in glob.glob(source):
inputImage = cv2.imread(file, 1)
resizedImage = cv2.imread(inputImage, (512, 512))
gaussBlur = cv2.GaussianBlur(resizedImage, (0, 0), 10)
processedImage = cv2.addweigted(resizedImage, 4, gaussBlur, -4, 128)
return processedImage
6.1.2 Augmentation
After pre-processing the images, the images were fed into the convolutional neural
network. The pre-processed images were used for training the convolutional neural
network. Later, during every epoch performed the images were randomly augmented.
The augmentation performed details can be found in the Table 6.1.
Point to be remember that the augmentation is performed on training data only and
not on test data. Although, re-scaling of input images is performed both on training and
testing data.
Figure 6.1: Showing an original, and applied pre-processing and augmentation image.
described before needed to be resolve. Few of the noisy images were manually removed
such as totally blank, corrupted, unfocused, blur images. The dataset was also unfairly
unbalanced. The technique of down-sampling and up-sampling was used to balance the
training dataset.
6.2.1 Pre-processing
The gaussian pre-processing technique [6.1.1] used for binary classification in the
initial phase was used.
6.2.2 Augmentation
The training data was augmented with the values provided in the table 6.1, with few
more added attributes to initial phase.
6.3 Training
After all the set-up being constructed and done, the CNN was trained with split-ted
training dataset into 90% as training and 10% as validation. The initial epochs with
value 30 were used for training the neural network. This was basically done to know
whether all the parameters used was capable of handling the neural network or not. It
was found that the validation accuracy fluctuated a lot. Hyper-parameters was tuned
and
many experiments were performed for tuning and training the convolutional network.
For tuning the learning rate if the cost loss is too high, then the learning rate is decreased
by multiples of 10. And, if cost loss is too low, the learning rate is increased by multiples
of 10. Stochastic gradient descent optimizer was used for training along with Nestrov
momentum. With trial and error, some optimal result was found. Some of the basic
hyper-parameters for different experiments performed is described in the Table 6.2.
batch_size 32 16 32 32
epochs 50 100 94 57
Figure 6.2: Accuracy and loss curves for binary classification. (Experiment 1)
Figure 6.3: Accuracy and loss curves for binary classification. (Experiment 2)
Figure 6.4: Accuracy and loss curves for Multi-class classification. (Experiment 3)
(a) Accuracy curve (b) Loss curve
Figure 6.5: Accuracy and loss curves for Multi-class classification. (Experiment 4)
CHAPTER 7
RESULTS
7.1 Results
For prediction, a total of 800 testing images for binary classification and 53,312 testing
images for multi-class belonging to different categories were used. As the classifica-
tion is based on image classification. Hence, sensitivity, specificity, accuracy,
precision, f-score, and support need to be defined in terms of confusion matrix and for
the classi- fication report too.
i. Sensitivity
Actual positive classes that are correctly predicted is sensitivity. Sensitivity is also
termed as recall or true positive rate. Sensitivity must be as high as possible.
TruePositive
sensitivity =
TruePositive +
FalseNegative
ii. Specificity
Specificity or true negative rate can be defined as actual negative classes that are cor-
rectly predicted.
TrueNegative
specificity =
TrueNegative +
FalsePositive
iii. Accuracy
Precision is the number of correct prediction wrt all the classes. Precision must be high.
TruePositive
precision =
TruePositive +
FalsePositive
v. F-score
F-score (F1-score) is the calculation of harmonic mean of sensitivity (recall) and preci-
sion. When any model having high recall and low precision (and vice-versa), the model
isn’t comparable. Therefore, f-score is used to make the model comparable. The best
score is identified as 1, while the worst is 0.
vi. Support
Support is the count of actual occurrences of the data present in a particular class. This
indicates whether the count of data in classes are unbalanced or balanced.
TruePositive
precision =
TruePositive +
FalsePositive
0 1
0
0 1
0 1 2 3 4
Due to usage of drop-out, the validation accuracy curve sometimes tends to be more
than training accuracy curve. Batch-normalization used after each convolutional layer
used, normalizes the activation values after each epochs. Hence, steep rise of validation
accuracy curve can be seen.
The loss curve seems to be good enough as the cost entropy loss is less than 1.0.
The testing run time on the convolutional neural network took 14.8 seconds
approximately for binary classification, while multi-class classification took around
180 seconds. The final accuracy achieved for multi-class classification is 84%.
CHAPTER 8
CONCLUSION
The report shows that the diabetic retinopathy classification screening with respect to
binary and multi-class can be performed with much higher accuracy than the actual
result found. However, for certain cases, where the images were out of focus, the algo-
rithm fails to identify some of the DR highlights. Consequently, in future, the algorithm
will be improved to handle the impact of unfocused picture.
REFERENCES
[2] Markku Kuivalainen. Retinal Image Analysis Using Machine Vision. Thesis re-
port Lappeenranta Univ of Tech, Dept of IT 2005.
[3] Nayak J, Bhat PS, Acharya R, Lim CM, Kagathi M.. Automated identification
of diabetic retinopathy stages using digital fundus images. J Med Sys
2008;32(2):107- 115.
[4] Acharya UR, Chua CK, Ng EY, Yu W, Chee C.. Application of higher order
spectra for the identification of diabetes retinopathy stages. J Med Sys
2008;32(6):481-488.
[5] Acharya UR, Lim CM, Ng EY, Chee C, Tamura T.. Computer-based detec- tion
of diabetes retinopathy stages using digital fundus images. P I Mech Eng H
2009;223(5):545-553.
[7] Harry Pratt, Frans Coenen, Deborah M Broadbent, Simon P Harding, Yalin
Zheng.. Convolutional Neural Network for Diabetic Retinopathy. Procedia Com-
puter Science 90 2016:200-205.
[8] Linda Roach. Artificial Intelligence, The Next Step in Diagnostics. In: EyeNet
November 2017.
[9] Pascanu, Razvan et al. Understanding the exploding gradient problem. In: CoRR
2012.
[10] Ben Graham (2015). Kaggle Diabetic Retinopathy Detection com-
petition report. URL http://blog.kaggle.com/2015/09/09/
diabetic-retinopathy-winners-interview-1st-place-ben-graham/