ECPD
ECPD
ECPD
Submitted by:
1. Ephrata Abera…………………………………. ….. RCS/059/15
2. Eyuel Lemma……………………………….………. RCS/046/16
3. Natnael Abebe ………...……………………………. RCS/090/16
4. Nigus Redae …………………………….…………... RCS/092/16
5. Mekuanint Dires…………………..………………… RCS/095/15
Advised by:
(MSc.) Kedir S.
Date: 1/21/2021 GC
Dilla, Ethiopia
CERTIFICATE
The project titled
in
Computer Science
from
Dilla University
Advisor:___________________________________________________________________
Examiner:__________________________________________________________________
Examiner:__________________________________________________________________
Examiner:__________________________________________________________________
i
ACKNOWLEDGMENT
First and foremost, praises and thanks to the God, the Almighty, for His showers of
blessings throughout our project work to complete the part of the project successfully.
We would like to express our deep and sincere gratitude to our project advisor, Kedir
M.Sc., computer science department, school of computing and informatics, Dilla
University incubation center, for giving us the opportunity to do project and providing
invaluable guidance throughout this project. Our adviser dynamism, vision, sincerity and
motivation have deeply inspired us. He has taught us the methodology to carry out the
project and to present the project works as clearly as possible. It was a great privilege and
honor to work and study under his guidance. We are extremely grateful for what he has
offered for us. We would also like to thank him for his friendship, empathy, and great
sense of humor.
We are extremely grateful to my parents for their love, prayers, caring and sacrifices for
educating and preparing us for our future. We are very much thankful to our moms and
dads for their love, understanding, prayers and continuing support to complete this
research work. Also we express our thanks to our sisters, brothers, for their support and
valuable prayers.
We are extending our thanks to the computer science students Dilla University, for their
support during my project work. I also thank all the staff of project section of Dilla
University, Dilla for their kindness.
ii
TABLE OF CONTENT
CHAPTER 1 1
1. INTRODACTION 1
1.1 Background 1
1.2.1 Vision 2
1.2.2 Mission 2
1.2.3 Purpose 2
1.3 Statement of the Problem 2
1.6.1 Scope 18
1.6.2 Limitation 18
1.7 Application of the Project 19
CHAPTER 2 20
2.3 Summery 22
CHAPTER 3 24
iii
3. SYSTEM ANALAYSIS 24
3.1 Introduction 24
3.2.1 Introduction 24
3.2.2 Model of the Existing System 24
3.2.3 Business Rules 25
3.2.4 Limitation of the Existing System 25
3.3 Proposed system 26
4. SYSTEM DESIGN 50
iv
4.4 User Interface Design 55
CHAPTER 5 63
5. EXPERMENT 63
5.1 Introduction 63
CHAPTER 6 74
6.1 Conclusion 74
6.2 Recommendation 74
REFERENCE i
APPENDIX iii
v
LIST OF FIGURE
vi
FIGURE 30 COMPONENT DIAGRAM FOR SUB DECOMPOSITION OF THE SYSTEM ........................ 54
FIGURE 31 HARDWARE/ SOFTWARE MAPPING ......................................................................... 55
FIGURE 32 USER INTERFACE DIAGRAM ................................................................................... 56
FIGURE 33 HOME PAGE (A) INTERFACE OF THE SYSTEM ......................................................... 57
FIGURE 34 HOME PAGE (B) OF THE USER INTERFACE .............................................................. 58
FIGURE 35 HOME PAGE (C) USER INTERFACE .......................................................................... 59
FIGURE 36 BREAST CANCER PROGNOSIS INTERFACE ............................................................... 60
FIGURE 37 BREST CANCER DIAGNOSIS INTERFACE .................................................................. 61
FIGURE 38 SKIN CANCER DIAGNOSIS USER INTERFACE ........................................................... 62
FIGURE 39 SAMPLE DATASET OF BREAST CANCER PROGNOSIS MODEL .................................... 66
FIGURE 40 BREAST CANCER FREE HISTOPATHOLOGICAL IMAG……………......……………..67
FIGURE 42 SAMPLE DATASET FOR CERVICAL CANCER PROGNOSIS .......................................... 68
FIGURE 43 VISUALIZATION OF THE TRAINING DATASET .......................................................... 71
FIGURE 44 MODEL COMPILING ............................................................................................... 72
FIGURE 45 PROTOTYPE MODEL PREDICTING THE OUTCOME .................................................... 72
FIGURE 46 ACCURACY OF BREAST CANCER PROGNOSIS MODEL .............................................. 73
FIGURE 47 ACCURACY OF CERVICAL CANCER PROGNOSIS MODEL .......................................... 73
vii
LIST OF TABLE
TABLE 1 PROJECT DESIGN METHODOLOGY ............................................................................... 5
TABLE 2 A NOTE FOR MISSING DATA ANALYSIS ........................................................................ 7
TABLE.3 UNIVARIATE STATISTICS ............................................................................................ 8
TABLE 1.4 CERVICAL CANCER DATASET DESCRIPTION ............................................................. 9
TABLE.5 CERVICAL CANCER MISSING DATA NOTES ................................................................ 10
TABLE 6 CERVICAL CANCER DATASET UNIVARIATE STATISTICS ............................................. 11
TABLE 7 ESSENTIAL USE CASE DIAGRAM OF THE EXISTING SYSTEM........................................ 25
TABLE 8 LIST OF OBJECTS AND THEIR ATTRIBUTES ................................................................. 46
TABLE 9 DATASET DESCRIPTION ............................................................................................. 65
viii
LIST OF ACRONYM
ix
ABSTRACT
In traditional cancer diagnosis, pathologists examine biopsies to make diagnostic assessments
largely based on cell morphology and tissue distribution. However, this is subjective and often
leads to considerable variability. On the other hand, computational diagnostic tools enable
objective judgments by making use of quantitative measures. This project presents a systematic
method of the computational steps in automated cancer diagnosis based on histopathology.
These computational steps are: 1.) image preprocessing to determine the focal areas, 2.) feature
extraction to quantify the properties of these focal areas, and 3.) classifying the focal areas as
malignant or not or identifying their malignancy levels. In Step 1, the focal area determination
is usually preceded by noise reduction to improve its success. In the case of cellular-level
diagnosis, this step also comprises nucleus/cell segmentation. Step 2 defines appropriate
representations of the focal areas that provide distinctive objective measures. In Step 3,
automated diagnostic systems that operate on quantitative measures are designed. After the
design, this step also estimates the accuracy of the system. In this project, we detail these
computational steps, address their challenges emphasizing the importance of constituting
different data sets. Such benchmark data sets allow comparing the different features and system
designs and prevent misleading accuracy estimation of the systems. Therefore, this allows
determining the subsets of distinguishing features, devise new features, and improve the
success of automated cancer diagnosis.
x
CHAPTER 1
1. INTRODACTION
1.1 Background
Cancer is a group of more than 100 different diseases. It can develop almost anywhere in the
body [1]. Cells are the basic units that make up the human body. Cells grow and divide to make
new cells as the body needs them. Usually, cells die when they get too old or damaged. Then,
new cells take their place [2]. Cancer begins when genetic changes interfere with this orderly
process. Cells start to grow uncontrollably. These cells may form a mass called a tumor. A
tumor can be cancerous or benign. A cancerous tumor is malignant, meaning it can grow and
spread to other parts of the body. A benign tumor means the tumor can grow but will not
spread.
Breast Cancer is one of the most prevalent and common forms of Cancers [3]. This condition
largely effects women only, while in a very few rare cases it also affects certain men. Breast
Cancer develops in the cells of the breast, wherein certain cells in the breast begin to grow
rapidly and abnormally. This results in the accumulation of lumps or a mass of tissue.
Skin cancer is the out-of-control growth of abnormal cells in the epidermis, the outermost skin
layer, caused by unrepaired DNA damage that triggers mutations. These mutations lead the
skin cells to multiply rapidly and form malignant tumors. The main types of skin cancer
are basal cell carcinoma (BCC), squamous cell carcinoma (SCC), melanoma and Merkel cell
carcinoma (MCC). Yes! these types of cancers were seen in Ethiopia. And The cervix is the
lower part of the uterus, the place where a baby grows during pregnancy.
Cervical cancer is caused by a virus called HPV. The virus spreads through sexual contact.
Most women's bodies are able to fight HPV infection. But sometimes the virus leads to cancer.
Cervical cancer is the fourth most frequent cancer in women with an estimated 570,000 new
cases in 2018 representing 6.6% of all female cancers. Approximately 90% of deaths from
cervical cancer occurred in low- and middle-income countries [4].
1
produce incomplete or misleading results. This is project uses different datamining, machine
learning and image processing techniques in order to predict breast cancer and cervical cancer
from given parameters and detection of breast and skin/melanoma cancer from
histopathological and lesion images.
1.2.1 Vision
The vision of this CAD system is seeing cancer prediction and diagnosis to become easier,
cheaper and distributed over the country with a good performance and acceptability.
1.2.2 Mission
To give the world a fast, efficient, accurate, secure and effective computer aided system
for prediction and diagnosis of cancer.
To fill the gap of pathologist insufficiency.
To decrease the time elapsed by laboratory test and increase the efficiency of the
predictions of results.
1.2.3 Purpose
Many of patients are benighted by accurate and fast CAD based cancer detection system. It
an efficient system where an examiner uses as a support system or as a main system for
diagnosing and predicting. It allows pathologists in order to makes sure their prediction is how
much accurate if they use the system as a support mechanism for the cancer detection and
diagnosis.
2
are essential to curbing this growing epidemic. Below are listed some problems as they are
main concern: -
3
1.5 Methodology
1.5.1 Literature Review
We made a review survey entitled “Histopathological Image Classification for Breast Cancer
Detection” which helps as review different papers. Based on that we tried to put some
narratives so that to understand what the actual research area is held on.
1-on-1 Interviews
For a highly personalized data we use 1-on-1 interviews. In order to gather information’s
from cancer penitents about the current situations.
Question: Which type of cancer is you attacked?
Answer: Cervical cancer
Question: How old are you when you hear you are a cancer patient?
Question: 34
Direct observation
To know about the existing system, we directly see how in Black Lion Hospital the system is
arranged.
Surveys
We prepared a survey on each functional requirements. We read different research’s and
write a survey with the title of “a survey report on histopathological image classification for
breast cancer detection” (this survey is included in the appendix a).
Focus Group
We discussed with different doctors who have a deep knowledge in cancer. Example
discussing topic: “How do we classify a cell as bening or malignant”.
4
phases. These phases are object oriented analysis and object oriented design. It increases
consistency among analyzer, designer implementation and testing. It also allows the
reusability of the code. [5]
For designing purpose, we selected the ajail method so we can keep updated our data and the
whole information’s
This dataset is taken from UCI Machine Learning Repository (Wisconsin University) and
become available with support of National Science Foundation. Each record represents
follow-up data for one breast cancer case. These are consecutive patients seen by Dr. Wolberg
since 1984, and include only those cases exhibiting invasive breast cancer [6]. The first 30
5
features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass.
They describe characteristics of the cell nuclei present in the image.
1) ID number
2) Outcome (R = recur, N = non-recur)
3) Time (recurrence time if field 2 = R, disease-free time if field 2 = N)
4-33) Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)
35) Lymph node status - number of positive axillary lymph nodes observed at time of
surgery
The missing values from the dataset above is showed below with their necessary details:
6
Table 2 A note for missing data analysis
Comments
Filter <none>
Weight <none>
/MPATTERN
7
Table.3 Univariate Statistics
Univariate Statistics
699
The dataset class distributions from 198 entries’ 151 are non-recur and 47 are recur.
Preprocessing
The first step is fill the missing data. The missing lymph nodes are full field using data
mining statistic method for handling missing data, specifically by using the median
imputation and pandas fillina () function.
8
The ID column is not useful in this time so another preprocessing is dropping the ID
columns. The id column is dedicated for identifying the patients uniquely, for the doctor, so
histories can put correctly through their records. For learning purpose it is not required.
Map 0 for benign and 1 for malignant. A column divides the final patients result as they
developed a breast cancer or not, which is mapped like 0 and 1.
Model
Our model build on artificial neural network with Logistic regression. The activation
functions are RELU and Sigmoid function with estimated accuracy of 0.95. This structure
may be changed if it leads to unexpected or below expected result.
[7]
Cervical cancer (Risk Factors) Data Set: This dataset focuses on the prediction of
indicators/diagnosis of cervical cancer. The features cover demographic information, habits,
and historic medical records. Supported by National Science Foundation. Below is the detail
description of the dataset.
Number of In
Data Set
Multivariate 858 Area: Life
Characteristics:
stances:
Attribute Number of
Integer, Real 36 Date Donated 2017-03-03
Characteristics: Attributes:
Number of
Associated Tasks: Classification Missing Values? Yes 102864
Web Hits:
Below tables show the missing values from the dataset with some descriptions.
9
Table.5 Cervical cancer missing data notes
Comments
Filter <none>
Weight <none>
/LISTWISE
/PAIRWISE
/EM(TOLERANCE=0.001 CONVERGENCE=0.0001
ITERATIONS=25).
10 | P a g e
Table 6 Cervical cancer dataset univariate statistics
Univariate Statistics
0.09
11 | P a g e
Preprocessing
As there are missing values and also a column to drop a datamining aspect (Statically) used for
preprocessing, especially for missing values like scaling and dropping tuples.
Model
The model is going to be ANN with SVM classifier. This structure may be changed if it leads to
below expected performance.
The 4 ((bool) Hinselmann: target variable, (bool) Schiller: target variable, (bool) Cytology: target
variable, (bool) Biopsy: target variable) are used as a class for our SVM, SV Classifier. We have
4 models, as diagnosis is taken from 4 different methods:
The first one is Hinselmann model, predicts that the person will develop a cervical cancer or not
based on Hinselmann perspectives. Schiller model, it predicts based on Schiller perspectives.
Cytology, predicts based cytological analysis. Biopsy, also predicts based on biopsy analysis.
With estimated average accuracy as 90%.
The original dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens
scanned at 40x. From that, 277,524 patches of size 50 x 50 were extracted (198,738 IDC negative
and 78,786 IDC positive). Each patch’s file name is of the format: u_xX_yY_classC.png — >
example 10253_idx5_x1351_y1101_class0.png . Where u is the patient ID (10253_idx5), X is the
x-coordinate of where this patch was cropped from, Y is the y-coordinate of where this patch was
cropped from, and C indicates the class where 0 is non-IDC and 1 is IDC.
12 | P a g e
We will split the dataset for Training, validation and testing split as described on the below figure
5.
Figure 2-
Data
splitting
visualization
Preprocessing
Most of the pixels in the image are redundant and do not contribute substantially to the intrinsic
[8]
information of an image . While dealing with AI networks, it is required to eliminate them to
avoid unnecessary computational overhead. This can be achieved by compression techniques. We
begin the implementation of our deep net by processing the images in the dataset. This is achieved
with the help of the OpenCV library in Python. There are many other modules that can be used in
this step e.g. MATLAB or other image processing libraries or software. This is necessary to
remove redundancy from the input data which only contributes to the computational complexity
of the network without providing any significant improvements in the result. The aspect ratio of
the original slide is preserved since both the dimensions are reduced by a factor of 2, giving an
image which is 1/4th in area, that is of dimension 350×230 pixels. Then as the dataset is unbalanced
we will apply data augmentation techniques in order to balance it. Finally, the images are resized
(50x50) and reshaped and ready to the input for the CNN.
Feature extraction
Feature learning is a crucial step in the classification process for both human and machine
algorithm. A study has shown that the human brain is sensitive to shapes, while computers are
13 | P a g e
more sensitive to patterns and texture, [9]. Because of this fact, feature learning is entirely different
for manual versus machine. In the visual context, malignant tumors tend to have large and irregular
nuclei or multiple nuclear structures. The cytoplasm also undergoes changes, wherein new
structures appear, or normal structures disappear. Malignant cells have a small cytoplasmic
amount, frequently with vacuoles. In this scenario, the ratio of cytoplasm to nucleus decreases [10].
All of these features are examined by experts, or algorithms are developed to quantify these
features to automate detection. This approach is difficult and imprecise as selection and
quantification involve various unknown errors that are hard to address. In the case of supervise
learning, we do not need to provide these features explicitly. In this case images are fed to an
architecture such as CNN, along with its class as a label (Benign or Malignant). From the automatic
update of filter values in the training process, CNN is able to extract the computational features.
In our proposed architecture, the convolutional neural network is made up of two types of layers:
1. Convolutional Layers
2. Pooling layers
Model
CNN is a modified variety of deep neural net which depends upon the correlation of neighboring
pixels. It uses randomly defined patches for input at the start, and modifies them in the training
14 | P a g e
process. Once training is done, the network uses these modified patches to predict and validate the
[11]
result in the testing and validation process. Convolutional neural networks have achieved
success in the image classification problem, as the defined nature of CNN matches the data point
distribution in the image. As a result, many image processing tasks adapt CNN for automatic
[[12], [13], [14]]
feature extraction. CNN is frequently used for image segmentation and medical
image processing as well [15].
The CNN architecture has two main types of transformation. The first is convolution, in which
pixels are convolved with a filter or kernel. This step provides the dot product between image
patch and kernel. The width and height of filters can be set according to the network, and the depth
of the filter is the same as the depth of the input. A second important transformation is
subsampling, which can be of many types (max_pooling, min_pooling and average_pooling) and
used as per requirement. The size of the pooling filter can be set by the user and is generally taken
in odd numbers. The pooling layer is responsible to lower the dimensionality of the data, and is
quite useful to reduce overfitting. After using a combination of convolution and pooling layers,
the output can be fed to a fully connected layer for efficient classification. The visualization of the
entire process is presented in fig 4.
For that we choose the CNN with softmax algorithm for the classification.
Classification
15 | P a g e
The process of classification is done by taking the flattened weighted feature map obtained from
the final pooling layer, and is used as input to the fully connected network, which calculates the
loss and modifies the weights of the internal hidden nodes accordingly. The estimated performance
is 85% in terms of accuracy.
Is taken from ISIC Archive. The overarching goal of the ISIC Melanoma Project is to support
efforts to reduce melanoma-related deaths and unnecessary biopsies by improving the accuracy
and efficiency of melanoma early detection. To this end the ISIC is developing proposed digital
imaging standards and creating a public archive of clinical and dermoscopic images of skin lesions.
The ISIC Archive contains over 23906 images of skin lesions, labeled as 'benign' or 'malignant'.
Model
CNN methods used to classify skin lesions are presented. CNNs can be used to classify skin lesions
in two fundamentally different ways. On the one hand, a CNN pretrained on another large dataset,
such as ImageNet [16], can be applied as a feature extractor. In this case, classification is
performed by another classifier, such as k-nearest neighbors, support vector machines, or artificial
neural networks. On the other hand, a CNN can directly learn the relationship between the raw
pixel data and the class labels through end-to-end learning. In contrast with the classical workflow
typically applied in machine learning, feature extraction becomes an integral part of classification
and is no longer considered as a separate, independent processing step.
Because publicly available datasets are limited, a common method of skin lesion classification
involves transfer learning. Therefore, all such works pertain a CNN via the ImageNet dataset; next,
the weighting parameters of the CNN are fine-tuned to the actual classification problem. So we
also use a transfer learning for the classification of lesion images. Estimated F1 score is 0.82.
16 | P a g e
Unit Testing: We assert a unit testing on the system which is The act of testing software
at the most basic (object) level.
For unit testing we will check the system functionalities by making a line by line code
running.
Functional Testing: As functional testing is making sure all the functions or use cases are
working properly, we will run a functional testing on all functionalities of the system one
by one.
Performance Testing: In this kind of test we will test the system performance issues.
Security testing: A collection of tests focused on probing an application's security, or its
ability to protect user assets.
System Testing: After we are finishing the above kinds of testing we will proceed to
testing the project as a one system as System Testing combines multiple features into an
end-to-end scenario.
Acceptance testing: Also known as acceptance tests, build verification tests, basic
verification tests, these are rudimentary tests which prove whether or not a given build is
worth deeper testing. In this we will make sure the system is as customers need.
17 | P a g e
Flash Disk: to the minimum 4GB. Used for transferring a data from one computer to
the another.
Paper (A3, A4): used to print the documentation and draw diagrams.
Pencil: To draw diagrams.
Printer: Used to print the documentation.
Software tools
Operating system: Linux (Ubuntu-18.0 LTS).
PSPP: for data analysis.
Browser: chedot, chrome.
Microsoft Azure: For hosting.
Visual studio code: for coding.
Github: For version controlling and team work.
Documentation tool:
LibreOfficeWord: a free word processing software.
LibreOfficePresentation: a free version for power point making.
Predicts if a patient (currently having a benign tumor) will develop breast cancer based on
features computed.
Predicts likelihood of developing cervical cancer based on lifestyle habits.
Accepts a breast histopathological image and predicts/shows the cancer on that image.
Accepts a skin lesion image and predicts it as melanoma positive or negative.
An interactive interface for the user.
1.6.2 Limitation
For Cervical cancer only prognosis of it implemented but not the diagnosis. Currently this
type of cancer is very difficult to diagnosis or for screening and we can’t get a tangible
information’s about it.
18 | P a g e
1.7 Application of the Project
Breast cancer prediction/prognosis based on FNA of breast mass.
Cervical cancer prediction/prognosis through life style habits.
Brest/IDC cancer classification/diagnosis on histopathology images of the breast
Skin/melanoma cancer diagnosis based on given lesion images. Benefit of the Project
Ethiopia has a very limited machines to detect and diagnosis a cancer so this project will help with
the following basic aspects:
Cost minimization
Time minimization
As the project will be simple to use (a user friendly) anyone who have a little knowledge
of health (specially in cancer) can access and see the results without going to “black lion
hospital” or other health centers Which decreases the number of customers comes to
every hospital.
Pathologists easily and efficiently identify cancers with the help of this project
The project can be used as a pathologist.
19 | P a g e
CHAPTER 2
2. LITERATURE REVIEW AND RELATED WORK
2.1 Literature Review
For many real-world problems, it is necessary to build extremely accurate and understandable
classification models. Especially in the medical domain, there is growing demand for Artificial
Intelligence (AI) approaches, which are not only well performing, but trustworthy, transparent,
interpretable and explainable. This would allow medical professionals to have possibilities to
understand how and why a machine learning algorithm arrives at its decision, which will enhance
trust of medical professionals in AI systems. In recent years, some machine learning models have
significantly improved the ability to predict the future condition of a patient. Although these
models are very accurate, the inability to explain the predictions from accurate, complex models
is a serious limitation. For this reason, machine learning methods employed in clinical applications
avoid using complex, yet more accurate, models and retreat to simpler interpretable models at the
expense of accuracy.
Caffe (Convolutional Architecture for Fast Feature Embedding) convolutional neural network is
a deep learning framework, originally developed at University of California, Berkeley. It is open
source, under a BSD license. It is written in C++, with a Python interface.
When practicing machine learning, training a model can take a long time. Creating a model
architecture from scratch, training the model, and then tweaking the model is a massive amount of
time and effort. A far more efficient way to train a machine learning model is to use an architecture
that has already been defined, potentially with weights that have already been calculated. This is
the main idea behind transfer learning, taking a model that has already been used and repurposing
it for a new task[11].
Texture convolutional neural network (TCNN) replaces handcrafted features based on Local Phase
Quantization (LPQ) and Haralick descriptors (HD) with the advantage of learning an appropriate
textural representation and the decision boundaries into a single optimization process .
20 | P a g e
cancer images. Spanhol et al. published a data set, named as BreaKHis, for histopathological
classification of breast cancer and suggested a test protocol by which the experiment obtained 80%
to 85% accuracy using SVM, LBP (Local Binary Pattern), and GLCM (Gray Level Co-occurrence
Matrix) [8]. Convolutional Neural Network(CNN) is known to achieve high performance in image
recognition and natural language processing through pattern analysis. CNN is a specific type of
neural network, which is a feed-forward neural network with convolutional layer, pooling layers
and fully connected layers as its hidden layer. Due to its outstanding performance, CNN is used
widely in many fields, especially in computer vision. And below are specific reviewed researches.
A method for classifying medical images using transfer learning: A pilot study on
histopathology of breast cancer.
Recent research using transfer learning have obtained prominent results in image analysis. Transfer
learning is a method that trains a pre-trained model, which is already learned in a specific domain,
to another knowledge domain. Transfer learning method is known to be very useful when the data
is not enough or training time and computing resources are restricted. The above research provides
classifying medical images using transfer learning. In this paper, they built deep convolutional
neural network (CNN, ConvNet) model to classify breast cancer histopathological images to
21 | P a g e
malignant and benign class. In addition to data augmentation, they applied transfer learning
technique to overcome the insufficient data and training time.
Break HIs do not have the same shapes found in large-scale image datasets that are commonly
used to train CNNs, such as ImageNet or CIFAR. Therefore, instead of using pre-trained CNNs,
Texture CNN propose an architecture that is more suitable to capture the texture-like features
present in HIs. For such an aim, this research claims use of an alternative architecture based on the
texture CNN proposed by Andrearczyk and Whelan. It consists of only two convolutional layers
(Conv2D), an average pooling layer (AvgPool2D) over the entire feature map also called global
average pooling, and fully connected layers (Dense).
2.3 Summery
Three different models which are stated under 5 different researches. From those, as we tabularized
in table 6, TCNN Inception recorded a good performance compared to others in sensitivity and
single-CNN in terms of Specificity. If we have a small dataset it is better to use transfer learning
because it is the pro side of it or if the matter/question is about magnification just single-task CNN
is the answer.
22 | P a g e
Figure 6 Comparison of algorithms performance
1×
6× 0.849 ± 0.038 0.932 ± 0.032 0.669 ± 0.137
12× 0.837 ± 0.017 0.891 ± 0.044 0.704 ± 0.065
24× 0.826 ± 0.043 0.874 ± 0.061 0.727 ± 0.161
48× 0.858 ± 0.039 0.920 ± 0.050 0.714 ± 0.095
72× 0.857 ± 0.051 0.919 ± 0.066 0.736 ± 0.109
1× 0.851 ± 0.032 0.907 ± 0.074 0.735 ± 0.178
6× 0.864 ± 0.045 0.918 ± 0.062 0.77 ± 0.098
Single CNN
|CNN
|Task
23 | P a g e
CHAPTER 3
3. SYSTEM ANALYSIS
3.1 Introduction
It is a process of collecting and interpreting facts, identifying the problems, and decomposition of
a system into its components.
System analysis is conducted for the purpose of studying a system or its parts in order to identify
its objectives. It is a problem solving technique that improves the system and ensures that all the
components of the system work efficiently to accomplish their purpose.
Pathologist: is an educated person that can take a sample from the patient and predicts the
existence of a cancer on the sample.
Patient: a person who comes to the health center
24 | P a g e
Table 7 Essential use case diagram of the existing system
Next, we will see some criteria and standards to determine a cell as benign or malignant.
The patient must give a blood sample/ X-ray /Ultrasound/CT scan/ MRI for diagnosing
In order to diagnosis there have to be a tumor on the person’s body.
The patient must wait until his call is come.
The doctor sends him to the laboratory.
The existing system is not efficient and accurate as the prediction and diagnosis is
determine by a human/pathologist manually. It may do a big mistake.
In number there are a few pathologists in Ethiopia and they are not sufficient to distribute
and work everywhere on the country.
25 | P a g e
Also it is expensive.
This helps to humanize the physician's ability to spot abnormalities. The figure 1.0 focuses the
proposed system. The manual process has few drawbacks. It is not automated and also there are
chances of not noticing the suspicious region, especially when it is too tiny to be noticed. In
proposed Computer Aided Detection system these drawbacks can be conquered and it is fast as
human intervention is minimized. In manual system, the pathologist does diagnosis with his
acquaintance, which enriches with the experience. If any suspicious patch or mark is observed,
then the pathologist needs the image with more details. Analysis of these diverse types of images
requires sophisticated computerized quantification and visualization tools. Resolution
augmentation is important for visualization and early diagnosis. Super resolution based region of
interest (ROI) can play major role in accurate diagnosis.
In order to make efficient prediction and diagnosis our team proposed prediction and diagnosis of
some basic, that are occurred frequently and dangerous, cancers like breast/IDC, skin/melanoma
and cervical cancer.
26 | P a g e
Figure 8 The proposed system general architecture for breast and skin cancer
This computer aided cancer prognosis and diagnosis system is documented expectations and
specifications designed to ensure that the product, service, process or environment is easy to use.
performance requirement
27 | P a g e
As the proposed system is a prediction and diagnosis performance of it is the main concern and
this project ensures to have a good capability in order to do what expected from the CAD system.
Security requirement
The system interface developed using flask means it runs on a browser after hosted on a server. So
this needs a high security in order to not to change the results of the patients.
Availability requirement
The system is available whenever the user needs to diagnosing or predicting a cancer from some
patients. This can be successfully by making the CAD system accessible both by online and at
local.
Reliability requirement
The proposed system is usually operating without failure for a non-limited number of uses
(transactions) or for a multi specified period of time. But the transaction may determine by the
server owners.
Hardware consideration
Memory capacity (RAM): 8 GB & 32GB
Process type: Intel core i5 processor.
Processor speed: minimum of 3.0 GHZ for good performance.
Hard disk space: minimum of 1 TB for large amount of data storage.
Keyboard: Normal.
Software consideration
Language: Python
Software: Visual studio code, PSPP, LibreOfficeWord, LibreOfficePowerPoint,
draw.io.
Operating system: Linux (Ubuntu 18 LTS)
Library: OpenCV, Pandas, Tensorflow, Sklearn, Keras
28 | P a g e
3.3.4 User Interface Specification and Description
As we know that, currently the development stages of the new proposed system are to early, my
proposed system doesn’t have a fully designed user interface. However, by assuming that about
the contents that could be on the system, I prepared the following user interface specification and
description. Put in your mind that this user interface prepared document is not the exact one and it
may or mayn’t be face a little bit modification in the final implementation of the system.
After the new proposed system develops, it will have the following menu items on it.
Home: it’s the first and the main part of the system. It shows all the menu items that found
on the system. I assumed the first view of the system as a home menu. Mainly it will have
two sections a prognosis and a diagnosis containing the type of cancer in which the use can
perform.
Breast cancer prediction: At the home there will be a button Breast cancer prediction
which leads to the prognosis of breast cancer of the user currently is in the prognosis section
or a diagnosis if the user is on the opposite section. This part will have its own GUI for
accepting inputs and displaying result for the user with containing an analyze button for
starting analyzing the input record.
Cervical cancer prediction: this one is a button existing on the home interface which
leads to the prediction of a cervical cancer. There will have an interface for accepting
different inputs and displaying a result.
Skin cancer diagnosis: This is the another button which is responsible for shifting the
user interface from the home page to the skin cancer diagnosing page which is contains an
apace for inserting image, displaying result and a submit button.
Breast cancer diagnosis: This one also another button which leads to the interface for
diagnosing a breast cancer which contains a space for inserting an image, display a result
and a submit button.
Help: This is a button contains a detailed technical explanation on how to use this system.
About: It is also a button for transferring from the home page to the about page which
consists the overview of the application.
Contact: This is the last button which holds the contact descriptions of the system
developers.
29 | P a g e
Below is the specification diagram for the proposed system.
30 | P a g e
3.4.1 Functional Model
3.4.1.1 Use-case Description
Use-case model consists of the collection of all actors and all use-case. A use-case describe a
function provided by the system that yields a visible result for an actor. An actor is a user playing
a role with respect to the system.
Identifier: BCP.
Actor: User.
Post condition: The user can know the tumor will be malignant or not.
Flow of events:
31 | P a g e
3) Then it chooses breast cancer.
4) Then the user fills the inputs provided by the system from their medical results.
5) The user commands the system to analyze the given data.
6) The system gives a response that indicates they will have a bening or malignant
tumor.
7) Use-case ends.
Alternative flow:
2. Use case 2
Identifier: CCP.
Actor: User.
Post condition: The user can know the tumor will be malignant or not.
Flow of events:
32 | P a g e
7) Use-case ends.
Alternative flow:
3. Use case 3
Identifier: BCP
Actor: User
Precondition: The user must have histopathology image of its breast cells.
Flow of events:
Alternative flow:
33 | P a g e
4. Use case 4
Identifier: SCP
Actor: User
Post condition: The user can know he is attacked by melanoma cancer or not.
Flow of events:
Alternative flow:
34 | P a g e
Figure 10 Early cancer prediction and diagnosis system general use case diagram
35 | P a g e
Figure 11 Use case diagram for breast cancer prediction
36 | P a g e
Figure 13 Use case diagram for breast cancer diagnosis
37 | P a g e
Figure 15 Sequence diagram for breast cancer prediction
38 | P a g e
Figure 16 Sequence diagram for cervical cancer prediction
39 | P a g e
Figure 18 Sequence diagram for skin cancer diagnosis
b) Activity Diagram
An activity diagram illustrates the dynamic nature of a system by modeling the flow of control
from activity to activity. An activity represents an operation on some class in the system that results
in a change in the state of the system. So, we are identifying the activity in terms of the functionality
of the system.
40 | P a g e
Figure 19 Activity diagram for breast cancer prediction
41 | P a g e
Figure 20 Activity diagram for cervical cancer predict
42 | P a g e
Figure 21 Activity diagram for skin cancer diagnosis
43 | P a g e
c) State Chart Diagram
A state machine is any device that stores the status of an object at a given time and can change
status or because other actions based on the input it receives. States refer to the different
combinations of information that an object can hold, not how the object behaves. We used state
machine diagram for describing the life cycle of objects with in the proposed system.
44 | P a g e
Figure 24 State chart diagram for prediction of cancer from given text input
45 | P a g e
Figure 26 State chart diagram for the detection of lesion image
46 | P a g e
Data type String
47 | P a g e
Figure 27 Class diagram for the proposed system
48 | P a g e
3.4.5 User Interface Flow Diagram
49 | P a g e
CHAPTER 4
4. SYSTEM DESIGN
4.1 An Over View of the System Design
Where the SRS document is converted into a format that can be implemented and decides how the
system will operate. The complex activity of system development is divided into several smaller
sub-activities, which coordinate with each other to achieve the main objective of system
development.
As input we use statement of work, requirement determination plan, current situation analysis,
proposed system requirements including a conceptual data model, modified DFDs, and metadata.
In this phase, we identified the system design goal categorized as performance, dependability,
maintenance, end-user criteria.
Our system response should be almost instantaneous for most use cases.
Input/output Performance
Processor allocation
The Control and trained models will have dedicated servers for all of their operations.
The Visualization and User Interface subsystems' processes will be run on the user's
workstation. Since the Visualization subsystem is computation intensive, this will
impose minimum hardware requirements on the user machines. The Simulation and
Facility Management subsystems will operate on whatever machines are available to
them. There is no specific hardware requirement for these subsystems.
50 | P a g e
The load time for user interface screens should not take more than three seconds.
The model prediction process should not take more than five seconds
51 | P a g e
4.3 System Design model
Systems design is the process of defining elements of a system like modules, architecture,
components and their interfaces and data for a system based on the specified requirements. It is
the process of defining, developing and designing systems which satisfies the specific needs and
requirements of a business or organization.
In a three-tier or a multi-tier architecture has client, server and model. Where the client request
is sent to the server and the server in turn sends the request to the model. The model sends
back the information/prediction required to the server which in turn sends it to the client. So our
system is three tier architecture. Figure 2[ shows the general structure of the system.
52 | P a g e
Figure 28 General structure of the system
53 | P a g e
4.3.2 Subsystem Decomposition
Subsystem decomposition is the process of dividing the system in to manageable subsystems from
the analysis model of the proposed system. The goal of the system decomposition is to reduce the
complexity of design model and to distribute the class of the system in to large scale and cohesive
components. The major subsystem identified “user” subsystem.
User subsystem
54 | P a g e
4.3.3 Hardware/Software mapping
Early cancer detection and diagnosis will run over any operating system. The web server will run
over cloud server and the programming language used for developing this system is: Python. The
following deployment diagram illustrates the hardware/software mapping for system.
55 | P a g e
Figure 32 User interface diagram
56 | P a g e
Figure 33 Home page (A) interface of the system
57 | P a g e
Figure 34 Home page (B) of the user interface
58 | P a g e
Figure 35 Home page (c) user interface
59 | P a g e
Figure 36 Breast cancer prognosis interface
60 | P a g e
Figure 37 Brest cancer diagnosis interface
61 | P a g e
Figure 38 Skin cancer diagnosis user interface
62 | P a g e
CHAPTER 5
5. EXPERIMENT
5.1 Introduction
A dataset is a structured collection of data generally associated with a unique body of work. For
developing the detection of cancer we used different multiple datasets from a dataset collection
web platform called kaggle and others. In the next section all datasets will be explained as dataset
for breast cancer prognosis and diagnosis., cervical cancer prognosis and skin cancer detection.
Various versions of this data have been used in the following publications:
O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and prognosis
via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995.
63 | P a g e
W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. Computerized breast
cancer diagnosis and prognosis from fine needle aspirates. Archives of Surgery
1995;130:511-516.
Results:
Relevant information
Each record represents follow-up data for one breast cancer case. These are consecutive
patients seen by Dr. Wolberg since 1984, and include only those cases exhibiting invasive
breast cancer and no evidence of distant metastases at the time of diagnosis.
The first 30 features are computed from a digitized image of afine needle aspirate (FNA)
of a breast mass. They describe characteristics of the cell nuclei present in the image.
c) perimeter
d) area
64 | P a g e
e) smoothness (local variation in radius lengths)
i) symmetry
Values for features 4-33 are recoded with four significant digits.
34) Tumor size - diameter of the excised tumor in centimeters
35) Lymph node status - number of positive axillary lymph nodes
observed at time of surgery
Missing attribute values: Lymph node status is missing in 4 cases.
Class distribution: 151 nonrecur, 47 recur
65 | P a g e
5.2.1.1 Sample Dataset for Breast Cancer Prognosis
The original dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens
scanned at 40x. From that, 277,524 patches of size 50 x 50 were extracted (198,738 IDC negative
and 78,786 IDC positive). Each patch’s file name is of the format: uxXyYclassC.png — > example
10253idx5x1351y1101class0.png . Where u is the patient ID (10253idx5), X is the x-coordinate of
where this patch was cropped from, Y is the y-coordinate of where this patch was cropped from,
and C indicates the class where 0 is non-IDC and 1 is IDC.
66 | P a g e
5.2.2.1 Sample Dataset for Breast Cancer Diagnosis
Figure 40 Breast cancer free histopathological image Figure 41Breast cancer detected histopathological image
The dataset was collected at 'Hospital Universitario de Caracas' in Caracas, Venezuela. The dataset
comprises demographic information, habits, and historic medical records of 858 patients. Several
patients decided not to answer some of the questions because of privacy concerns (missing values).
Source:
Kelwin Fernandes (kafc _at_ inesctec _dot_ pt) - INESC TEC & FEUP, Porto, Portugal.
Jaime S. Cardoso - INESC TEC & FEUP, Porto, Portugal.
Jessica Fernandes - Universidad Central de Venezuela, Caracas, Venezuela.
Attribute Information:
(int) Age, (int) Number of sexual partners, (int) First sexual intercourse (age), (int) Num of
pregnancies, (bool) Smokes, (bool) Smokes (years), (bool) Smokes (packs/year), (bool) Hormonal
Contraceptives, (int) Hormonal Contraceptives (years), (bool) IUD, (int) IUD (years), (bool)
STDs, (int) STDs (number), (bool) STDs:condylomatosis, (bool) STDs:cervical condylomatosis,
(bool) STDs:vaginal condylomatosis, (bool) STDs:vulvo-perineal condylomatosis, (bool)
STDs:syphilis, (bool) STDs:pelvic inflammatory disease, (bool) STDs:genital herpes, (bool),
STDs:molluscum contagiosum, (bool) STDs:AIDS, (bool) STDs:HIV, (bool), STDs:Hepatitis B,
(bool) STDs:HPV, (int) STDs: Number of diagnosis, (int) STDs:, Time since first diagnosis, (int)
67 | P a g e
STDs: Time since last diagnosis, (bool) Dx:Cancer, (bool) Dx:CIN, (bool) Dx:HPV, (bool) Dx,
(bool) Hinselmann: target variable, (bool) Schiller: target variable, (bool) Cytology: target
variable, (bool) Biopsy: target variable
68 | P a g e
5.3 Implementation
This section basically highlights the issues dealt with the implementation phases. Implementation
is the phase where objectives of physical operations of the system turned into reality i.e. real
working model. In this phase the coding convention has made it possible as it’s the real phase of
objectivity to reality. Then the code is tested until most of the errors have been detected and
corrected. The goal of implementation is to introduce our system for the users in real sense that
how they use this new system which is developed for their intended objectives
Development Server: Here is where we test code and checks whether the application runs
successfully with that code. Once the application has been tested and the code is working
fine, the application then moves to the staging server.
Staging Server: This environment is made to look exactly like the production server
environment. The application is tested on the staging server to check for reliability and to
make sure it does not fail on the actual production server. This type of testing on the staging
server is the final step before the application could be deployed on a production server. The
application needs to be approved in order to deploy it on the production server.
Production Server: Once the approval is done, our application then becomes a part of this
server.
69 | P a g e
NN-SVG: (online) for drawing machine learning concepts like the architecture of CNN
and ANN.
Hardware tool
Computer: processor: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz, RAM, To the
minimum 8GB, Hard Disk: To the minimum 1TB, GPU: To the minimum 4GB.
Server computer- minimum of 32GB RAM, 32GB GPU and 1.4TB HD.
Flash Disk: to the minimum 4GB. Used for transferring a data from one computer to
the another.
Paper (A3, A4): used to print the documentation and draw diagrams.
Pencil: To draw diagrams.
Printer: Used to print the documentation.
Software tools
Operating system: Linux (Ubuntu-18.0 LTS).
PSPP: for data analysis.
Browser: chedot, chrome.
Microsoft Azure: For hosting.
Visual studio code: for coding.
Github: For version controlling and team work.
Documentation tool:
LibreOfficeWord: a free word processing software.
LibreOfficePresentation: a free version for power point making.
70 | P a g e
5.3.2 The Prototype of the Project
prototyping involves the use of basic models or examples of the product being tested. For example,
the model might be incomplete and utilize just a few of the features that will be available in the
final design, or it might be constructed using materials not intended for the finished article. We
will show the prototype of this project using Jupiter notebook. (You can found the full prototype
code and result at appendix b)
71 | P a g e
Figure 45 Model compiling
72 | P a g e
5.4 The Results
In the project different models were constructed using CNN and ML concepts. Basically there are
4 models, a breast and cervical cancer prognosis/ prediction model and a diagnosis model for skin
and breast cancer. We will mention the results from these model as the following:
For model of diagnosing breast cancer villus is: 0.3326 and val_accuracy: 0.8579
73 | P a g e
CHAPTER 6
6. CONCLUSION AND RECOMMENDATION
6.1 Conclusion
Computer-assisted diagnosis for histopathology image can improve the accuracy and relieve the
burden for pathologists at the same time. In this project, we present a supervised learning
framework, CNN, for histopathology image segmentation using only image-level labels. CNN
automatically enriches supervision information from image-level to instance-level with high
quality and achieves comparable segmentation results with its fully supervised counterparts. More
importantly, the automatic labeling methodology may generalize to other supervised learning
studies for histopathology image analysis. In CNN, the obtained instance-level labels are directly
assigned to the corresponding pixels and used as masks in the segmentation task, which may result
in the over-labeling issue. The datasets are recorded from different data warehouse websites like
kaggle which provide a big data for machine learning, openCV, data mining and artificial
intelligence purposes. The process exhibited high performance on the binary classification of
breast cancer scoring around 0.93 (93%), i.e. determining whether benign tumor or malignant
tumor. Consequently, the statistical measures on the classification problem were also satisfactory.
The project is aimed at developing a portable software for detecting breast, cervical and skin
cancer. Using the models, this system, can use a histopathology image and predicts whether IDC
is presented or not and uses a laboratory reports in order to predict occurrence of a breast cancer.
Requirement analysis is performed to discover the needs of the new solution to the proposed
system. This phase consists of drawing out functional and non-functional requirements of the
system. In Literature review part we reviewed different research and surveys and make a review
survey. In analysis phase, the proposed and existing system is represented using UML diagrams
such as usecase diagram. In design system phase, proposed system general architecture and design
goals are deeply described. In experiment phase, the dataset and programming tools were
presented.
6.2 Recommendation
To further substantiate the results of this study, a CV technique such as k-fold cross validation
should be employed. The application of such a technique will not only provide a more accurate
74 | P a g e
measure of model prediction performance, but it will also assist in determining the most optimal
hyper-parameters for the ML algorithms.
According to the scope of the project, the team should develop a prognosis and diagnosis for breast,
skin and cervical cancer. But due to time constraint we may have limitations which should be
considered. We recommend being included the following functionalities:
75 | P a g e
REFERENCE
[1] What Is Cancer? (n.d.). Retrieved from https://www.cancer.gov/about-
cancer/understanding/what-is-cancer
Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint
arXiv:1408.5093, 2014.
[5] Roger S. Pressman, Ph.D. Software Engineering A practitioner approach seven edition
Published by McGraw-Hill, a business unit of The McGraw-Hill Companies, Inc., 1221 Avenue
of the Americas, New York, NY 10020.
[6] https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Prognostic).
[7] Kelwin Fernandes, Jaime S. Cardoso, and Jessica Fernandes. 'Transfer Learning with Partial
Observability Applied to Cervical Cancer Screening.' Iberian Conference on Pattern Recognition
and Image Analysis. Springer International Publishing, 2017.
[8] R.C. González, R.E. Woods, S.L. Eddins Digital image processing using MATLAB
Pearson (2004).
[10] A.I. Baba, C. Câtoi Tumor cell morphology Comparative oncology, The Publishing House of
the Romanian Academy (2007).
i
[12] F. Xing, Y. Xie, L. Yang An automatic learning-based framework for robust nucleus
segmentation IEEE Trans Med Imaging, 35 (2) (2016), pp. 550-
566, 10.1109/TMI.2015.2481436
[13] A. Prasoon, K. Petersen, C. Igel, F. Lauze, E. Dam, M. Nielsen Deep feature learning for
knee cartilage segmentation using a triplanar convolutional neural network Medical
image computing and computer-assisted intervention - (MICCAI) 2013 - 16th international
conference, nagoya, Japan, september 22-26, 2013, proceedings, Part II (2013), pp. 246-
253, 10.1007/978-_3-_642-_40763-_5\_31.
[14] D.C. Ciresan, A. Giusti, L.M. Gambardella, J. Schmidhuber Deep neural networks segment
neuronal membranes in electron microscopy images
Advances in neural information processing systems 25: 26th annual conference on neural
information processing systems 2012. Proceedings of a meeting held december 3-6, 2012, lake
tahoe, Nevada, United States. (2012), pp. 2852-2860.
Medical imaging 2014: digital pathology, san diego, California, United States, 15-20 february
2014 (2014), p. 904103, 10.1117/12.2043872.
ii
APPENDIX
A.
Group-4
Students
Computer Science
Dilla University
Abstract- Breast cancer is the second leading cause aided diagnosis (CAD) of breast cancer utilizing
of cancer death among women. Breast cancer is not a histopathology image analysis is an effective means
single disease, but rather is comprised of many for cancer detection and diagnosis. Modern digital
different biological entities with distinct pathological pathology provides a ways to facilitate pathology
features and clinical implications. Pathologists face a
practice[2]–[4]. It lends itself to automated
substantial increase in workload and complexity of
digital pathology in cancer diagnosis due to the advent histopathological analysis which has been proven to
of personalized medicine, and diagnostic protocols be valuable in prognostic determination of various
have to focus equally on efficiency and accuracy. malignancies, including breast cancer [5].
Computerized image processing technology has been
shown to improve efficiency, accuracy and For many real-world problems, it is necessary to
consistency in histopathology evaluations, and can build extremely accurate and understandable
provide decision support to ensure diagnostic classification models. Especially in the medical
consistency. I compare some different techniques used domain, there is growing demand for Artificial
to binary classification for breast cancer detection. Intelligence (AI) approaches, which are not only
They are using Caffe [1] convolutional neural network,
well performing, but trustworthy, transparent,
convolutional neural network using transfer learning,
convolutional neural network using deep learning and
interpretable and explainable. This would allow
texture CNN. medical professionals to have possibilities to
understand how and why a machine learning
Keywords – feedforward neural nets; medical image algorithm arrives at its decision, which will enhance
trust of medical professionals in AI systems [6]. In
processing; Convolutional Neural Network; image
recent years, some machine learning models have
classification model; breast cancer; histopathology significantly improved the ability to predict the
future condition of a patient [7] [8]. Although these
images; Google Inception v3 model;
models are very accurate, the inability to explain the
predictions from accurate, complex models is a
INTRODUCTION serious limitation. For this reason, machine learning
methods employed in clinical applications avoid
According to the American Cancer Society, using complex, yet more accurate, models and
breast cancer is the second leading cause of cancer retreat to simpler interpretable models at the
death among women. Because the disease is so expense of accuracy [9].
deadly, its rapid diagnosis and treatment is a critical
problem having huge societal benefits. Computer Caffe (Convolutional Architecture for Fast
Feature Embedding) convolutional neural
iii
network is a deep learning framework, originally Amazon mechanical Turk. Regarding sample size,
developed at University of California, Berkeley. It histopathology tasks fit the bill of transfer learning.
is open source, under a BSD license.[10] It is written However major deep learning image datasets which
in C++, with a Python interface. are frequently used to generate initialization
weights
When practicing machine learning, training a such as ImageNet [15] or MIT Places [16] are based
model can take a long time. Creating a model on natural scene photography’s. They have very
architecture from scratch, training the model, and distinct image statistics compared to
then tweaking the model is a massive amount of histopathological stains. The H&E stain, for
time and effort. A far more efficient way to train a example, is characterized by pink tones for the
machine learning model is to use an architecture connecting tissue from the Eosin and blue-violet
that has already been defined, potentially with tones for the nuclei from the Hematoxylin
weights that have already been calculated. This is compounds, with a notable absence of any green or
the main idea behind transfer learning, taking a yellow color components present in natural images.
model that has already been used and repurposing it So, this paper contributes a comparison between
for a new task[11]. different methods or techniques for detection of
breast cancer.
Texture convolutional neural network (TCNN)
replaces handcrafted features based on Local Phase 4. DATA SETS
Quantization (LPQ) and Haralick descriptors (HD)
with the advantage of learning an appropriate In this paper all reviewed researches are used same
textural representation and the decision boundaries dataset from BreaKHis database composed of 7909
into a single optimization process [12]. microscopic biopsy images of benign and malignant
breast tumor acquired on 82 patients [5]. BreaKHis
This paper shows the review of them. is collected using different magnifying factors
2. EXISTING SYSTEM (40X, 100X, 200X, and 400X) and contains 2,480
benign and 5,429 malignant images. Table 1 shows
The existing detection of breast cancer has been the distribution of the dataset.
determined by specialists’ pathologic diagnosis that
is influenced by doctor’s experience and other TABLE I. Distribution of the dataset [5]
external factors. The pathologist takes a sample
from the patient and sees my magnifier like Magnification Benign Malignant Total
microscope in order to make a classification.
40x 625 1,370 1,995
Pathology examination requires time consuming 100x 644 1,437 2,081
scanning through tissue images under different
magnification levels to find clinical assessment 200x 623 1,390 2,013
clues to produce correct diagnoses. Pathologists 400x 588 1,232 1,820
face a substantial increase in workload and
complexity of digital pathology in cancer diagnosis Total 2,480 5,429 7,909
due to the advent of personalized medicine, and # Patients 24 58 82
diagnostic protocols have to focus equally on
efficiency and accuracy. Burden of pathologists is
larger, which can cause fault of
their diagnosis.
iv
Figure 1. Sample malignant histopathological images from
BreakHIs dataset.
5. RELATED WROK
v
24× 0.826 ± 0.043 0.874 ± 0.061 0.727 ± 0.161
48× 0.858 ± 0.039 0.920 ± 0.050 0.714 ± 0.095
72× 0.857 ± 0.051 0.919 ± 0.066 0.736 ± 0.109
Model Advantage Dis advantage
1× 0.851 ± 0.032 0.907 ± 0.074 0.735 ± 0.178
Single CNN
0.864 ± 0.045 0.918 ± 0.062 0.77 ± 0.098
TCNN TCNN
Independent handle
(Implement)
Table 2 Advantage and dis advantages
6. CONCLUSION
Figure 3 The architecture googles inception v3 model
In this paper we reviewed 3 different models which
3. Texture CNN for Histopathological Image are stated under 5 different researches. From those,
Classification. as we tabularized, TCNN Inception recorded a good
. performance compared to others in sensitivity and
Break HIs do not have the same shapes found in single-CNN in terms of Specificity. If we have a
large-scale image datasets that are commonly used small dataset it is better to use transfer learning
to train CNNs, such as ImageNet or CIFAR. because it is the pro side of it or if the
Therefore, instead of using pre-trained CNNs, matter/question is about magnification just single-
Texture CNN propose an architecture that is more task CNN is the answer.
suitable to capture the texture-like features present
in HIs. For such an aim, this research claims use of ACKNOWLEDGMENTS
an alternative architecture based on the texture
CNN proposed by Andrearczyk and Whelan. It Authors acknowledge the incubation center at
consists of only two convolutional layers university of Dilla for providing computer, office to
(Conv2D), an average pooling layer (AvgPool2D) work and GPU.
over the entire feature map also called global
average pooling, and fully connected layers REFERENCE
(Dense).
[1] Korean Breast Cancer Society, Breast Cancer Facts &
Figures 2016. Sourl : Korean Breast Cancer Society, 2016.
Table 1 Comparison of algorithms performances
Accuracy Sensitivity Specificity
Model DA Mean ± SD Mean ± SD Mean ± SD [2] M. Kowal, P. Filipczuk, A. Obuchowicz, J. Korbicz and R.
Monczak, "Computer-aided diagnosis of breast cancer based on
0.851 ± 0.045 0.915 ± 0.043 0.731 ± 0.093
TCNN
1×
6× 0.849 ± 0.038 0.932 ± 0.032 0.669 ± 0.137 [4] P. Wang, X. Hu, Y. Li, Q. Liu and X. Zhu, "Automatic cell
12× 0.837 ± 0.017 0.891 ± 0.044 0.704 ± 0.065 nuclei segmentation and classification of breast cancer
vi
histopathology images", Signal Processing, vol. 122, pp. 1-13, vision” In Proceedings of the IEEE Conference on Computer
2016. Vision and Pattern Recognition, pp. 2818-2826, 2016.
[5] F. Spanhol, L. Oliveira, C. Petitjean and L. Heutte, "A [13] K. Simonyan and A. Zisserman, “Very deep convolutional
Dataset for Breast Cancer Histopathological Image networks for large-scale image recognition”, arXiv preprint
Classification", IEEE Transactions on Biomedical Engineering, arXiv:1409.1556, 2014.
vol. 63, no. 7, pp. 1455-1462, 2016.
[15] J. de Matos, A. de Souza Britto, L. E. S. de Oliveira and A.
L. Koerich, "Texture CNN for Histopathological Image
[6] F. Spanhol, L. Oliveira, C. Petitjean and L. Heutte, “Breast Classification," 2019 IEEE 32nd International Symposium on
cancer histopathological image classification using Computer-Based Medical Systems (CBMS), Cordoba, Spain,
convolutional neural networks”, International Joint conference 2019, pp.580-583.
on Neural Networks (IJCNN), pp.2560-2567, 2016. doi: 10.1109/CBMS.2019.00120.
[7] N. Bayramoglu, J. Kannala, and J. Heikkila, “Deep learning [16] P. Sabol, P. Sinčák, K. Ogawa and P. Hartono, "Explainable
for magnifi- ¨cation independent breast cancer histopathology Classifier Supporting Decision-making for Breast Cancer
image classification”, in23rd International Conference on Diagnosis from Histopathological Images," 2019 International
Pattern Recognition, vol. 1, December2016. Joint Conference on Neural Networks (IJCNN), Budapest,
Hungary, 2019,pp.1-8.
doi: 10.1109/IJCNN.2019.8852070.
[8] B. Wei, Z. Han, X. He and Y. Yin, “Deep learning model
based breast cancer histopathological image classification”, In [17]S. Angara, M. Robinson and P. Guillén-Rondon,
Cloud Computing and Big Data Analysis (ICCCBDA), 2017 "Convolutional Neural Networks for Breast Cancer
IEEE 2nd International Conference on (pp. 348-353). IEEE. Histopathological Image Classification," 2018 4th International
Conference on Big Data and Information Analytics (BigDIA),
[9] H. Chen, Q. Dou, X. Wang, J. Qin and P. A. Heng, “Mitosis Houston, TX, USA, 2018, pp. 1-6.
detection in breast cancer histology images via deep cascaded doi: 10.1109/BigDIA.2018.8632800.
networks”, InThirtieth AAAI Conference on Artificial
Intelligence, pp. 1160-1166, 2016. [18]N. Bayramoglu, J. Kannala and J. Heikkilä, "Deep learning
for magnification independent breast cancer histopathology
[10] A. Esteva, B. Kuprel, R. Novoa, J. Ko, S. Swetter, H. Blau image classification," 2016 23rd International Conference on
and S. Thrun, "Dermatologist-level classification of skin cancer Pattern Recognition (ICPR), Cancun, 2016, pp. 2440-2445.
with deep neural networks", Nature, vol. 542, no. 7639, pp. doi: 10.1109/ICPR.2016.7900002.
115-118, 2017.
vii
DECLARATION
This is to declare that the project entitled “Early cancer detection and diagnosis” is an original
work done by us, undersigned students in the Department of Computer Science, School of
Computing and Informatics, College of Engineering and Technology, Dilla University. The
reports are based on the project work done entirely by us and not copied from any other source.
Advisor
(MSc.) Kedir
_____________________
Date
Name ID Signature
viii