ECPD

Download as pdf or txt
Download as pdf or txt
You are on page 1of 94

Dilla University

College of Engineering and Technology


School of Computing and Informatics
Department of Computer Science
Early cancer detection and diagnosis

Submitted by:
1. Ephrata Abera…………………………………. ….. RCS/059/15
2. Eyuel Lemma……………………………….………. RCS/046/16
3. Natnael Abebe ………...……………………………. RCS/090/16
4. Nigus Redae …………………………….…………... RCS/092/16
5. Mekuanint Dires…………………..………………… RCS/095/15
Advised by:
(MSc.) Kedir S.

Date: 1/21/2021 GC
Dilla, Ethiopia
CERTIFICATE
The project titled

“Early cancer detection and diagnosis”


Compiled by

1. Ephrata Abera…………………………………. ….. RCS/059/15


2. Eyuel Lemma……………………………….………. RCS/046/16
3. Natnael Abebe ………...……………………………. RCS/090/16
4. Nigus Redae …………………………….…………... RCS/092/16
5. Mekuanint Dires…………………..………………… RCS/095/15
Bachelor of Science

in

Computer Science

from

Dilla University

Signed by the Examining Committee:

Name Signature Date

Advisor:___________________________________________________________________

Examiner:__________________________________________________________________

Examiner:__________________________________________________________________

Examiner:__________________________________________________________________

Signed by the Head of the Department:


Name____________________________Signature_________________Date_____________

i
ACKNOWLEDGMENT
First and foremost, praises and thanks to the God, the Almighty, for His showers of
blessings throughout our project work to complete the part of the project successfully.

We would like to express our deep and sincere gratitude to our project advisor, Kedir
M.Sc., computer science department, school of computing and informatics, Dilla
University incubation center, for giving us the opportunity to do project and providing
invaluable guidance throughout this project. Our adviser dynamism, vision, sincerity and
motivation have deeply inspired us. He has taught us the methodology to carry out the
project and to present the project works as clearly as possible. It was a great privilege and
honor to work and study under his guidance. We are extremely grateful for what he has
offered for us. We would also like to thank him for his friendship, empathy, and great
sense of humor.

We are extremely grateful to my parents for their love, prayers, caring and sacrifices for
educating and preparing us for our future. We are very much thankful to our moms and
dads for their love, understanding, prayers and continuing support to complete this
research work. Also we express our thanks to our sisters, brothers, for their support and
valuable prayers.

We are extending our thanks to the computer science students Dilla University, for their
support during my project work. I also thank all the staff of project section of Dilla
University, Dilla for their kindness.

ii
TABLE OF CONTENT
CHAPTER 1 1

1. INTRODACTION 1

1.1 Background 1

1.2 Introduction About the Project 1

1.2.1 Vision 2
1.2.2 Mission 2
1.2.3 Purpose 2
1.3 Statement of the Problem 2

1.4 Objective of the Project 3

1.4.1 General Objectives 3


1.4.2 Specific Objectives 3
1.5 Methodology 4

1.5.1 Literature Review 4


1.5.2 Data Collection 4
1.5.3 System Analysis and Design Methodology 4
1.5.4 Implementation Methodology 5
1.5.5 Testing Methodology 16
1.5.6 Development Environment and Programming Tools 17
1.6 Scope and Limitation 18

1.6.1 Scope 18
1.6.2 Limitation 18
1.7 Application of the Project 19

CHAPTER 2 20

2. LITRATURE REVIEW AND RELATED WORK 20

2.1 Literature Review 20

2.2 Related Work 20

2.3 Summery 22

CHAPTER 3 24

iii
3. SYSTEM ANALAYSIS 24

3.1 Introduction 24

3.2 Existing System 24

3.2.1 Introduction 24
3.2.2 Model of the Existing System 24
3.2.3 Business Rules 25
3.2.4 Limitation of the Existing System 25
3.3 Proposed system 26

3.3.1 Overview of The Proposed System 26


3.3.2 Functional Requirement 27
3.3.3 Non-functional Requirements 27
3.3.4 User Interface Specification and Description 29
3.4 Analysis Model 30

3.4.1 Functional Model 31

3.4.2 Dynamic Model 37


3.4.3 Object Model 46
3.4.4. Class Diagram 47
3.4.5 User Interface Flow Diagram 49
CHAPTER 4 50

4. SYSTEM DESIGN 50

4.1 An Over View of the System Design 50

4.2 Design Goals 50

4.2.1 Performance Criteria 50


4.2.2 Dependability Criteria 51
4.2.3 Maintenance criteria 51
4.2.4 End User Criteria 51
4.3 System Design model 52

4.3.1 Proposed System General Architecture 52


4.3.2 Subsystem Decomposition 54
4.3.3 Hardware/Software mapping 55

iv
4.4 User Interface Design 55

CHAPTER 5 63

5. EXPERMENT 63

5.1 Introduction 63

5.2 Dataset Preparation 63

5.2.1 Dataset description for Breast Cancer Prognosis 63


5.2.2 Dataset description for Breast Cancer Diagnosis 66
5.2.3 Dataset description for Cervical Cancer Prognosis 67
5.3 Implementation 69

5.3.1 Development Environment and Programming Tool 69


5.3.2 The Prototype of the Project 71
5.4 The Results 73

CHAPTER 6 74

6. CONCLUSION AND RECOMENDATION 74

6.1 Conclusion 74

6.2 Recommendation 74

REFERENCE i

APPENDIX iii

v
LIST OF FIGURE

FIGURE 1 REPRESENTATION OF IDC DATASET ........................................................................ 12


FIGURE 2- DATA SPLITTING VISUALIZATION ........................................................................... 13
FIGURE 3-TYPES OF LAYERS WE ARE GOING TO USE ................................................................ 14
FIGURE 4 STRUCTURE OF CNN ............................................................................................... 15
FIGURE 5 CNN STRUCTURE .................................................................................................... 22
FIGURE 6 COMPARISON OF ALGORITHMS PERFORMANCE ........................................................ 23
FIGURE 7 PROS AND CONS ...................................................................................................... 23
FIGURE 8 THE PROPOSED SYSTEM GENERAL ARCHITECTURE FOR BREAST AND SKIN CANCER 27
FIGURE 9 USER INTERFACE SPECIFICATION DIAGRAM FOR THE PROPOSED SYSTEM ................ 30
FIGURE 10 EARLY CANCER PREDICTION AND DIAGNOSIS SYSTEM GENERAL USE CASE ........... 35
FIGURE 11 USE CASE DIAGRAM FOR BREAST CANCER PREDICTION ......................................... 36
FIGURE 12 USE CASE DIAGRAM FOR CERVICAL CANCER PREDICTION ..................................... 36

FIGURE 13 USE CASE DIAGRAM FOR BREAST CANCER DIAGNOSIS ........................................... 37


FIGURE 14 USE CASE DIAGRAM FOR SKIN CANCER DIAGNOSIS ................................................ 37
FIGURE 15 SEQUENCE DIAGRAM FOR BREAST CANCER PREDICTION ........................................ 38
FIGURE 16 SEQUENCE DIAGRAM FOR CERVICAL CANCER PREDICTION .................................... 39
FIGURE 17 SEQUENCE DIAGRAM FOR BREAST CANCER DIAGNOSIS ......................................... 39
FIGURE 18 SEQUENCE DIAGRAM FOR SKIN CANCER DIAGNOSIS .............................................. 40
FIGURE 19 ACTIVITY DIAGRAM FOR BREAST CANCER PREDICTION ......................................... 41
FIGURE 20 ACTIVITY DIAGRAM FOR CERVICAL CANCER PREDICT ........................................... 42
FIGURE 21 ACTIVITY DIAGRAM FOR SKIN CANCER DIAGNOSIS ................................................ 43
FIGURE 22 STATE CHART DIAGRAM FOR VALIDATION OF INPUT TEXT ..................................... 44
FIGURE 23 STATE CHART DIAGRAM FOR VALIDATION OF INPUT IMAGE................................... 44
FIGURE 24 STATE CHART DIAGRAM FOR PREDICTION OF CANCER FROM GIVEN TEXT INPUT .... 45
FIGURE 25 STATE DIAGRAM FOR DETECTION OF HISTOPATHOLOGY IMAGE ............................. 45
FIGURE 26 STATE CHART DIAGRAM FOR THE DETECTION OF LESION IMAGE............................ 46
FIGURE 27 CLASS DIAGRAM FOR THE PROPOSED SYSTEM ....................................................... 48
FIGURE 28 GENERAL STRUCTURE OF THE SYSTEM .................................................................. 53
FIGURE 29 DECOMPOSITION DIAGRAM OF THE SYSTEM .......................................................... 53

vi
FIGURE 30 COMPONENT DIAGRAM FOR SUB DECOMPOSITION OF THE SYSTEM ........................ 54
FIGURE 31 HARDWARE/ SOFTWARE MAPPING ......................................................................... 55
FIGURE 32 USER INTERFACE DIAGRAM ................................................................................... 56
FIGURE 33 HOME PAGE (A) INTERFACE OF THE SYSTEM ......................................................... 57
FIGURE 34 HOME PAGE (B) OF THE USER INTERFACE .............................................................. 58
FIGURE 35 HOME PAGE (C) USER INTERFACE .......................................................................... 59
FIGURE 36 BREAST CANCER PROGNOSIS INTERFACE ............................................................... 60
FIGURE 37 BREST CANCER DIAGNOSIS INTERFACE .................................................................. 61
FIGURE 38 SKIN CANCER DIAGNOSIS USER INTERFACE ........................................................... 62
FIGURE 39 SAMPLE DATASET OF BREAST CANCER PROGNOSIS MODEL .................................... 66
FIGURE 40 BREAST CANCER FREE HISTOPATHOLOGICAL IMAG……………......……………..67
FIGURE 42 SAMPLE DATASET FOR CERVICAL CANCER PROGNOSIS .......................................... 68
FIGURE 43 VISUALIZATION OF THE TRAINING DATASET .......................................................... 71
FIGURE 44 MODEL COMPILING ............................................................................................... 72
FIGURE 45 PROTOTYPE MODEL PREDICTING THE OUTCOME .................................................... 72
FIGURE 46 ACCURACY OF BREAST CANCER PROGNOSIS MODEL .............................................. 73
FIGURE 47 ACCURACY OF CERVICAL CANCER PROGNOSIS MODEL .......................................... 73

vii
LIST OF TABLE
TABLE 1 PROJECT DESIGN METHODOLOGY ............................................................................... 5
TABLE 2 A NOTE FOR MISSING DATA ANALYSIS ........................................................................ 7
TABLE.3 UNIVARIATE STATISTICS ............................................................................................ 8
TABLE 1.4 CERVICAL CANCER DATASET DESCRIPTION ............................................................. 9
TABLE.5 CERVICAL CANCER MISSING DATA NOTES ................................................................ 10
TABLE 6 CERVICAL CANCER DATASET UNIVARIATE STATISTICS ............................................. 11
TABLE 7 ESSENTIAL USE CASE DIAGRAM OF THE EXISTING SYSTEM........................................ 25
TABLE 8 LIST OF OBJECTS AND THEIR ATTRIBUTES ................................................................. 46
TABLE 9 DATASET DESCRIPTION ............................................................................................. 65

viii
LIST OF ACRONYM

 AIDS: - Acquired Immune  HIV: - Human Immune Virus.


Deficiency Syndrome.  HPV: - Human Papilloma Virus.
 ANN: - Artificial Neural  IDC: - Invasive Ductal
Network. Carcinoma.
 BCC: - Basal Cell Carcinoma.  ISIC: - International Standard
 CAD: - Computer Aided Industrial Classification.
Diagnosis.  IUD: - Intra-Uterine Device
 CIN: - Cervical Intraepithelial  MCC: - Melanoma Cell
Neoplasia. Carcinoma.
 CPU: - Central Processing Unit.  OOSAD: - Object Oriented
 CNN: - Convolutional Neural System Analysis and development.
Network.  RAM: - Random Access Memory.
 DNA: - Deoxyribonucleic Acid.  SCC: - Squamous Cell
 Dr: - Doctor. Carcinoma.
 Dx: - Diagnosis.  STDs: - Sexual Transmitted
 FNA: - Fine Needle Aspirin. Disease.
 GPU: - Graphical Process Unit.  SVM: - Support Vector Machine

ix
ABSTRACT
In traditional cancer diagnosis, pathologists examine biopsies to make diagnostic assessments
largely based on cell morphology and tissue distribution. However, this is subjective and often
leads to considerable variability. On the other hand, computational diagnostic tools enable
objective judgments by making use of quantitative measures. This project presents a systematic
method of the computational steps in automated cancer diagnosis based on histopathology.
These computational steps are: 1.) image preprocessing to determine the focal areas, 2.) feature
extraction to quantify the properties of these focal areas, and 3.) classifying the focal areas as
malignant or not or identifying their malignancy levels. In Step 1, the focal area determination
is usually preceded by noise reduction to improve its success. In the case of cellular-level
diagnosis, this step also comprises nucleus/cell segmentation. Step 2 defines appropriate
representations of the focal areas that provide distinctive objective measures. In Step 3,
automated diagnostic systems that operate on quantitative measures are designed. After the
design, this step also estimates the accuracy of the system. In this project, we detail these
computational steps, address their challenges emphasizing the importance of constituting
different data sets. Such benchmark data sets allow comparing the different features and system
designs and prevent misleading accuracy estimation of the systems. Therefore, this allows
determining the subsets of distinguishing features, devise new features, and improve the
success of automated cancer diagnosis.

x
CHAPTER 1
1. INTRODACTION
1.1 Background
Cancer is a group of more than 100 different diseases. It can develop almost anywhere in the
body [1]. Cells are the basic units that make up the human body. Cells grow and divide to make
new cells as the body needs them. Usually, cells die when they get too old or damaged. Then,
new cells take their place [2]. Cancer begins when genetic changes interfere with this orderly
process. Cells start to grow uncontrollably. These cells may form a mass called a tumor. A
tumor can be cancerous or benign. A cancerous tumor is malignant, meaning it can grow and
spread to other parts of the body. A benign tumor means the tumor can grow but will not
spread.

Breast Cancer is one of the most prevalent and common forms of Cancers [3]. This condition
largely effects women only, while in a very few rare cases it also affects certain men. Breast
Cancer develops in the cells of the breast, wherein certain cells in the breast begin to grow
rapidly and abnormally. This results in the accumulation of lumps or a mass of tissue.

Skin cancer is the out-of-control growth of abnormal cells in the epidermis, the outermost skin
layer, caused by unrepaired DNA damage that triggers mutations. These mutations lead the
skin cells to multiply rapidly and form malignant tumors. The main types of skin cancer
are basal cell carcinoma (BCC), squamous cell carcinoma (SCC), melanoma and Merkel cell
carcinoma (MCC). Yes! these types of cancers were seen in Ethiopia. And The cervix is the
lower part of the uterus, the place where a baby grows during pregnancy.

Cervical cancer is caused by a virus called HPV. The virus spreads through sexual contact.
Most women's bodies are able to fight HPV infection. But sometimes the virus leads to cancer.
Cervical cancer is the fourth most frequent cancer in women with an estimated 570,000 new
cases in 2018 representing 6.6% of all female cancers. Approximately 90% of deaths from
cervical cancer occurred in low- and middle-income countries [4].

1.2 Introduction About the Project


Cancer is a major root cause of disease among human deaths in many developed countries.
Cancer classification in medical practice trusted on clinical and histopathological facts may

1
produce incomplete or misleading results. This is project uses different datamining, machine
learning and image processing techniques in order to predict breast cancer and cervical cancer
from given parameters and detection of breast and skin/melanoma cancer from
histopathological and lesion images.

1.2.1 Vision
The vision of this CAD system is seeing cancer prediction and diagnosis to become easier,
cheaper and distributed over the country with a good performance and acceptability.

1.2.2 Mission
 To give the world a fast, efficient, accurate, secure and effective computer aided system
for prediction and diagnosis of cancer.
 To fill the gap of pathologist insufficiency.
 To decrease the time elapsed by laboratory test and increase the efficiency of the
predictions of results.

1.2.3 Purpose
Many of patients are benighted by accurate and fast CAD based cancer detection system. It
an efficient system where an examiner uses as a support system or as a main system for
diagnosing and predicting. It allows pathologists in order to makes sure their prediction is how
much accurate if they use the system as a support mechanism for the cancer detection and
diagnosis.

1.3 Statement of the Problem


Cancer is an increasing public health burden for Ethiopia and Sub-Saharan Africa at large.
Indeed, by the year 2030, cancer and other non-communicable diseases may overtake some
infectious diseases as leading causes of death in the African Region [5]. Currently cancer
accounts for four per cent of all deaths in Ethiopia. Many of these deaths can be avoided if the
cancer can be detected and treated early. Many cancers can also be prevented by avoiding
exposure to common risk factors, such as tobacco smoke. However, many African countries
have limited capacity to detect, treat and care for their cancer patients. Clinical oncology
infrastructure and improved cancer health systems, as well as prevention and control strategies

2
are essential to curbing this growing epidemic. Below are listed some problems as they are
main concern: -

 Limitation of educated human resource.


 High burden on pathologists.
 No efficient prognosis mechanism (prediction).
 No efficient diagnosing method.
 Distribution problem.

1.4 Objective of the Project


In this project we classified the objectives in to two as general and specific objects.

1.4.1 General Objectives


The general objectives of this project is to develop a computer aided prognosis and diagnosis
system for Skin, Cervical and Breast cancer.

1.4.2 Specific Objectives


 Study the existing system and find out the problem.
 Tentative the problem and categories the problem with their solutions.
 Collecting data sets, for the model, for each problem.
 Decide the general and specific design/structure of the project.
 List out and map models to their problem as they are the solution.
 Train the machine or models.
 Building a pre-diagnosis system for Breast cancer.
 Building a pre-diagnosis system for Cervical cancer.
 Building a diagnosis system for Breast and Skin cancer.
 Designing the user interface and help system for users.

It has no one can know until it reaches a dangerous stage.

3
1.5 Methodology
1.5.1 Literature Review
We made a review survey entitled “Histopathological Image Classification for Breast Cancer
Detection” which helps as review different papers. Based on that we tried to put some
narratives so that to understand what the actual research area is held on.

1.5.2 Data Collection


Our team tries to collect information’s in different techniques below we mentioned these with
some sample.

 1-on-1 Interviews
For a highly personalized data we use 1-on-1 interviews. In order to gather information’s
from cancer penitents about the current situations.
Question: Which type of cancer is you attacked?
Answer: Cervical cancer
Question: How old are you when you hear you are a cancer patient?
Question: 34
 Direct observation
To know about the existing system, we directly see how in Black Lion Hospital the system is
arranged.

 Surveys
We prepared a survey on each functional requirements. We read different research’s and
write a survey with the title of “a survey report on histopathological image classification for
breast cancer detection” (this survey is included in the appendix a).
 Focus Group
We discussed with different doctors who have a deep knowledge in cancer. Example
discussing topic: “How do we classify a cell as bening or malignant”.

1.5.3 System Analysis and Design Methodology


The method of the system development can be done through OOSAD (Object oriented system
analysis and development) during the whole project life cycle. In our project, we will apply
the concept of object oriented system development methodology which categorized in to two

4
phases. These phases are object oriented analysis and object oriented design. It increases
consistency among analyzer, designer implementation and testing. It also allows the
reusability of the code. [5]

For designing purpose, we selected the ajail method so we can keep updated our data and the
whole information’s

Table 1 Project design methodology

1.5.4 Implementation Methodology


For the implementation we select a machine learning, Image processing and data mining with
python programming language. Below described the detail implementations by classifying in
two-part which are prognosis and diagnosis.

 Breast cancer prognosis


 Dataset

Dataset name: “Breast Cancer Wisconsin (Prognostic) Data Set.

This dataset is taken from UCI Machine Learning Repository (Wisconsin University) and
become available with support of National Science Foundation. Each record represents
follow-up data for one breast cancer case. These are consecutive patients seen by Dr. Wolberg
since 1984, and include only those cases exhibiting invasive breast cancer [6]. The first 30
5
features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass.
They describe characteristics of the cell nuclei present in the image.

The following is an attribute Information of the dataset.

1) ID number
2) Outcome (R = recur, N = non-recur)
3) Time (recurrence time if field 2 = R, disease-free time if field 2 = N)
4-33) Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)

34) Tumor size - diameter of the excised tumor in centimeters

35) Lymph node status - number of positive axillary lymph nodes observed at time of
surgery

The missing values from the dataset above is showed below with their necessary details:

6
Table 2 A note for missing data analysis

Output Created 11-DEC-2019 19:58:29

Comments

Input Data D:\Final Thing\breast-cancer-wisconsin (1).csv

Active Dataset DataSet2

Filter <none>

Weight <none>

Split File <none>

N of Rows in Working Data 699


File

Syntax MVA VARIABLES=V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11

/MPATTERN

/EM(TOLERANCE=0.001 CONVERGENCE=0.0001 ITERATIONS=25).

Resources Processor Time 00:00:00.09

Elapsed Time 00:00:00.09

7
Table.3 Univariate Statistics

Univariate Statistics

Missing No. of Extremes

N Mean Std. Deviation Count Percent Low High

V1 699 1071704.10 617095.730 0 0.0 21 2

V2 699 4.42 2.816 0 0.0 0 0

V3 699 3.13 3.051 0 0.0 0 0

V4 699 3.21 2.972 0 0.0 0 0

V5 699 2.81 2.855 0 0.0 0 60

V6 699 3.22 2.214 0 0.0 0 66

V7 683 3.54 3.644 16 2.3 0 0

V8 699 3.44 2.438 0 0.0 0 20

V9 699 2.87 3.054 0 0.0 0 77

V10 1.59 1.715 0 0.0 . .

699

V11 699 2.69 0.951 0 0.0 0 0

The dataset class distributions from 198 entries’ 151 are non-recur and 47 are recur.

 Preprocessing

The first step is fill the missing data. The missing lymph nodes are full field using data
mining statistic method for handling missing data, specifically by using the median
imputation and pandas fillina () function.

8
The ID column is not useful in this time so another preprocessing is dropping the ID
columns. The id column is dedicated for identifying the patients uniquely, for the doctor, so
histories can put correctly through their records. For learning purpose it is not required.

Map 0 for benign and 1 for malignant. A column divides the final patients result as they
developed a breast cancer or not, which is mapped like 0 and 1.

 Model

Our model build on artificial neural network with Logistic regression. The activation
functions are RELU and Sigmoid function with estimated accuracy of 0.95. This structure
may be changed if it leads to unexpected or below expected result.

 Cervical cancer prognosis


 Dataset

[7]
Cervical cancer (Risk Factors) Data Set: This dataset focuses on the prediction of
indicators/diagnosis of cervical cancer. The features cover demographic information, habits,
and historic medical records. Supported by National Science Foundation. Below is the detail
description of the dataset.

Table 1.4 Cervical cancer dataset description

Number of In
Data Set
Multivariate 858 Area: Life
Characteristics:
stances:

Attribute Number of
Integer, Real 36 Date Donated 2017-03-03
Characteristics: Attributes:

Number of
Associated Tasks: Classification Missing Values? Yes 102864
Web Hits:

Below tables show the missing values from the dataset with some descriptions.

9
Table.5 Cervical cancer missing data notes

Output Created 11-DEC-2019 19:33:58

Comments

Input Data C:\Users\MELAKU-Hohe-


TECH\Downloads\risk_factors_cervical_cancer.csv

Active Dataset DataSet1

Filter <none>

Weight <none>

Split File <none>

N of Rows in Working Data 859


File

Syntax MVA VARIABLES=V1 V2 V3 V5 V6 V7 V26 V29 V30 V31 V32 V33


V34 V35 V36

/LISTWISE

/PAIRWISE

/EM(TOLERANCE=0.001 CONVERGENCE=0.0001
ITERATIONS=25).

Resources Processor Time 00:00:00.22

Elapsed Time 00:00:00.22

10 | P a g e
Table 6 Cervical cancer dataset univariate statistics

Univariate Statistics

Missing No. of Extremesa

N Mean Std. Deviation Count Percent Low High

V1 858 26.82 8.498 1 0.1 0 9

V2 832 2.528 1.6678 27 3.1 0 68

V3 851 16.995 2.8034 8 0.9 2 39

V5 845 0.146 0.3529 14 1.6 . .

V6 845 1.21972141259 4.08901693756 14 1.6 . .


3

V7 845 .453143950649 2.22660980260 14 1.6 . .


54

V26 858 0.09 0.303 1 0.1 . .

V29 858 0.02 0.143 1 0.1 . .

V30 858 0.01 0.102 1 0.1 . .

V31 858 0.02 0.143 1 0.1 . .

V32 858 0.03 0.165 1 0.1 . .

V33 858 0.04 0.198 1 0.1 . .

V34 858 0.281 1 0.1 . .

0.09

V35 858 0.05 0.221 1 0.1 . .

V36 858 0.06 0.245 1 0.1 . .

11 | P a g e
 Preprocessing

As there are missing values and also a column to drop a datamining aspect (Statically) used for
preprocessing, especially for missing values like scaling and dropping tuples.

 Model

The model is going to be ANN with SVM classifier. This structure may be changed if it leads to
below expected performance.

The 4 ((bool) Hinselmann: target variable, (bool) Schiller: target variable, (bool) Cytology: target
variable, (bool) Biopsy: target variable) are used as a class for our SVM, SV Classifier. We have
4 models, as diagnosis is taken from 4 different methods:

The first one is Hinselmann model, predicts that the person will develop a cervical cancer or not
based on Hinselmann perspectives. Schiller model, it predicts based on Schiller perspectives.
Cytology, predicts based cytological analysis. Biopsy, also predicts based on biopsy analysis.
With estimated average accuracy as 90%.

 Breast cancer diagnosis / IDC


 Dataset

The original dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens
scanned at 40x. From that, 277,524 patches of size 50 x 50 were extracted (198,738 IDC negative
and 78,786 IDC positive). Each patch’s file name is of the format: u_xX_yY_classC.png — >
example 10253_idx5_x1351_y1101_class0.png . Where u is the patient ID (10253_idx5), X is the
x-coordinate of where this patch was cropped from, Y is the y-coordinate of where this patch was
cropped from, and C indicates the class where 0 is non-IDC and 1 is IDC.

Figure 1 representation of IDC dataset

12 | P a g e
We will split the dataset for Training, validation and testing split as described on the below figure
5.

Figure 2-
Data
splitting

visualization

 Preprocessing

Most of the pixels in the image are redundant and do not contribute substantially to the intrinsic
[8]
information of an image . While dealing with AI networks, it is required to eliminate them to
avoid unnecessary computational overhead. This can be achieved by compression techniques. We
begin the implementation of our deep net by processing the images in the dataset. This is achieved
with the help of the OpenCV library in Python. There are many other modules that can be used in
this step e.g. MATLAB or other image processing libraries or software. This is necessary to
remove redundancy from the input data which only contributes to the computational complexity
of the network without providing any significant improvements in the result. The aspect ratio of
the original slide is preserved since both the dimensions are reduced by a factor of 2, giving an
image which is 1/4th in area, that is of dimension 350×230 pixels. Then as the dataset is unbalanced
we will apply data augmentation techniques in order to balance it. Finally, the images are resized
(50x50) and reshaped and ready to the input for the CNN.

 Feature extraction

Feature learning is a crucial step in the classification process for both human and machine
algorithm. A study has shown that the human brain is sensitive to shapes, while computers are
13 | P a g e
more sensitive to patterns and texture, [9]. Because of this fact, feature learning is entirely different
for manual versus machine. In the visual context, malignant tumors tend to have large and irregular
nuclei or multiple nuclear structures. The cytoplasm also undergoes changes, wherein new
structures appear, or normal structures disappear. Malignant cells have a small cytoplasmic
amount, frequently with vacuoles. In this scenario, the ratio of cytoplasm to nucleus decreases [10].
All of these features are examined by experts, or algorithms are developed to quantify these
features to automate detection. This approach is difficult and imprecise as selection and
quantification involve various unknown errors that are hard to address. In the case of supervise
learning, we do not need to provide these features explicitly. In this case images are fed to an
architecture such as CNN, along with its class as a label (Benign or Malignant). From the automatic
update of filter values in the training process, CNN is able to extract the computational features.
In our proposed architecture, the convolutional neural network is made up of two types of layers:

1. Convolutional Layers
2. Pooling layers

Figure 3-types of layers we are going to use

 Model

CNN is a modified variety of deep neural net which depends upon the correlation of neighboring
pixels. It uses randomly defined patches for input at the start, and modifies them in the training

14 | P a g e
process. Once training is done, the network uses these modified patches to predict and validate the
[11]
result in the testing and validation process. Convolutional neural networks have achieved
success in the image classification problem, as the defined nature of CNN matches the data point
distribution in the image. As a result, many image processing tasks adapt CNN for automatic
[[12], [13], [14]]
feature extraction. CNN is frequently used for image segmentation and medical
image processing as well [15].

The CNN architecture has two main types of transformation. The first is convolution, in which
pixels are convolved with a filter or kernel. This step provides the dot product between image
patch and kernel. The width and height of filters can be set according to the network, and the depth
of the filter is the same as the depth of the input. A second important transformation is
subsampling, which can be of many types (max_pooling, min_pooling and average_pooling) and
used as per requirement. The size of the pooling filter can be set by the user and is generally taken
in odd numbers. The pooling layer is responsible to lower the dimensionality of the data, and is
quite useful to reduce overfitting. After using a combination of convolution and pooling layers,
the output can be fed to a fully connected layer for efficient classification. The visualization of the
entire process is presented in fig 4.

Figure 4 structure of CNN

For that we choose the CNN with softmax algorithm for the classification.

 Classification

15 | P a g e
The process of classification is done by taking the flattened weighted feature map obtained from
the final pooling layer, and is used as input to the fully connected network, which calculates the
loss and modifies the weights of the internal hidden nodes accordingly. The estimated performance
is 85% in terms of accuracy.

 Skin cancer/melanoma diagnosis


 Dataset

Is taken from ISIC Archive. The overarching goal of the ISIC Melanoma Project is to support
efforts to reduce melanoma-related deaths and unnecessary biopsies by improving the accuracy
and efficiency of melanoma early detection. To this end the ISIC is developing proposed digital
imaging standards and creating a public archive of clinical and dermoscopic images of skin lesions.

The ISIC Archive contains over 23906 images of skin lesions, labeled as 'benign' or 'malignant'.

 Model

CNN methods used to classify skin lesions are presented. CNNs can be used to classify skin lesions
in two fundamentally different ways. On the one hand, a CNN pretrained on another large dataset,
such as ImageNet [16], can be applied as a feature extractor. In this case, classification is
performed by another classifier, such as k-nearest neighbors, support vector machines, or artificial
neural networks. On the other hand, a CNN can directly learn the relationship between the raw
pixel data and the class labels through end-to-end learning. In contrast with the classical workflow
typically applied in machine learning, feature extraction becomes an integral part of classification
and is no longer considered as a separate, independent processing step.

Because publicly available datasets are limited, a common method of skin lesion classification
involves transfer learning. Therefore, all such works pertain a CNN via the ImageNet dataset; next,
the weighting parameters of the CNN are fine-tuned to the actual classification problem. So we
also use a transfer learning for the classification of lesion images. Estimated F1 score is 0.82.

1.5.5 Testing Methodology


Testing methodologies are approaches to testing, from unit testing through system testing and
beyond. There is no formally recognized body of testing methodologies, and very rarely will you
ever find a unified set of definitions. But here are some common methodologies that we can use:

16 | P a g e
 Unit Testing: We assert a unit testing on the system which is The act of testing software
at the most basic (object) level.

For unit testing we will check the system functionalities by making a line by line code
running.

 Functional Testing: As functional testing is making sure all the functions or use cases are
working properly, we will run a functional testing on all functionalities of the system one
by one.
 Performance Testing: In this kind of test we will test the system performance issues.
 Security testing: A collection of tests focused on probing an application's security, or its
ability to protect user assets.
 System Testing: After we are finishing the above kinds of testing we will proceed to
testing the project as a one system as System Testing combines multiple features into an
end-to-end scenario.
 Acceptance testing: Also known as acceptance tests, build verification tests, basic
verification tests, these are rudimentary tests which prove whether or not a given build is
worth deeper testing. In this we will make sure the system is as customers need.

1.5.6 Development Environment and Programming Tools


 Interface: Flask- is a micro web framework written in Python, bootstrap-
 Language: python 3.6.x
 Documentation and design modeling
 UML design: Lucidchart – online application for UML diagrams (for drawing different
diagrams such as use case and class diagram).
 NN-SVG: (online) for drawing machine learning concepts like the architecture of CNN
and ANN.
 Hardware tool
 Computer: processor: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz, RAM, To the
minimum 8GB, Hard Disk: To the minimum 1TB, GPU: To the minimum 4GB.
 Server computer- minimum of 32GB RAM, 32GB GPU and 1.4TB HD.

17 | P a g e
 Flash Disk: to the minimum 4GB. Used for transferring a data from one computer to
the another.
 Paper (A3, A4): used to print the documentation and draw diagrams.
 Pencil: To draw diagrams.
 Printer: Used to print the documentation.
 Software tools
 Operating system: Linux (Ubuntu-18.0 LTS).
 PSPP: for data analysis.
 Browser: chedot, chrome.
 Microsoft Azure: For hosting.
 Visual studio code: for coding.
 Github: For version controlling and team work.
 Documentation tool:
 LibreOfficeWord: a free word processing software.
 LibreOfficePresentation: a free version for power point making.

1.6 Scope and Limitation


1.6.1 Scope
The scope of this project is developing a machine learning and image processing based system
with a user friendly interface. Particularly focuses as follow:

 Predicts if a patient (currently having a benign tumor) will develop breast cancer based on
features computed.
 Predicts likelihood of developing cervical cancer based on lifestyle habits.
 Accepts a breast histopathological image and predicts/shows the cancer on that image.
 Accepts a skin lesion image and predicts it as melanoma positive or negative.
 An interactive interface for the user.

1.6.2 Limitation
 For Cervical cancer only prognosis of it implemented but not the diagnosis. Currently this
type of cancer is very difficult to diagnosis or for screening and we can’t get a tangible
information’s about it.

18 | P a g e
1.7 Application of the Project
 Breast cancer prediction/prognosis based on FNA of breast mass.
 Cervical cancer prediction/prognosis through life style habits.
 Brest/IDC cancer classification/diagnosis on histopathology images of the breast
 Skin/melanoma cancer diagnosis based on given lesion images. Benefit of the Project

Ethiopia has a very limited machines to detect and diagnosis a cancer so this project will help with
the following basic aspects:

 Cost minimization
 Time minimization
 As the project will be simple to use (a user friendly) anyone who have a little knowledge
of health (specially in cancer) can access and see the results without going to “black lion
hospital” or other health centers Which decreases the number of customers comes to
every hospital.
 Pathologists easily and efficiently identify cancers with the help of this project
 The project can be used as a pathologist.

19 | P a g e
CHAPTER 2
2. LITERATURE REVIEW AND RELATED WORK
2.1 Literature Review
For many real-world problems, it is necessary to build extremely accurate and understandable
classification models. Especially in the medical domain, there is growing demand for Artificial
Intelligence (AI) approaches, which are not only well performing, but trustworthy, transparent,
interpretable and explainable. This would allow medical professionals to have possibilities to
understand how and why a machine learning algorithm arrives at its decision, which will enhance
trust of medical professionals in AI systems. In recent years, some machine learning models have
significantly improved the ability to predict the future condition of a patient. Although these
models are very accurate, the inability to explain the predictions from accurate, complex models
is a serious limitation. For this reason, machine learning methods employed in clinical applications
avoid using complex, yet more accurate, models and retreat to simpler interpretable models at the
expense of accuracy.

Caffe (Convolutional Architecture for Fast Feature Embedding) convolutional neural network is
a deep learning framework, originally developed at University of California, Berkeley. It is open
source, under a BSD license. It is written in C++, with a Python interface.

When practicing machine learning, training a model can take a long time. Creating a model
architecture from scratch, training the model, and then tweaking the model is a massive amount of
time and effort. A far more efficient way to train a machine learning model is to use an architecture
that has already been defined, potentially with weights that have already been calculated. This is
the main idea behind transfer learning, taking a model that has already been used and repurposing
it for a new task[11].

Texture convolutional neural network (TCNN) replaces handcrafted features based on Local Phase
Quantization (LPQ) and Haralick descriptors (HD) with the advantage of learning an appropriate
textural representation and the decision boundaries into a single optimization process .

2.2 Related Work


Recent studies have leveraged machine learning techniques in medical image analysis. Various
algorithms have achieved high performance in nucleus segmentation and classification with breast

20 | P a g e
cancer images. Spanhol et al. published a data set, named as BreaKHis, for histopathological
classification of breast cancer and suggested a test protocol by which the experiment obtained 80%
to 85% accuracy using SVM, LBP (Local Binary Pattern), and GLCM (Gray Level Co-occurrence
Matrix) [8]. Convolutional Neural Network(CNN) is known to achieve high performance in image
recognition and natural language processing through pattern analysis. CNN is a specific type of
neural network, which is a feed-forward neural network with convolutional layer, pooling layers
and fully connected layers as its hidden layer. Due to its outstanding performance, CNN is used
widely in many fields, especially in computer vision. And below are specific reviewed researches.

 Deep learning for magnification independent breast cancer histopathology image


classification.

In classification of histopathological images, the magnification of images is another issue in the


use of machine learning. This research proposed a model that can learn and predict the decision of
disease regardless of different image magnifications. This paper used a single task CNN in order
to perform classification as bening or malignant.

 A method for classifying medical images using transfer learning: A pilot study on
histopathology of breast cancer.

Recent research using transfer learning have obtained prominent results in image analysis. Transfer
learning is a method that trains a pre-trained model, which is already learned in a specific domain,
to another knowledge domain. Transfer learning method is known to be very useful when the data
is not enough or training time and computing resources are restricted. The above research provides
classifying medical images using transfer learning. In this paper, they built deep convolutional
neural network (CNN, ConvNet) model to classify breast cancer histopathological images to

21 | P a g e
malignant and benign class. In addition to data augmentation, they applied transfer learning
technique to overcome the insufficient data and training time.

Figure 5 CNN Structure

 Texture CNN for Histopathological Image Classification.

Break HIs do not have the same shapes found in large-scale image datasets that are commonly
used to train CNNs, such as ImageNet or CIFAR. Therefore, instead of using pre-trained CNNs,
Texture CNN propose an architecture that is more suitable to capture the texture-like features
present in HIs. For such an aim, this research claims use of an alternative architecture based on the
texture CNN proposed by Andrearczyk and Whelan. It consists of only two convolutional layers
(Conv2D), an average pooling layer (AvgPool2D) over the entire feature map also called global
average pooling, and fully connected layers (Dense).

2.3 Summery
Three different models which are stated under 5 different researches. From those, as we tabularized
in table 6, TCNN Inception recorded a good performance compared to others in sensitivity and
single-CNN in terms of Specificity. If we have a small dataset it is better to use transfer learning
because it is the pro side of it or if the matter/question is about magnification just single-task CNN
is the answer.

22 | P a g e
Figure 6 Comparison of algorithms performance

Accuracy Sensitivity Specificity


Model DA Mean ± SD Mean ± SD Mean ± SD

0.851 ± 0.045 0.915 ± 0.043 0.731 ± 0.093


TCNN

6× 0.828 ± 0.037 0.897 ± 0.035 0.684 ± 0.083
12× 0.839 ± 0.026 0.897 ± 0.025 0.720 ± 0.086
24× 0.829 ± 0.038 0.890 ± 0.043 0.689 ± 0.073
48× 0.834 ± 0.033 0.887 ± 0.042 0.704 ± 0.105
72× 0.833 ± 0.047 0.896 ± 0.044 0.700 ± 0.122
0.844 ± 0.045 0.913 ± 0.041 0.709 ± 0.083
TCNN Inception


6× 0.849 ± 0.038 0.932 ± 0.032 0.669 ± 0.137
12× 0.837 ± 0.017 0.891 ± 0.044 0.704 ± 0.065
24× 0.826 ± 0.043 0.874 ± 0.061 0.727 ± 0.161
48× 0.858 ± 0.039 0.920 ± 0.050 0.714 ± 0.095
72× 0.857 ± 0.051 0.919 ± 0.066 0.736 ± 0.109
1× 0.851 ± 0.032 0.907 ± 0.074 0.735 ± 0.178
6× 0.864 ± 0.045 0.918 ± 0.062 0.77 ± 0.098
Single CNN

12× 0.871 ± 0.029 0.919 ± 0.040 0.782 ± 0.049


24× 0.864 ± 0.026 0.907 ± 0.060 0.789 ± 0.077
48× 0.862 ± 0.036 0.918 ± 0.044 0.778 ± 0.088
72× 0.874 ± 0.027 0.914 ± 0.043 0.803 ± 0.053

Figure 7 Pros and Cons

Model Advantage Dis advantage

Small Parameters Large dataset


TCNN
Inception

Small Dataset Performance


TCNN

Magnification Independent Difficult to handle (Implement)


Single-

|CNN
|Task

23 | P a g e
CHAPTER 3
3. SYSTEM ANALYSIS
3.1 Introduction
It is a process of collecting and interpreting facts, identifying the problems, and decomposition of
a system into its components.

System analysis is conducted for the purpose of studying a system or its parts in order to identify
its objectives. It is a problem solving technique that improves the system and ensures that all the
components of the system work efficiently to accomplish their purpose.

3.2 Existing System


3.2.1 Introduction
The existing detection of breast cancer has been determined by specialists’ pathologic diagnosis
that is influenced by doctor’s experience and other external factors. The pathologist takes a sample
from the patient and sees by magnifier like microscope in order to make a classification.
Pathologists distinguishes the cancer using 3 key features. They are: Microscopic, gross and
clinical features.

3.2.2 Model of the Existing System


Actor
The existing system need/have the following different actors in order to perform a complete task:

 Pathologist: is an educated person that can take a sample from the patient and predicts the
existence of a cancer on the sample.
 Patient: a person who comes to the health center

Essential Use Case


Use case diagram for the existing system are used to represent the basic functionalities of the
existing as use cases focus on the behavior of the system from the external point of view.

24 | P a g e
Table 7 Essential use case diagram of the existing system

3.2.3 Business Rules


The existing system have a rules that must be fulfilled in order to get the service. Currently in our
country many private and public hospitals such as black lion hospital is responsible for the
identification of cells as benign or malignant.

Next, we will see some criteria and standards to determine a cell as benign or malignant.

 The patient must give a blood sample/ X-ray /Ultrasound/CT scan/ MRI for diagnosing
 In order to diagnosis there have to be a tumor on the person’s body.
 The patient must wait until his call is come.
 The doctor sends him to the laboratory.

3.2.4 Limitation of the Existing System


Pathology examination requires time consuming scanning through tissue images under different
magnification levels to find clinical assessment clues to produce correct diagnoses. Pathologists
face a substantial increase in workload and complexity of digital pathology in cancer diagnosis
due to the advent of personalized medicine, and diagnostic protocols have to focus equally on
efficiency and accuracy. Burden of pathologists is larger, which can cause fault of their diagnosis.

 The existing system is not efficient and accurate as the prediction and diagnosis is
determine by a human/pathologist manually. It may do a big mistake.
 In number there are a few pathologists in Ethiopia and they are not sufficient to distribute
and work everywhere on the country.

25 | P a g e
 Also it is expensive.

3.3 Proposed system


3.3.1 Overview of The Proposed System
A Computer Aided Detection system would involve following phases.

 Capture histopathologic image


 Propagate each image to Computer Aided Detection system
 Compare each image with built-in model base and detect the abnormalities, if any.
 If any abnormalities are detected, the model reports as malignant.

This helps to humanize the physician's ability to spot abnormalities. The figure 1.0 focuses the
proposed system. The manual process has few drawbacks. It is not automated and also there are
chances of not noticing the suspicious region, especially when it is too tiny to be noticed. In
proposed Computer Aided Detection system these drawbacks can be conquered and it is fast as
human intervention is minimized. In manual system, the pathologist does diagnosis with his
acquaintance, which enriches with the experience. If any suspicious patch or mark is observed,
then the pathologist needs the image with more details. Analysis of these diverse types of images
requires sophisticated computerized quantification and visualization tools. Resolution
augmentation is important for visualization and early diagnosis. Super resolution based region of
interest (ROI) can play major role in accurate diagnosis.

In order to make efficient prediction and diagnosis our team proposed prediction and diagnosis of
some basic, that are occurred frequently and dangerous, cancers like breast/IDC, skin/melanoma
and cervical cancer.

26 | P a g e
Figure 8 The proposed system general architecture for breast and skin cancer

3.3.2 Functional Requirement


 Breast cancer prediction/prognosis based on FNA of breast mass.
 Cervical cancer prediction/prognosis through life style habits.
 Brest/IDC cancer classification/diagnosis on histopathology images of the breast
 Skin/melanoma cancer diagnosis based on given lesion images.
 User guide/help

3.3.3 Non-functional Requirements


 Usability requirement

This computer aided cancer prognosis and diagnosis system is documented expectations and
specifications designed to ensure that the product, service, process or environment is easy to use.

 performance requirement

27 | P a g e
As the proposed system is a prediction and diagnosis performance of it is the main concern and
this project ensures to have a good capability in order to do what expected from the CAD system.

 Security requirement

The system interface developed using flask means it runs on a browser after hosted on a server. So
this needs a high security in order to not to change the results of the patients.

 Availability requirement

The system is available whenever the user needs to diagnosing or predicting a cancer from some
patients. This can be successfully by making the CAD system accessible both by online and at
local.

 Reliability requirement

The proposed system is usually operating without failure for a non-limited number of uses
(transactions) or for a multi specified period of time. But the transaction may determine by the
server owners.

3.2.2.1 System Requirement


Followings are hardware and software consideration for developing the system.

 Hardware consideration
 Memory capacity (RAM): 8 GB & 32GB
 Process type: Intel core i5 processor.
 Processor speed: minimum of 3.0 GHZ for good performance.
 Hard disk space: minimum of 1 TB for large amount of data storage.
 Keyboard: Normal.
 Software consideration
 Language: Python
 Software: Visual studio code, PSPP, LibreOfficeWord, LibreOfficePowerPoint,
draw.io.
 Operating system: Linux (Ubuntu 18 LTS)
 Library: OpenCV, Pandas, Tensorflow, Sklearn, Keras

28 | P a g e
3.3.4 User Interface Specification and Description
As we know that, currently the development stages of the new proposed system are to early, my
proposed system doesn’t have a fully designed user interface. However, by assuming that about
the contents that could be on the system, I prepared the following user interface specification and
description. Put in your mind that this user interface prepared document is not the exact one and it
may or mayn’t be face a little bit modification in the final implementation of the system.

After the new proposed system develops, it will have the following menu items on it.

 Home: it’s the first and the main part of the system. It shows all the menu items that found
on the system. I assumed the first view of the system as a home menu. Mainly it will have
two sections a prognosis and a diagnosis containing the type of cancer in which the use can
perform.
 Breast cancer prediction: At the home there will be a button Breast cancer prediction
which leads to the prognosis of breast cancer of the user currently is in the prognosis section
or a diagnosis if the user is on the opposite section. This part will have its own GUI for
accepting inputs and displaying result for the user with containing an analyze button for
starting analyzing the input record.
 Cervical cancer prediction: this one is a button existing on the home interface which
leads to the prediction of a cervical cancer. There will have an interface for accepting
different inputs and displaying a result.
 Skin cancer diagnosis: This is the another button which is responsible for shifting the
user interface from the home page to the skin cancer diagnosing page which is contains an
apace for inserting image, displaying result and a submit button.
 Breast cancer diagnosis: This one also another button which leads to the interface for
diagnosing a breast cancer which contains a space for inserting an image, display a result
and a submit button.
 Help: This is a button contains a detailed technical explanation on how to use this system.
 About: It is also a button for transferring from the home page to the about page which
consists the overview of the application.
 Contact: This is the last button which holds the contact descriptions of the system
developers.

29 | P a g e
Below is the specification diagram for the proposed system.

Figure 9 User interface specification diagram for the proposed system

3.4 Analysis Model


In this we want to describe the requirements of the system using UML. we divided the description
in to three sections. The first section is the use case model. In this section we can identify actors,
their specification, identify use case and use case diagram with use case description which briefly
describe the precondition, scenarios or basic course of action, and post condition of each use case.
On the second section we describe different interacting objects and the communication between
these objects in each use case by using sequence diagram. Sequence diagram describe how
processes operate with one another and in what order. On third section, we describe interaction of
class to represent structure of the system using class diagram. And which describes the structure
of a system by showing the system classes, their attributes, operation (method), and the
relationships among object.

30 | P a g e
3.4.1 Functional Model
3.4.1.1 Use-case Description
Use-case model consists of the collection of all actors and all use-case. A use-case describe a
function provided by the system that yields a visible result for an actor. An actor is a user playing
a role with respect to the system.

 Actors of the system


 User
 Actors specification
 User: is an actor who use the designed system for a specific purpose based on
his/her needs.
 Use cases of the system
 Brest cancer prediction: It is used to perform a prediction for breast cancer having
a tumor on breast.
 Cervical cancer prediction: Used to predict a cervical cancer.
 Skin cancer diagnosis: Used to identify a tumor of skin as bening or malignant.
 Breast cancer diagnosis: This on is for identifying a breast cancer from the given
histopathological images.
 Use case description
1. Use case 1

Use case name: Breast cancer prediction.

Identifier: BCP.

Actor: User.

Precondition: The user must know its previous medical information’s.

Post condition: The user can know the tumor will be malignant or not.

Flow of events:

1) The user opens the program.


2) Then it chooses prognosis.

31 | P a g e
3) Then it chooses breast cancer.
4) Then the user fills the inputs provided by the system from their medical results.
5) The user commands the system to analyze the given data.
6) The system gives a response that indicates they will have a bening or malignant
tumor.
7) Use-case ends.

Alternative flow:

4.A The users enter invalid input.

4.A.1 The system show an error sign.

4.A.2 The system stays on step 4.

2. Use case 2

Use case name: Cervical cancer prediction.

Identifier: CCP.

Actor: User.

Precondition: The user must know its previous medical information’s.

Post condition: The user can know the tumor will be malignant or not.

Flow of events:

1) The user opens the program.


2) Then it chooses prognosis.
3) Then it chooses cervical cancer.
4) Then the user fills the inputs provided by the system from their life style and
medical results.
5) The user commands the system to analyze the given data.
6) The system gives a response that indicates they will have a bening or malignant
tumor.

32 | P a g e
7) Use-case ends.

Alternative flow:

4.A The users enter invalid input.

4.A.1 The system show an error sign

4.A.2 The system stays on step 4

3. Use case 3

Use case name: Breast cancer prognosis

Identifier: BCP

Actor: User

Precondition: The user must have histopathology image of its breast cells.

Post condition: The user can know he is attacked by cancer or not.

Flow of events:

1) The user opens the program


2) Then it chooses diagnosis
3) Then it chooses Brest cancer
4) Then the user enters the histopathology image.
5) The user commands the system to analyze the given image.
6) The system does a preprocessing on the image.
7) The system gives a response that indicates there is a cancer on the given image
or not.
8) Use-case ends.

Alternative flow:

4.A. The system detects invalid image

4.A.1 The system returns to analyze another image

33 | P a g e
4. Use case 4

Use case name: Skin cancer prognosis

Identifier: SCP

Actor: User

Precondition: The user must have lesion image of its skin.

Post condition: The user can know he is attacked by melanoma cancer or not.

Flow of events:

1) The user opens the program


2) Then it chooses diagnosis
3) Then it chooses skin cancer
4) Then the user enters the lesion image.
5) The user commands the system to analyze the given image.
6) The system does a preprocessing on the image.
7) The system gives a response that indicates there is a cancer on the given image
or not.
8) Use-case ends.

Alternative flow:

4.A. The system detects invalid image

4.A.1 The system returns to analyze another image

3.4.1.2 Use Case Model


Use case diagram describe the functional behavior of the system as seen by the user. Use case
diagram for the proposed system are used to represent the basic functionalities of the system as
use cases focus on the behavior of the system from an external point of view.

34 | P a g e
Figure 10 Early cancer prediction and diagnosis system general use case diagram

35 | P a g e
Figure 11 Use case diagram for breast cancer prediction

Figure 12 Use case diagram for cervical cancer prediction

36 | P a g e
Figure 13 Use case diagram for breast cancer diagnosis

Figure 14 Use case diagram for skin cancer diagnosis

3.4.2 Dynamic Model


a) Sequence Diagram

Sequence diagrams describe patterns of communication among a set of interacting objects. An


object interacts with another object by sending messages. The reception of a message by an object
triggers the execution of an operation, which in turn may send messages to other objects. Sequence
diagram and collaboration are called interaction diagrams. A sequence diagram is an introduction
that empathizes the time ordering of messages. Sequence diagrams are used to formalize the
behavior of the system and to visualize the communication among objects. They are useful for
identifying additional objects that participate in the use cases. I call objects involved in a use case
participating object. A sequence diagram represents the interactions that take place among these
objects.

37 | P a g e
Figure 15 Sequence diagram for breast cancer prediction

38 | P a g e
Figure 16 Sequence diagram for cervical cancer prediction

Figure 17 Sequence diagram for Breast cancer diagnosis

39 | P a g e
Figure 18 Sequence diagram for skin cancer diagnosis

b) Activity Diagram

An activity diagram illustrates the dynamic nature of a system by modeling the flow of control
from activity to activity. An activity represents an operation on some class in the system that results
in a change in the state of the system. So, we are identifying the activity in terms of the functionality
of the system.

40 | P a g e
Figure 19 Activity diagram for breast cancer prediction

41 | P a g e
Figure 20 Activity diagram for cervical cancer predict

42 | P a g e
Figure 21 Activity diagram for skin cancer diagnosis

43 | P a g e
c) State Chart Diagram

A state machine is any device that stores the status of an object at a given time and can change
status or because other actions based on the input it receives. States refer to the different
combinations of information that an object can hold, not how the object behaves. We used state
machine diagram for describing the life cycle of objects with in the proposed system.

Figure 22 State chart diagram for validation of input text

Figure 23 State chart diagram for validation of input image

44 | P a g e
Figure 24 State chart diagram for prediction of cancer from given text input

Figure 25 State diagram for detection of histopathology image

45 | P a g e
Figure 26 State chart diagram for the detection of lesion image

3.4.3 Object Model


The object model, represented in UML with class diagrams, describes the structure of a system in
terms of objects, attributes, associations, and operation. The objects that participate in the system
as follows:

Table 8 list of objects and their attributes

Objects It’s attribute Type

User User name String

Image Image ID Int

Image size Int

Input analysis Data ID Int

Data Content String

46 | P a g e
Data type String

Brest cancer predictor Data ID Int

Data Content String

Data type String

Data Name String

Cervical cancer predictor Data ID Int

Data Content String

Data type String

Data Name String

Preprocessor Image ID Int

Image Size Int

Image Channel bool

Breast cancer diagnoses Image type PNG

Skin cancer diagnoses Image type PNG

Status Result String

3.4.4. Class Diagram


Class diagram describe the structure of the system in terms of class and object. Class are
abstraction that specify the attributes and behavior of a set of objects, whereas objects are entities
that encapsulate state and behavior.

47 | P a g e
Figure 27 Class diagram for the proposed system

48 | P a g e
3.4.5 User Interface Flow Diagram

49 | P a g e
CHAPTER 4
4. SYSTEM DESIGN
4.1 An Over View of the System Design
Where the SRS document is converted into a format that can be implemented and decides how the
system will operate. The complex activity of system development is divided into several smaller
sub-activities, which coordinate with each other to achieve the main objective of system
development.

As input we use statement of work, requirement determination plan, current situation analysis,
proposed system requirements including a conceptual data model, modified DFDs, and metadata.

In this phase, we identified the system design goal categorized as performance, dependability,
maintenance, end-user criteria.

4.2 Design Goals


4.2.1 Performance Criteria
 General system performance:

Our system response should be almost instantaneous for most use cases.

 Input/output Performance

Communication with sensors and actuators will be facilitated by specialized I/O


hardware. The data generation rate can be supported by the current setup; however,
cancer detection data could put a strain on the network throughput.

 Processor allocation

The Control and trained models will have dedicated servers for all of their operations.
The Visualization and User Interface subsystems' processes will be run on the user's
workstation. Since the Visualization subsystem is computation intensive, this will
impose minimum hardware requirements on the user machines. The Simulation and
Facility Management subsystems will operate on whatever machines are available to
them. There is no specific hardware requirement for these subsystems.

50 | P a g e
 The load time for user interface screens should not take more than three seconds.
 The model prediction process should not take more than five seconds

4.2.2 Dependability Criteria


 Support for a variety of distribution formats, so that our documentation is viewable on a
wide variety of platforms. Further, our system should be adaptable to newer distribution
formats as and when they become popular.
 Availability: our system deploys on server which allows to be appear all the time. Will
work as required when required during the period of a mission.
 Reliability: the system has probability that a product, system, or service will perform its
intended function adequately for a specified period of time, or will operate in
a defined environment without failure.
 Safety: the project will develop by a python language which has a good safety and security.

4.2.3 Maintenance criteria


Difficult to maintain source code is a big problem in software development today, but in this
system codes and their origins are organized in a proper and easy to debug way.

 The system supports error-handling mechanism display graphic approach by means of


message boxes.
 The GUI should be accompanied with help files that describe the usage of each GUI.
 Codes with the same programming languages should be kept near.
 Static files and code should have grouped in one working folder.
 Used templates should be kept on a folder called templates.
 Same task/functionality should have same code flow.
 Object-oriented concepts of programming should be implemented so that code is readable
and reusable.

4.2.4 End User Criteria


The new system developed are very easy to learn and understand.

51 | P a g e
4.3 System Design model
Systems design is the process of defining elements of a system like modules, architecture,
components and their interfaces and data for a system based on the specified requirements. It is
the process of defining, developing and designing systems which satisfies the specific needs and
requirements of a business or organization.

4.3.1 Proposed System General Architecture


The system architecture defines how pieces of the application interact with each other, and what
functionality each piece is responsible for performing. There are three main classes of application
architecture. They can be characterized by the number of layers between the user and the data. The
three types of application architecture are single-tier (or monolithic), two-tier, and n-tier, where n
can be three or more.

In a three-tier or a multi-tier architecture has client, server and model. Where the client request

is sent to the server and the server in turn sends the request to the model. The model sends

back the information/prediction required to the server which in turn sends it to the client. So our
system is three tier architecture. Figure 2[ shows the general structure of the system.

52 | P a g e
Figure 28 General structure of the system

Figure 29 Decomposition diagram of the system

53 | P a g e
4.3.2 Subsystem Decomposition
Subsystem decomposition is the process of dividing the system in to manageable subsystems from
the analysis model of the proposed system. The goal of the system decomposition is to reduce the
complexity of design model and to distribute the class of the system in to large scale and cohesive
components. The major subsystem identified “user” subsystem.

The decomposition shows the existence of the following subsystem: -

User subsystem

 Predict breast cancer


 Predict cervical cancer
 Diagnosis breast cancer
 Diagnosis skin cancer
 Use help and support system

Figure 30 Component diagram for sub decomposition of the system

54 | P a g e
4.3.3 Hardware/Software mapping
Early cancer detection and diagnosis will run over any operating system. The web server will run
over cloud server and the programming language used for developing this system is: Python. The
following deployment diagram illustrates the hardware/software mapping for system.

Figure 31 hardware/ Software mapping

4.4 User Interface Design


The goal of user interface design is to make the user's interaction as simple and efficient as
possible, in terms of accomplishing user goals what is often called user centered design.

55 | P a g e
Figure 32 User interface diagram

56 | P a g e
Figure 33 Home page (A) interface of the system

57 | P a g e
Figure 34 Home page (B) of the user interface

58 | P a g e
Figure 35 Home page (c) user interface

59 | P a g e
Figure 36 Breast cancer prognosis interface

60 | P a g e
Figure 37 Brest cancer diagnosis interface

61 | P a g e
Figure 38 Skin cancer diagnosis user interface

62 | P a g e
CHAPTER 5
5. EXPERIMENT
5.1 Introduction
A dataset is a structured collection of data generally associated with a unique body of work. For
developing the detection of cancer we used different multiple datasets from a dataset collection
web platform called kaggle and others. In the next section all datasets will be explained as dataset
for breast cancer prognosis and diagnosis., cervical cancer prognosis and skin cancer detection.

5.2 Dataset Preparation


5.2.1 Dataset description for Breast Cancer Prognosis
 Dataset description
 Title: Wisconsin Prognostic Breast Cancer (WPBC)
 Source Information

Creators: Dr. William H. Wolberg, General Surgery Dept., University of Wisconsin,


Clinical Sciences Center, Madison, WI 53792 [email protected], W.
Nick Street, Computer Sciences Dept., University of Wisconsin, 1210 West Dayton
St., Madison, WI 53706 [email protected] 608-262-6619, Olvi L. Mangasarian,
Computer Sciences Dept., University of Wisconsin, 1210 West Dayton St., Madison,
WI 53706 [email protected].

 Donor: Nick Street


 Date: December 1995
 Past Usage:

Various versions of this data have been used in the following publications:

W. N. Street, O. L. Mangasarian, and W.H. Wolberg. An inductive learning approach to


prognostic prediction. In A. Prieditis and S. Russell, editors, Proceedings of theTwelfth
International Conference on Machine Learning, pages522--530, San Francisco, 1995.
Morgan Kaufmann.

O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and prognosis
via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995.

63 | P a g e
W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. Computerized breast
cancer diagnosis and prognosis from fine needle aspirates. Archives of Surgery
1995;130:511-516.

 Results:

Two possible learning problems:

Predicting field 2, outcome: R = recurrent, N = nonrecurrent

 Relevant information

Each record represents follow-up data for one breast cancer case. These are consecutive
patients seen by Dr. Wolberg since 1984, and include only those cases exhibiting invasive
breast cancer and no evidence of distant metastases at the time of diagnosis.

The first 30 features are computed from a digitized image of afine needle aspirate (FNA)
of a breast mass. They describe characteristics of the cell nuclei present in the image.

 This database is also available through the UW CS ftp server:


 Number of instances: 198
 Number of attributes: 34 (ID, outcome, 32 real-valued input features)
 Attribute information
 ID number
 Outcome (R = recur, N = nonrecur)
 Time (recurrence time if field 2 = R, disease-free time if
o field 2 = N)
 4-33) Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)

b) texture (standard deviation of gray-scale values)

c) perimeter

d) area

64 | P a g e
e) smoothness (local variation in radius lengths)

f) compactness (perimeter^2 / area - 1.0)

g) concavity (severity of concave portions of the contour)

h) concave points (number of concave portions of the contour)

i) symmetry

j) fractal dimension ("coastline approximation" - 1)

 Values for features 4-33 are recoded with four significant digits.
 34) Tumor size - diameter of the excised tumor in centimeters
 35) Lymph node status - number of positive axillary lymph nodes
 observed at time of surgery
 Missing attribute values: Lymph node status is missing in 4 cases.
 Class distribution: 151 nonrecur, 47 recur

Table 9 Dataset description

Data Set Number of


Multivariate 198 Area: Life
Characteristics: Instances:

Attribute Number of 1995-12-


Real 34 Date Donated
Characteristics: Attributes: 01

Classification, Missing Number of


Associated Tasks: Yes 226692
Regression Values? Web Hits:

65 | P a g e
5.2.1.1 Sample Dataset for Breast Cancer Prognosis

Figure 39 Sample dataset of breast cancer prognosis model

5.2.2 Dataset description for Breast Cancer Diagnosis


Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. To assign an
aggressiveness grade to a whole mount sample, pathologists typically focus on the regions which
contain the IDC. As a result, one of the common pre-processing steps for automatic aggressiveness
grading is to delineate the exact regions of IDC inside of a whole mount slide.

The original dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens
scanned at 40x. From that, 277,524 patches of size 50 x 50 were extracted (198,738 IDC negative
and 78,786 IDC positive). Each patch’s file name is of the format: uxXyYclassC.png — > example
10253idx5x1351y1101class0.png . Where u is the patient ID (10253idx5), X is the x-coordinate of
where this patch was cropped from, Y is the y-coordinate of where this patch was cropped from,
and C indicates the class where 0 is non-IDC and 1 is IDC.

66 | P a g e
5.2.2.1 Sample Dataset for Breast Cancer Diagnosis

Figure 40 Breast cancer free histopathological image Figure 41Breast cancer detected histopathological image

5.2.3 Dataset description for Cervical Cancer Prognosis


This dataset focuses on the prediction of indicators/diagnosis of cervical cancer. The features cover
demographic information, habits, and historic medical records.

The dataset was collected at 'Hospital Universitario de Caracas' in Caracas, Venezuela. The dataset
comprises demographic information, habits, and historic medical records of 858 patients. Several
patients decided not to answer some of the questions because of privacy concerns (missing values).

 Source:

Kelwin Fernandes (kafc _at_ inesctec _dot_ pt) - INESC TEC & FEUP, Porto, Portugal.
Jaime S. Cardoso - INESC TEC & FEUP, Porto, Portugal.
Jessica Fernandes - Universidad Central de Venezuela, Caracas, Venezuela.

 Attribute Information:

(int) Age, (int) Number of sexual partners, (int) First sexual intercourse (age), (int) Num of
pregnancies, (bool) Smokes, (bool) Smokes (years), (bool) Smokes (packs/year), (bool) Hormonal
Contraceptives, (int) Hormonal Contraceptives (years), (bool) IUD, (int) IUD (years), (bool)
STDs, (int) STDs (number), (bool) STDs:condylomatosis, (bool) STDs:cervical condylomatosis,
(bool) STDs:vaginal condylomatosis, (bool) STDs:vulvo-perineal condylomatosis, (bool)
STDs:syphilis, (bool) STDs:pelvic inflammatory disease, (bool) STDs:genital herpes, (bool),
STDs:molluscum contagiosum, (bool) STDs:AIDS, (bool) STDs:HIV, (bool), STDs:Hepatitis B,
(bool) STDs:HPV, (int) STDs: Number of diagnosis, (int) STDs:, Time since first diagnosis, (int)

67 | P a g e
STDs: Time since last diagnosis, (bool) Dx:Cancer, (bool) Dx:CIN, (bool) Dx:HPV, (bool) Dx,
(bool) Hinselmann: target variable, (bool) Schiller: target variable, (bool) Cytology: target
variable, (bool) Biopsy: target variable

5.2.3.1 Sample Dataset for Cervical Cancer Prognosis

Figure 42 Sample dataset for cervical cancer prognosis

68 | P a g e
5.3 Implementation
This section basically highlights the issues dealt with the implementation phases. Implementation
is the phase where objectives of physical operations of the system turned into reality i.e. real
working model. In this phase the coding convention has made it possible as it’s the real phase of
objectivity to reality. Then the code is tested until most of the errors have been detected and
corrected. The goal of implementation is to introduce our system for the users in real sense that
how they use this new system which is developed for their intended objectives

5.3.1 Development Environment


A development environment is a collection of procedures and tools for developing, testing and
debugging an application or program. The development environment normally has three server
tiers, called development, staging and production. All three tiers together are usually referred to as
the DSP.

 Development Server: Here is where we test code and checks whether the application runs
successfully with that code. Once the application has been tested and the code is working
fine, the application then moves to the staging server.
 Staging Server: This environment is made to look exactly like the production server
environment. The application is tested on the staging server to check for reliability and to
make sure it does not fail on the actual production server. This type of testing on the staging
server is the final step before the application could be deployed on a production server. The
application needs to be approved in order to deploy it on the production server.
 Production Server: Once the approval is done, our application then becomes a part of this
server.

5.3.2 Programming Tool


 Interface: Flask- is a micro web framework written in Python, bootstrap-
 Language: python 3.6.x
 Documentation and design modeling
 UML design: Lucidchart – online application for UML diagrams (for drawing different
diagrams such as use case and class diagram).

69 | P a g e
 NN-SVG: (online) for drawing machine learning concepts like the architecture of CNN
and ANN.
 Hardware tool
 Computer: processor: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz, RAM, To the
minimum 8GB, Hard Disk: To the minimum 1TB, GPU: To the minimum 4GB.
 Server computer- minimum of 32GB RAM, 32GB GPU and 1.4TB HD.
 Flash Disk: to the minimum 4GB. Used for transferring a data from one computer to
the another.
 Paper (A3, A4): used to print the documentation and draw diagrams.
 Pencil: To draw diagrams.
 Printer: Used to print the documentation.
 Software tools
 Operating system: Linux (Ubuntu-18.0 LTS).
 PSPP: for data analysis.
 Browser: chedot, chrome.
 Microsoft Azure: For hosting.
 Visual studio code: for coding.
 Github: For version controlling and team work.
 Documentation tool:
 LibreOfficeWord: a free word processing software.
 LibreOfficePresentation: a free version for power point making.

Figure 43 Development environment process

70 | P a g e
5.3.2 The Prototype of the Project
prototyping involves the use of basic models or examples of the product being tested. For example,
the model might be incomplete and utilize just a few of the features that will be available in the
final design, or it might be constructed using materials not intended for the finished article. We
will show the prototype of this project using Jupiter notebook. (You can found the full prototype
code and result at appendix b)

Figure 44 Visualization of the training dataset

71 | P a g e
Figure 45 Model compiling

Figure 46 Prototype model predicting the outcome

72 | P a g e
5.4 The Results
In the project different models were constructed using CNN and ML concepts. Basically there are
4 models, a breast and cervical cancer prognosis/ prediction model and a diagnosis model for skin
and breast cancer. We will mention the results from these model as the following:

 Results from breast cancer prognosis

With 500 epochs, loss is 0.92 and accuracy is 0.93 or 93%.

Figure 47 Accuracy of breast cancer prognosis model

 Results from cervical cancer prognosis

Here we used 4 different models and got different accuracy.

Figure 48 Accuracy of cervical cancer prognosis model

 Results from breast cancer diagnosis

For model of diagnosing breast cancer villus is: 0.3326 and val_accuracy: 0.8579

73 | P a g e
CHAPTER 6
6. CONCLUSION AND RECOMMENDATION
6.1 Conclusion
Computer-assisted diagnosis for histopathology image can improve the accuracy and relieve the
burden for pathologists at the same time. In this project, we present a supervised learning
framework, CNN, for histopathology image segmentation using only image-level labels. CNN
automatically enriches supervision information from image-level to instance-level with high
quality and achieves comparable segmentation results with its fully supervised counterparts. More
importantly, the automatic labeling methodology may generalize to other supervised learning
studies for histopathology image analysis. In CNN, the obtained instance-level labels are directly
assigned to the corresponding pixels and used as masks in the segmentation task, which may result
in the over-labeling issue. The datasets are recorded from different data warehouse websites like
kaggle which provide a big data for machine learning, openCV, data mining and artificial
intelligence purposes. The process exhibited high performance on the binary classification of
breast cancer scoring around 0.93 (93%), i.e. determining whether benign tumor or malignant
tumor. Consequently, the statistical measures on the classification problem were also satisfactory.

The project is aimed at developing a portable software for detecting breast, cervical and skin
cancer. Using the models, this system, can use a histopathology image and predicts whether IDC
is presented or not and uses a laboratory reports in order to predict occurrence of a breast cancer.
Requirement analysis is performed to discover the needs of the new solution to the proposed
system. This phase consists of drawing out functional and non-functional requirements of the
system. In Literature review part we reviewed different research and surveys and make a review
survey. In analysis phase, the proposed and existing system is represented using UML diagrams
such as usecase diagram. In design system phase, proposed system general architecture and design
goals are deeply described. In experiment phase, the dataset and programming tools were
presented.

6.2 Recommendation
To further substantiate the results of this study, a CV technique such as k-fold cross validation
should be employed. The application of such a technique will not only provide a more accurate

74 | P a g e
measure of model prediction performance, but it will also assist in determining the most optimal
hyper-parameters for the ML algorithms.

According to the scope of the project, the team should develop a prognosis and diagnosis for breast,
skin and cervical cancer. But due to time constraint we may have limitations which should be
considered. We recommend being included the following functionalities:

 Prognosis system for skin cancer


 Diagnosis system for cervical cancer
 Direct connection with microscope lenses

75 | P a g e
REFERENCE
[1] What Is Cancer? (n.d.). Retrieved from https://www.cancer.gov/about-
cancer/understanding/what-is-cancer

[2] Cell (biology). (2019, December 25). Retrieved from


https://en.wikipedia.org/wiki/Cell_(biology).

[3] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T.

Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint

arXiv:1408.5093, 2014.

[4] American Cancer Society. (2017) Cancer Facts and Figures.

[5] Roger S. Pressman, Ph.D. Software Engineering A practitioner approach seven edition
Published by McGraw-Hill, a business unit of The McGraw-Hill Companies, Inc., 1221 Avenue
of the Americas, New York, NY 10020.

[6] https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Prognostic).

[7] Kelwin Fernandes, Jaime S. Cardoso, and Jessica Fernandes. 'Transfer Learning with Partial
Observability Applied to Cervical Cancer Screening.' Iberian Conference on Pattern Recognition
and Image Analysis. Springer International Publishing, 2017.

[8] R.C. González, R.E. Woods, S.L. Eddins Digital image processing using MATLAB
Pearson (2004).

[9] R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F.A. Wichmann, W. BrendelImagenet-


trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
International conference on learning representations (2019).

[10] A.I. Baba, C. Câtoi Tumor cell morphology Comparative oncology, The Publishing House of
the Romanian Academy (2007).

[11] K.M.e. a. Lowe A, M. Grunkin Mitos atypia grand challenge 2014


(2014) https://mitos-_atypia-_14.grand-_challenge.org/Dataset/.

i
[12] F. Xing, Y. Xie, L. Yang An automatic learning-based framework for robust nucleus
segmentation IEEE Trans Med Imaging, 35 (2) (2016), pp. 550-
566, 10.1109/TMI.2015.2481436

[13] A. Prasoon, K. Petersen, C. Igel, F. Lauze, E. Dam, M. Nielsen Deep feature learning for
knee cartilage segmentation using a triplanar convolutional neural network Medical
image computing and computer-assisted intervention - (MICCAI) 2013 - 16th international
conference, nagoya, Japan, september 22-26, 2013, proceedings, Part II (2013), pp. 246-
253, 10.1007/978-_3-_642-_40763-_5\_31.

[14] D.C. Ciresan, A. Giusti, L.M. Gambardella, J. Schmidhuber Deep neural networks segment
neuronal membranes in electron microscopy images

Advances in neural information processing systems 25: 26th annual conference on neural
information processing systems 2012. Proceedings of a meeting held december 3-6, 2012, lake
tahoe, Nevada, United States. (2012), pp. 2852-2860.

[15]A. CruzRoa, A. Basavanhally, F.A. González, H. Gilmore, M. Feldman, S. Ganesan, N. Shih


, J. Tomaszewski, A. Madabhushi Automatic detection of invasive ductal carcinoma in
whole slide images with convolutional neural networks

Medical imaging 2014: digital pathology, san diego, California, United States, 15-20 february
2014 (2014), p. 904103, 10.1117/12.2043872.

[16] A. Krizhevsky, I. Sutskever, G.E. HintonImagenet classification with deep


convolutional neural networks

Commun ACM, 60 (6) (2017), pp. 84-90, 10.1145/3065386

ii
APPENDIX
A.

A Review Survey On Histopathological Image


Classification for Breast Cancer Detection

Group-4
Students
Computer Science
Dilla University

Abstract- Breast cancer is the second leading cause aided diagnosis (CAD) of breast cancer utilizing
of cancer death among women. Breast cancer is not a histopathology image analysis is an effective means
single disease, but rather is comprised of many for cancer detection and diagnosis. Modern digital
different biological entities with distinct pathological pathology provides a ways to facilitate pathology
features and clinical implications. Pathologists face a
practice[2]–[4]. It lends itself to automated
substantial increase in workload and complexity of
digital pathology in cancer diagnosis due to the advent histopathological analysis which has been proven to
of personalized medicine, and diagnostic protocols be valuable in prognostic determination of various
have to focus equally on efficiency and accuracy. malignancies, including breast cancer [5].
Computerized image processing technology has been
shown to improve efficiency, accuracy and For many real-world problems, it is necessary to
consistency in histopathology evaluations, and can build extremely accurate and understandable
provide decision support to ensure diagnostic classification models. Especially in the medical
consistency. I compare some different techniques used domain, there is growing demand for Artificial
to binary classification for breast cancer detection. Intelligence (AI) approaches, which are not only
They are using Caffe [1] convolutional neural network,
well performing, but trustworthy, transparent,
convolutional neural network using transfer learning,
convolutional neural network using deep learning and
interpretable and explainable. This would allow
texture CNN. medical professionals to have possibilities to
understand how and why a machine learning
Keywords – feedforward neural nets; medical image algorithm arrives at its decision, which will enhance
trust of medical professionals in AI systems [6]. In
processing; Convolutional Neural Network; image
recent years, some machine learning models have
classification model; breast cancer; histopathology significantly improved the ability to predict the
future condition of a patient [7] [8]. Although these
images; Google Inception v3 model;
models are very accurate, the inability to explain the
predictions from accurate, complex models is a
INTRODUCTION serious limitation. For this reason, machine learning
methods employed in clinical applications avoid
According to the American Cancer Society, using complex, yet more accurate, models and
breast cancer is the second leading cause of cancer retreat to simpler interpretable models at the
death among women. Because the disease is so expense of accuracy [9].
deadly, its rapid diagnosis and treatment is a critical
problem having huge societal benefits. Computer Caffe (Convolutional Architecture for Fast
Feature Embedding) convolutional neural

iii
network is a deep learning framework, originally Amazon mechanical Turk. Regarding sample size,
developed at University of California, Berkeley. It histopathology tasks fit the bill of transfer learning.
is open source, under a BSD license.[10] It is written However major deep learning image datasets which
in C++, with a Python interface. are frequently used to generate initialization
weights
When practicing machine learning, training a such as ImageNet [15] or MIT Places [16] are based
model can take a long time. Creating a model on natural scene photography’s. They have very
architecture from scratch, training the model, and distinct image statistics compared to
then tweaking the model is a massive amount of histopathological stains. The H&E stain, for
time and effort. A far more efficient way to train a example, is characterized by pink tones for the
machine learning model is to use an architecture connecting tissue from the Eosin and blue-violet
that has already been defined, potentially with tones for the nuclei from the Hematoxylin
weights that have already been calculated. This is compounds, with a notable absence of any green or
the main idea behind transfer learning, taking a yellow color components present in natural images.
model that has already been used and repurposing it So, this paper contributes a comparison between
for a new task[11]. different methods or techniques for detection of
breast cancer.
Texture convolutional neural network (TCNN)
replaces handcrafted features based on Local Phase 4. DATA SETS
Quantization (LPQ) and Haralick descriptors (HD)
with the advantage of learning an appropriate In this paper all reviewed researches are used same
textural representation and the decision boundaries dataset from BreaKHis database composed of 7909
into a single optimization process [12]. microscopic biopsy images of benign and malignant
breast tumor acquired on 82 patients [5]. BreaKHis
This paper shows the review of them. is collected using different magnifying factors
2. EXISTING SYSTEM (40X, 100X, 200X, and 400X) and contains 2,480
benign and 5,429 malignant images. Table 1 shows
The existing detection of breast cancer has been the distribution of the dataset.
determined by specialists’ pathologic diagnosis that
is influenced by doctor’s experience and other TABLE I. Distribution of the dataset [5]
external factors. The pathologist takes a sample
from the patient and sees my magnifier like Magnification Benign Malignant Total
microscope in order to make a classification.
40x 625 1,370 1,995
Pathology examination requires time consuming 100x 644 1,437 2,081
scanning through tissue images under different
magnification levels to find clinical assessment 200x 623 1,390 2,013
clues to produce correct diagnoses. Pathologists 400x 588 1,232 1,820
face a substantial increase in workload and
complexity of digital pathology in cancer diagnosis Total 2,480 5,429 7,909
due to the advent of personalized medicine, and # Patients 24 58 82
diagnostic protocols have to focus equally on
efficiency and accuracy. Burden of pathologists is
larger, which can cause fault of
their diagnosis.

3. CONTRIBUTION OF THIS WORK

Remarkably, many research efforts in deep learning


for histopathology rely on neural networks trained
from scratch. This is in contrast to the fact that
transfer learning[13], [14]is known to improve
prediction performance drastically in cases of small
training sample sizes. Histopathology tasks are
typical domains of small sample size datasets owing
to the fact that annotations are left to experts rather
than crowd source annotation services such as

iv
Figure 1. Sample malignant histopathological images from
BreakHIs dataset.
5. RELATED WROK

Recent studies have leveraged machine learning


techniques in medical image analysis. Various
algorithms have achieved high performance in
nucleus segmentation and classification with breast
cancer images [17]. Spanhol et al. published a data
set, named as BreaKHis, for histopathological
classification of breast cancer and suggested a test
protocol by which the experiment obtained 80% to
85% accuracy using SVM, LBP (Local Binary
Pattern), and GLCM (Gray Level Co-occurrence
Matrix)[18]. Convolutional Neural Network(CNN)
is known to achieve high performance in image
recognition and natural language processing
through pattern analysis. CNN is a specific type of
neural network, which is a feed-forward neural
network with convolutional layer, pooling layers
and fully connected layers as its hidden layer. Due
to its outstanding performance, CNN is used widely
in many fields, especially in computer vision. And
below are specific reviewed researches. Figure 2 Schematic representation for classification (Single
1. Deep learning for magnification CNN)
independent breast cancer histopathology
image classification 2. A method for classifying medical images
using transfer learning: A pilot study on
In classification of histopathological images, the histopathology of breast cancer.
magnification of images is another issue in the use
of machine learning. This research proposed a Recent research using transfer learning have
model that can learn and predict the decision of obtained prominent results in image analysis.
disease regardless of different image Transfer learning is a method that trains a pre-
magnifications. This paper used a single task CNN) trained model, which is already learned in a specific
in order to perform classification as bening or domain, to another knowledge domain. Transfer
malignant. learning method is known to be very useful when
the data is not enough or training time and
computing resources are restricted. The above
research provides classifying medical images using
transfer learning. In this paper, they built deep
convolutional neural network (CNN, ConvNet)
model to classify breast cancer histopathological
images to malignant and benign class. In addition to
data augmentation, they applied transfer learning
technique to overcome the insufficient data and
training time.

v
24× 0.826 ± 0.043 0.874 ± 0.061 0.727 ± 0.161
48× 0.858 ± 0.039 0.920 ± 0.050 0.714 ± 0.095
72× 0.857 ± 0.051 0.919 ± 0.066 0.736 ± 0.109
Model Advantage Dis advantage
1× 0.851 ± 0.032 0.907 ± 0.074 0.735 ± 0.178

Single CNN
0.864 ± 0.045 0.918 ± 0.062 0.77 ± 0.098
TCNN TCNN

Small Large dataset 6×


Parameters 12× 0.871 ± 0.029 0.919 ± 0.040 0.782 ± 0.049
0.864 ± 0.026 0.907 ± 0.060 0.789 ± 0.077
|Task |CNN Incepti

Small Dataset Performance 24×


on

48× 0.862 ± 0.036 0.918 ± 0.044 0.778 ± 0.088


Magnification Difficult to 72× 0.874 ± 0.027 0.914 ± 0.043 0.803 ± 0.053
Single-

Independent handle
(Implement)
Table 2 Advantage and dis advantages

6. CONCLUSION
Figure 3 The architecture googles inception v3 model
In this paper we reviewed 3 different models which
3. Texture CNN for Histopathological Image are stated under 5 different researches. From those,
Classification. as we tabularized, TCNN Inception recorded a good
. performance compared to others in sensitivity and
Break HIs do not have the same shapes found in single-CNN in terms of Specificity. If we have a
large-scale image datasets that are commonly used small dataset it is better to use transfer learning
to train CNNs, such as ImageNet or CIFAR. because it is the pro side of it or if the
Therefore, instead of using pre-trained CNNs, matter/question is about magnification just single-
Texture CNN propose an architecture that is more task CNN is the answer.
suitable to capture the texture-like features present
in HIs. For such an aim, this research claims use of ACKNOWLEDGMENTS
an alternative architecture based on the texture
CNN proposed by Andrearczyk and Whelan. It Authors acknowledge the incubation center at
consists of only two convolutional layers university of Dilla for providing computer, office to
(Conv2D), an average pooling layer (AvgPool2D) work and GPU.
over the entire feature map also called global
average pooling, and fully connected layers REFERENCE
(Dense).
[1] Korean Breast Cancer Society, Breast Cancer Facts &
Figures 2016. Sourl : Korean Breast Cancer Society, 2016.
Table 1 Comparison of algorithms performances
Accuracy Sensitivity Specificity
Model DA Mean ± SD Mean ± SD Mean ± SD [2] M. Kowal, P. Filipczuk, A. Obuchowicz, J. Korbicz and R.
Monczak, "Computer-aided diagnosis of breast cancer based on
0.851 ± 0.045 0.915 ± 0.043 0.731 ± 0.093
TCNN

1× fine needle biopsy microscopic images", Computers in Biology


6× 0.828 ± 0.037 0.897 ± 0.035 0.684 ± 0.083 and Medicine, vol. 43, no. 10, pp. 1563-1572, 2013.
12× 0.839 ± 0.026 0.897 ± 0.025 0.720 ± 0.086
24× 0.829 ± 0.038 0.890 ± 0.043 0.689 ± 0.073 [3] Y. Zhang, B. Zhang, F. Coenen, J. Xiao and W. Lu, "One-
48× 0.834 ± 0.033 0.887 ± 0.042 0.704 ± 0.105 class kernel subspace ensemble for medical image
classification", EURASIP Journal on Advances in Signal
72× 0.833 ± 0.047 0.896 ± 0.044 0.700 ± 0.122 Processing, vol. 2014, no. 1, 2014.
0.844 ± 0.045 0.913 ± 0.041 0.709 ± 0.083
n
TCNN
Inceptio


6× 0.849 ± 0.038 0.932 ± 0.032 0.669 ± 0.137 [4] P. Wang, X. Hu, Y. Li, Q. Liu and X. Zhu, "Automatic cell
12× 0.837 ± 0.017 0.891 ± 0.044 0.704 ± 0.065 nuclei segmentation and classification of breast cancer

vi
histopathology images", Signal Processing, vol. 122, pp. 1-13, vision” In Proceedings of the IEEE Conference on Computer
2016. Vision and Pattern Recognition, pp. 2818-2826, 2016.

[5] F. Spanhol, L. Oliveira, C. Petitjean and L. Heutte, "A [13] K. Simonyan and A. Zisserman, “Very deep convolutional
Dataset for Breast Cancer Histopathological Image networks for large-scale image recognition”, arXiv preprint
Classification", IEEE Transactions on Biomedical Engineering, arXiv:1409.1556, 2014.
vol. 63, no. 7, pp. 1455-1462, 2016.
[15] J. de Matos, A. de Souza Britto, L. E. S. de Oliveira and A.
L. Koerich, "Texture CNN for Histopathological Image
[6] F. Spanhol, L. Oliveira, C. Petitjean and L. Heutte, “Breast Classification," 2019 IEEE 32nd International Symposium on
cancer histopathological image classification using Computer-Based Medical Systems (CBMS), Cordoba, Spain,
convolutional neural networks”, International Joint conference 2019, pp.580-583.
on Neural Networks (IJCNN), pp.2560-2567, 2016. doi: 10.1109/CBMS.2019.00120.

[7] N. Bayramoglu, J. Kannala, and J. Heikkila, “Deep learning [16] P. Sabol, P. Sinčák, K. Ogawa and P. Hartono, "Explainable
for magnifi- ¨cation independent breast cancer histopathology Classifier Supporting Decision-making for Breast Cancer
image classification”, in23rd International Conference on Diagnosis from Histopathological Images," 2019 International
Pattern Recognition, vol. 1, December2016. Joint Conference on Neural Networks (IJCNN), Budapest,
Hungary, 2019,pp.1-8.
doi: 10.1109/IJCNN.2019.8852070.
[8] B. Wei, Z. Han, X. He and Y. Yin, “Deep learning model
based breast cancer histopathological image classification”, In [17]S. Angara, M. Robinson and P. Guillén-Rondon,
Cloud Computing and Big Data Analysis (ICCCBDA), 2017 "Convolutional Neural Networks for Breast Cancer
IEEE 2nd International Conference on (pp. 348-353). IEEE. Histopathological Image Classification," 2018 4th International
Conference on Big Data and Information Analytics (BigDIA),
[9] H. Chen, Q. Dou, X. Wang, J. Qin and P. A. Heng, “Mitosis Houston, TX, USA, 2018, pp. 1-6.
detection in breast cancer histology images via deep cascaded doi: 10.1109/BigDIA.2018.8632800.
networks”, InThirtieth AAAI Conference on Artificial
Intelligence, pp. 1160-1166, 2016. [18]N. Bayramoglu, J. Kannala and J. Heikkilä, "Deep learning
for magnification independent breast cancer histopathology
[10] A. Esteva, B. Kuprel, R. Novoa, J. Ko, S. Swetter, H. Blau image classification," 2016 23rd International Conference on
and S. Thrun, "Dermatologist-level classification of skin cancer Pattern Recognition (ICPR), Cancun, 2016, pp. 2440-2445.
with deep neural networks", Nature, vol. 542, no. 7639, pp. doi: 10.1109/ICPR.2016.7900002.
115-118, 2017.

[11] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-


based learning applied to document recognition", Proceedings
of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.

[12] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z.


Wojna, “Rethinking the inception architecture for computer

vii
DECLARATION

This is to declare that the project entitled “Early cancer detection and diagnosis” is an original
work done by us, undersigned students in the Department of Computer Science, School of
Computing and Informatics, College of Engineering and Technology, Dilla University. The
reports are based on the project work done entirely by us and not copied from any other source.

Advisor

(MSc.) Kedir

_____________________

Department of Computer Science

Date

Name ID Signature

1. Natnael Abebe …………………. RCS/090/16 ________________


2. Mekuanint Dires…………...…… RCS/095/15 ________________
3. Eyuel Lemma………….….……. RCS/046/16 ________________
4. Ephrata Aber………………. ….. RCS/059/15 ________________
5. Nigus Redae ……….…………... RCS/092/16 _________________

viii

You might also like