Arc Model
Abstract—Early diagnosis of Breast Cancer is significantly machine (SVM), adaptive neuro-fuzzy inference system
important to treat the disease easily therefore it is necessary to (ANFIS), k nearest neighbor decision tree (DT) case-based
develop techniques that can help physicians to get accurate reasoning (CBR) and rough set theory (RST).
diagnosis. This study suggests a hybrid classification algorithm The most common method for breast cancer detection is
which is based upon Genetic Algorithm (GA) and k Nearest Mammography. The varying interpretation by the radiologists
neighbor algorithm (kNN). GA algorithm has been used for its
primary purpose as an optimization technique for kNN by
about the images that are obtained from mammography has
selecting best features as well as optimization of the k value, led to the use other methods. Fine Needle Aspiration Cytology
while the kNN is used for classification purpose. The planned (FNAC) is another method used for this purpose.
algorithm is tested by applying it on Wisconsin Breast Cancer The aim of this study is to help breast cancer physicians in
Dataset from UCI Repository of Machine Learning Databases early diagnose of the disease in patients with BC. As it is
using different datasets in which the first is Wisconsin Breast mainly diagnosed after appearance of most of the symptoms,
Cancer Database (WBCD) and the second one is Wisconsin even though the majority of women with BC at early stages
Diagnosis Breast Cancer (WDBC) which has changes in the are asymptomatic
number of attributes and number of instances. The proposed
algorithm was measured against different classifier algorithms
on the same database. The evaluation results of the algorithm II. RELATED WORK
proposed have achieved 99% accuracy.
A lot of research is being done on breast cancer diagnosis
Keywords— Breast Cancer Diagnosis; Classification algorithm;
using the (WBCD) and (WDBC) dataset. Many techniques
Genetic algorithm and k Nearest Neighbor algorithm. and methods are constantly developed achieve accurate and
efficient diagnosis results. In [4], a new system was proposed
I. INTRODUCTION for breast cancer classification. The new system uses a hybrid
Breast cancer (BC) is one of the major concerns nowadays of K-means and Support Vector Machine (SVM). BC
and is one of the most leading reasons of death among women diagnosis based on a K-NN algorithm with different distances
as it is highly prevalent cancer type after lungs cancer [1]. (Euclidean distance and Manhattan distance) had been
According to the report of the International Agency for proposed in [5]. Breast Cancer Detection with Reduced
Research on Cancer (IARC), the global burden of cancer has Feature Set in [6] have been discussed using various data
been increased to 14.1 million new cases along with 8.2 mining technique such as artificial neural network (ANN), k-
million deaths in 2012 [2]. Early detection of breast cancer Nearest neighbor (k-NN), radial basis function neural network
helps to decrease death and improve the quality of life among (RBFNN), and SVM which are utilized for diagnosis with
patients. Hence, early detection should have an accurate and feature reduction properties using Independent Component
reliable diagnosis in order to differentiate benign and Analysis (ICA) for reducing the one-dimensional feature
malignant tumour. Fortunately, the rapidly-developing vector that is involved in the computation of an independent
diagnostic techniques that make diagnosis more accurate and component (IC). In [7] new system was proposed using k
the treatment more effective have contributed to the Nearest neighbor algorithm (kNN) and Naïve Bayes with
significant reduction in the burden of the disease. Data mining imputation techniques which is used instead of removing the
is the tool for acquiring information in the form of huge values that are missing from the Mammographic Mass data.
amount of data. This technique is widely used nowadays in The system is evaluated using different performance criteria
health care industry [3]. Different data mining technique that such as accuracy, sensitivity, specificity and ROC analysis.
were discovered and developed include the artificial neural
network (ANN) as the commonest along with support vector
