Arc Model
Arc Model
Arc Model
Abstract—Early diagnosis of Breast Cancer is significantly machine (SVM), adaptive neuro-fuzzy inference system
important to treat the disease easily therefore it is necessary to (ANFIS), k nearest neighbor decision tree (DT) case-based
develop techniques that can help physicians to get accurate reasoning (CBR) and rough set theory (RST).
diagnosis. This study suggests a hybrid classification algorithm The most common method for breast cancer detection is
which is based upon Genetic Algorithm (GA) and k Nearest Mammography. The varying interpretation by the radiologists
neighbor algorithm (kNN). GA algorithm has been used for its
primary purpose as an optimization technique for kNN by
about the images that are obtained from mammography has
selecting best features as well as optimization of the k value, led to the use other methods. Fine Needle Aspiration Cytology
while the kNN is used for classification purpose. The planned (FNAC) is another method used for this purpose.
algorithm is tested by applying it on Wisconsin Breast Cancer The aim of this study is to help breast cancer physicians in
Dataset from UCI Repository of Machine Learning Databases early diagnose of the disease in patients with BC. As it is
using different datasets in which the first is Wisconsin Breast mainly diagnosed after appearance of most of the symptoms,
Cancer Database (WBCD) and the second one is Wisconsin even though the majority of women with BC at early stages
Diagnosis Breast Cancer (WDBC) which has changes in the are asymptomatic
number of attributes and number of instances. The proposed
algorithm was measured against different classifier algorithms
on the same database. The evaluation results of the algorithm II. RELATED WORK
proposed have achieved 99% accuracy.
A lot of research is being done on breast cancer diagnosis
Keywords— Breast Cancer Diagnosis; Classification algorithm;
using the (WBCD) and (WDBC) dataset. Many techniques
Genetic algorithm and k Nearest Neighbor algorithm. and methods are constantly developed achieve accurate and
efficient diagnosis results. In [4], a new system was proposed
I. INTRODUCTION for breast cancer classification. The new system uses a hybrid
Breast cancer (BC) is one of the major concerns nowadays of K-means and Support Vector Machine (SVM). BC
and is one of the most leading reasons of death among women diagnosis based on a K-NN algorithm with different distances
as it is highly prevalent cancer type after lungs cancer [1]. (Euclidean distance and Manhattan distance) had been
According to the report of the International Agency for proposed in [5]. Breast Cancer Detection with Reduced
Research on Cancer (IARC), the global burden of cancer has Feature Set in [6] have been discussed using various data
been increased to 14.1 million new cases along with 8.2 mining technique such as artificial neural network (ANN), k-
million deaths in 2012 [2]. Early detection of breast cancer Nearest neighbor (k-NN), radial basis function neural network
helps to decrease death and improve the quality of life among (RBFNN), and SVM which are utilized for diagnosis with
patients. Hence, early detection should have an accurate and feature reduction properties using Independent Component
reliable diagnosis in order to differentiate benign and Analysis (ICA) for reducing the one-dimensional feature
malignant tumour. Fortunately, the rapidly-developing vector that is involved in the computation of an independent
diagnostic techniques that make diagnosis more accurate and component (IC). In [7] new system was proposed using k
the treatment more effective have contributed to the Nearest neighbor algorithm (kNN) and Naïve Bayes with
significant reduction in the burden of the disease. Data mining imputation techniques which is used instead of removing the
is the tool for acquiring information in the form of huge values that are missing from the Mammographic Mass data.
amount of data. This technique is widely used nowadays in The system is evaluated using different performance criteria
health care industry [3]. Different data mining technique that such as accuracy, sensitivity, specificity and ROC analysis.
were discovered and developed include the artificial neural
network (ANN) as the commonest along with support vector
This research is supported by research grant RG312-14AFR from University
of Malaya, Malaysia.
[2] D. Max Parkin, Paola Pisani, and J Ferlay. Global cancer statistics. CA:
A cancer journal for clinicians, 49(1):33-64,1999.
[7] C. Güzel, M. Kaya, Oktay Yıldız, Hasan Şakir Bilge, Breast Cancer [19] K. Shaker, and S. Abdullah. 2010. Controlling Multi Algorithms Using
Diagnosis Based on Naïve Bayes Machine Learning Classifier with KNN Round Robin for University Course Timetabling Problem. Database Theory
Missing Data Imputation. and Application, Bio-Science and Bio- Technology (DTA-2010), Lecture
Notes in Computer Sciences, Springer, PP 47-55.
[8] L. Xiaoyong, and Hui Fu1, PSO-Based Support Vector Machine with
[20] J. Han and M. Kamber,”Data Mining Concepts and Techniques”,Morgan
Cuckoo Search Technique for Clinical Disease Diagnoses, Hindawi
Kauffman Publishers, 2000.
Publishing Corporation Scientific World Journal,Volume 2014, Article ID
548483, 1-7.. [21] P. Cabena, Hadjinian, P., Stadler, R., Verhees, J. and Zanasi, A. (1998).
Discovering Data Mining: From Concept to Implementation, Upper Saddle
[9] S.Soliman, E. ElHamd, Classification of Breast Cancer using Differential River, N.
Evolution and Least Squares Support Vector Machine, International Journal
of Emerging Trends & Technology in Computer Science (IJETTCS), Volume
3, Issue 2, March – April 2014
[11] M. Akay, Support vector machines combined with feature selection for
breast cancer diagnosis, Expert Systems with Applications 36 (2009) 3240–
3247.