Mining Big Data: Breast Cancer Prediction Using DT - SVM Hybrid Model
Mining Big Data: Breast Cancer Prediction Using DT - SVM Hybrid Model
Mining Big Data: Breast Cancer Prediction Using DT - SVM Hybrid Model
ISSN: 2395-3470
www.ijseas.com
418
International Journal of Scientific Engineering and Applied Science (IJSEAS) - Volume-1, Issue-5, August 2015
ISSN: 2395-3470
www.ijseas.com
419
International Journal of Scientific Engineering and Applied Science (IJSEAS) - Volume-1, Issue-5, August 2015
ISSN: 2395-3470
www.ijseas.com
Vikas Chaurasia et al. used RepTree, RBF The design is the structure of any scientific
Network and Simple Logistic to predict the work. It gives direction and systematizes
survivability for breast cancer patients. the research. Most scientists are interested in
getting reliable observations that can help the
420
International Journal of Scientific Engineering and Applied Science (IJSEAS) - Volume-1, Issue-5, August 2015
ISSN: 2395-3470
www.ijseas.com
DECISION
SUPPORT &
CONFIDENCE
SUPPORT &
CONFIDENCE
RESULT
TRANSFORMATION
421
International Journal of Scientific Engineering and Applied Science (IJSEAS) - Volume-1, Issue-5, August 2015
ISSN: 2395-3470
www.ijseas.com
SVM is also used as classifier for health care Information Gain: The information gain is
diagnosis. Interactive Dichotomiser 3 (ID3) based on the decrease in entropy after a
and C4.5 are the two very popular DT dataset is split on an attribute. Constructing
algorithms proposed by Quinlan [20]. ID3 a decision tree is all about finding attribute
uses Entropy and Information Gain to that returns the highest information gain
construct a decision tree. (i.e., the most homogeneous branches).
422
International Journal of Scientific Engineering and Applied Science (IJSEAS) - Volume-1, Issue-5, August 2015
ISSN: 2395-3470
www.ijseas.com
423
International Journal of Scientific Engineering and Applied Science (IJSEAS) - Volume-1, Issue-5, August 2015
ISSN: 2395-3470
www.ijseas.com
424
International Journal of Scientific Engineering and Applied Science (IJSEAS) - Volume-1, Issue-5, August 2015
ISSN: 2395-3470
www.ijseas.com
425
International Journal of Scientific Engineering and Applied Science (IJSEAS) - Volume-1, Issue-5, August 2015
ISSN: 2395-3470
www.ijseas.com
100%
can predict about presence or absence of
90%
Breast cancer disease. The path from root to 80%
leaf node shows if the patient has these 70%
Accuracy in %
combinations of cell values is high, and then 60%
ERROR RATE
CORRECTLY
CLASSIFIED
CLASSIFIED
ACCURACY
INSTANCE
INSTANCE
14
12
Error Rate In %
10
DT+SVM 91% 2.58 459 240 8
6
IBL 85.23% 12.63 184 515
4
SMO 72.56% 5.96 325 374 2
0
NAVE 89.48% 9.89 291 408 DT+SVM IBK SMO NAVE
ERRORRATE 2.58 12.63 5.96 9.89
The performance of a chosen (IBL, SMO
and Nave based) classifiers are performed
through weka tool and its validated based
on error rate and accuracy. The Figure 4.12: Error Rate of Classification
classification accuracy is predicted in terms Methods
of Sensitivity and Specificity. The 500
Number of Instance
100
From the above figures and table we find
that highest accuracy of Classification 0 DT+SVM IBK SMO NAVE
model is DT - SVM (91%), low error rate Correctly
Classified 459 184 325 291
(2.58%), correctly classified instance (459) Instance
and incorrectly classified instance (240) in
breast cancer data as shown in Figure 4.11, Figure 4.13: Correctly Classified Instance
4.12, 4.13 and 4.14. of Classification Methods
426
International Journal of Scientific Engineering and Applied Science (IJSEAS) - Volume-1, Issue-5, August 2015
ISSN: 2395-3470
www.ijseas.com
600
The experimental results given in the Figure
Number of Instance
500 4.10 and Table 4.1 showed the effectiveness
400 of the proposed algorithm. Overall, DT-
SVM classification accuracy is better than
300
other classifier algorithm. However, from a
200
relatively low error rate, the results show
100 that the DT-SVM will be the best prognosis
0
in clinical practice.
DT+SVM IBK SMO NAVE
InCorrectly
Classified 240 515 374 408 The optimum breast cancer disease
Instance
predictive model obtained in this study
adopts DT-SVM classification algorithm,
Figure 4.13: InCorrectly Classified Instance
this research may provide references for
of Classification Methods future research on selecting the optimal
predictive models to lower the incidence of
5 CONCLUSION breast cancer.
427
International Journal of Scientific Engineering and Applied Science (IJSEAS) - Volume-1, Issue-5, August 2015
ISSN: 2395-3470
www.ijseas.com
[6] Breast Cancer Wisconsin Data [online]. [16] Vapnik, V. N., The nature of statistical
Available: http://archive.ics.uci.edu/ml/machine- learning theory. Springer, Berlin, 1995.
learning-databases/breast-
cancerwisconsin/breast-cancer-wisconsin.data. [17] Weka: Data Mining Software in Java,
http://www.cs.waikato.ac.nz/ml/weka/
[7] Brenner, H., Long-term survival rates of
cancer patients achieved by the end of the 20th [18] Witten H.I., Frank E., Data Mining:
century: a period analysis. Lancet. 360:1131 Practical Machine Learning Tools and
1135, 2002. Techniques, Second edition, Morgan Kaufmann
Publishers, 2005.
[8] D. Delen, G. Walker and A. Kadam (2005),
Predicting breast cancer survivability: a [19] Y Rejani- Early detection of breast cancer
comparison of three data mining methods, using SVM. 2009 arxiv
Artificial Intelligence in Medicine, vol.34,
pp.113-127. [20] Ilias Maglogiannis, E Zafiropoulos An
intelligent system for automated breast cancer
[9] Ian H. Witten and Eibe Frank. Data Mining: diagnosis and prognosis using SVM based
Practical machine learning tools and techniques, classifiers Applied Intelligence, 2009
2nd Edition. San Fransisco:Morgan Kaufmann; Springer.
2005.
[21] Zhang Qinli; Wang Shitong; Guo Qi; A
[10] J. Han and M. Kamber, Data Mining Novel SVM and Its Application to Breast
Concepts and Technique (The Morgan Cancer Diagnosis
Kaufmann Series in Data Management http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arn
Systems), 2nd ed. San Mateo, CA: Morgan umber=4272649.
Kaufmann, 2006.
[22] Quinlan.J.R. (1993).C4.5:Programs for
[11] J. R. Quinlan, C4.5: Programs for Machine machine learning (1st edition), San Francisco,
Learning. San Mateo, CA:Morgan Kaufmann; Morgan Kaufmann Publishers,1993.
1993.
[23] Pendharkar PC, Rodger JA, Yaverbaum GJ,
[12] Mitchell, T. M., Machine Learning, Herman N, Benner M (1999) Association,
McGraw-Hill Science/Engineering/Math, 1997 statistical, mathematical and neural approaches
for mining breast cancer patterns. Expert
[13] P.-N. Tan, M. Steinbach, and V. Kumar, Systems with Applications 17: 223-232.
Introduction to Data Mining. Reading, MA:
Addison-Wesley, 2005. [24] Zhou ZH, Jiang Y (2003) Medical
diagnosis with C4.5 Rule preceded by artificial
[14] Razavi, A. R., Gill, H., Ahlfeldt, H., and neural network ensemble. IEEE Trans Inf
Shahsavar, N., Predicting metastasis in breast Technol Biomed 7: 37-42.
cancer: comparing a decision tree with domain
experts. J. Med. Syst. 31:263273, 2007. [25] Lundin M, Lundin J, Burke HB, Toikkanen
S, Pylkkanen L, et al. (1999) Artificial neural
[15] S.B.Kotsiantis and networks applied to survival prediction in breast
P.E.Pintelas,Combining Bagging and cancer. Oncology 57: 281-286.
Boosting, International Journal of Information
[26] Shweta Kharya, Using data mining
and Mathematical Sciences, 1:4 2005.
techniques for diagnosis and prognosis of
Cancer Disease, International Journal of
Computer Science, Engineering and Information
428
International Journal of Scientific Engineering and Applied Science (IJSEAS) - Volume-1, Issue-5, August 2015
ISSN: 2395-3470
www.ijseas.com
429