Ijet V7i2 8 10557
Ijet V7i2 8 10557
Ijet V7i2 8 10557
8) (2018) 684-687
Research Paper
Abstract
Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few
decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable,
accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have
been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have
been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related
diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models
based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision
Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.
Keywords: Cardiovascular Diseases; Support Vector Machines; K- Nearest Neighbour; Naïve Bayes; Decision Tree; Random Forest; Ensemble Models.
Copyright © 2018 Authors. This is an open access article distributed under the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
International Journal of Engineering & Technology 685
D. Decision Tree
Decision tree is a of supervised learning algorithm.This technique
is mostly used in classification problems. It performs effortlessly
withcontinuous and categorical attributes. This algorithm
dividesthe population into two or more similar sets based on the
most significantpredictors.Decision Treealgorithm, first calculates
the entropy of each and every attribute. Then the dataset is split
with the help of thevariables or predictors with maximum
information gain or minimum entropy. These two steps are
In [7], Naive Bayes has achieved an accuracy of 84.1584% with performed recursively with the remaining attributes.
the 10 most significant features which are selected using SVM-
RFE (Recursive Feature Elimination) and gain ratio algorithms
whereas in[8],Naive Bayes has achieved an accuracy of 83.49%
when all 13 attributes of the Cleveland dataset[25] are used.
E. Random Forest diseases but still there is a lot scope of research to be done on how
Random Forest is also a popularly supervised machine learning to handle high dimensional data and overfitting. A lot of research
algorithm.This technique can be used for both regression and can also be done on the correct ensemble of algorithms to use for a
classification tasks but generally performs better in classification particular type of data.
tasks. As the name suggests, Random Forest technique considers
multiple decision trees before giving an output. So, it is basically 5. Acknowledgment
an ensemble of decision trees. This technique is based on the
belief that more number of trees would converge to the right
We sincerely thank the staff of SRM Institute of Science and
decision. For classification, it uses a voting system and then
Technology, that have provided their immense support and
decides the class whereas in regression it takes the mean of all the
guidance throughout the project.
outputs of each of the decision trees. It works well with large
datasets with high dimensionality.
Reference
[1] Ramadoss and Shah B et al.“A. Responding to the threat of
chronic diseases in India”. Lancet. 2005; 366:1744–1749. doi:
10.1016/S0140-6736(05)67343-6.
[2] Global Atlas on Cardiovascular Disease Prevention and Control.
Geneva, Switzerland: World Health Organization, 2011
[3] Dhomse Kanchan B and Mahale Kishor M. et al. “Study of
Machine Learning Algorithms for Special Disease Prediction
using Principal of Component Analysis”, 2016 International
Conference on Global Trends in Signal Processing, Information
Computing and Communication.
[4] R.Kavitha and E.Kannan et al. “An Efficient Framework for
Heart Disease Classification using Feature Extraction and Feature
Selection Technique in Data Mining “, 2016
[5] Shan Xu ,Tiangang Zhu, Zhen Zang, Daoxian Wang, Junfeng Hu
and Xiaohui Duan et al. “Cardiovascular Risk Prediction Method
Based on CFS Subset Evaluation and Random Forest
Classification Framework”, 2017 IEEE 2nd International
Conference on Big Data Analysis.
[6] Manpreet Singh, Levi Monteiro Martins, Patrick Joanis and Vijay
Fig. 3: Random Forest K. Mago et al. “ Building a Cardiovascular Disease Predictive
Model using Structural Equation Model & Fuzzy Cognitive Map”,
In [5], random forest performs exceptionally well. In Cleveland 978-1-5090-0626-7/16/$31.00 c 2016 IEEE.
dataset, random forest has a significantly higher accuracy of [7] Kanika Pahwa and Ravinder Kumar et al. “Prediction of Heart
91.6% than all the other methods. In People's Hospital dataset, it Disease Using Hybrid Technique For Selecting Features”, 2017
achieves an accuracy of 97%. In [20] random forest has achieved 4th IEEE Uttar Pradesh Section International Conference on
an f-measure of 0.86. In [21], random forest is used to predict Electrical, Computer and Electronics (UPCON).
[8] Seyedamin Pouriyeh, Sara Vahid, Giovanna Sannino,
coronary heart disease and it obtains an accuracy of 97.7%. Giuseppe De Pietro, Hamid Arabnia, Juan Gutierrez et al. “ A
Comprehensive Investigation and Comparison of Machine
F. Ensemble Model Learning Techniques in the Domain of Heart Disease”, 22nd IEEE
In ensemble modeling two or more related but different analytical Symposium on Computers and Communication (ISCC 2017):
models are used and produce their results are combined into a Workshops - ICTS4eHealth 2017
single score. [9] Hanen Bouali and Jalel Akaichi et al. “Comparative study of
TahiraMahboob et al. [22] have used an ensemble of SVM, KNN Different classification techniques, heart Diseases use Case.”,
2014 13th International Conference on Machine Learning and
and ANN to achieve an accuracy of 94.12%. The Majority vote- Applications
based model as demonstrated by Saba Bashir et al. [23] which [10] Seyedamin Pouriyeh, Sara Vahid, Giovanna Sannino, Giuseppe
comprises of Naïve Bayes, Decision Tree and Support Vector De Pietro, Hamid Arabnia, Juan Gutierrez et al. “ A
Machine classifiers, gave an accuracy of 82%, sensitivity of 74% Comprehensive Investigation and Comparison of Machine
and specificity of 93% for UCI heart disease dataset. In [24] an Learning Techniques in the Domain of Heart Disease”, 22nd IEEE
ensemble model, consisting ofGini Index,SVMand Naïve Bayes Symposium on Computers and Communication (ISCC 2017):
classifiers, has been proposed which gave an accuracy of 98% in Workshops - ICTS4eHealth 2017
predicting Syncope disease. [11] Houda Mezrigui, Foued Theljani and Kaouther Laabidi et al.
“Decision Support System for Medical Diagnosis Using a Kernel-
Based Approach”, ICCAD’17, Hammamet - Tunisia, January 19-
4. Conclusion 21, 2017.
[12] Dr.(Mrs).D.Pugazhenthi, Quaid-E-Millath and Meenakshi et al.
“Detection Of Ischemic Heart Diseases From Medical Images “
Based on the above review, it can be concluded that there is a 2016 International Conference on Micro-Electronics and
huge scope for machine learning algorithms in predicting Telecommunication Engineering.
cardiovascular diseases or heart related diseases. Each of the [13] J. Hodges et al. “Discriminatory analysis, nonparametric
above-mentioned algorithms have performed extremely well in discrimination: Consistency properties,” 1981.
some cases but poorly in some other cases. Alternating decision [14] S.Rajathi and Dr.G.Radhamani et al. “Prediction and Analysis of
trees when used with PCA, have performed extremely well but Rheumatic Heart Disease using kNN Classification with ACO “,
decision trees have performed very poorly in some other cases 2016.
[15] Puneet Bansal and Ridhi Saini et al. “Classification of heart
which could be due to overfitting. Random Forest and Ensemble
diseases from ECG signals using wavelet transform and kNN
models have performed very well because they solve the problem classifier”, International Conference on Computing,
of overfitting by employing multiple algorithms (multiple Communication and Automation (ICCCA2015).
Decision Trees in case of Random Forest). Models based on Naïve [16] Simge EKIZ and Pakize Erdogmus et al. “Comparitive Study of
Bayes classifier were computationally very fast and have also heart Disease Classification”, 978-1-5386-0440-3/17/$31.00
performed well.SVM performed extremely well for most of the ©2017 IEEE.
cases. Systems based on machine learning algorithms and [17] Renu Chauhan, Pinki Bajaj, Kavita Choudhary and Yogita Gigras
techniques have been very accurate in predicting the heart related et al. “Framework to Predict Health Diseases Using Attribute
International Journal of Engineering & Technology 687