August 2016 1474373778 26
August 2016 1474373778 26
August 2016 1474373778 26
Engineering
An Efficient Cardiovascular Disease
KEYWORDS: Data mining, ACO, mining
prediction through Pheromone Based ACO algorithm, cardiovascular diseases (CVDs), Fuzzy
with Hybrid Fuzzy logic logic
Dr.M.Nithya Head, Dept of CSE, VMKVEC, Vinayaka Mission University, Tamilnadu, India
ABSTRACT World Heart Day is celebrated every year on 29th September with the intent of raising awareness about
cardiovascular diseases. As per a World Health Organization (WHO) report, cardiovascular diseases (CVDs) would be the largest cause of
death and disability in India by 2020. Changing lifestyle pattern, coupled with lack of physical exercise has put 74 per cent of urban Indians at
risk of suffering from cardiovascular diseases (CVD), a recent study has revealed. Heart diseases have emerged as the number one killer for
Indian women, they account for 15 % of the global burden of heart disease which kills about 15 million people every year. Currently, the key
challenges that are faced by cardiac care in India are lack of awareness of non-communicable diseases by the people, inadequate facilities,
accessibility etc. is research work deals with improving the accuracy of Heart Disease Prediction using Pheromone Based ACO with Hybrid
Fuzzy logic. e overall objective of this research work is to design and implement ACO based data mining algorithm for Heart Disease
Prediction. e proposed technique is designed and implemented in the MATLAB tool. e comparison among some well known technique
has also been drawn by considering various performance metrics. We have evaluated our new classification approach using the well-known
medical Cleveland, Abalone, Iris, Cardio to Cography, dataset as input. Results indicate that the proposed method can predict the
Cardiovascular Disease with an acceptable accuracy. In addition, the extracted fuzzy rules have significant interpretability either.
I.INTRODUCTION will not trust enough the discovered knowledge to use it for decision
Data mining is the process of extracting the useful information from making. is can lead to wrong decisions [3] .
large amount of data. Data mining software is one of a number of
analytical tools for analyzing data. Technically, data mining is the Cardiovascular Disease:
process of finding correlations or patterns among dozens of fields in Facts:
large relational databases. e overall goal of the data mining process It's no longer true that only those in their 50s and 60s can have
is to extract information from a data set and transform it into an cardiovascular problems. Even those in their 30s are coming with
understandable structure for further use. Data mining is the search such problems these days.
for the relationships and global patterns that exist in large databases
but are hidden among vast amounts of data, such as the relationship Causes:
between patient data and their medical diagnosis. is relationship Ÿ lifestyle habits like junk food
represents valuable knowledge about the database and the objects in Ÿ alcohol consumption
the database, if the database is a faithful mirror of the real world Ÿ smoking
registered by the database. ACO based Mining e Knowledge Ÿ mental stress
Discovery in Databases (KDD) field of data mining is concerned with Ÿ lack of physical work
the development of methods, techniques and algorithm which can
make sense of the available data. Knowledge Discovery in Database Implication on Indian Health System:
is useful in finding trends, patterns and anomalies in the databases (i) to provide information and an enabling environment for
which is helpful to make accurate decisions for the future. increasing awareness and adoption of health living habits by the
community;
Recently, Ant Colony Optimization (ACO) algorithm has been
applied to the data mining field to extract rule based classifier [8] (ii) early detection of persons with risk factors and cost-effective
[2].Ant Colony Optimization (ACO) is a metaheuristic for solving interventions for reducing risk; and
hard combinatorial optimization problems [6]. e inspiring source
of ACO is the pheromone trail laying and following behavior of real (iii) early detection of persons with clinical disease and cost-effective
ants, which use pheromones as a communication medium. secondary prevention measures to prevent complications.
In analogy to the biological example, ACO is based on indirect II. LITERATURE REVIEW
communication within a colony of simple agents, called (artificial) Yumin Chen et al. [1] proposed a new rough set approach to feature
ants, mediated by (artificial) pheromone trails [7] [9]. e selection based on Ant Colony Optimization (ACO), which can adopt
pheromone trails in ACO serve as a distributed, numerical mutual information based feature significance as heuristic
information, which the ants use to probabilistically construct information. e paper also proposed a feature selection algorithm.
solutions to the problem being solved and which the ants adapt is research approach started from the feature core, which changed
during the algorithm's execution to reflect their search experience [4] the complete graph to a smaller one. To verify the efficiency of this
[11]. In essence, the goal of data mining is to extract knowledge from algorithm, experiments are carried out on some standard UCI
data. Data mining is an inter-disciplinary field, whose core is at the datasets. e results demonstrated that this algorithm could provide
intersection of machine learning, statistics and databases [7] [1]. We efficient solution to find a minimal subset of the features.
emphasize that in data mining – unlike for example in classical
statistics – the goal is to discover knowledge that is not only accurate Pablo Loyola et al. [2] proposed an ant colony optimization-based
but also comprehensible for the user [1] [8]. Comprehensibility is algorithm to predict web usage patterns. is methodology has
important whenever discovered knowledge will be used for incorporated multiple data sources, such as web content and
supporting a decision made by a human user [3] [1]. After all, if structure, as well as web usage. e proposed model is based on a
discovered knowledge is not comprehensible for the user, he/she will continuous learning strategy based on previous usage in which
not be able to interpret and validate it. In this case, probably the user artificial ants try to fit their sessions with real usage through the
IJSR - INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH 1
Volume : 5 | Issue : 8 | Special Issue August-2016 • ISSN No 2277 - 8179 Research Paper
modification of a text preference vector. Subsequently, trained ants deployed to evaluate the spatial risk pattern rule sets to its
are released on to a new web graph and the new artificial sessions are optimization on search phase in quick successions. e
compared with real sessions previously captured via web log experimental results on a geographical traffic (trend layer) spatial
processing. e main results of this work are related to an effective database show that their method has higher efficiency in
prediction of the aggregated patterns of real usage, reaching performance of the discovery process and in the quality of trend
approximately 80%. is approach also allowed the obtaining of a patterns discovered compared to other existing approaches using
quantitative representation of the keywords that in fluence then non-intelligent decision tree heuristics. e proposed results provide
navigational sessions. spatial decision trees for traffic risk patterns with optimized route
structure with the ant agents.
Md. MonirulKabir et al. [3] proposed a new hybrid ant colony
optimization (ACO) algorithm for feature selection (FS), called Mohammad Saniee et al. [8] have presented an ACO to extract a set of
ACOFS, using a neural network. A key aspect of this algorithm is the rules for diagnosis of diabetes disease. e new presented algorithm
selection of a subset of the salient features of reduced size. ACOFS uses ACO to extract fuzzy If-en rules for diagnosis of diabetes
uses a hybrid search technique that combined the advantages of disease, they call it FADD. ey have evaluated their new
wrapper and filter approaches. In order to facilitate such a hybrid classification system via Pima Indian Diabetes data set. Results show
search, this research paper designed a new set of rules for FADD can detect the diabetes disease with an acceptable accuracy
pheromone update and heuristic information measurement. On the and competitive or even better than the results achieved by previous
other hand, the ants are guided in correct directions while works.
constructing graph (subset) paths using a bounded scheme in each
and every step in the algorithm. e above combinations ultimately Yu-Min Chiang et al. [9]ey utilizes some machine learning
not only provide an effective balance between exploration and techniques and a meta-heuristic approach to classify cancers using
exploitation of ants in the search, but also intensify the global search microarray data. In the study, the ant colony optimization algorithm
capability of ACO for a high quality solution in FS. Extensive is introduced to select genes relevant to cancers, then the MLP and
experiments were conducted to ascertain how AOCFS works on FS SVM classifiers are used for cancer classification. Experimental
tasks. e comparison details show that ACOFS has a remarkable results show that selecting genes by using ACO can improve the
ability to generate reduced-size subsets of the salient features while accuracy of MLP and SVM classifiers. Besides, the optimal number of
yielding significant classification accuracy. genes selected for cancer classification should be set according to the
microarray dataset and gene selection methods. Although the
Kashef et al. [4] have proposed a novel feature selection algorithm proposed ACO gene selection algorithm improves the classification
based on Ant Colony Optimization (ACO), called Advanced Binary accuracy of MLP and SVM classifiers .
ACO (ABACO), is presented. e performance of proposed algorithm
is compared to the performance of Binary Genetic Algorithm (BGA), Shyi-Ching Lian et al.[10] Have proposed an extension of Ant-Miner
Binary Particle Swarm Optimization(BPSO),Catfish BPSO, Improved is proposed to incorporate the concept of parallel processing and
Binary Gravitational Search Algorithm(IBGSA),and some prominent grouping. . Due to the algorithm design, Ant-Miner made a slight
ACO- based algorithms on the task of feature selection on 12 well- modification in this part which removes the parallel searching
known UCI datasets. Simulation results verify that the algorithm capability. Based on Ant-Miner, they propose an extension that
provides a suitable feature subset with good classification accuracy modifies the algorithm design to incorporate parallel processing. For
using a smaller feature set than competing feature selection solving the classification rule problem, they design an algorithm with
methods. they present a new feature selection technique based on the concept of multi-level rule choosing mechanism in order to get
Ant Colony Optimization(ACO)by combining two models of ACO. more accuracy of rule induced. Furthermore, they provide a possible
e proposed algorithm has a strong search capability in the problem direction for researches toward the classification rule problem
space and can effectively find the minimal feature subset.
SurendraKumar, and C. S. P. Rao et al. [11] proposed a novel use of
Khalid M. Salama et al.[5] have explored the use of various data mining algorithms for the extraction of knowledge from a large
classification quality measures for evaluating the BAN classifiers set of flow shop schedules. e purposes of this work is to apply data
constructed by the ants. e aim of this investigation is to discover mining methodologies to explore the patterns in data generated by
how the use of different evaluation measures affects the quality of the an ant colony algorithm performing a scheduling operation and to
output classifier in terms of predictive accuracy. In their experi- develop a rule set scheduler which approximates the ant colony
ments, they use 6 different classification measures on 25 benchmark algorithm's scheduler. Ant colony optimization (ACO) is a paradigm
datasets. they found that the hypothesis that different measures for designing Meta heuristic algorithms for combinatorial
produce different results is acceptable according to the Friedman's optimization problems. e ant algorithm is simple to implement
statistical test. ey explored the effect of using 6 different and results of the case studies show its ability to provide speedy and
classification quality measures for evaluating the candidate BN accurate solutions. Further, this research has employed the genetic
classifiers constructed by the ants and updating pheromone during algorithm operators such as crossover and mutation to generate the
the training phase of ABC-Miner. new regions of the solution.
Wen xiong et al. [6] have proposed a novel hybrid clustering III. PROPOSED APPROACH
approach, which uses adaptive ant colony optimization(ACO) to ACO can be used for clustering and classification in the field of data
optimize the partition of data set, and utilizes enhanced particle mining. In our proposed method, ACO will be used for pattern
swarm optimization (PSO) to refine the result of the adaptive ACO. classification in which classification is based on fuzzy rules. e
Experiments displayed that the approach obtains smaller clustering fuzzy rules are used to control the influence of pheromone values in
evaluations on three data sets of University of California Irvine (UCI) ACO. We will describe a technique for update the pheromone rule
and competitive results on two data sets of UCI, which verifying its that improves the quality of each rule. Because for each rule, the value
availability. of pheromone that increased in each iteration depend on the quality
of modifications that ants will be done. With this new update
Mr. K. Ravikumar et al. [7] have described a model that develops an pheromone function ants in order to improve the quality of rule,
ant colony algorithm for the discovery of spatial trend patterns found make better decisions in next iterations. e implementation is done
in a GIS traffic risk analysis database. e proposed ant colony based in MATLAB with various mining datasets.e overall objective of this
spatial data mining algorithm applies the emergent intelligent research work is to design and implement ACO based data mining
behavior of ant colonies to handle the huge search space encoun- algorithm for Cardiovascular Disease prediction
tered in the discovery of this knowledge. Genetic algorithm is
IV. RESEARCH METHODOLOGY where hij is a problem-dependent heuristic value for term ij, tij is the
Initially, the input medical Cleveland, Abalone, Iris, Cardioto- amount of pheromone presently accessible (at time t) on the
cography dataset is given to our proposed work. Based on these connection among attribute i and value I is the set of characteristics
dataset we will generate the rules for fuzzy rules generation. By that are not so far applied by the ant in the domain of attribute i.
generating these fuzzy rules, the process repeatedly generates the
same rules. To overcome this problem, we introduce the Ant Colony e rule pruning procedure iteratively eliminates the term whose
Optimization Algorithm (ACO), using this optimization algorithm elimination will cause a maximum increase in the quality of the rule.
we will generate the optimized rules. ese rules are given in to the Using the subsequent equation (2) the quality of a rule is calculated.
fuzzy that performs the fuzzification process then the rules are
generated, then finally we perform the defuzzification process. Based
on this process we will take the decision about the diseases. Finally it (2)
will give a accurate decision over the prediction model.
where True Pos is the number of cases wrapped by the rule and
containing the similar class as that forecasted by the rule, False Pos is
the number of cases wrapped by the rule and containing a dissimilar
class from that forecasted by the rule, False Neg is the number of
cases that are not wrapped by the rule, while containing the class
predicted by the rule, True Neg is the number of cases that are not
wrapped by the rule which have a dissimilar class from the class
forecasted by the rule.
1) Fuzzification
2) Fuzzy Inference Engine
3) Defuzzification
Block Diagram for Heart Disease Prediction Fuzzy inference engine: With the help of If-en type fuzzy rules,
alters the fuzzy input into the fuzzy output.
In order to predict the heart disease there are number of researches
combined in fuzzy technique for effective classification. e main Defuzzification: Changes the fuzzy output of the inference engine to
aim of this study is to predict the heart diseases because heart disease crisp using membership function equivalent to those exploited by
kills about 15 million people every year worldwide. Here numerous the fuzzifier. Crisp rules are fuzzified inference system through the
researches are combined in fuzzy technique with some other method triangular membership function in our effort. Fuzzification is
but in our proposed fuzzy classification method an effectively required as a degree of membership function is precised for each
classification process is performed. member of set. e fuzzy system predicts the results more specifically
with the optimized membership function.
4.1. Rules Generation
In the rules generation process fuzzy rules generation are used to V. RESULTS AND DISCUSSION
generate the rules. Initially the input medical image dataset are given e experimental result of fuzzy based classifier is discussed below.
as input, the datasets are Cleveland, Abalone, Iris, Cardioto e proposed system is implemented using MATLAB 2014 and the
Cography. experimentation is performed with i5 processor of 3GB RAM.
(1) e test data outcome can be positive means predicting that the
person has the heart disease or negative means predicting that the
[3] Md. MonirulKabir, Md. Shahjahan, and Kazuyuki Murase, “A new hybrid ant colony
optimization algorithm for feature selection”, Expert Systems with Applications, Vol.
39, No. 3, pp. 3747–3763, 2012.
[4] Kashef, Shima, and Hossein Nezamabadi-pour. "An advanced ACO algorithm for
feature subset selection. " Neurocomputing 147 (2015): 271-279.
[5] Salama, Khalid M., and Alex Alves Freitas. "Investigating the impact of various
classification quality measures in the predictive accuracy of ABC-Miner." In
Evolutionary Computation (CEC), 2013 IEEE Congress on, pp. 2321-2328. IEEE, 2013.
Fig. 5: Graph for dataset results with the Performance
measures Precision, Recall, F-Measure, Specificity and [6] Xiong, Wen, and Cong Wang. "A novel hybrid clustering based on adaptive ACO and
PSO." In Computer Science and Service System (CSSS), 2011 International Conference
Sensitivity, Accuracy on, pp. 1960-1963. IEEE, 2011.
From table II, the evaluation metrics are analyzed for the four [7] Ravikumar, K., and A. Gnanabaskaran. "ACO based spatial data mining for traffic risk
analysis." In Innovative Computing Technologies (ICICT), 2010 International
numbers of datasets, by which we can observe the efficiency of Conference on, pp. 1-6. IEEE, 2010.
proposed detection system. e results of the measures Precision,
Recall and F-Measure are graphically represented in fig. 5. e [8] Fathi Ganji, M., and Mohammad Saniee Abadeh. "Using fuzzy ant colony optimization
for diagnosis of diabetes disease." In Electrical Engineering (ICEE), 2010 18th Iranian
precision of four frames are 1%, 0.941176471%, 0.869565217%, and Conference on, pp. 501-505. IEEE, 2010.
0.862068966% and the average precision value for the whole frames
of the dataset is 0.918203663%. e values of recall for the four [9] Chiang, Yu-Min, Huei-Min Chiang, and Shang-Yi Lin. "e application of ant colony
optimization for gene selection in microarray-based cancer classification." In
datasets are 0.888888889%, 0.888888889%, 0.769230769%, and Machine Learning and Cybernetics, 2008 International Conference on, vol. 7, pp.
0.714285714% and its average value is also 0.815323%. Likewise, the 4001-4006. IEEE, 2008.
average F-Measure value for the entire dataset is 3.261294%. With
[10] Li, Pei-Chiang. "e Application of Ant Colony Optimization to the Classification Rule
these metrics, the specificity and accuracy are the main measures for Problem." (2007)
evaluating the detection and tracking accuracy of our proposed
system, which yields 0.641666% of specificity and 0.80260% of [11] Surendra Kumar, and C. S. P. Rao, “Application of ant colony, genetic algorithm and
data mining-based techniques for scheduling”, RoboticsandComputer-Integrated
accuracy on average for the dataset. e results get high accuracy Manufacturing, Vol. 25, No. 6, pp.901–908, 2009.
results on behalf of the reduced error rates in the proposed system.
From the fig. 5 also, we find out the minimal value of error rates for
the four dataset.