2017DataMiningTools PDF
2017DataMiningTools PDF
2017DataMiningTools PDF
net/publication/321520481
CITATIONS READS
4 406
1 author:
Kauser Ahmed P
VIT University
12 PUBLICATIONS 93 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Hybridizing Evolutionary and Metaheuristic Algorithms for Real Problems View project
All content following this page was uploaded by Kauser Ahmed P on 12 January 2019.
Abstract
Adaptation of information technology has leads to creation of several applications in health care informatics. Health care
informatics is generating large amount of data. These data can be processed using data mining techniques to predict the
diseases. Data mining is the process of analyzing, extracting data and furnishes the data as knowledge which forms the
relationship within the available data. Some of the data mining techniques include association, clustering, classification and
prediction. Various data mining tools are compared to analyze the performance of health care data and disease prediction.
Keywords: Clustering, Classification, Data mining tools, Disease prediction, Health care.
INTRODUCTION
Significant advances in information technology results in 1. FUNDAMENTALS OF DATA MINING
excessive growth of data in health care informatics [1]. Data mining is concerned with the process of
Health care informatics data includes hospital details, computationally extracting unknown knowledge from huge
patient’s details, disease details and treatment cost. These sets of data. Extraction of useful knowledge from the
huge data are generated from different sources and format. enormous data sets and providing decision-making results
It can have irrelevant attributes and missing data. Applying for the diagnosis and treatment of diseases is very
data mining techniques is a key approach to extract important. Data mining can be used to extract knowledge
knowledge from large disease data. Data mining has by analyzing and predicting various diseases. Health care
various methods to extract knowledge from huge disease data mining has great potential to discover the hidden
data set. Data mining techniques like classification, patterns in the data sets of the medical domain. Various
clustering and rule mining can be used to analyze data and data mining techniques are available with their suitability
extract meaningful information. Some of the important dependent on the health care data. Data mining applications
current applications of data mining in health care includes in health care can have a wonderful potential and
predicting the future outcomes of diseases based on effectiveness. It automates the process of finding predictive
previous data collected from similar diseases, diagnosis of information in huge databases.
disease based on patient data, analyzing treatment costs Disease prediction plays an important role in data mining.
and demand of resources, preprocessing of noisy, missing Finding of a disease requires the performance of a number
data and minimizing the time to wait for the disease of tests on the patient. However, use of data mining
diagnosis. Data mining tools like Weka, Rapid miner and techniques, can reduce the number of tests. This reduced
Orange [2, 3, 4] are used to analyze and predict better result test set plays significant role in performance and time.
for health care data. New and current data mining tools and Health care data mining is an important task because it
technologies are used in disease diagnosis and health care allows doctors to see which attributes are more important
informatics to improve the health care services in cost for diagnosis such as age, weight, symptoms etc. This will
effective manner and minimizing the time for disease help the doctors diagnose the disease more efficiently.
diagnosis. Knowledge discovery in databases is the process of finding
The organization of this paper is as follows. Section 1 useful information and patterns in data. Knowledge
describes the fundamentals of data mining. Section 2 list discovery in databases can be done using data mining. It
out various data mining techniques used in health care. uses algorithms to extract the information and patterns
Data mining tools are discussed in section 3. Results and derived by the knowledge discovery in databases process.
discussion are highlighted in section 4. Concluding remarks Various stages of knowledge discovery in databases
are given in section 5. process is highlighted in Fig.1.
1886
Kauser Ahmed P /J. Pharm. Sci. & Res. Vol. 9(10), 2017, 1886-1888
Various stages of knowledge discovery in databases tasks like data preprocessing, clustering, classification,
process is describe as follows. In Selection stage, it obtains regression, visualization and feature selection New
the data different resources. In preprocessing stage, it algorithms can also be implemented using WEKA with
removes the unwanted missing and noisy data and existing data mining and machine learning techniques.
furnished the clean data which can format to a common WEKA provides various sources for loading data, including
format in transformation stage. Then data mining files, URLs and databases. It supports file formats include
techniques is applied to get desired output. Finally in the in WEKA"s own ARFF format, CSV, Lib SVMs format, and
the interpretation stage, it will present the result to end user C4.5’s format. Many evaluation criteria are also provided
in a meaningful manner. in WEKA such as confusion matrix, precision, recall, true
positive and false negative, etc. Some of the advantages of
2. DATA MINING TECHNIQUES WEKA tool includes Open source, platform independent
Data mining techniques like classification, clustering and and portable, graphical user interface and contains very
association rules are widely used in disease data analysis. large collection of different data mining algorithms.
Classification
Classification is a machine learning based data mining RAPIDMINER
technique. Classification is used to classify each RAPIDMINER (RM) [8] is open source software which
information in a set of data into one of predefined set of provides a good environment for data mining processes. It
groups or classes. It makes use mathematical techniques has the facility of drag-and-drop which is used to construct
such as decision trees, linear programming, neural network the dataflow. It support different file formats. Regression,
and statistics to classify the data into different groups. classification and clustering tasks can be performed easily
Modern classification techniques provide more intelligent with different learning algorithms. Rapid Miner supports a
methods for effective prediction of diseases [5]. Different large number of the classification and regression
types of classification techniques includes Support vector algorithms, decision trees, association rules, clustering
machine, discriminant analysis, naive based, decision trees, algorithms, and many features are available for data pre-
linear and non linear regression. processing, normalization, filtering and data analysis. It
Clustering can import data from different traditional and standard
Clustering is a data mining technique that makes cluster of databases.
objects that have similar characteristic using automatic
technique. Clustering technique defines the classes and put ORANGE
objects in them where the class is not predefined. Different ORANGE [9] is an open source data mining tool developed
types of cluster techniques includes K-means, Fuzzy C- at the Bioinformatics Laboratory at the University of
means (FCM), Rough C-means (RCM), Rough-Fuzzy C- Ljubljana. Applications can be implemented using
means (RFCM) , Robust RFCM (rRFCM), hierarchical and scripting and visual programming. Python library is
Gaussian mixture. available for data manipulation and widget alteration.
Association rule mining Programming is performed by placing widgets on the
Association rule learning is a popular and well researched canvas and connecting their inputs and outputs. This tool is
method for finding interesting relations between different suitable for machine learning and data mining algorithms.
data in large databases. It is intended to identify well built It can be easily used by both researchers of data mining and
rules discovered in databases using different procedures of inexperienced users who want to develop and test their own
importance based on input data set. Association rule mining algorithms. It gives advantage of reusing as much of the
is the data mining process of finding the rules, finding code as possible.
frequent patterns, associations, correlations, or causal
structures among sets of items that may govern associations KNIME
and causal objects between sets of items. Understand KNIME (Konstanz Information Miner) [10] is a general
customer buying habits by finding associations and purpose open source data mining tool developed and
correlations between the different items that customers maintained by the Swiss company. It is implemented on the
place in their “shopping basket”. The main applications of Eclipse platform and has facility of data integration,
association rule mining includes basket data analysis, cross- processing, exploration, and analysis platform. KNIME can
marketing and catalog design. The above data mining be integrated with other data mining tools such as R and
techniques can be used in the diagnosis of diseases [6]. WEKA.
1887
Kauser Ahmed P /J. Pharm. Sci. & Res. Vol. 9(10), 2017, 1886-1888
0.86 REFERENCES
1. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I,
0.84 Chouvarda I. Machine learning and data mining methods in diabetes
research. Computational and structural biotechnology journal. 2017;
0.82 104-116.
2. Patil PH, Thube S, Ratnaparkhi B, Rajeswari K. Analysis of
0.8 Different Data Mining Tools using Classification, Clustering and
Orange Weka Rapidminer Association Rule Mining. International Journal of Computer
Applications. 2014; 93(8):35-39.
3. Usha Rani D. Survey on Data Mining Tools and Techniques in
Fig.2: Classification Accuracy of data mining tools on Medical Field. International Journal of Advanced Networking &
Iris data set. Applications. 2017; 8(5):51-54.
4. Devi SK, Krishnapriya S, Kalita D. Prediction of Heart Disease
using Data Mining Techniques. Indian Journal of Science and
Based on analysis the following result has been derived Technology. 2016;9(39):1-5.
from different data mining tools. 5. Bhatla N, Jyoti K. An analysis of heart disease prediction using
WEKA is the best tool for a beginner since it contains different data mining techniques. International Journal of
Engineering. 2012;1(8):1-4.
many in-built and experimental features and no prior 6. Fatima M, Pasha M. Survey of Machine Learning Algorithms for
knowledge of coding is required. Disease Diagnostic. Journal of Intelligent Learning Systems and
RapidMiner is the only tool which is independent of Applications. 2017;9(1):1-16.
language limitations and has statistical and predictive 7. http://www.cs.waikato.ac.nz/ml/weka/ [Last accessed: 22/08/2017].
8. https://rapidminer.com/ [Last accessed: 22/08/2017].
analytical capabilities. 9. https://orange.biolab.si/ [Last accessed: 22/08/2017].
ORANGE and RapidMiner in comparison are the tools that 10. https://www.knime.com/ [Last accessed: 22/08/2017].
are for advanced users since it requires advanced 11. https://archive.ics.uci.edu/ml/datasets/iris [Last accessed:
knowledge in coding. 22/08/2017].
1888