Academia.eduAcademia.edu

Forecasting System for Heart Diseases: A Classification Approach

2023, International Journal of Computer Science and Mobile Computing (IJCSMC)

https://doi.org/10.47760/ijcsmc.2023.v12i06.001

The medical field provides enormous quantities of data that contain an unobserved pattern that can be useful for decisions. This paper intended to develop a forecasting system to detect the early existence of heart diseases based on fourteen variables from patients’ historical data. The system used the naïve bayes data mining technique to analyze the inputted medical data. This paper used the Rapid Application Development (RAD) Software Life Cycle in designing and developing the system. The system was simulated using synthetic datasets namely, Cleveland and Statlog. Results of this study showed that the system provided adequate features, the dataset input training page, data verification page, and forecasting results page for predicting heart diseases.

Supercharge your research with Academia Premium

checkDownload curated PDF packages
checkTrack your impact with Mentions
checkAccess advanced search filters

INTRODUCTION

Heart diseases remain the fundamental driver of death around the world. Conceivable identification in the prior stage will forestall the assaults. Clinical experts create information with an abundance of shrouded data present, and it isn't appropriately utilized for the forecast [1].

There were several mechanisms used for managing diagnostic results and forecasting systems that are based on computers may play a vital role. The health care field generates big data about clinical assessment, a report regarding patients, cure, follow-ups, medication. Improvement in the measure of information needs some appropriate way to concentrate and procedure information adequately and proficiently [2].

Techniques of data mining help to process the data and turn them into useful information. Prediction results from information mining are valuable in different fields like Healthcare Management. This field requires precise and convenient mannered analysis which can spare numerous patients' lives. Data mining strategies play a vital role in healthcare analysis [3].

This study focused on forecasting heart disease to a patient based on historical data set. Datasets being used are Cleveland and Statlog.

II.

METHODS AND TOOLS This study used the Rapid Application Development (RAD) Software Life Cycle in designing and developing the system. This model targets developing the system in a short span of time [4]. The first phase of this model is Analysis and Quick Design. In this phase, finding related studies are done and hardware and software requirements are identified. System functionalities' conceptualization is also done in this phase. The second phase of the model is the Build phase. In this phase, the system user's designs are coded and designed. Chosen the data mining technique was embedded in this phase. The third phase of the model is Demonstrate, Refine, and Testing. In this phase system's functionalities are tested using synthetic data sets from UCI Machine Learning Repository. Fourteen (14) medical data with categorical values were used in testing the system: the age, sex, chest pain type, resting blood sugar, cholesterol, fasting blood sugar, electrocardiographic result, maximum heart rate achieved, exercise induces angina, ST depression, slope, number of major vessels colored by fluoroscopy and defect type.

A. Naïve Bayesian Algorithm

Naive Bayes classifier is based on Bayes theorem. This classifier algorithm used conditional independence, means it assumes that an attribute value on a given class is independent of the values of other attributes [5].

Steps:

1. Convert the dataset into a frequency table 2. Create a likelihood table by finding the probability. 3. Calculate the posterior probability of each class. 4. The class with the highest priority probability is the outcome of the prediction.

The Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to beat even profoundly advanced characterization techniques. A preferred position of the Naive Bayes classifier is that it requires just a modest quantity of preparing information to appraise the parameters (means and variances of the variables) necessary for classification. Since independent variables are assumed, just the changes of the variables for each class need to be resolved. It very well may be utilized for both binary and multiclass classification issues [2].

Naïve Bayes is one of the data mining techniques demonstrating impressive achievement compared to other data mining techniques over different heart disease datasets. Palaniappan and Awang explored contrasting various information mining strategies in the diagnosis of heart disease patients. These techniques included naïve bayes, decision tree, and neural network. The outcomes indicated that the naïve bayes achieved the best accuracy in the diagnosis of heart disease patients. Rajkumar and Reena investigated Naïve Bayes, k-nearest neighbour, and decision list in the diagnosis of heart disease patients. The results showed that the Naïve Bayes achieved the best accuracy in the diagnosis of heart disease patients [6].

Fig. 1 Using Naïve Bayes Classification Algorithm

Medical data are entered classification algorithm in order to be learned as shown in figure 1. Mostly, the connection between attributes needs to be found by the algorithm to forestall the result. At the point when another case is shown up the developed classification algorithm is used to classify it into one of the predefined classes. For example, the training set in the medical database would have a lot of important patient data recorded as of now, where the forecast result is whether the patient had a heart disease.

B. Data Source

The publicly available heart disease database was used, the Cleveland and Statlog datasets. Fourteen (14) medical data with categorical values are used in testing the system; the age, sex, chest pain type, resting blood sugar, cholesterol, fasting blood sugar, electrocardiographic result, maximum heart rate achieved, exercise induces angina, ST depression, slope, number of major vessels colored by fluoroscopy, defect type and the target value. The table 1 describes the three datasets to be used in this research. Synthetic_03 dataset is the combined dataset of Cleveland and Statlog datasets Table 1 Dataset's Description Figure 2 shows the page where the user can enter the medical data of a patient for data testing. Medical data was made categorical for better understanding. Users can select the values in every combo box, the ward should carefully select values in order to get an accurate forecast result from the developed system. Figure 3 shows the page where Doctor's approval is needed if the patient's data did not match any record in the historical database. If the doctor-approved the prediction result, the patient's data will be added to the historical database, else the data will not be stored in the database. This feature of the developed system will let the system's data reserve its integrity since all predictions are verified by a specialist. Figure 4 shows the prediction result's page. If the patient's data is 100% matched correlated to any record in the database, the forecasting is with heart disease or no heart disease, else doctor's approval for the data is needed.

Table 1

Figure 2

Dataset Input Training Page

Figure 3

Figure 4

Forecasting Result's Page

III. RESULTS AND DISCUSSION

IV. CONCLUSION Based in the findings of the study, this paper developed the features of forecasting system for heart diseases with the use of the different tools cited in hardware and software requirements. The researchers wrote the equivalent program code of Naïve Bayes Classification Algorithm and embed it to the developed forecasting system. Integrating concept of the Naïve Bayes Algorithm in the developed system provides forecasting that can be use as basis for interpretation, diagnosing and decision making.

V.

RECOMMENDATION Based on the findings and conclusions made, the researchers recommend the following: (1) The developed forecasting system needs to be deployed to a real scenario for testing and for user's approval. System Evaluation also must be done. (2) Future researchers of this study may use the features of the developed system for heart diseases as basis to create and design other prediction systems specially for human diseases. (3) The developed system can have additional features such as Patients Information Repository System for diagnosing heart diseases.

Datasets No. of records No. of attributes No. of With Heart Disease No. of Without Heart Disease