0% found this document useful (0 votes)
28 views32 pages

Minor Project FINAL Review in Signal Processing Domain: Supervisor

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 32

Minor Project

Enhancing the performance of classifiers in


FINAL Review
detecting the abnormalities in medical data
In
using nature inspired optimization techniques 
Signal Processing Domain

SUPERVISOR
ASWATH.S 
Assistant Professor / ECE

BATCH MEMBERS
1. K. CHIKITHA (VTU12182)
2. B.DEEPTHI (VTU14238)
3. K.REETHI (VTU12191)
[Department of ECE]
 In the field of Artificial Intelligence (AI), there is a large number of nature-inspired techniques to solve a
wide range of problems. 
 Evolutionary computation is a family of algorithms for global optimization inspired by biological
evolution.
 Based on the behavior of certain biological species, some evolutionary algorithms are developed. In
technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic
or stochastic optimization character. 
 In evolutionary computation, an initial set of candidate solutions is generated and iteratively updated.
Each new generation is produced by stochastically removing less desired solutions and introducing small
random change. 

ABSTRACT  The motive of this study is to design a model which can detect diabetes in patients with maximum
accuracy. 
 The evaluation of model is carried with the help of Pima Indians Diabetes Database (PIDD). 
 This work implants the usage of nature inspired optimization technique for predicting diabetes. The nature
inspired algorithms such as bat algorithm, hybrid bat algorithm, grey wolf optimization algorithm and
firefly algorithm were mostly used for numerical data and also used for improving the accuracy and other
performance metrics.
 Seven classification algorithms were used in detection of diabetes by tuning the hyper parameter.
 In this work the majority classifiers performance was enhanced with the help of nature inspired algorithm
and produces best results in their performance metrics when compared with base model and randomized
search cv.
 Firefly algorithm and hybrid bat algorithm outperforms with the highest accuracy of 77.2% in the case of
random forest classifier.
OBJECTIVE
 The main objective of this study is to develop
a machine learning (ML)-based system for
detecting diabetic patients
 To classify the abnormalities in medical data
using a machine learning algorithm.
 To design a model by using nature inspired
techniques to enhance the performance of
classifiers in detecting abnormalities in
medical data.
 Comparison of different classifiers
performance by using different techniques.
 This is a binary (2-class) classification project
with supervised learning.
INTRODUCTI
ON
Diabetes is a very familiar word in the present world and crucial challenges in both developed and developing countries .
The number of diabetic patients is increased day by day as a result deaths are also increased day by day.
The insulin hormone in the body produced by the pancreas allows glucose to pass from the food into the bloodstream.
The analysis of diabetes data is a challenging issue because most of the medical data are nonlinear, nonnormal, correlation structured, and complex in nature.
The best approach for the detecting diabetes by machine learning. 
ML system has advantage that can use both feature selection and classifiers. 
ML also helps in providing the best results in detecting the diabetes. 
There are lot of approaches in ML for detecting the diabetes mostly supervised classifiers. 
The supervised classifiers like random forest, decision tree, logistic regression, naive bayes and svm.
There are many techniques in ML for finding best performance metrics for the model.
In this study randomized search cv and bat algorithm, hybrid bat algorithm, grey wolf optimization algorithm and firefly algorithm are used.
The best classifier is finalized with the help of performance metrics like accuracy, precision, log loss etc, the classifier which has highest values in performance metrics is
treated as a best model.
In this study, the following works are carried out :
Enhancing the classifiers performance by using nature inspired algorithms
To choose the best model out of all proposed models.
Finding the best nature inspired algorithm.
LITERATURE SURVEY

S.No Title of the Paper Author Name Journal Name & Year Inference
1. Diabetes prediction using • Hasan • IEEE • In this paper they proposed a hearty structure for
ensmbling of different diabetes expectation where the  exception dismissal,
machine learning classifiers. • Md Kamrul • Published in 2020 filling the missing qualities, information normalization,
highlight choice, K-overlay cross-validation, and diverse
• Md Ashraful Machine Learning (ML) classifiers (k-closest
Neighbor, Decision Trees, Random Forest,
• Dola Das AdaBoost, Naive Bayes) and Multilayer Perceptron
(MLP) were utilized.

2. Classification and prediction • Jitranjan Sahoo  • IRJET  • The experiment was carried on the PIMA Indian
of diabetes disease using Diabetes data set and the results confirmed the
machine learning paradigm • Manoranjan Dash  • Published in 2020 designed system had an accuracy of 79.17% using the
supplying regression classification formula. 
• Abhilash Pati 
• The designed system using this ML algorithm can also be
customized in predicting other alternative diseases
S.No Title of the Paper Author Name Journal Name & Year Inference

3. Prediction of Diabetes • Deepti Sisodia • ICCIDS • During this work, three machine learning classification
using Classification algorithms are studied and evaluated on various
Algorithms • Dilip Singh Sisodia • Published in 2018 measures. Experiments are performed on Pima Indians
Diabetes Database.

•  Experimental results determine the adequacy of the


designed system with an achieved accuracy of 76.30 %
using the Naive Bayes classification algorithm. In future,
the designed system with the used machine
learning classification algorithms can be used to
predict or diagnose other diseases

4. Nature-Inspired Algorithm • Mario W. L. Moreira • Journal of Medical • In this paper the utilization of a naturally propelled
for Training Multilayer Systems procedure, known as molecule swarm streamlining
Perceptron Networks in e- •  Neeraj Kumar  (PSO), is proposed for lessening the computational
health Environments for • Published in 2018 expense of the ANN-based technique alluded to as the
High-Risk Pregnancy Care •  Joel J. P. C. Rodrigues  multi-facet perceptron (MLP), without decreasing its
precision rate. 

• All things considered, different methodologies by 26.4%


as far as accuracy and 14.9% as far as the genuine
positive proportion (TPR) and showed a decrease of
35.4% in the false positive proportion (FPR).
S.No Title of the Paper Author Name Journal Name & Year Inference

5. Bat algorithm (BA): • Amar Yahya Zebari • International Journal of • This paper shows that Bat algorithm (BA) has become a
review, applications and Scientific World. powerful nature inspired metaheuristic algorithm for
modifications • Saman M. Almufti many continuous and discrete optimization problems.
• Published in 2020.
• Chyavan • This algorithm has proved to be better than other
Mohammed nature inspired algorithm. This algorithm has also been
Abdulrahman applied to many problems such as: classification and
data mining, image process and fuzzy logic.

6. A Hybrid Bat Algorithm • Fister Jr. • Elektrotehniski Vestnik / • This paper we improve the bat algorithm, BA, by
Electrotechnical Review developing its new variant, the so called hybrid bat
• Dusan Fister, algorithm, HBA. HBA is a hybrid of BA with DE strategies.
• Published in 2013
• Xin-She Yang • The results of the HBA algorithm show that this
algorithm significantly improves the results of the
original BA algorithm
BLOCK
DIAGRA
M
 The machine learning model is used to detect
whether the person is diabetic or non-diabetic.
 Import relevant libraries to test and train our data set
and required install some packages related to
nature inspired algorithms.
 Split the data as training data set and testing dataset 
they should be in the ratio 80:20 respectively and
perform the Model Selection.
 Eight different classifiers namely Logistic
Regression, K Nearest Neighbour, Decision Tree

METHODLOGY
Classifier, Random Forest Classifier, Extra Tress
Classifier, Gaussian NB, Support Vector
Classification (SVC) and Linear SVC are
considered.
 The optimization techniques like random search cv,
bat algorithm, hybrid bat, grey wolf optimization
and firefly were implemented for different
classifiers to enhance the classifiers performance
 Different performance metrics like Accuracy Score,
Precision Score, Recall Score, F1 Score, Log Loss
and roauc are used for evaluating the model.
 Based on different performance metrics values the
classifier with highest value is considered as the
best model.
NATURE
INSPIRIED
ALGORITHM
 Nature inspired algorithms are a bunch of  novel
problem-solving methodologies and approaches got
from natural processes.
  Nature Inspired algorithms are profoundly efficient
in discovering improved answers for multi-
dimensional and multi-modular issues. 
 The traditional improvement approach in math
tracking down the principal the first order derivative
of the objective function and equating it to zero to
get the critical points. These basic focuses then give
the greatest or least worth according to the objective
function.
a. Bat Algorithm
Algorithm – Bat Algorithm
• The fitness function needs to be defined here.
• The bat population is generated randomly.
• must be initialized.
The basic Bat algorithm is bio-inspired on the bio-sonar or • The iterations need to be continued up to
echo location characteristics of bats. ​ • New solution has to be generated by using
• if
• Local solution needs to be selected around the
best solution using
In nature, bats release ultrasonic waves to the environment • End if
around it for the purposes of hunting or navigation. ​ • if
• The new solution needs to be stored.
• and are updating using
• end if
After the emission of these waves, it receives the echoes of the • The bats must rank and the current best
waves, and based on the received echo they locate themselves solution is obtained.
and identify obstacles in their ways and preys. ​
b. Hybrid Bat Algorithm

Algorithm – Hybrid Bat Algorithm


• Objective function
• Initialize the bat population and
• Define pulse frequency Q i € [Q min, Q max]
• Initialize pulses rates and the loudness
Hybrid Bat Algorithm was obtained by hybridizing the • While // number of iterations
original BA using the Differential Evolution (DE) strategies. • Generate new solutions by adjusting
frequency and updating velocities and
locations/solutions
• If
• End if
DE optimizes a problem by maintaining a population of • Generate a new solution by flying
candidate solutions and creates new candidate solutions by randomly

combining the existing ones according to its simple formulae, • if

and then keeping whichever candidate solution has the best • Accept the new solutions

score or fitness on the optimization problem at hand. DE • Increase and reduce

supports a differential mutation, a differential crossover and a • End if


• Rank the bats and find the current
differential selection.
best
• end
c. Grey Wolf Optimization Algorithm
Algorithm - Grey Wolf Optimization
• Initialize the grey wolf population where .
• Initialize and
• The fitness of each search agent is implemented.
• is the first finest search agent.
The GWO algorithm mimics the leadership hierarchy and • is the second finest search agent.
hunting mechanism of gray wolves in nature. • is the third finest search agent.

• For each search agent modify the current search


agents position using
Four types of grey wolves such as alpha, beta, delta, and • end for
omega are employed for simulating the leadership • Modify , and
hierarchy. • The fitness is for all search agents is
computed
• 11. Modify and .
In addition, three main steps of hunting, searching for • 12.
prey, encircling prey, and attacking prey, are implemented • 13. end while
to perform optimization. • 14. return
d. Firefly Algorithm
• Algorithm – Firefly Algorithm
• Generate initial population of fireflies where .
Among swarm-intelligence-based algorithms, firefly algorithm (FA) is now one of the most widely • Light intensity at is determined by
used. Firefly algorithm was developed by Xin-She Yang in 2008, • Define light absorption coefficient
• while )
based on inspiration from the natural behavior of tropical fireflies. • for all n fireflies
• 6. for all n fireflies

FA tries to mimic the flashing pattern and attraction behavior of fireflies. • if


• Move fireflytowards in d-dimensions via levy flights.

The purpose of these flashing lights is two-fold: to attract mating partners and to warn potential • end if
predators. • Attractiveness varies with distance via
• Evaluate new solutions and update light
Obviously, this flashing light and its intensity can obey some rules, including physical laws. intensity.
• end for
The attractiveness is proportional to the brightness and they both decrease as their distance • 10. end for
increases for any two flashing fireflies, the less bright one will move towards the brighter one. • Rank the fireflies and find the current best
• 11. end while
If there is no brighter one than a particular firefly, it will move randomly. • end
DATA SET EXPLORATION
 The data set which is used for this study is PIMA dataset
 The data set consists of 768 observations with 8 medical predictor features.
 The target feature is the outcome when the outcome is 1 the person is diabetic and with 0 the person is non-diabetic
 The 8 medical predictor features are:
 Pregnancies : Number of times pregnant 
 Glucose : Plasma glucose concentration a 2 hours in an oral glucose tolerance test
 Blood Pressure : Diastolic blood pressure (mm Hg)  65%
 Skin Thickness : Triceps skin fold thickness (mm)
 Insulin : 2-Hour serum insulin (mu U/ml)
 BMI : Body mass index (weight in kg/(height in m)²)  35%
 Diabetes Pedigree Function : Diabetes pedigree function
 Age : Age (years)
 Outcomes : Zeroes and Ones
SOFTWARE USED

Jupyter Notebook
FOR BASE LINE MODEL THE PERFORMANCE METRICS:
• In base line model normal calculation of performance metrics takes place.
• In base line model logistic regression give best accuracy around 74.0 %.

MODEL
EVALUATI
ON
RESULTS
FOR RANDOMIZED SEARCH CV THE PERFORMANCE METRICS:
• In base line model normal calculation of performance metrics takes place.
• In base line model random forest give best accuracy around 74.6%.
FOR BAT ALGORITHM MODEL THE PERFORMANCE METRICS:
• In base line model normal calculation of performance metrics takes place.
• In base line model random forest give best accuracy around 75.9%.
FOR HYBRID BAT MODEL THE PERFORMANCE METRICS:
• In base line model normal calculation of performance metrics takes place.
• In base line model random forest give best accuracy around 77.2%.
FOR GREY WOLF OPTIMIZATION MODEL, THE PERFORMANCE METRICS:
• In base line model normal calculation of performance metrics takes place.
• In base line model decision tree give best accuracy around 75.9%.
FOR FIREFLY MODEL THE PERFORMANCE METRICS:
• In base line model normal calculation of performance metrics takes place.
• In base line model random forest give best accuracy around 77.2%.
COMPARISION OF EACH
TECHNIQUE BASED ON • Highest accuracy is obtained in Hybrid bat model and firefly by using
PERFORMANCE random forest classifier with accuracy of 77.2%
METRICS
• Highest precision is obtained in firefly model by using random
forest classifier with precision of 72%
• Highest recall is obtained in all the models by with all classifier
with recall of 62.9%
• Highest F1 score is obtained in hybrid bat model by using random
forest classifier with F1 score of 65.3%
• Highest rocauc is obtained in hybrid bat model by using random
forest classifier with rocauc of 73.5%
• Highest log loss is obtained in grey wolf optimization model by
using linear svc classifier with log loss of 12.5%
• More time is taken by firefly model while using random forest
classifier with time of 57.9 seconds
Date of
S.No Project Activity Description
Completion
Literature Survey & Problem The Literature survey on the Project title will be 16.09.2021
1 Identification done from refereed journals
2 Review with Supervisor Discussion on objectives 17.09.2021

The concept of the project will be finalized as a


set of data to be chosen. The required machine
3 Data set exploration 24.09.2021
learning tool its functions and modules will be
finalized

PROJECT
4 Review with Supervisor Concept Discussion 25.09.2021

Implementation of The classification on algorithms is carried out in


5 01.10.2021

PLAN
Classification algorithm Jupyter notebook and best classifier is finalized

6 Review with Supervisor Discussion on the results  02.10.2021

WITH 7
Implementation of
optimization algorithm
The best nature inspired algorithm is chose and
worked on it and best optimization algorithm is
finalized
09.10.2021

OUTCOME
Outcome : Conference / Paper Publication 8
Result validation and Report
preparation 
Result validation 
Report preparation
16.10.2021
 23.10.2021

The implementation of suggestions given by


9 Review of work by Supervisor review panel will be verified by supervisor 24.10.2021

Research article preparation


10 Paper publication 30.10.2021
 
REFERENCES
• Arkadip Ray, Avijit Kumar Chaudhuri “Smart healthcare disease diagnosis and patient management: Innovation,
improvement and skill development” Machine Learning with Applications, Volume 3, March 2021.
• Alehegn, Minyechil, Rahul Joshi, and Preeti Mulay, “Analysis and prediction of diabetes mellitus using machine
learning algorithm”, International Journal of Pure and Applied Mathematics 118, July 2018. 
• Md. Maniruzzaman, Md. Jahanur Rahman, Benojir Ahammed and Md. Menhazul Abedin “Classification and
prediction of diabetes disease using machine learning paradigm”, Health Information Science and Systems,
January 2020. 
• Mercaldo Francesco, Vittoria Nardone, and Antonella Santone “Diabetes mellitus affected patients’ classification
and diagnosis through machine learning techniques”, Procedia Computer Science, September 2017.
• Kashif Naseer Qureshi, Sadia Din, Gwanggil Jeon, Francesco Piccialli “An Accurate and Dynamic Predictive
Model for a Smart M-Health System Using Machine Learning”, Computers and Electrical Engineering, Volume
95, October 2021. 
• Deepti Sisodia, Dilip Singh Sisodia Prediction of Diabetes using Classification Algorithms”, Procedia Computer
Science, International Conference on Computational Intelligence and Data Science (ICCIDS), May 2018.
• S. R. Jino Ramson, K. Lova Raju, S. Vishnu and Theodoros Anagnostopoulos” Nature Inspired Optimization
Techniques for Image Processing A Short Review”, Springer International Publishing AG, Intelligent Systems
Reference Library, Volume 150, January 2019.
THANK YOU

You might also like