Minor Project FINAL Review in Signal Processing Domain: Supervisor
Minor Project FINAL Review in Signal Processing Domain: Supervisor
Minor Project FINAL Review in Signal Processing Domain: Supervisor
SUPERVISOR
ASWATH.S
Assistant Professor / ECE
BATCH MEMBERS
1. K. CHIKITHA (VTU12182)
2. B.DEEPTHI (VTU14238)
3. K.REETHI (VTU12191)
[Department of ECE]
In the field of Artificial Intelligence (AI), there is a large number of nature-inspired techniques to solve a
wide range of problems.
Evolutionary computation is a family of algorithms for global optimization inspired by biological
evolution.
Based on the behavior of certain biological species, some evolutionary algorithms are developed. In
technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic
or stochastic optimization character.
In evolutionary computation, an initial set of candidate solutions is generated and iteratively updated.
Each new generation is produced by stochastically removing less desired solutions and introducing small
random change.
ABSTRACT The motive of this study is to design a model which can detect diabetes in patients with maximum
accuracy.
The evaluation of model is carried with the help of Pima Indians Diabetes Database (PIDD).
This work implants the usage of nature inspired optimization technique for predicting diabetes. The nature
inspired algorithms such as bat algorithm, hybrid bat algorithm, grey wolf optimization algorithm and
firefly algorithm were mostly used for numerical data and also used for improving the accuracy and other
performance metrics.
Seven classification algorithms were used in detection of diabetes by tuning the hyper parameter.
In this work the majority classifiers performance was enhanced with the help of nature inspired algorithm
and produces best results in their performance metrics when compared with base model and randomized
search cv.
Firefly algorithm and hybrid bat algorithm outperforms with the highest accuracy of 77.2% in the case of
random forest classifier.
OBJECTIVE
The main objective of this study is to develop
a machine learning (ML)-based system for
detecting diabetic patients
To classify the abnormalities in medical data
using a machine learning algorithm.
To design a model by using nature inspired
techniques to enhance the performance of
classifiers in detecting abnormalities in
medical data.
Comparison of different classifiers
performance by using different techniques.
This is a binary (2-class) classification project
with supervised learning.
INTRODUCTI
ON
Diabetes is a very familiar word in the present world and crucial challenges in both developed and developing countries .
The number of diabetic patients is increased day by day as a result deaths are also increased day by day.
The insulin hormone in the body produced by the pancreas allows glucose to pass from the food into the bloodstream.
The analysis of diabetes data is a challenging issue because most of the medical data are nonlinear, nonnormal, correlation structured, and complex in nature.
The best approach for the detecting diabetes by machine learning.
ML system has advantage that can use both feature selection and classifiers.
ML also helps in providing the best results in detecting the diabetes.
There are lot of approaches in ML for detecting the diabetes mostly supervised classifiers.
The supervised classifiers like random forest, decision tree, logistic regression, naive bayes and svm.
There are many techniques in ML for finding best performance metrics for the model.
In this study randomized search cv and bat algorithm, hybrid bat algorithm, grey wolf optimization algorithm and firefly algorithm are used.
The best classifier is finalized with the help of performance metrics like accuracy, precision, log loss etc, the classifier which has highest values in performance metrics is
treated as a best model.
In this study, the following works are carried out :
Enhancing the classifiers performance by using nature inspired algorithms
To choose the best model out of all proposed models.
Finding the best nature inspired algorithm.
LITERATURE SURVEY
S.No Title of the Paper Author Name Journal Name & Year Inference
1. Diabetes prediction using • Hasan • IEEE • In this paper they proposed a hearty structure for
ensmbling of different diabetes expectation where the exception dismissal,
machine learning classifiers. • Md Kamrul • Published in 2020 filling the missing qualities, information normalization,
highlight choice, K-overlay cross-validation, and diverse
• Md Ashraful Machine Learning (ML) classifiers (k-closest
Neighbor, Decision Trees, Random Forest,
• Dola Das AdaBoost, Naive Bayes) and Multilayer Perceptron
(MLP) were utilized.
2. Classification and prediction • Jitranjan Sahoo • IRJET • The experiment was carried on the PIMA Indian
of diabetes disease using Diabetes data set and the results confirmed the
machine learning paradigm • Manoranjan Dash • Published in 2020 designed system had an accuracy of 79.17% using the
supplying regression classification formula.
• Abhilash Pati
• The designed system using this ML algorithm can also be
customized in predicting other alternative diseases
S.No Title of the Paper Author Name Journal Name & Year Inference
3. Prediction of Diabetes • Deepti Sisodia • ICCIDS • During this work, three machine learning classification
using Classification algorithms are studied and evaluated on various
Algorithms • Dilip Singh Sisodia • Published in 2018 measures. Experiments are performed on Pima Indians
Diabetes Database.
4. Nature-Inspired Algorithm • Mario W. L. Moreira • Journal of Medical • In this paper the utilization of a naturally propelled
for Training Multilayer Systems procedure, known as molecule swarm streamlining
Perceptron Networks in e- • Neeraj Kumar (PSO), is proposed for lessening the computational
health Environments for • Published in 2018 expense of the ANN-based technique alluded to as the
High-Risk Pregnancy Care • Joel J. P. C. Rodrigues multi-facet perceptron (MLP), without decreasing its
precision rate.
5. Bat algorithm (BA): • Amar Yahya Zebari • International Journal of • This paper shows that Bat algorithm (BA) has become a
review, applications and Scientific World. powerful nature inspired metaheuristic algorithm for
modifications • Saman M. Almufti many continuous and discrete optimization problems.
• Published in 2020.
• Chyavan • This algorithm has proved to be better than other
Mohammed nature inspired algorithm. This algorithm has also been
Abdulrahman applied to many problems such as: classification and
data mining, image process and fuzzy logic.
6. A Hybrid Bat Algorithm • Fister Jr. • Elektrotehniski Vestnik / • This paper we improve the bat algorithm, BA, by
Electrotechnical Review developing its new variant, the so called hybrid bat
• Dusan Fister, algorithm, HBA. HBA is a hybrid of BA with DE strategies.
• Published in 2013
• Xin-She Yang • The results of the HBA algorithm show that this
algorithm significantly improves the results of the
original BA algorithm
BLOCK
DIAGRA
M
The machine learning model is used to detect
whether the person is diabetic or non-diabetic.
Import relevant libraries to test and train our data set
and required install some packages related to
nature inspired algorithms.
Split the data as training data set and testing dataset
they should be in the ratio 80:20 respectively and
perform the Model Selection.
Eight different classifiers namely Logistic
Regression, K Nearest Neighbour, Decision Tree
METHODLOGY
Classifier, Random Forest Classifier, Extra Tress
Classifier, Gaussian NB, Support Vector
Classification (SVC) and Linear SVC are
considered.
The optimization techniques like random search cv,
bat algorithm, hybrid bat, grey wolf optimization
and firefly were implemented for different
classifiers to enhance the classifiers performance
Different performance metrics like Accuracy Score,
Precision Score, Recall Score, F1 Score, Log Loss
and roauc are used for evaluating the model.
Based on different performance metrics values the
classifier with highest value is considered as the
best model.
NATURE
INSPIRIED
ALGORITHM
Nature inspired algorithms are a bunch of novel
problem-solving methodologies and approaches got
from natural processes.
Nature Inspired algorithms are profoundly efficient
in discovering improved answers for multi-
dimensional and multi-modular issues.
The traditional improvement approach in math
tracking down the principal the first order derivative
of the objective function and equating it to zero to
get the critical points. These basic focuses then give
the greatest or least worth according to the objective
function.
a. Bat Algorithm
Algorithm – Bat Algorithm
• The fitness function needs to be defined here.
• The bat population is generated randomly.
• must be initialized.
The basic Bat algorithm is bio-inspired on the bio-sonar or • The iterations need to be continued up to
echo location characteristics of bats. • New solution has to be generated by using
• if
• Local solution needs to be selected around the
best solution using
In nature, bats release ultrasonic waves to the environment • End if
around it for the purposes of hunting or navigation. • if
• The new solution needs to be stored.
• and are updating using
• end if
After the emission of these waves, it receives the echoes of the • The bats must rank and the current best
waves, and based on the received echo they locate themselves solution is obtained.
and identify obstacles in their ways and preys.
b. Hybrid Bat Algorithm
and then keeping whichever candidate solution has the best • Accept the new solutions
The purpose of these flashing lights is two-fold: to attract mating partners and to warn potential • end if
predators. • Attractiveness varies with distance via
• Evaluate new solutions and update light
Obviously, this flashing light and its intensity can obey some rules, including physical laws. intensity.
• end for
The attractiveness is proportional to the brightness and they both decrease as their distance • 10. end for
increases for any two flashing fireflies, the less bright one will move towards the brighter one. • Rank the fireflies and find the current best
• 11. end while
If there is no brighter one than a particular firefly, it will move randomly. • end
DATA SET EXPLORATION
The data set which is used for this study is PIMA dataset
The data set consists of 768 observations with 8 medical predictor features.
The target feature is the outcome when the outcome is 1 the person is diabetic and with 0 the person is non-diabetic
The 8 medical predictor features are:
Pregnancies : Number of times pregnant
Glucose : Plasma glucose concentration a 2 hours in an oral glucose tolerance test
Blood Pressure : Diastolic blood pressure (mm Hg) 65%
Skin Thickness : Triceps skin fold thickness (mm)
Insulin : 2-Hour serum insulin (mu U/ml)
BMI : Body mass index (weight in kg/(height in m)²) 35%
Diabetes Pedigree Function : Diabetes pedigree function
Age : Age (years)
Outcomes : Zeroes and Ones
SOFTWARE USED
Jupyter Notebook
FOR BASE LINE MODEL THE PERFORMANCE METRICS:
• In base line model normal calculation of performance metrics takes place.
• In base line model logistic regression give best accuracy around 74.0 %.
MODEL
EVALUATI
ON
RESULTS
FOR RANDOMIZED SEARCH CV THE PERFORMANCE METRICS:
• In base line model normal calculation of performance metrics takes place.
• In base line model random forest give best accuracy around 74.6%.
FOR BAT ALGORITHM MODEL THE PERFORMANCE METRICS:
• In base line model normal calculation of performance metrics takes place.
• In base line model random forest give best accuracy around 75.9%.
FOR HYBRID BAT MODEL THE PERFORMANCE METRICS:
• In base line model normal calculation of performance metrics takes place.
• In base line model random forest give best accuracy around 77.2%.
FOR GREY WOLF OPTIMIZATION MODEL, THE PERFORMANCE METRICS:
• In base line model normal calculation of performance metrics takes place.
• In base line model decision tree give best accuracy around 75.9%.
FOR FIREFLY MODEL THE PERFORMANCE METRICS:
• In base line model normal calculation of performance metrics takes place.
• In base line model random forest give best accuracy around 77.2%.
COMPARISION OF EACH
TECHNIQUE BASED ON • Highest accuracy is obtained in Hybrid bat model and firefly by using
PERFORMANCE random forest classifier with accuracy of 77.2%
METRICS
• Highest precision is obtained in firefly model by using random
forest classifier with precision of 72%
• Highest recall is obtained in all the models by with all classifier
with recall of 62.9%
• Highest F1 score is obtained in hybrid bat model by using random
forest classifier with F1 score of 65.3%
• Highest rocauc is obtained in hybrid bat model by using random
forest classifier with rocauc of 73.5%
• Highest log loss is obtained in grey wolf optimization model by
using linear svc classifier with log loss of 12.5%
• More time is taken by firefly model while using random forest
classifier with time of 57.9 seconds
Date of
S.No Project Activity Description
Completion
Literature Survey & Problem The Literature survey on the Project title will be 16.09.2021
1 Identification done from refereed journals
2 Review with Supervisor Discussion on objectives 17.09.2021
PROJECT
4 Review with Supervisor Concept Discussion 25.09.2021
PLAN
Classification algorithm Jupyter notebook and best classifier is finalized
WITH 7
Implementation of
optimization algorithm
The best nature inspired algorithm is chose and
worked on it and best optimization algorithm is
finalized
09.10.2021
OUTCOME
Outcome : Conference / Paper Publication 8
Result validation and Report
preparation
Result validation
Report preparation
16.10.2021
23.10.2021