Assistant Professor / ECE
1. K. CHIKITHA (VTU12182)
2. B.DEEPTHI (VTU14238)
3. K.REETHI (VTU12191)
[Department of ECE]
In the field of Artificial Intelligence (AI), there is a large number of nature-inspired techniques to solve a
wide range of problems.
Evolutionary computation is a family of algorithms for global optimization inspired by biological
Based on the behavior of certain biological species, some evolutionary algorithms are developed. In
technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic
or stochastic optimization character.
In evolutionary computation, an initial set of candidate solutions is generated and iteratively updated.
Each new generation is produced by stochastically removing less desired solutions and introducing small
random change.
ABSTRACT The motive of this study is to design a model which can detect diabetes in patients with maximum
The evaluation of model is carried with the help of Pima Indians Diabetes Database (PIDD).
This work implants the usage of nature inspired optimization technique for predicting diabetes. The nature
inspired algorithms such as bat algorithm, hybrid bat algorithm, grey wolf optimization algorithm and
firefly algorithm were mostly used for numerical data and also used for improving the accuracy and other
performance metrics.
Seven classification algorithms were used in detection of diabetes by tuning the hyper parameter.
In this work the majority classifiers performance was enhanced with the help of nature inspired algorithm
and produces best results in their performance metrics when compared with base model and randomized
search cv.
Firefly algorithm and hybrid bat algorithm outperforms with the highest accuracy of 77.2% in the case of
random forest classifier.
The main objective of this study is to develop
a machine learning (ML)-based system for
detecting diabetic patients
To classify the abnormalities in medical data
using a machine learning algorithm.
To design a model by using nature inspired
techniques to enhance the performance of
classifiers in detecting abnormalities in
medical data.
Comparison of different classifiers
performance by using different techniques.
This is a binary (2-class) classification project
with supervised learning.
Diabetes is a very familiar word in the present world and crucial challenges in both developed and developing countries .
The number of diabetic patients is increased day by day as a result deaths are also increased day by day.
The insulin hormone in the body produced by the pancreas allows glucose to pass from the food into the bloodstream.
The analysis of diabetes data is a challenging issue because most of the medical data are nonlinear, nonnormal, correlation structured, and complex in nature.
The best approach for the detecting diabetes by machine learning.
ML system has advantage that can use both feature selection and classifiers.
ML also helps in providing the best results in detecting the diabetes.
There are lot of approaches in ML for detecting the diabetes mostly supervised classifiers.
The supervised classifiers like random forest, decision tree, logistic regression, naive bayes and svm.
There are many techniques in ML for finding best performance metrics for the model.
In this study randomized search cv and bat algorithm, hybrid bat algorithm, grey wolf optimization algorithm and firefly algorithm are used.
The best classifier is finalized with the help of performance metrics like accuracy, precision, log loss etc, the classifier which has highest values in performance metrics is
treated as a best model.
In this study, the following works are carried out :
Enhancing the classifiers performance by using nature inspired algorithms
To choose the best model out of all proposed models.
Finding the best nature inspired algorithm.
The machine learning model is used to detect
whether the person is diabetic or non-diabetic.
Import relevant libraries to test and train our data set
and required install some packages related to
nature inspired algorithms.
Split the data as training data set and testing dataset
they should be in the ratio 80:20 respectively and
perform the Model Selection.
Eight different classifiers namely Logistic
Regression, K Nearest Neighbour, Decision Tree
Classifier, Random Forest Classifier, Extra Tress
Classifier, Gaussian NB, Support Vector
Classification (SVC) and Linear SVC are
The optimization techniques like random search cv,
bat algorithm, hybrid bat, grey wolf optimization
and firefly were implemented for different
classifiers to enhance the classifiers performance
Different performance metrics like Accuracy Score,
Precision Score, Recall Score, F1 Score, Log Loss
and roauc are used for evaluating the model.
Based on different performance metrics values the
classifier with highest value is considered as the
best model.
Nature inspired algorithms are a bunch of novel
problem-solving methodologies and approaches got
from natural processes.
Nature Inspired algorithms are profoundly efficient
in discovering improved answers for multi-
dimensional and multi-modular issues.
The traditional improvement approach in math
tracking down the principal the first order derivative
of the objective function and equating it to zero to
get the critical points. These basic focuses then give
the greatest or least worth according to the objective
a. Bat Algorithm
Algorithm – Bat Algorithm
• The fitness function needs to be defined here.
• The bat population is generated randomly.
• must be initialized.
The basic Bat algorithm is bio-inspired on the bio-sonar or • The iterations need to be continued up to
echo location characteristics of bats. • New solution has to be generated by using
• if
• Local solution needs to be selected around the
best solution using
In nature, bats release ultrasonic waves to the environment • End if
around it for the purposes of hunting or navigation. • if
• The new solution needs to be stored.
• and are updating using
• end if
After the emission of these waves, it receives the echoes of the • The bats must rank and the current best
waves, and based on the received echo they locate themselves solution is obtained.
and identify obstacles in their ways and preys.
b. Hybrid Bat Algorithm
and then keeping whichever candidate solution has the best • Accept the new solutions
The purpose of these flashing lights is two-fold: to attract mating partners and to warn potential • end if
predators. • Attractiveness varies with distance via
• Evaluate new solutions and update light
Obviously, this flashing light and its intensity can obey some rules, including physical laws. intensity.
• end for
The attractiveness is proportional to the brightness and they both decrease as their distance • 10. end for
increases for any two flashing fireflies, the less bright one will move towards the brighter one. • Rank the fireflies and find the current best
• 11. end while
If there is no brighter one than a particular firefly, it will move randomly. • end
The data set which is used for this study is PIMA dataset
The data set consists of 768 observations with 8 medical predictor features.
The target feature is the outcome when the outcome is 1 the person is diabetic and with 0 the person is non-diabetic
The 8 medical predictor features are:
Pregnancies : Number of times pregnant
Glucose : Plasma glucose concentration a 2 hours in an oral glucose tolerance test
Blood Pressure : Diastolic blood pressure (mm Hg) 65%
Skin Thickness : Triceps skin fold thickness (mm)
Insulin : 2-Hour serum insulin (mu U/ml)
BMI : Body mass index (weight in kg/(height in m)²) 35%
Diabetes Pedigree Function : Diabetes pedigree function
Age : Age (years)
Outcomes : Zeroes and Ones
Jupyter Notebook
• In base line model normal calculation of performance metrics takes place.
• In base line model logistic regression give best accuracy around 74.0 %.
• In base line model normal calculation of performance metrics takes place.
• In base line model random forest give best accuracy around 74.6%.
• In base line model normal calculation of performance metrics takes place.
• In base line model random forest give best accuracy around 75.9%.
• In base line model normal calculation of performance metrics takes place.
• In base line model random forest give best accuracy around 77.2%.
• In base line model normal calculation of performance metrics takes place.
• In base line model decision tree give best accuracy around 75.9%.
• In base line model normal calculation of performance metrics takes place.
• In base line model random forest give best accuracy around 77.2%.
TECHNIQUE BASED ON • Highest accuracy is obtained in Hybrid bat model and firefly by using
PERFORMANCE random forest classifier with accuracy of 77.2%
• Highest precision is obtained in firefly model by using random
forest classifier with precision of 72%
• Highest recall is obtained in all the models by with all classifier
with recall of 62.9%
• Highest F1 score is obtained in hybrid bat model by using random
forest classifier with F1 score of 65.3%
• Highest rocauc is obtained in hybrid bat model by using random
forest classifier with rocauc of 73.5%
• Highest log loss is obtained in grey wolf optimization model by
using linear svc classifier with log loss of 12.5%
• More time is taken by firefly model while using random forest
classifier with time of 57.9 seconds
Date of
S.No Project Activity Description
Literature Survey & Problem The Literature survey on the Project title will be 16.09.2021
1 Identification done from refereed journals
2 Review with Supervisor Discussion on objectives 17.09.2021
4 Review with Supervisor Concept Discussion 25.09.2021
Classification algorithm Jupyter notebook and best classifier is finalized
Implementation of
optimization algorithm
The best nature inspired algorithm is chose and
worked on it and best optimization algorithm is
Outcome : Conference / Paper Publication 8
Result validation and Report
Result validation
Report preparation