This document presents a study on using genetic algorithms for classification of datasets. The objective is to create a classification engine based on genetic algorithms that can train itself from training data and classify inputs. It describes the genetic algorithm workflow including population initialization, fitness function, selection, crossover, mutation and termination criteria. The study compares the performance of genetic algorithm classification against other classifiers like ZeroR, J48, IBk on datasets from a machine learning repository. The results show genetic algorithm achieving the highest accuracy of 81.33% with 10-fold cross validation and 80.95% with a 66% train-test split. Future work plans to evaluate the approach on more datasets and classifiers.
This document presents a study on using genetic algorithms for classification of datasets. The objective is to create a classification engine based on genetic algorithms that can train itself from training data and classify inputs. It describes the genetic algorithm workflow including population initialization, fitness function, selection, crossover, mutation and termination criteria. The study compares the performance of genetic algorithm classification against other classifiers like ZeroR, J48, IBk on datasets from a machine learning repository. The results show genetic algorithm achieving the highest accuracy of 81.33% with 10-fold cross validation and 80.95% with a 66% train-test split. Future work plans to evaluate the approach on more datasets and classifiers.
This document presents a study on using genetic algorithms for classification of datasets. The objective is to create a classification engine based on genetic algorithms that can train itself from training data and classify inputs. It describes the genetic algorithm workflow including population initialization, fitness function, selection, crossover, mutation and termination criteria. The study compares the performance of genetic algorithm classification against other classifiers like ZeroR, J48, IBk on datasets from a machine learning repository. The results show genetic algorithm achieving the highest accuracy of 81.33% with 10-fold cross validation and 80.95% with a 66% train-test split. Future work plans to evaluate the approach on more datasets and classifiers.
This document presents a study on using genetic algorithms for classification of datasets. The objective is to create a classification engine based on genetic algorithms that can train itself from training data and classify inputs. It describes the genetic algorithm workflow including population initialization, fitness function, selection, crossover, mutation and termination criteria. The study compares the performance of genetic algorithm classification against other classifiers like ZeroR, J48, IBk on datasets from a machine learning repository. The results show genetic algorithm achieving the highest accuracy of 81.33% with 10-fold cross validation and 80.95% with a 66% train-test split. Future work plans to evaluate the approach on more datasets and classifiers.
U SE OF G ENETIC A LGORITHMS FOR C LASSIFICATION OF D ATASETS
{ N ITIN K UMAR , 216CS1140 } CSE DEPARTMENT, NIT R OURKELA
I NTRODUCTION O BJECTIVE G ENETIC A LGORITHM F LOW C HART
Genetic Algorithms (GA) are search-oriented The aim of this project is to create a classification timization problems.GA uses genetics as itâĂŹs repetitive optimization strategies built on the con- engine based on genetic algorithm that will train model as problem solving. Each solution in ge- cepts of natural selection theory and biological itself from the training data provided, and be able netic algorithm is represented through chromo- genetics. In Genetic Algorithms, we have a set to classify. The classification engine that is be- somes. Chromosomes are made up of genes, of possible candidate solutions to a problem at ing developed here is based on a type of machine which are individual elements (alleles) that rep- hand. These solutions are transformed into other learning algorithm called genetic algorithm. The resent the problem. The collection of all chromo- solutions using mutation and recombination, pro- classification engine will train itself to the max- somes is called population. Generally there are ducing new children solutions. And this process imum possible level by refining itself with ade- three popular operators are use in GA. is continued for several iterations. This study quate attention being paid to ensure the engine shows the comparison of performance of various will not be over-fitted, meaning the classification • Population : The evolution usually starts popular classification algorithms against GA with engine will be able to classify any input instances from a population of randomly generated respect to accuracy of the classification. precisely, not confining to the training data. individuals. In other words, population is a group of chromosomes.
M ETHODS OF GA M ETHODS OF GA • Fitness Function : Individual solutions are
1. Selection: This operator is used in selecting Commonly used mutation techniques are selected through a fitness process. Fitness individuals for reproduction. Various selection function in simple words depicts how âĂIJ- methods are • bit flip mutation Figure 1: Flow Chart fitâĂİ or how âĂIJbadâĂİ is a solution to a • random resetting Genetic algorithms are useful for search and op- given problem. • Roulette wheel selection • swap mutation • Random selection • scramble mutation • Rank selection • inversion mutation • Tournament selection R ESULTS • Boltzmann selection 4. Survivor selection: The policy that help us in finding out which chromosomes to retain for on our training dataset that shows the ac- 2. Crossover: This is the process of taking two the next generation, and which chromosomes to curacy of 76.46 percent with 10 folds cross- parent chromosomes and producing a child from be discarded is called as survivor selection policy. validation and 76.56 percent with 66 percent them. This operator is applied to create better Most commonly used survivor selection schemes split of the training dataset string. Various types of cross over operators are are • IBk Classifier: The IBk classifier is applied on our training dataset that shows the ac- • onepoint crossover • Age based selection curacy of 75.16 percent with 10 folds cross- • multi-point crossover • Fitness based selection validation and 74.16 percent with 66 percent • uniform crossover Figure 2: Classification Accuracy split of the training dataset. • whole arithmetic recombination 5. Termination condition: It is highly important • ZeroR classifier: The ZeroR classifier is ap- • Genetic Algorithm: The GA classifier is ap- to design the termination condition of any ge- 3. Mutation: Randomly inverted gene in parent plied on training dataset that shows the ac- plied on our training dataset with the ac- netic algorithm very carefully, since it determines chromosome would produce the child chromo- curacy of 59.90 percent with 10 folds cross- curacy of 81.33 percent with 10 folds cross- when to end the genetic process. If the termina- some. Mutation is very much needed for diver- validation and 63.16 percent with 66 percent validation and 80.95 percent with 66 percent tion condition has not been designed correctly, it sity of the population. split of the training dataset. split of the training dataset. may lead to infinite loops. • J48 Classifier: The J48 classifier is applied
R EFERENCES F UTURE W ORK C ONCLUSION
[1] Shanabog CS Nandish and UM Ashwinkumar. Use In this poster we have presented an approach for classification of datasets. As a way to validate the Genetic algorithm has provided the maximum of genetic algorithms for classification of datasets. proposed method, we have tested with machine learning data sets taken from UCI repository. Future classification accuracy 81.33 percent with 10-fold Recent Trends in Electronics, Information & Communi- work may include different datasets and more number of classifiers. Using different data sets may also cross validation and 80.35 percent with 66 percent cation Technology (RTEICT), 2017 2nd IEEE Interna- be useful to test the proposed system performance. splitting. tional Conference on, pages 2016–2020, 2017.