An Effective Algorithm for Correlation Attribute Subset Selection by Using Genetic Algorithm Based On Naive Bays
Works reported so far in the area of feature subset
selection for dimensionality reduction could not claimed that
the solution provided by them in the most optimal solution its
because correlation attribute feature subset selection is an
optimization problem so the scope of work remains open
further and algorithm likes ACO, GA, co-relation based GA,
Meta heuristic Search and PSO have been applied to subset
selection in the past. In my research paper we are working on,
correlation attribute subset selection done by using genetic
algorithm that based on naive bays classifier. Its aim is to
improve the performance results of classifiers but using a
significantly reduced set of features. Genetic Algorithms as an
optimization tool is proposed to be applied where Nave
Bayes Classifier will be used to compute the Classification
accuracy that will be taken as the fitness value of the Figure 3 List of parameters
individual subset.
After the k fold validation results, a subset of selected
IV. RESULT ANALYSIS attribute is shown in last of output screen. Here k is taken by
In this proposed method a source bank dataset will be the user in term of number. Classification ratio is show the
taken as input in arff file; arff is Attribute relation file format. performance of the program.
After that all the attributes of datasets are encoded. A number
of attributes are select randomly. The classification accuracy
is compute with selected attributes.
a complete detail of implemented tool is discussed along with
the description of results obtained.
A tool is designed in Java to select the subset of
features automatically based on GABASS. Tool has a GUI as
shown in figure 6.1 where three command buttons are
provided named;
a. File,
b. Preprocess
c. Classify
By clicking on the file button an ARFF format dataset file is
browsed and taken as input. An ARFF (Attribute-Relation
File Format) file is an ASCII text file that describes a list of
instances sharing a set of attributes. One such a list of features Figure 4. Selected subset of Attribute
can be seen in figure 2. An attribute value and its quantities of
instances on clicking a particular attribute. In this work, five different methods are used for feature
selection. Forward Selection Multi cross Validation,
Bootstrap backward elimination, Relief, MIFS and proposed
GABASS method are implemented and five different feature
subsets were obtained. Forward Selection Multi cross
Validation and Bootstrap backward elimination are wrapper
based method; Relief and MIFS are filter based method. To
calculate the classification accuracy for above described
methods; SIPINA tool of TANAGRA software is used. The
selected feature subsets by these five methods are detailed in
following table. The k-fold cross validation method was used
to measure the performances.
In the Feature selection methodology is the first task
of any learning approach to define a relevant set of features.
Several methods are proposed to deal with the problem of
Figure 2. Attribute list feature selection including filter, wrapper and embedded
methods. In this work, I focus on feature subset selection to
By clicking on the Classify button a number of input boxes, select a minimally sized subset of optimal features.
some check boxes and two buttons are shown in figure 3. An Feature Selection is optimization problem; genetic
input boxes are used for take user define values in respect to algorithm based attribute subset selection using nave bayes
different parameters. All check boxes are optional and used to classifier is used for this purpose. GABASS are found to be
depend on the user. the best technique for selection purpose when there is very
large population. The GABASS provides good results and
their power lies in the good adaptation to the various and fast
changing environments.
