1 s2.0 S1568494623005768 Main
1 s2.0 S1568494623005768 Main
1 s2.0 S1568494623005768 Main
article info a b s t r a c t
Article history: Feature selection aims to choose a subset of features with minimal feature-feature correlation and
Received 29 November 2022 maximum feature-class correlation, which can be considered as a multi-objective problem. Grey wolf
Received in revised form 28 April 2023 optimization mimics the leadership hierarchy and group hunting mechanism of grey wolves in nature.
Accepted 15 June 2023
However, it can easily fall into local optimization in multi-objective optimization. To address this,
Available online 19 June 2023
a novel multi-objective binary grey wolf optimization based on a guided mutation strategy (GMS),
Keywords: called MOBGWO-GMS, is proposed. In the initialization phase, the population is initialized based on
Feature selection feature correlation, and features are selected using a uniform operator. The proposed GMS uses the
Multi-objective optimization Pearson correlation coefficient to provide direction for local search, improving the local exploration
Grey wolf optimization ability of the population. Moreover, a dynamic agitation mechanism is used for perturbation to prevent
Guided mutation strategy population stagnation due to the use of a single strategy. The strategy is dynamically adjusted to
Dynamic agitation mechanism maintain population diversity and improve detection ability. To evaluate the classification ability of
quasi-optimal subsets, a wrapper-based k-nearest neighbor classifier was employed. The effectiveness
of the proposed algorithm was demonstrated through an extensive comparison with eight well-known
algorithms on fourteen benchmark datasets. Experimental results showed that the proposed approach
is superior in the optimal trade-off between the two fitness evaluation criteria and can easily jump
out of local optima compared to other algorithms.
© 2023 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.asoc.2023.110558
1568-4946/© 2023 Elsevier B.V. All rights reserved.
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
algorithm-II (NSGA-II) [22], multi-objective artificial bee colony 3. A dynamic agitation mechanism is proposed to detect
algorithm (MOABC) [23], multi-objective particle swarm opti- whether the algorithm is trapped in local optimum and dy-
mization (MOPSO) [24], and other algorithms have been proposed namically change the search strategy. The archives are han-
based on multi-objective FS problems. dled concurrently to prevent over-storage and performance
Grey wolf optimization (GWO) [25] has recently received ex- impacts.
tensive attention from scholars owing to its strong global search
capability, few parameters, and easy implementation. According The remainder of this paper is organized as follows. An in-
to previous studies, GWO has good exploration and exploita- troduction to the related work is given in Section 2 and the
tion ability to update the solutions during optimization, and it background information is presented in Section 3. Section 4 char-
has been successfully applied to anomaly detection [26], Ara- acterizes the proposed method. The experimental design and
bic text classification [27], disease diagnosis [28], and feature results are demonstrated in Section 5. The conclusions and some
selection [29] problems. Although the algorithm has excellent perspectives for future work are provided in Section 6.
performance in many fields, many researchers have continuously
improved it, making it beneficial to solve the FS problem related 2. Related work
to its inherent defects, however, from some of the literature, we
found that some random mechanisms in GWO may significantly 2.1. Single-objective meta-heuristic algorithm for feature selection
affect the convergence and diversity of the population. In par-
ticular, the leader is selected randomly from the Pareto optimal Swarm-based MHs have become increasingly popular for solv-
solutions in the iterative process, which has a great influence on ing FS problems due to their excellent optimization capabilities.
the late convergence of the algorithm. Unfortunately, in many Some examples of such MHs are GA, PSO, HHO, ACO, ABC, WOA,
multi-objective GWOs, little research has focused on and studied
and their variants. Most researchers tend to treat the FS problem
this issue. It should be noted that when an algorithm converges
as a SOP since they may only focus on a certain optimization
at a later stage, each individual needs to explore near its current
objective. Tahir et al. [32] introduced a binary chaotic genetic
location, rather than being influenced by a randomly selected
algorithm (BCGA) that utilizes chaotic maps to enhance the pop-
leader to explore a feature space far from its current location.
ulation initialization and mutation method of the conventional
Most MH algorithms use random initialization strategies in
GA. The BCGA replaces the random variable selection process
the initial population stage. The disadvantage of this method
with chaotic sequences, thus improving the quality of solutions.
lies that, on the one hand, the initialized population carries less
gainful information and wastes computer resources in the early However, this improvement comes at the cost of greater com-
iteration of the algorithm; on the other hand, especially for high- putational complexity and increases time consumption due to
dimensional datasets, the initialized solutions are concentrated additional algorithmic operations. Li et al. [33] developed a hybrid
together, which cannot perfectly explore the search space and FS technique for high-dimensional datasets that combines a Q-
expand the search range. To overcome the limitations, Wang learning prescreening approach based on the maximum informa-
et al. [30] introduced a population initialization method with tion coefficient with an improved PSO method. Paniri et al. [34]
mixed initialization and threshold selection, and principal com- proposed using the PCC to measure both the redundancy and rel-
ponent analysis is utilized to sort the importance of features. Song evance of features. Specifically, the minimum correlation between
et al. [31] proposed an effective population initialization strategy features is used to measure redundancy, while the maximum
according to the correlation between features and class labels correlation between features and class labels is used to measure
to accelerate the convergence of the population. The Pearson relevance. Karimi et al. [35] employed a differential reinforcement
correlation coefficient (PCC) is both an upgrade of the Euclidean learning algorithm to accomplish semi-supervised classification
distance and an improvement of the cosine similarity in the ab- tasks, which explores the feature space by conducting a non-
sence of dimension values, which can avoid the problem of rating linear function search. Hashemi et al. [36] created multiple filter
scale inflation. In short, the PCC can more accurately calculate the methods for ranking features, which are then used to guide pop-
correlation between two sets of data. ulation evolution. Bayati et al. [37] presented an efficient method
Our paper aims to create a novel guided mutation strategy that combines subspace learning and a memetic algorithm to
(GMS) that collaborates with GWO to jointly search for optimal search the feature space, which demonstrates high efficacy in
solutions and prevents a single search mechanism from get- feature space exploration.
ting trapped in a local optimum. The GMS utilizes correlations The GWO algorithm is a simple and comprehensible opti-
between feature-feature and feature-class to guide population mization algorithm that is easy to implement. Its less parameter
search, thereby increasing the likelihood of avoiding local optima tuning makes it a suitable choice for FS problems [38]. Recent
and guiding the population towards better evolution. Unlike ex- studies have applied GWO to FS, and Table 1 provides a summary
isting hybrid search algorithms, the proposed algorithm is not of these applications.
a mere combination of strategies, instead, it adapts the search
strategy based on the current iterative state until finding an 2.2. Multi-objective meta-heuristic algorithm for feature selection
optimal solution. The main innovations and contributions of this
paper include the following aspects.
When selecting features, multiple conflicting objectives need
1. A novel initialization strategy is proposed. The strategy to be optimized simultaneously. These objectives may involve
first assigns a weight to each feature (the weight is cal- maximizing predictive accuracy, minimizing the size of feature
culated by PCC for feature-class correlation), then applies subsets, minimizing computational complexity, and other factors.
the binary tournament selection to initialize the popula- Therefore, numerous researchers frequently regard FS as a MOP.
tion. The proposed uniform operator maintains a uniform NSGA-II [22], which is a genetic algorithm derived from the
distribution of individuals. classical MOPs, was applied to tackle the multi-objective feature
2. A new GMS is designed to combine with GWO to accelerate subset selection problem in [46]. Amoozegar et al. [47] pro-
convergence. GMS is unique in that it is guided by both posed a multi-objective FS based on PSO, which ranks solutions
the correlation between features, and the correlation be- based on their frequency within the archive set, and individuals
tween classes. For local exploration, this strategy contains with higher rankings are utilized to guide the evolution of both
a triple-mutation mechanism. other individuals and archives. Hu et al. [48] developed a fuzzy
2
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
Table 1
Summary research of GWO algorithms for feature selection in recent years.
Methods Basic information Merits Limits
COGWO2D Ibrahim et al. integrated GWO with The COGWO2D enhances the The OBL strategy and the disruption
(2018) [39] differential evolution. effectiveness of local search. Moreover, operator contribute to an increase in
this method incorporates interruption computational complexity.
operators to preserve population
diversity.
MEGWO Tu et al. introduced a multi-strategy The enhanced global-best lead strategy, Each strategy introduces additional
(2019) [40] integrated FS technique that adaptable cooperative strategy, and parameters.
incorporates several strategies. disperse foraging strategy collectively
balance the trade-off between
exploitation and exploration.
GWOCSA Arora et al. proposed a FS technique The GWOCSA aims to combine the GWOCSA algorithm in simple mode is easy
(2019) [41] that utilizes both the GWO and crow strengths of both algorithms to produce to have premature.
search algorithm. highly favorable quasi-optimal solutions.
EGWO Zhao et al. improved the GWO The EGWO’s effectiveness in diagnosing The search capacity of GWO requires
(2019) [42] algorithm by incorporating a chaotic patients with paraquat poisoning was additional enhancement.
search technique to identify the optimal evaluated, and it achieved AUC,
feature subset. accuracy, sensitivity, and specificity
values exceeding 90%.
Improved Hu et al. enhanced the updating The improved BGWO methods have The focus of the experiments has been to
BGWO equation of the BGWO and implemented better classification accuracy. analyze the essential role of the transfer
(2020) [43] it in addressing binary optimization function in GWO, and the comparison
problems. algorithm lacks originality.
TMGWO Abdel-Basset et al. developed a The first stage of the mutation method The TMGWO takes much time to complete
(2020) [20] two-stage mutation FS technique that can reduce the number of selected the evaluation process.
enhances the local search capability of features, while the following stage can
GWO. enhance the classification accuracy of
TMGWO by introducing supplementary
information.
IGWO Shen et al. designed a two-stage method The benefits of the IGWO are The IGWO may have a potential drawback
(2021) [44] that combines GWO with Neural increasingly prominent in terms of time due to its inherent complexity, which could
network for small samples of consumption, classification accuracy, and make implementation challenging.
high-dimensional biomedical data. feature subset size as the dimensionality Furthermore, selecting suitable parameters
of the FS problem grows. could prove to be a difficult task.
GWORS Sathiyabhama et al. presented a new FS GWORS integrates both GWO and rough When dealing with unfamiliar datasets,
(2021) [45] method that utilizes a combination of set theory to improve the accuracy of where the feature size is unknown, the
GWO and rough set theory to identify feature subset. accuracy of GWORS may decrease
important features from mammograms. significantly.
BIGWO Rajammal et al. created an approach that Applying the mutation operation of The dataset used in the experiment is
(2022) [28] utilizes GWO to categorize Parkinson’s BIGWO improves exploration capacity in low-dimensional and presents challenges
disease using a set of features that have FS, resulting in higher accuracy when when applied to other complex
been optimized for optimal performance. classifying Parkinson’s disease. classification tasks.
multi-objective FS technique for which utilizes a fuzzy dominance From Tables 1 and 2, although researchers have put significant
relation to assess the quality of quasi-optimal particles and de- efforts into solving FS problems based on GWO, the algorithms
fines a fuzzy crowding distance metric to select the global leader mentioned still have the following limitations and challenges. In
of particles. Zhang et al. [49] studied a bi-objective artificial bee particular, it is not sufficient to solve MOPs based on GWO in the
colony algorithm that simultaneously considers the feature cost existing research.
and classification which utilizes a convergence-guided search and
1. Random initialization may result in individual distribu-
a dual external archive strategy, leading to improved retrieval of
tions within discrete or aggregated groups. The quality of
different species of bees. Wang et al. [50] developed a bi-objective
the initial populations cannot be guaranteed, which can
FS framework using which incorporates a sample selection strat-
limit GWO’s ability to explore and exploit the search space
egy and a stepped sample utilization strategy based on K-means
effectively.
clustering to obtain better feature subsets with less running time.
2. GWO may become trapped in a local optimum if it does not
Piri et al. [51] introduced a bi-objective optimization technique
explore the search space adequately, resulting in subopti-
based on the harris hawk optimizer, which utilizes the crowding
mal feature subsets.
distance as a third criterion for selecting quasi-optimal solutions.
3. In most of the literature, the lead wolves were randomly
Zhou et al. [52] introduced a two-stage particle cooperative strat-
selected from the non-dominant solutions. The track of the
egy for a multi-objective FS method, which includes optimizing a lead wolves has a certain degree of randomness, which fails
distance metric as the third objective. Their approach combines to speed up convergence in the later stage.
randomly initialized particles and Relief-filtered particles to form
4. Many improved multi-objective GWO algorithms only
an initial population and designs a reset operation. prove their effectiveness in low-dimensional spaces and
In recent years, researchers have investigated the potential need further research when dealing with complex high-
of the GWO algorithm for tackling multi-objective FS problems. dimensional spaces.
Mirjalili et al. [58] introduced the first multi-objective version of
GWO, and subsequent researchers have further improved their al- 3. Background
gorithms, leading to the development of MOGWO. The following
Table 2 summarizes recent research on multi-objective FS based In this section, multi-objective optimization, grey wolf opti-
on GWO. mization and Pearson correlation coefficient are introduced.
3
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
Table 2
Summary research of MOGWO algorithms for feature selection in recent years.
Methods Basic information Merits Limits
MO-GWO Emary et al. proposed a multi-objective MO-GWO can filter out redundant Its effectiveness was only tested on
(2015) [53] FS method that combines GWO and information early and improve search low-dimension datasets with less than 18
mutual information. performance in later stages. features. It is difficult to extend the
MO-GWO to high-dimensional datasets.
NSGWO Sahoo and Chandra developed two The NSGWO algorithm achieves a The NSGWO lacks more appropriate and
(2017) [54] different multi-objective GWO remarkable accuracy of 91.1% when comprehensive indicators to reflect the
algorithms, named MOGWO and used for predicting cervical cancer. quality of its solutions, in order to increase
NSGWO. MOGWO utilizes a linear practicality and effectiveness.
scalarization method, while NSGWO is a
posteriori approach.
BMOGWO-S Al-Tashi et al. proposed a binary BMOGWO-S is a cost-effective algorithm There is a shortage of algorithms available
(2020) [55] multi-objective FS algorithm that utilizes for identifying a set of non-dominated for the purpose of algorithm comparison.
a combination of MOGWO and sigmoid solutions. It is computationally efficient The advantage of the BMOGWO-S is
function. The algorithm employs an compared to other algorithms. primarily dependent on the performance of
artificial neural network to assess the the original GWO. The dataset used in the
classification performance of different experiments has a low dimensionality.
feature subsets.
MOBGWO Mendeley et al. proposed a The MOBGWO method demonstrates The execution time of the selected
(2021) [56] multi-objective binary GWO method for efficacy in predicting the energy regressors is heavily reliant on their
FS, which is tested on a dataset for consumption of appliances and can be configuration parameters, which should be
predicting energy usage in appliances. applied to different types of datasets. adjusted according to the dataset’s
characteristics.
Aided Ukken et al. developed a multi-objective The performance of aided BMOGWO-S When dealing with datasets containing few
BMOGWO-S feature selection algorithm based on can be achieved through the utilization features, the performance of the aided
(2023) [57] GWO technique. The algorithm utilizes of statistical information. BMOGWO-S may be less effective than
statistical information from the dataset filter-based methods, especially when
to select relevant features, while also considering the time it takes to run the
incorporating techniques to improve its method.
runtime efficiency.
3.1. Multi-objective optimization β wolves and δ wolves are subordinates of α wolves, whose
primary duties are to help and tail α wolves. ω wolves are the
Multi-objective optimization problem (MOP) is a mathemat- lowest wolves and must obey other lead wolves. The optimiza-
ical problem involving simultaneous optimization of multiple tion process of the GWO includes grey wolf social stratification,
objective functions, and the objective functions of such issues tracking prey, encircling prey and attacking prey.
are often conflicting. The general mathematical description of a (1) Social hierarchy The top three best fittest solutions are
multi-objective optimization problem is as follows: treated as α , β , and δ wolves, respectively, and the remaining
candidate solutions are considered as ω wolves. During hunting
min F (X ) = {f1 (X ), f2 (X ), . . . , fm (X )}T
{
(1) process, ω wolves follow α , β wolves, and δ wolves.
s.t .X ∈ Ω (2) Encircling prey The following equation models the prey
encircled by grey wolves:
where F (X ) is the objective function, m is the number of objective
functions, X ∈ Ω means X belongs to the decision space. −−→
t +1
⏐−
⏐→ ⃗ ⃗ ⏐
⏐
xi = ⏐ xtp + A · D⏐ (3)
∀i ∈ {1, 2, . . . , m} : fi (c) ≤ fi (d) and ∃j ∈ {1, 2, . . . , m} : fj (c) < fj (d) −−→
where xti +1 is the vector position of the ith wolf during
(2) −
→
⃗
iteration = t + 1, xti is the position vector of the prey, and A,
where m is the number of objective functions, c and d are two ⃗
D are two coefficient vectors which affect the position of the
solutions of a MOP problem. wolves.
In the MOP, the optimal solution of the MOP needs to be ⃗ and D
The vectors A ⃗ are obtained through the following calcu-
made when two or more conflicting objectives are weighed. To lation:
clarify the advantages and disadvantages of MOP solutions, a ⏐ ⏐
special relationship needs to be defined to compare solutions. For ⃗ = ⏐2a⃗rand1 − a⃗⏐
A (4)
example, c and d are the two solutions of the MOP problem. If ⏐ −→ − →⏐
Eq. (2) is met, then c is better than d. Among all solutions to the ⃗ = ⏐⏐C⃗ xtp − xt ⏐⏐
D (5)
MOP, if solution c satisfies Eq. (2) for any other solution, then c is
called a Pareto solution. The combination of all Pareto solutions ⃗ decrease linearly from 2 to 0 during the
where components of a
−
→
is called the Pareto solution set or Pareto front. iteration process, randi=1,2 are random vectors in [0, 1], xt is the
vector position of the ith wolf during iteration = t.
3.2. Grey wolf optimization
−
→ −
→
The vectors C and a are calculated as follows:
−
→
Mirjalili et al. [25] designed the Grey wolf optimization (GWO) C = 2rand2 (6)
to imitate the strict social dominance level and group hunting
−
→ 2
mechanism of grey wolves in nature. The grey wolves in GWO al- a =2−I ·( ) (7)
gorithm have four levels, including α wolves, β wolves, δ wolves, MAXiteration
and ω wolves. α wolves are the leaders and are mainly responsi- where I is the number of iterations, MAXi teration is the total
ble for establishing the rules of behavior for the entire population. number of iterations.
4
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
−
→
C i = 2 · randi=1,2,3 (11)
−
→ − → −
→
where xα , xβ and xδ represent the positions of α wolf, β wolf
and δ wolf respectively.
(4) Attacking prey and Search for prey During this process,
the vector A controls whether to attack the prey, that is, when
|A| < 1, the wolf is forced to attack the prey; when |A| > 1, the
grey wolf is forced to leave the prey in the hope of finding more
suitable prey.
Fig. 6. Increase only mutation strategy. Fig. 8. Simultaneous increase and decrease mutation strategy.
Table 4
Parameter settings of nine algorithms.
Algorithm Private parameters Common parameters
MOABC The epsilon size is set to epsilon = 0.001; Controlling
the importance of food sources in the production of
The upper bound ub =
new food sources w1 = 0.8; Controlling the importance
1;
of information provided by an employed bee w2 = 0.7;
The lower bound lb = 0;
MOBGWO No other parameters
Population size N = 50;
MOBHHO Setting beta = 1.5 during the Levy flight;
Iteration count maxIter
NSGA-II The crossover rate is set to 1; The mutation rate is set
= 50;
to 1/D, where D is the dimension of the feature;
NSGA-II-SDR The crossover rate is set to 0.1; The mutation rate is set
to 1/D; The distribution index of both is set to 20;
MOPSO The values of cognitive factor c1 = 1 and social factor
c2 = 2; The inertial weight is set to w = 0.9; The
maximum velocity (Vmax) is set to (ub-lb)/2;
CMDPSOFS The inertial weight (w) is chosen randomly from the
range [0.1, 0.5], while the acceleration constants (c1 and
c2) are randomly chosen from the range [1.5, 2]; The
mutation rate Mr = 1/D, where D is the size of the
feature space.
MOBGWO-GMS The threshold for directional mutations is set to thres =
0.5;
SparseEA The crossover rate is set to 1.0; The mutation rate is set
to 1/D, where D is the dimension of the feature.
Fig. 9. The initial population distribution generated by the proposed initialization and the traditional random initialization.
individuals and increases the amount of useful information. To the GMS strategy constructs two shared pools that facilitate the
evaluate the efficiency of the proposed initialization method, we triple-mutation of selected and unselected features based on two
compare the distribution of the traditional random initializa- types of correlations. This triple-mutation operation enables indi-
tion method with the proposed method by using three datasets viduals to search for positions nearby in later iterations, leading
including Waveform, Musk and Madelon, each with a popula- to faster convergence of the population.
tion size of 100. As shown in Fig. 9, the proposed initialization
method achieves higher accuracy on the Waveform and Madelon
5.5. Results and discussions
datasets. The traditional random initialization method tends to
select a limited range of features, resulting in individuals clumped
together. Our proposed method generates individuals that are The results were analyzed qualitatively and quantitatively.
widely and uniformly distributed in space, which enhances the For the qualitative analysis of the results, the 20 Pareto fronts
global search ability by providing a more diverse and balanced obtained for each algorithm and dataset were combined into two
population. Therefore, it can be concluded that the proposed groups, named ‘‘best’’ and ‘‘average’’. Section 5.5.1 presents their
initialization method can enhance the global search ability by comparison. For the quantitative analysis of the results, the qual-
providing more diverse and balanced populations. ity of the solution is comprehensively evaluated by calculating
GMS is developed to enhance GWO’s ability to solve multi- three significant indicators (C-metric, HV and IGD). In addition,
objective feature selection problems by avoiding local optima. we analyzed the computational complexity and runtime of all
We consider that random factors and limited useful informa- algorithms
tion can cause GWO to become trapped in local optima. To ad-
dress this, GMS conducts local searches by using feature-feature
5.5.1. Pareto frontier analysis
and feature-class correlations. The GMS search is triggered when
The UCI Machine Learning Repository provides fourteen data-
GWO falls into a local optimum. To evaluate GMS’s effectiveness,
sets that are categorized into three subgroups based on their
we compare the search paths of GWO and GMS on the Madelon
dimensionality: low, medium, and high. To obtain two sets of
dataset over the last ten iterations. We randomly selected five
non-dominated solutions, twenty independent experiments were
individuals and compared the performance of the two operators
conducted, and their results were combined to form the ‘‘best’’
under the same conditions, as shown in Fig. 10. It can be seen
and ‘‘average’’ sets. Each dataset was randomly split into a train-
that the GWO operator has high exploration blindness, resulting
ing and a testing set, resulting in four sets of experimental
in poor local exploration. The repeated paths taken by individu-
results: ‘‘best train’’, ‘‘best test’’, ‘‘average train’’, and ‘‘average
als lead to significant waste of computer resources. In contrast,
test’’. Each of the four experiment groups underwent individual
10
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
Fig. 11. Obtained ‘‘best train’’ and ‘‘best test’’ Pareto fronts from the nine methods on low-dimensional datasets.
analysis. Figs. 11 to 13 show the Pareto frontiers of the pro- obtains better non-dominated solution sets compared to other
posed MOBGWO-GMS algorithm and other algorithms on low-, algorithms on the Ionosphere and Waveform datasets. In other
medium-, and high-dimensional datasets in terms of ‘‘best train’’ words, MOBGWO-GMS achieves a better classification error rate
and ‘‘best test’’ results. Each figure is divided into two parts, with and fewer features.
the left showing the ‘‘best train’’ and the right showing the ‘‘best On the medium-dimensional datasets (HillValley, Musk, Made-
test’’. The horizontal axis denotes the number of selected features, lon, and Isolet5), Fig. 12 displays the performance of the proposed
while the vertical axis represents the classification error rate. The algorithm. The left and right sides of Fig. 12 represent the ‘‘best
dataset name and group are indicated at the top of each figure, train’’ and ‘‘best test’’ results for each dataset, respectively. It
along with the original feature size and classification error rates can be seen that the proposed algorithm achieves the best or
using the full feature set in parentheses. sub-optimal results in all cases. The MOBGWO-GMS excels in
On the low-dimensional datasets (Vehicle, German, Ionosphere, reducing the number of features in Musk, Madelon, and Isolet5
Waveform, Sonar, and Libras), it can be seen from Fig. 11 that all datasets, making it a favorable choice when the decision maker
algorithms can effectively reduce the number of features, and the is cost-conscious. The MOBGWO-GMS achieves high classification
classification error rate is also significantly reduced. In particular, accuracy, especially on the Madelon dataset where it achieves
MOBGWO-GMS shows superior performance with fewer features around 90%, outperforming other algorithms. In Madelon and
in most datasets as depicted by the ‘‘best train’’ results on the left Isolet5, the solution sets obtained by MOBGWO-GMS dominate
side of Fig. 11. For example, when 4 or 5 features are selected other algorithms. MOBGWO-GMS performs better than other
in the German dataset, the corresponding classification error algorithms in terms of the number of features and classification
rate is lower than in other algorithms. Similarly, the ‘‘best test’’ accuracy in the ‘‘best test’’ results, and its solution set outper-
analysis on the right side of Fig. 11 confirms that MOBGWO-GMS forms other algorithms on all four datasets. Notably, the proposed
11
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
Fig. 12. Obtained ‘‘best train’’ and ‘‘best test’’ Pareto fronts from the nine methods on medium-dimensional datasets.
Fig. 13. Obtained ‘‘best train’’ and ‘‘best test’’ Pareto fronts from the nine methods on high-dimensional datasets.
initialization strategy allows our algorithm to search for solutions MOBGWO-GMS method in dealing with high-dimensional feature
in lower-dimensional feature ranges in datasets such as Musk and selection problems is remarkable.
Isolet5. Furthermore, the GMS strategy helps the algorithm find Figs. 14 to 16 depict the average Pareto fronts for low, medium,
more potential non-dominated solutions. MOBGWO-GMS offers and high dimensional datasets, respectively. The left side of each
a subset of non-dominated features that can satisfy different graph represents the ‘‘average train’’ and the right side represents
decision-makers’ needs. the ‘‘average test’’. In low-dimensional datasets, as shown in
When it comes to the experimental results on the high- Fig. 14, the average Pareto frontier obtained by MOBGWO-GMS
dimensional datasets (SRBCT, DBWorld, Leukemia1 and Brain_ outperforms other algorithms on some datasets, for example, the
Tumor1), their ‘‘best train’’ and ‘‘best test’’ are shown in Fig. 13. Vehicle, German,Ionosphere and Sonar datasets in the training
In the ‘‘best train’’ on the left side of Fig. 13, we can see that the set, and the German, Waveform and Libras,in the test set. In the
Pareto solution set obtained by MOBGWO-GMS dominates the so- mid-dimensional dataset, as shown in Fig. 15, we can see that
lution set obtained by other algorithms, which demonstrates that our proposed algorithm outperforms other algorithms in Madelon
MOBGWO-GMS can not only reduce the dimension of features (average train and test) and Musk (average test). MOBGWO-GMS
more effectively on high-dimensional datasets but also has ex- is able to control the search range to a smaller number of features.
cellent classification accuracy. In the ‘‘best test’’, MOBGWO-GMS For example, in the Musk dataset, the search range of various
obtains a similar conclusion. Although the classification error rate algorithms can be clearly displayed. The main reason for this
on the SRBCT dataset is slightly lower than that of the MOABC is our proposed initialization strategy. The same conclusion is
algorithm, MOBGWO-GMS can achieve similar classification ac- especially obvious in all high-dimensional datasets, as shown in
curacy to MOABC using less than 30 features. The proposed algo- Fig. 16, which delivers the search range of various algorithms.
rithm can achieve the lowest classification error rate, particularly Meanwhile, MOBGWO-GMS is also significantly competitive in
in SRBCT, DBWorld, Leukemia1, and Brain_Tumor1 datasets, with terms of classification error rate. The GMS strategy can explore
error rates of 14.28%, 16.67%, 16.67%, and 8.03% (using 19, 162, 46, more possible solutions and make the algorithm converge to the
and 88 features, respectively). Therefore, the performance of the optimal Pareto front quickly.
12
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
Fig. 14. Obtained ‘‘average train’’ and ‘‘average test’’ Pareto fronts from the nine methods on low-dimensional datasets.
Fig. 15. Obtained ‘‘average train’’ and ‘‘average test’’ Pareto fronts from the nine methods on medium-dimensional datasets.
In conclusion, MOBGWO-GMS demonstrates strong compet- eight comparison algorithms. If C(MOBGWO-GMS, X) is greater
itiveness in multi-objective feature selection. The algorithm ef- than C(X, MOBGWO-GMS), it confirms that the proposed al-
fectively reduces feature dimensionality and minimizes classifi- gorithm has a better convergence ability towards the optimal
cation error rates. Furthermore, the algorithm holds tremendous solution [47]. The computed C-metric values for all datasets in
potential in handling high-dimensional data. the training and testing sets are displayed in Tables 5 and 6. The
first eight columns of the table represent C(MOBGWO-GMS, X),
5.5.2. Performance metrics while the second eight columns show C(X, MOBGWO-GMS). Upon
The purpose of this section is to assess the performance of analyzing the results, it can be observed that:
the MOBGWO-GMS algorithm using three multi-objective feature (1) In the training set, all algorithms (X) exhibit much higher
selection metrics: C-metric, HV, and IGD. values of C(MOBGWO-GMS, X) compared to C(X, MOBGWO-
The C-metric is a widely used measure for evaluating the GMS). For instance, in the Musk dataset, MOBGWO-GMS dom-
effectiveness of solution sets by comparing the scores of two inates 99% of the solutions obtained by MOABC (C(MOBGWO-
Pareto fronts. In our study, we evaluated the performance of GMS, MOABC) = 0.99), whereas none of the solutions in MOBGWO-
nine algorithms using the C-metric index on both the training GMS are dominated by MOABC (C(MOABC, MOBGWO-GMS) =
and testing sets. Specifically, we calculated C(MOBGWO-GMS, 0). Notably, the fact that the mean (Avg) and variance (Std)
X) and C(X, MOBGWO-GMS) using Eq. (24), where X represents in C(X, MOBGWO-GMS) are both zero implies that none of the
13
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
Fig. 16. Obtained ‘‘average train’’ and ‘‘average test’’ Pareto fronts from the nine methods on high-dimensional datasets.
Table 5
Average and standard deviation of C-metric values between MOBGWO-GMS and the eight other methods in training data.
Training set C(MOBGWO-GMS,X) as X: C(X,MOBGWO-GMS) as X:
MOABC MOBGWO MOBHHO NSGA-II NSGAII-SDR MOPSO CMDPSOFS SparseEA MOABC MOBGWO MOBHHO NSGA-II NSGAII-SDR MOPSO CMDPSOFS SparseEA
Mean 0.9208 0.7138 0.9395 0.6605 0.2333 0.7259 0.7083 0.8196 0.0393 0.2094 0.0207 0.2295 0.1553 0.2261 0.2600 0.1442
Vehcile
Std 0.1610 0.1816 0.0845 0.1801 0.1567 0.1857 0.1923 0.0807 0.0834 0.1067 0.0426 0.1457 0.0587 0.1460 0.1873 0.0552
Mean 0.8531 0.8244 0.9839 0.8175 0.4167 0.8323 0.8788 0.5945 0.0616 0.1517 0 0.1064 0.2155 0.0999 0.1084 0.3244
German
Std 0.1475 0.1118 0.0394 0.2353 0.1431 0.1876 0.1270 0.1581 0.1055 0.0664 0 0.1445 0.1186 0.1728 0.1068 0.1393
Mean 1 0.7592 1 0.6933 0.2208 0.9117 0.9667 0.7117 0 0.1408 0 0.2092 0.3108 0.0300 0.0083 0.3025
Ionosphere
Std 0 0.2224 0 0.2559 0.1716 0.1515 0.0872 0.1477 0 0.1826 0 0.1870 0.1374 0.1342 0.0373 0.1509
Mean 1 0.9571 1 0.7214 0.3875 0.8291 0.8734 0.8751 0 0.0277 0 0.1926 0.1035 0.0851 0.0346 0.0987
Waveform
Std 0 0.0706 0 0.1910 0.1879 0.1746 0.1293 0.0486 0 0.0474 0 0.1706 0.0414 0.0905 0.0468 0.0356
Mean 0.9275 0.8065 1 0.7508 0.4458 0.8441 0.9168 0.7067 0 0.0975 0 0.1923 0.1380 0.0715 0.0171 0.2639
Sonar
Std 0.1426 0.2097 0 0.2080 0.2392 0.1740 0.1741 0.1493 0 0.1731 0 0.2262 0.0528 0.0709 0.0536 0.0932
Mean 1 0.7631 0.9521 0.7627 0.6208 0.9068 0.9363 0.7983 0 0.1108 0 0.1783 0.1006 0.0619 0.0036 0.2027
Libras
Std 0 0.1298 0.1302 0.2599 0.1130 0.0868 0.1623 0.1175 0 0.0645 0 0.2143 0.0170 0.0534 0.0160 0.0932
Mean 1 0.8312 0.9909 0.5992 0.6042 0.9750 0.8723 0.3417 0 0.0350 0 0.1646 0.2163 0 0.1346 0.5800
HillValley
Std 0 0.2610 0.0407 0.2267 0.0892 0.1118 0.1210 0.2403 0 0.1182 0 0.2478 0.0692 0 0.1156 0.2129
Mean 0.9900 0.9181 0.9944 0.5880 0.7092 0.8800 0.9209 0.8371 0 0.0111 0 0.0744 0.0841 0.0081 0.0205 0.1443
Musk
Std 0.0447 0.1766 0.0248 0.2243 0.1028 0.2225 0.1149 0.0525 0 0.0275 0 0.1356 0.0304 0.0252 0.0375 0.0563
Mean 1 0.8329 1 1 1 1 1 0.6308 0 0.1648 0 0 0 0 0 0.2182
Madelon
Std 0 0.0490 0 0 0 0 0 0.2137 0 0.0196 0 0 0 0 0 0.1044
Mean 0.7579 0.5652 0.9547 0.1157 1 0.3102 0.6907 0.7302 0 0.0782 0 0.0227 0 0.0092 0 0.1493
Isolet5
Std 0.2508 0.3104 0.1019 0.0193 0 0.2751 0.3048 0.1765 0 0.0797 0 0.0430 0 0.0283 0 0.1366
Mean 1 0.9958 1 1 0.9064 1 1 0.8882 0 0 0 0 0.0700 0 0 0.1164
SRBCT
Std 0 0.0186 0 0 0.0366 0 0 0.0594 0 0 0 0 0.0270 0 0 0.0972
Mean 1 0.9917 1 1 0.8977 1 0.9833 0.4458 0 0.0031 0 0 0.0833 0 0.0031 0.5069
DBWorld
Std 0 0.0373 0 0 0.0148 0 0.0745 0.1583 0 0.0140 0 0 0.0227 0 0.0140 0.2482
Mean 0.8822 1 1 1 0.9102 0.8997 0.9950 0.4317 0.0946 0 0 0 0.0875 0.0599 0.0083 0.6189
Leukemia1
Std 0.0226 0 0 0 0.0315 0.0845 0.0224 0.1547 0.0274 0 0 0 0.0324 0.0509 0.0373 0.2211
Mean 0.8713 1 1 1 0.9175 0.9234 0.9524 0.6431 0.1288 0 0 0 0.1004 0.0650 0 0.3554
Brain_Tumor1
Std 0.0467 0 0 0 0.0518 0.0824 0.1286 0.2152 0.0536 0 0 0 0.0763 0.0779 0 0.1873
solutions in X dominates any of the solutions in MOBGWOGMS. t-test comparison are indicated in the tables with the symbols
This trend is evident in the Ionosphere, Waveform, Libras, Isolet5, ‘‘+’’, ‘‘−’’, and ‘‘=’’.
Leukemia1, and Brain_Tumor1 datasets for MOABC, MOBHHO,
1. ‘‘+’’ indicates that the performance of MOBGWO-GMS is
NSGA-II, and CMDPSOFS.
significantly better than the corresponding algorithm;
(2) In the testing set, we observed that the value of
2. ‘‘−’’ indicates that the performance of MOBGWO-GMS is
C(MOBGWO-GMS, X) is significantly higher than the corres-
inferior to the corresponding algorithm;
ponding value of C(X, MOBGWO-GMS). Moreover, C(X,
MOBGWO-GMS) is even closer to zero. These results suggest that 3. ‘‘=’’ indicates that there is no significant difference in the
MOBGWO-GMS has strong convergence capabilities, allowing it performance of the two algorithms.
to converge towards optimal solutions. Table 7 demonstrates that the proposed MOBGWO-GMS al-
HV is another popular metric used for assessing the quality gorithm outperforms the other eight algorithms significantly in
of the Pareto front. Its purpose is to calculate the hypervolume terms of the HV metric. Except for the HillValley datasets, the
of the space formed by the Pareto front and a chosen reference MOBGWO-GMS algorithm achieves the highest HV values. The
point. In this study, the reference point chosen was (1.1, 1.1) MOBGWO-GMS algorithm exhibits superior performance in the
due to its ability to measure both the convergence and diversity HV metric compared to other algorithms, which is attributed to
of the set of solutions. A higher HV value indicates a higher its ability to maintain a diverse population. From Table 8, except
quality of the solution set. Each experiment involved calculating for the Vehcile, German, Ionosphere, and Leukemia1 datasets,
the hypervolume HV using the optimal frontier ‘‘best’’ as the MOBGWO-GMS achieves the biggest value of HV in the remaining
Pareto frontier. The experiment was divided into two groups: the datasets. Our algorithm ranks among the top three in terms of
training set and the testing set. The average of 20 independent HV value on the three low-dimensional datasets (Vehicle, Ger-
algorithm runs was computed. The statistical significance of the man, Ionosphere), although it may not be optimal. And on the
HV results was assessed using a paired t-test with a significance Leukemia1 dataset, the HV value of our algorithm is 0.0847 lower
level of 0.05, as illustrated in Tables 7 and 8. The results of the than that of the SparseEA algorithm.
14
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
Table 6
Average and standard deviation of C-metric values between MOBGWO-GMS and the eight other methods in testing data.
Testing set C(MOBGWO-GMS,X) as X: C(X,MOBGWO-GMS) as X:
MOABC MOBGWO MOBHHO NSGA-II NSGAII-SDR MOPSO CMDPSOFS SparseEA MOABC MOBGWO MOBHHO NSGA-II NSGAII-SDR MOPSO CMDPSOFS SparseEA
Mean 0.9083 0.4461 0.8436 0.5658 0.2000 0.2685 0.4475 0.2494 0.0468 0.4685 0.0907 0.2333 0.2416 0.6493 0.4858 0.6495
Vehcile
Std 0.1891 0.2737 0.2279 0.2485 0.1675 0.2197 0.2771 0.2431 0.0974 0.2836 0.1634 0.1399 0.1067 0.2488 0.3159 0.2592
Mean 0.8792 0.1250 0.7750 0.3088 0.2292 0.4193 0.2761 0.7149 0.0403 0.7644 0.1278 0.3990 0.3369 0.4573 0.7682 0.2317
German
Std 0.2067 0.1804 0.2194 0.2797 0.2292 0.3297 0.2447 0.1277 0.1001 0.1855 0.1703 0.1973 0.1734 0.2618 0.2711 0.0949
Mean 0.8417 0.6708 1 0.8017 0.1833 0.8431 0.7803 0.0767 0 0.2950 0 0.1642 0.3683 0.1450 0.0167 0.8400
Ionosphere
Std 0.2386 0.3004 0 0.2555 0.2016 0.1375 0.2869 0.1466 0 0.3212 0 0.2294 0.1728 0.1175 0.0513 0.1887
Mean 1 0.6357 1 0.4169 0.2417 0.8799 0.8705 0.8170 0 0.3145 0 0.5326 0.1344 0.0524 0.0867 0.1446
Waveform
Std 0 0.1809 0 0.2127 0.1597 0.1651 0.1069 0.0727 0 0.2049 0 0.2341 0.0484 0.0772 0.0717 0.0575
Mean 0.9833 0.6808 1 0.7387 0.4083 0.6863 0.6894 0.5770 0 0.2454 0 0.1395 0.2862 0.2302 0.0225 0.3120
Sonar
Std 0.0745 0.2667 0 0.2135 0.2564 0.1701 0.2404 0.2326 0 0.2168 0 0.2353 0.1430 0.0706 0.0697 0.1328
Mean 1 0.5326 0.8776 0.8570 0.6458 0.7537 0.9661 0.4885 0 0.2017 0 0.0929 0.1412 0.1278 0.0125 0.4969
Libras
Std 0 0.2599 0.2199 0.2323 0.0850 0.1573 0.0863 0.2612 0 0.0866 0 0.1654 0.0385 0.0416 0.0385 0.2426
Mean 0.7476 0.5517 0.9524 1 0.6042 0.9373 0.6182 0.4250 0 0.1000 0 0 0.3625 0.1208 0.0792 0.4750
HillValley
Std 0.3082 0.2972 0.1591 0 0.0892 0.0800 0.3578 0.2260 0 0.1446 0 0 0.1022 0.1564 0.1517 0.2479
Mean 1 0.7514 1 0.9333 0.7433 0.9317 0.8975 0.6599 0 0.1498 0 0.0071 0.1498 0.0430 0.1084 0.3039
Musk
Std 0 0.2073 0 0.1502 0.0517 0.0899 0.0695 0.1846 0 0.0785 0 0.0319 0.0633 0.0732 0.0868 0.1566
Mean 1 0.8678 1 0.8073 0.8367 0.9813 1 0.4208 0 0.0797 0 0 0.1642 0.0100 0 0.2845
Madelon
Std 0 0.1955 0 0.2767 0.0353 0.0612 0 0.3261 0 0.0955 0 0 0.0393 0.0447 0 0.1593
Mean 0.9620 0.4698 0.7764 0.3183 0.8532 0.9065 0.5388 0.7910 0 0.0933 0.0087 0.0145 0.0347 0 0.0069 0.1242
Isolet5
Std 0.0801 0.3220 0.2244 0.2823 0.0614 0.1922 0.3260 0.1208 0 0.0890 0.0315 0.0350 0.0087 0 0.0308 0.0810
Mean 0.9150 0.6820 1 0.9625 0.9242 0.8858 0.9917 0.5333 0 0.2580 0 0 0.2580 0.0867 0 0.5852
SRBCT
Std 0.2014 0.2021 0 0.1223 0.0096 0.1341 0.0373 0.1160 0 0.1172 0 0 0.1172 0.1145 0 0.1889
Mean 1 0.8364 0.9633 0.9417 0.9804 0.8165 0.7983 0.5250 0 0.2433 0 0 0.3367 0.1517 0.1100 0.4450
DBWorld
Std 0 0.1361 0.1134 0.1816 0.0000 0.2009 0.2202 0.1118 0 0.1680 0 0 0.1066 0.1849 0.1632 0.1965
Mean 0.7695 1 1 1 0.9804 0.6650 0.8074 0.5000 0.3808 0 0 0 0.3808 0.3808 0.3517 0.5942
Leukemia1
Std 0.0788 0 0 0 0.0000 0.1516 0.0841 0 0.1066 0 0 0 0.1066 0.1066 0.1572 0.1736
Mean 0.3000 0.9000 0.9500 0.9000 0.8302 0.5583 0.5750 0.7000 0.4225 0 0 0 0.4225 0.4225 0.3308 0.1942
Brain_Tumor1
Std 0.2513 0.2052 0.1539 0.2052 0.2419 0.2181 0.2260 0.2513 0.1126 0 0 0 0.1126 0.1126 0.2001 0.2865
Table 7
Average, standard deviation and t-test of HV index values of the nine methods on training data.
HV (Training set) MOABC MOBGWO MOBHHO NSGA-II NSGAII-SDR MOPSO CMDPSOFS SparseEA MOBGWO-GMS
Mean 0.6375 0.7183 0.7215 0.7246 0.5809 0.7172 0.7317 0.7085 0.7410
Vehcile + + + + + + + +
Std 0.0234 0.0050 0.0097 0.0118 0.0051 0.0213 0.0070 0.0066 0.0046
Mean 0.6949 0.7462 0.7394 0.7561 0.7272 0.7442 0.7518 0.7535 0.7727
German + + + + + + + +
Std 0.0332 0.0054 0.0184 0.0066 0.0107 0.0262 0.0177 0.0069 0.0049
Mean 0.7944 0.9124 0.8207 0.9078 0.8541 0.8436 0.8740 0.8995 0.9215
Ionosphere + + + + + + + +
Std 0.0228 0.0122 0.0191 0.0107 0.0236 0.0472 0.0225 0.0080 0.0059
Mean 0.6766 0.7925 0.7288 0.8122 0.6491 0.7271 0.7534 0.8022 0.8313
Waveform + + + + + + + +
Std 0.0249 0.0107 0.0113 0.0192 0.0612 0.0382 0.0218 0.0058 0.0031
Mean 0.7018 0.8643 0.7565 0.8884 0.7335 0.7775 0.7799 0.8745 0.9086
Sonar + + + + + + + +
Std 0.0179 0.0149 0.0171 0.0143 0.0442 0.0274 0.0322 0.0075 0.0120
Mean 0.5667 0.7523 0.6758 0.7915 0.5567 0.6550 0.6526 0.7959 0.8071
Libras + + + + + + + +
Std 0.0144 0.0118 0.0164 0.0088 0.1095 0.0255 0.0188 0.0087 0.0165
Mean 0.4811 0.6349 0.5527 0.6749 0.5941 0.5433 0.5323 0.6649 0.6716
HillValley + + + = + + + +
Std 0.0132 0.0105 0.0124 0.0099 0.0193 0.0210 0.0138 0.0054 0.0090
Mean 0.6368 0.8325 0.7309 0.8683 0.7927 0.7081 0.6852 0.8764 0.9062
Musk + + + + + + + +
Std 0.0126 0.0136 0.0126 0.0131 0.0207 0.0288 0.0145 0.0072 0.0075
Mean 0.4509 0.6357 0.5336 0.6693 0.5211 0.4881 0.4776 0.6946 0.8825
Madelon + + + + + + + +
Std 0.0085 0.0202 0.0069 0.0165 0.0118 0.0142 0.0082 0.0142 0.0266
Mean 0.5690 0.7790 0.6734 0.7588 0.6466 0.6060 0.5895 0.8532 0.8555
Isolet5 + + + + + + + =
Std 0.0083 0.0065 0.0102 0.0092 0.0120 0.0165 0.0072 0.0069 0.0092
Mean 0.5741 0.8602 0.7423 0.6979 0.5979 0.5811 0.6037 0.9592 0.9856
SRBCT + + + + + + + +
Std 0.0063 0.0119 0.0133 0.0116 0.0154 0.0142 0.0059 0.0080 0.0029
Mean 0.4474 0.7443 0.5567 0.5218 0.5361 0.4741 0.4225 0.9041 0.9525
DBWorld + + + + + + + +
Std 0.0118 0.0364 0.0126 0.0599 0.0424 0.0195 0.0037 0.0136 0.0196
Mean 0.8107 0.5466 0.7157 0.6307 0.5642 0.5495 0.5296 0.9486 0.9758
Leukemia1 + + + + + + + +
Std 0.0082 0.0036 0.0094 0.0068 0.0099 0.0063 0.0059 0.0168 0.0109
Mean 0.7692 0.5138 0.6638 0.5857 0.5496 0.5209 0.5423 0.8982 0.8992
Brain_Tumor1 + + + + + + + =
Std 0.0062 0.0035 0.0084 0.0062 0.0065 0.0060 0.0041 0.0055 0.0177
Table 8
Average, standard deviation and t-test of HV index values of the nine methods on testing data.
HV (Testing Set) MOABC MOBGWO MOBHHO NSGA-II NSGAII-SDR MOPSO CMDPSOFS SparseEA M0BGWO-GMS
Mean 0.6438 0.7144 0.7058 0.6782 0.6023 0.7676 0.7485 0.7386 0.7344
Vehcile + + + + + − − =
Std 0.0231 0.0166 0.0118 0.0263 0.0096 0.0306 0.0121 0.0064 0.0238
Mean 0.7218 0.8071 0.7757 0.7785 0.7499 0.7917 0.8154 0.7573 0.8042
German + = + + + + − +
Std 0.0250 0.0067 0.0170 0.0228 0.0106 0.0205 0.0090 0.0103 0.0135
Mean 0.7938 0.9088 0.7996 0.9066 0.8452 0.8276 0.8659 0.9447 0.9215
Ionosphere + + + + + + + −
Std 0.0235 0.0172 0.0246 0.0169 0.0499 0.0312 0.0281 0.0109 0.0163
Mean 0.6696 0.8191 0.7220 0.8288 0.6632 0.7538 0.7585 0.8185 0.8379
Waveform + + + + + + + +
Std 0.0217 0.0085 0.0111 0.0147 0.0650 0.0279 0.0168 0.0071 0.0059
Mean 0.6967 0.8848 0.7199 0.8994 0.7869 0.8056 0.8195 0.8602 0.9190
Sonar + + + = + + + +
Std 0.0254 0.0210 0.0182 0.0402 0.0659 0.0289 0.0289 0.0123 0.0236
Mean 0.4102 0.6116 0.5157 0.5961 0.5065 0.5061 0.4680 0.6340 0.6430
Libras + + + + + + + =
Std 0.0098 0.0215 0.0148 0.0284 0.0603 0.0156 0.0186 0.0098 0.0472
Mean 0.4846 0.6578 0.5325 0.5930 0.5562 0.5033 0.5560 0.6413 0.6757
HillValley + + + + + + + +
Std 0.0122 0.0162 0.0148 0.0176 0.0173 0.0190 0.0132 0.0053 0.0298
Mean 0.5699 0.8152 0.6637 0.8397 0.6979 0.6761 0.6334 0.8654 0.8954
Musk + + + + + + + +
Std 0.0114 0.0195 0.0121 0.0142 0.0409 0.0237 0.0155 0.0080 0.0204
Mean 0.4623 0.6538 0.5490 0.6508 0.5013 0.4944 0.4783 0.6824 0.7722
Madelon + + + + + + + +
Std 0.0071 0.0242 0.0076 0.0142 0.0187 0.0143 0.0124 0.0065 0.0363
Mean 0.5211 0.7395 0.6426 0.7092 0.5905 0.5540 0.5624 0.7968 0.8032
Isolet5 + + + + + + + =
Std 0.0051 0.0096 0.0086 0.0126 0.0176 0.0139 0.0080 0.0084 0.0152
Mean 0.5369 0.7150 0.4732 0.6148 0.3817 0.4540 0.4570 0.8757 0.8872
SRBCT + + + + + + + =
Std 0.0216 0.0212 0.0158 0.0188 0.0219 0.0217 0.0129 0.0001 0.0720
Mean 0.4486 0.7718 0.6620 0.6210 0.4749 0.5731 0.5858 0.9172 0.8775
DBWorld + + + + + + + =
Std 0.0267 0.0154 0.0218 0.0035 0.0045 0.0300 0.0151 0.0000 0.2129
Mean 0.5923 0.3526 0.5352 0.4946 0.4720 0.4587 0.4383 0.9585 0.8738
Leukemia1 + + + + + + + −
Std 0.0017 0.0015 0.0132 0.0092 0.0051 0.0299 0.0102 0.0000 0.0576
Mean 0.7396 0.4790 0.5968 0.5611 0.5422 0.4882 0.4945 0.7933 0.8221
Brain_Tumor1 + + + + + + + +
Std 0.0024 0.0017 0.0060 0.0025 0.0129 0.0066 0.0039 0.0000 0.0601
15
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
Table 9
Average, standard deviation and t-test of IGD index values of the nine methods on training data.
IGD (Training set) MOABC M0BGWO M0BHHO NSGA-II NSGAII-SDR MOPSO CMDPSOFS SparseEA MOBGWO-GMS
Mean 0.1353 0.0996 0.1112 0.0970 0.2716 0.0991 0.0989 0.1143 0.0856
Vehcile + + + + + + + +
Std 0.0153 0.0088 0.0069 0.0076 0.0059 0.0105 0.0064 0.0072 0.0044
Mean 0.1132 0.1295 0.1149 0.1234 0.1820 0.0994 0.1069 0.1137 0.1010
German + + + + + = = +
Std 0.0114 0.0096 0.0037 0.0098 0.0147 0.0079 0.0075 0.0141 0.0113
Mean 0.1348 0.0551 0.1274 0.0608 0.1236 0.1022 0.0829 0.0737 0.0470
Ionosphere + + + + + + + +
Std 0.0242 0.0109 0.0163 0.0077 0.0288 0.0386 0.0150 0.0124 0.0050
Mean 0.1240 0.0836 0.1062 0.0526 0.2804 0.0960 0.0748 0.0727 0.0464
Waveform + + + + + + + +
Std 0.0187 0.0080 0.0092 0.0077 0.0636 0.0215 0.0101 0.0070 0.0071
Mean 0.2306 0.0920 0.1960 0.0846 0.2620 0.1595 0.1658 0.1183 0.0711
Sonar + + + + + + + +
Std 0.0191 0.0101 0.0132 0.0119 0.0557 0.0227 0.0269 0.0100 0.0097
Mean 0.3043 0.0829 0.1858 0.0392 0.2431 0.1930 0.2112 0.0520 0.0319
Libras + + + = + + + +
Std 0.0190 0.0133 0.0199 0.0170 0.1230 0.0301 0.0242 0.0052 0.0078
Mean 0.3309 0.1015 0.2235 0.0660 0.1402 0.2302 0.2556 0.0788 0.0671
HillValley + + + = + + + +
Std 0.0240 0.0119 0.0213 0.0089 0.0174 0.0308 0.0214 0.0108 0.0115
Mean 0.2799 0.0825 0.1728 0.0563 0.1094 0.1953 0.2244 0.0732 0.0494
Musk + + + + + + + +
Std 0.0176 0.0067 0.0150 0.0076 0.0165 0.0303 0.0191 0.0079 0.0088
Mean 0.4648 0.2090 0.3586 0.2000 0.3362 0.4140 0.4333 0.1763 0.0317
Madelon + + + + + + + +
Std 0.0104 0.0209 0.0120 0.0192 0.0131 0.0207 0.0122 0.0240 0.0112
Mean 0.3742 0.1287 0.2305 0.2034 0.2349 0.3448 0.3493 0.0408 0.0364
Isolet5 + + + + + + + +
Std 0.0090 0.0141 0.0123 0.0106 0.0093 0.0209 0.0079 0.0031 0.0049
Mean 0.4650 0.1439 0.2664 0.3192 0.4026 0.4497 0.4454 0.0466 0.0111
SRBCT + + + + + + + +
Std 0.0037 0.0053 0.0137 0.0079 0.0106 0.0144 0.0049 0.0116 0.0042
Mean 0.5521 0.2154 0.4348 0.4597 0.4521 0.5190 0.5695 0.0857 0.0302
DBWorld + + + + + + + +
Std 0.0125 0.0376 0.0215 0.0567 0.0342 0.0189 0.0041 0.0155 0.0196
Mean 0.1681 0.4929 0.2844 0.3817 0.4421 0.4852 0.4924 0.0353 0.0187
Leukemia1 + + + + + + + +
Std 0.0062 0.0045 0.0137 0.0049 0.0071 0.0083 0.0054 0.0188 0.0077
Mean 0.1593 0.4876 0.2789 0.3846 0.4191 0.4766 0.4645 0.0424 0.0291
Brain_Tumor1 + + + + + + + +
Std 0.0065 0.0030 0.0110 0.0038 0.0066 0.0096 0.0040 0.0111 0.0166
Table 10
Average, standard deviation and t-test of IGD index values of the nine methods on testing data.
IGD (Testing set) MOABC MOBGWO MOBHHO NSGA-II NSGAII-SDR MOPSO CMDPSOFS SparseEA MOBGWO-GMS
Mean 0.1312 0.1024 0.1307 0.1394 0.2470 0.0527 0.0774 0.0753 0.0870
Vehcile + + + + + − = −
Std 0.0119 0.0152 0.0139 0.0327 0.0109 0.0197 0.0098 0.0121 0.0217
Mean 0.0986 0.0903 0.0738 0.1151 0.1651 0.0618 0.0525 0.1155 0.0737
German + + = + + = − +
Std 0.0100 0.0199 0.0081 0.0460 0.0175 0.0150 0.0158 0.0167 0.0229
Mean 0.1559 0.0593 0.1698 0.0648 0.1348 0.1342 0.1060 0.0306 0.0509
Ionosphere + = + + + + + −
Std 0.0267 0.0170 0.0256 0.0146 0.0604 0.0227 0.0303 0.0097 0.0165
Mean 0.1309 0.0691 0.1092 0.0543 0.2700 0.0793 0.0764 0.0601 0.0385
Waveform + + + + + + + +
Std 0.0185 0.0124 0.0088 0.0110 0.0662 0.0167 0.0086 0.0085 0.0095
Mean 0.2409 0.0763 0.2437 0.0774 0.1985 0.1322 0.1377 0.1215 0.0629
Sonar + + + = + + + +
Std 0.0315 0.0142 0.0191 0.0425 0.0756 0.0263 0.0273 0.0157 0.0201
Mean 0.3969 0.1692 0.2963 0.1943 0.2864 0.2797 0.3223 0.1732 0.1551
Libras + = + + + + + =
Std 0.0169 0.0197 0.0205 0.0281 0.0745 0.0195 0.0260 0.0125 0.0440
Mean 0.3438 0.0836 0.2552 0.1554 0.1828 0.2680 0.2500 0.0987 0.0584
HillValley + + + + + + + +
Std 0.0196 0.0180 0.0231 0.0213 0.0184 0.0265 0.0205 0.0071 0.0352
Mean 0.3263 0.0945 0.2374 0.0813 0.2076 0.2176 0.2648 0.0804 0.0614
Musk + + + + + + + +
Std 0.0172 0.0131 0.0156 0.0099 0.0430 0.0255 0.0209 0.0090 0.0191
Mean 0.4641 0.1944 0.3537 0.2249 0.3580 0.4130 0.4427 0.1845 0.0932
Madelon + + + + + + + +
Std 0.0082 0.0206 0.0102 0.0138 0.0209 0.0196 0.0117 0.0073 0.0318
Mean 0.3817 0.1370 0.2416 0.1861 0.2577 0.3370 0.3491 0.0845 0.0727
Isolet5 + + + + + + + +
Std 0.0055 0.0065 0.0067 0.0094 0.0132 0.0157 0.0095 0.0073 0.0116
Mean 0.4821 0.2320 0.5399 0.3398 0.6152 0.5344 0.5403 0.0933 0.0892
SRBCT + + + + + + + =
Std 0.0122 0.0166 0.0288 0.0136 0.0302 0.0218 0.0127 0.0000 0.0802
Mean 0.5532 0.1883 0.3511 0.3577 0.4986 0.4605 0.4467 0.0684 0.0644
DBWorld + + + + + + + =
Std 0.0288 0.0186 0.0303 0.0051 0.0059 0.0193 0.0051 0.0000 0.0611
Mean 0.3908 0.6729 0.4815 0.4840 0.5119 0.5494 0.5695 0.0111 0.1137
Leukemia1 + + + + + + + −
Std 0.0028 0.0069 0.0174 0.0107 0.0068 0.0285 0.0124 0.0000 0.0697
Mean 0.1664 0.4964 0.3167 0.3792 0.4253 0.4890 0.4753 0.1507 0.1171
Brain_Tumor1 + + + + + + + +
Std 0.0028 0.0024 0.0083 0.0035 0.0090 0.0132 0.0051 0.0000 0.0706
IGD is the third metric used to measure the quality of Pareto 5.5.3. Sensitivity analysis of the parameter
solution sets in our experiment. Tables 9 and 10 are the IGD In MOBGWO-GMS, a threshold called thres is utilized to guide
analysis of the nine algorithms on fourteen datasets, the mean the leader wolf’s selection from the Pareto solution set. To
and standard deviation obtained from 20 independent runs. t- determine an appropriate range of thres value experiments were
test was also performed on the results of the experiments. Based conducted, and the results are presented in Fig. 17. The figure
on the findings from Tables 9 and 10, we can observe that: shows the correlation between thres and the performance of our
(1) In the testing set, MOBGWO-GMS achieves the lowest IGD proposed method, as measured by the HV metric. We found that
value on all training datasets except for the German dataset. Al- the Waveform dataset obtains the largest HV at thres = 0.6,
though MOPSO had the lowest IGD value on the German dataset, the Isolet5 dataset obtains at thres = 0.5, and the DBWorld
a statistical analysis using the t-test showed no significant dif- dataset obtains at thres = 0.4, as shown in Fig. 17. Based on
ference in performance between MOPSO and our algorithm. (2) our experimental data, we find that the optimal value of thres
In the testing set, our algorithm achieves a slightly higher IGD is influenced by the dimension of the dataset, but is typically
value than other algorithms on low-dimensional datasets such as around 0.5. Therefore, we propose to set thres to 0.5 for ex-
Vehicle, German, and Ionosphere. However, on other datasets, our perimental comparisons. Additionally, the implementation of the
algorithm still obtained the smallest IGD value compared to the triple-mutation strategy in GMS introduces only one additional
other algorithms. parameter (thres), which significantly improves the portability of
In summary, the above-mentioned metrics demonstrate that our algorithm.
the proposed algorithm performs exceptionally well in most
datasets. The experiments reveal that the algorithm outperforms
other methods by having a lower classification error rate while 5.5.4. Computational complexity and runtime analysis
using the same number of features. Moreover, the proposed Algorithm 1 gives the pseudocode of the MOBGWO-GMS algo-
approach is superior in achieving an optimal balance between the rithm. The dimension of the dataset is M, the number of popula-
two fitness evaluation criteria, resulting in a better Pareto front. tions is N, and the maximum number of iterations of is maxIter.
According to the Algorithm 1, the time complexity analysis is
16
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
Fig. 17. HV with respect to the value of q for the Waveform, Isolet5, and DBWorld datasets. The vertical dashed lines indicate the best thresholds for thres.
Table 11
Time complexity analysis of nine algorithms.
Algorithm The overall complexity of the algorithm The complexity of KNN classifier Explanatory note
MOABC O(maxIter ∗ (N ∗ D + M ∗ N 2 ))
MOBGWO O(maxIter ∗ (N ∗ D + M ∗ N 2 ))
M:Number of objective functions;
MOBHHO O(maxIter ∗ (N ∗ D + M ∗ N 2 )) N:Population size;
NSGA-II O(maxIter ∗ M ∗ N2) D:Dimension of the problem;
NSGAII-SDR O(maxIter ∗ M ∗ N3) O(N ∗ S ∗ D ∗ log(S) ∗ K )
S:Number of samples;
MOPSO O(maxIter ∗ (N ∗ D + M ∗ N 2 )) K:K-fold cross validation;
CMDPSOFS O(maxIter ∗ (N ∗ D + M ∗ N 2 )) maxIter:Maximum number of iterations
SparseEA O(maxIter ∗ (N ∗ D + M ∗ N 2 ))
MOBGWO_GMS O(maxIter ∗ (N ∗ D + M ∗ N 2 ))
Table 12
Average running CPU time of nine algorithms.
Run Time MOABC MOBGWO MOBHHO NSGA-II NSGAII-SDR MOPSO CMDPSOFS SparseEA BGWO-GMS
Vehcile 2.9134 1.3587 2.5663 1.6712 1.5866 1.4847 2.5141 1.5991 2.2895
German 2.8879 1.3301 2.5354 11.2481 1.6365 1.4763 2.4255 1.8137 2.4178
Ionosphere 2.5157 1.3009 2.2905 2.0833 1.5665 1.3625 2.1286 1.6711 2.0467
Waveform 4.6683 2.7515 4.7444 5.8407 3.0626 2.6086 4.3276 3.2728 4.4544
Sonar 2.4614 1.2727 2.2803 2.6101 1.3982 1.2521 2.0300 1.6081 2.4126
Libras 2.6164 1.3450 2.4004 2.6169 1.5304 1.3317 2.2436 1.9175 2.6421
HillValley 2.6541 1.3486 2.4454 2.6623 1.6408 1.3102 2.0519 1.6016 2.1065
Musk 2.6732 1.2698 2.4881 2.5252 1.6869 1.3780 2.0815 1.8743 1.9568
Madelon 8.5921 2.2138 9.2071 6.8140 4.2155 4.4407 6.7070 3.4174 3.3393
Isolet5 7.6602 2.1940 8.1016 6.5277 4.1845 3.9585 6.3248 3.8327 2.8599
SRBCT 2.8537 1.4805 2.7919 2.6473 1.8586 1.5690 2.2725 5.7360 2.0007
DBWorld 3.1721 1.4280 3.0847 2.7836 1.7878 1.6315 2.4028 9.1485 1.8639
Leukemia1 4.1380 1.5612 3.6187 3.1399 2.2868 2.0617 3.2094 11.3248 2.0625
Brain_Tumor1 4.5473 1.6424 4.0188 3.8111 2.6046 2.2864 3.4998 12.1604 2.1380
described as follows. The complexity of the algorithm is mainly Table 12, the runtime of the MOBGWO-GMS algorithm is signifi-
divided into two stages, stage 1 is mainly concentrated on step 2 cantly higher than that of the MOPSO and MOBGWO algorithms
and step 3, and stage 2 is mainly concentrated on the while loop on each dataset. The main reason is that the triple-mutation
of steps 6–28. In the first stage, the complexity of the new initial- strategies of MOBGWO-GMS in local search consume some time.
ization strategy proposed in step 2 is O(M ∗ N); the complexity However, compared with other algorithms, MOBGWO-GMS still
of normal implementation fitness evaluation and non-dominated has good competitiveness. The runtime of MOBGWO-GMS is
sorting in step 3 is O(M ∗ N + M ∗ N 2 ). In the second stage, the much lower than that of NSGA-II, MOABC, and MOBHHO al-
complexity mainly focuses on the GMS strategy (steps 17–23) gorithms on German and Sonar datasets. While the calculation
and step 26. In the worst case of GMS, that is, when the size of process of GMS requires a certain amount of time to execute,
GoodPop is N, the complexity is as high as O(maxIter ∗ M ∗ N 2 ), the incorporation of the GMS does not significantly increase the
which is much higher than the complexity O(maxIter ∗ M ∗ N) computational complexity as represented by big O notation.
of steps 9–15 (BGWO strategy); the complexity of step 26 is
O(maxIter ∗ (M ∗ N + M ∗ N 2 )). It can be seen that the complex- 6. Conclusions and future works
ity of the second stage is much higher than that of the first
stage. Therefore, the total time complexity of the MOBGWO- In this paper, our goal is to achieve a Pareto front with
GMS algorithm is O(maxIter ∗ M ∗ N 2 ). Table 11 displays the time low classification error and a reduced number of features while
complexities of the compared algorithms. preserving population diversity. The MOBGWO-GMS uses the
For the sake of simplicity, we compare the CPU runtime of the Pearson correlation coefficient to guide the mutation strategy, op-
nine algorithms in a single iteration of the main loop. Table 12 erates the uniform initialization population method, and employs
shows the average running CPU time of each algorithm when three different increase and decrease mutation strategies to effi-
independently run 20 times on fourteen datasets. As shown in ciently search in the search space. To verify the competitiveness
of MOBGWO-GMS, eight well-known algorithms and fourteen
17
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
benchmark datasets were used for comparison. The experimental [10] Y. Zhang, R. Liu, X. Wang, H. Chen, C. Li, Boosted binary harris hawks
results show that, in most cases, both in terms of dimensionality optimizer and feature selection, Eng. Comput. 37 (4) (2021) 3741–3770.
[11] A. Al-Ani, Feature subset selection using ant colony optimization,
reduction and classification error, the MOBGWO-GMS method
International Journal of Computational Intelligence (2005).
has significant competitive advantage. Especially on medium and [12] M. Paniri, M.B. Dowlatshahi, H. Nezamabadi-pour, Ant-TD: Ant colony op-
large-scale datasets, which are more common in the domain of timization plus temporal difference reinforcement learning for multi-label
machine learning and data mining, it performs excellently and feature selection, Swarm Evol. Comput. 64 (2021) 100892.
[13] M. Mafarja, S. Mirjalili, Whale optimization approaches for wrapper feature
can find a set of optimal feature subsets with good convergence
selection, Appl. Soft Comput. 62 (2018) 441–453.
and diversity. [14] E. Emary, H.M. Zawbaa, A.E. Hassanien, Binary grey wolf optimization
It should be noted that the algorithm’s performance is not approaches for feature selection, Neurocomputing 172 (2016) 371–381.
exceptional on some low-dimensional datasets. This may be be- [15] Y. Zhao, J. Dong, X. Li, H. Chen, S. Li, A binary dandelion algorithm using
cause lower-dimensional datasets have fewer optimal solutions, seeding and chaos population strategies for feature selection, Appl. Soft
Comput. (2022) 109166.
and other algorithms may also escape local optima. Our next [16] P. Agrawal, T. Ganesh, A.W. Mohamed, Chaotic gaining sharing knowledge-
focus is to optimize the efficiency of the algorithm by minimizing based optimization algorithm: an improved metaheuristic algorithm for
its complexity. Furthermore, we intend to utilize the improved feature selection, Soft Comput. 25 (14) (2021) 9505–9528.
algorithm to classify large-scale imbalanced datasets. [17] K. Hussain, N. Neggaz, W. Zhu, E.H. Houssein, An efficient hybrid sine-
cosine harris hawks optimization for low and high-dimensional feature
selection, Expert Syst. Appl. 176 (2021) 114778.
CRediT authorship contribution statement [18] N. Neggaz, A.A. Ewees, M. Abd Elaziz, M. Mafarja, Boosting salp swarm
algorithm by sine cosine algorithm and disrupt operator for feature
Xiaobo Li: Supervision, Resources, Proiect administration, Val- selection, Expert Syst. Appl. 145 (2020) 113103.
idation, Funding acquisition, Writing – review & editing. Qiy- [19] K. Chen, F.-Y. Zhou, X.-F. Yuan, Hybrid particle swarm optimization with
spiral-shaped mechanism for feature selection, Expert Syst. Appl. 128
ong Fu: Conceptualization, Methodology, Software, Investigation,
(2019) 140–156.
Writing – original draft. Qi Li: Formal analysis, Data curation, [20] M. Abdel-Basset, D. El-Shahat, I. El-henawy, V.H.C. de Albuquerque, S.
Funding acquisition. Weiping Ding: Formal analysis, Data cura- Mirjalili, A new fusion of grey wolf optimizer algorithm with a two-phase
tion, Software. Feilong Lin: Formal analysis, Data curation, Soft- mutation for feature selection, Expert Syst. Appl. 139 (2020) 112824.
[21] Y. Xu, X. Li, Q. Li, A discrete teaching–learning based optimization algo-
ware. Zhonglong Zheng: Formal analysis, Data curation.
rithm with local search for rescue task allocation and scheduling, Appl.
Soft Comput. 134 (2023) 109980.
Declaration of competing interest [22] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjec-
tive genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2) (2002)
The authors declare that they have no known competing finan- 182–197.
[23] E. Hancer, B. Xue, M. Zhang, D. Karaboga, B. Akay, A multi-objective
cial interests or personal relationships that could have appeared
artificial bee colony approach to feature selection using fuzzy mutual
to influence the work reported in this paper. information, in: 2015 IEEE Congress on Evolutionary Computation, CEC,
IEEE, 2015, pp. 2420–2427.
Data availability [24] B. Xue, M. Zhang, W.N. Browne, Particle swarm optimization for feature
selection in classification: A multi-objective approach, IEEE Trans. Cybern.
43 (6) (2012) 1656–1671.
No data was used for the research described in the article.
[25] S. Mirjalili, S.M. Mirjalili, A. Lewis, Grey wolf optimizer, Adv. Eng. Softw.
69 (2014) 46–61.
Acknowledgments [26] H. Almazini, K. Ku-Mahamud, Grey wolf optimization parameter control
for feature selection in anomaly detection, Int. J. Intel. Eng. Syst. 14 (2)
This work was supported by the National Natural Science (2021) 474–483.
[27] H. Chantar, M. Mafarja, H. Alsawalqah, A.A. Heidari, I. Aljarah, H. Faris, Fea-
Foundation of China under Grant Nos. 61373057, 62273310, ture selection using binary grey wolf optimizer with elite-based crossover
62272419; Key Project of Zhejiang Provincial Natural Science for arabic text classification, Neural Comput. Appl. 32 (2020) 12201–12220.
Foundation of China under Grant No. LZ22F020010; Zhejiang [28] R.R. Rajammal, S. Mirjalili, G. Ekambaram, N. Palanisamy, Binary grey wolf
Provincial Natural Science Foundation of China under Grant No. optimizer with mutation and adaptive K-nearest neighbour for feature
selection in parkinson’s disease diagnosis, Knowl.-Based Syst. 246 (2022)
LY22F030006.
108701.
[29] Q. Al-Tashi, S.J.A. Kadir, H.M. Rais, S. Mirjalili, H. Alhussian, Binary opti-
References mization using hybrid grey wolf optimization for feature selection, Ieee
Access 7 (2019) 39496–39508.
[1] Y. Sun, C. Lu, X. Li, The cross-entropy based multi-filter ensemble method [30] S. Wang, X. Yu, W. Jia, A new population initialization of particle swarm
for gene selection, Genes 9 (5) (2018) 258. optimization method based on pca for feature selection, J. Big Data 3 (1)
[2] M. Canayaz, Classification of diabetic retinopathy with feature selection (2021) 1.
over deep features using nature-inspired wrapper methods, Appl. Soft [31] X.-f. Song, Y. Zhang, D.-w. Gong, X.-y. Sun, Feature selection using bare-
Comput. 128 (2022) 109462. bones particle swarm optimization with mutual information, Pattern
[3] S. Maza, M. Touahria, Feature selection for intrusion detection using new Recognit. 112 (2021) 107804.
multi-objective estimation of distribution algorithms, Appl. Intell. 49 (12) [32] M. Tahir, A. Tubaishat, F. Al-Obeidat, B. Shah, Z. Halim, M. Waqas, A novel
(2019) 4237–4257. binary chaotic genetic algorithm for feature selection and its utility in
[4] Q. Al-Tashi, S.J. Abdulkadir, H.M. Rais, S. Mirjalili, H. Alhussian, Approaches affective computing and healthcare, Neural Comput. Appl. (2020) 1–22.
to multi-objective feature selection: A systematic literature review, IEEE [33] X. Li, J. Ren, MICQ-IPSO:: An effective two-stage hybrid feature selection
Access 8 (2020) 125076–125096. algorithm for high-dimensional data, 2022.
[5] S. Li, X. Li, H. Chen, Y. Zhao, J. Dong, A novel hybrid hunger games [34] M. Paniri, M.B. Dowlatshahi, H. Nezamabadi-Pour, MLACO: A multi-
search algorithm with differential evolution for improving the behaviors label feature selection algorithm based on ant colony optimization,
of non-cooperative animals, IEEE Access 9 (2021) 164188–164205. Knowl.-Based Syst. 192 (2020) 105285.
[6] H. Chen, X. Li, S. Li, Y. Zhao, J. Dong, Improved slime mould algorithm [35] F. Karimi, M.B. Dowlatshahi, A. Hashemi, SemiACO: A semi-supervised
hybridizing chaotic maps and differential evolution strategy for global feature selection based on ant colony optimization, Expert Syst. Appl. 214
optimization, IEEE Access (2022). (2023) 119130.
[7] H. Chen, S. Li, X. Li, Y. Zhao, J. Dong, A hybrid adaptive differential evolution [36] A. Hashemi, M. Joodaki, N.Z. Joodaki, M.B. Dowlatshahi, Ant colony opti-
based on Gaussian tail mutation, Eng. Appl. Artif. Intell. 119 (2023) 105739. mization equipped with an ensemble of heuristics through multi-criteria
[8] W. Siedlecki, J. Sklansky, A note on genetic algorithms for large-scale decision making: A case study in ensemble feature selection, Appl. Soft
feature selection, Pattern Recognit. Lett. 10 (5) (1989) 335–347. Comput. 124 (2022) 109046.
[9] S.B. Sakri, N.B.A. Rashid, Z.M. Zain, Particle swarm optimization feature [37] H. Bayati, M.B. Dowlatshahi, A. Hashemi, MSSL: A memetic-based sparse
selection for breast cancer recurrence prediction, IEEE Access 6 (2018) subspace learning algorithm for multi-label classification, Int. J. Mach.
29637–29647. Learn. Cybern. 13 (11) (2022) 3607–3624.
18
X. Li, Q. Fu, Q. Li et al. Applied Soft Computing 145 (2023) 110558
[38] Q. Al-Tashi, H. Md Rais, S.J. Abdulkadir, S. Mirjalili, H. Alhussian, A review [53] E. Emary, W. Yamany, A.E. Hassanien, V. Snasel, Multi-objective gray-
of grey wolf optimizer-based feature selection methods for classification, wolf optimization for attribute reduction, Procedia Comput. Sci. 65 (2015)
Evolut. Mach. Learn. Techn. Algor. Appl. (2020) 273–286. 623–632.
[39] R.A. Ibrahim, M. Abd Elaziz, S. Lu, Chaotic opposition-based grey-wolf [54] A. Sahoo, S. Chandra, Multi-objective grey wolf optimizer for improved
optimization algorithm based on differential evolution and disruption cervix lesion classification, Appl. Soft Comput. 52 (2017) 64–80.
operator for global optimization, Expert Syst. Appl. 108 (2018) 1–27. [55] Q. Al-Tashi, S.J. Abdulkadir, H.M. Rais, S. Mirjalili, H. Alhussian, M.G.
[40] Q. Tu, X. Chen, X. Liu, Multi-strategy ensemble grey wolf optimizer and Ragab, A. Alqushaibi, Binary multi-objective grey wolf optimizer for feature
its application to feature selection, Appl. Soft Comput. 76 (2019) 16–30. selection in classification, IEEE Access 8 (2020) 106247–106263.
[41] S. Arora, H. Singh, M. Sharma, S. Sharma, P. Anand, A new hybrid [56] D. Moldovan, A. Slowik, Energy consumption prediction of appliances using
algorithm based on grey wolf optimization and crow search algorithm for machine learning and multi-objective binary grey wolf optimization for
unconstrained function optimization and feature selection, Ieee Access 7 feature selection, Appl. Soft Comput. 111 (2021) 107745.
(2019) 26343–26361. [57] A.F.V. Ukken, A. Bindu Jayachandran, J.K. Punnath Malayathodi, P. Das,
[42] X. Zhao, X. Zhang, Z. Cai, X. Tian, X. Wang, Y. Huang, H. Chen, L. Hu, Statistically aided binary Multi-Objective Grey Wolf Optimizer: a new
Chaos enhanced grey wolf optimization wrapped ELM for diagnosis of feature selection approach for classification, J. Supercomput. (2023) 1–33.
paraquat-poisoned patients, Comput. Biol. Chem. 78 (2019) 481–490. [58] S. Mirjalili, S. Saremi, S.M. Mirjalili, L.d.S. Coelho, Multi-objective grey wolf
[43] P. Hu, J.-S. Pan, S.-C. Chu, Improved binary grey wolf optimizer and its optimizer: a novel algorithm for multi-criterion optimization, Expert Syst.
application for feature selection, Knowl.-Based Syst. 195 (2020) 105746. Appl. 47 (2016) 106–119.
[44] C. Shen, K. Zhang, Two-stage improved grey wolf optimization algorithm [59] J. Adler, I. Parmryd, Quantifying colocalization by correlation: the pearson
for feature selection on high-dimensional classification, Complex Intell. correlation coefficient is superior to the Mander’s overlap coefficient,
Syst. (2021) 1–21. Cytometry Part A 77 (8) (2010) 733–742.
[45] B. Sathiyabhama, S.U. Kumar, J. Jayanthi, T. Sathiya, A. Ilavarasi, V. Yuvara- [60] R. Akbari, R. Hedayatzadeh, K. Ziarati, B. Hassanizadeh, A multi-objective
jan, K. Gopikrishna, A novel feature selection framework based on grey artificial bee colony algorithm, Swarm Evol. Comput. 2 (2012) 39–52.
wolf optimizer for mammogram image analysis, Neural Comput. Appl. 33 [61] Y. Tian, R. Cheng, X. Zhang, Y. Su, Y. Jin, A strengthened dominance relation
(21) (2021) 14583–14602. considering convergence and diversity for evolutionary many-objective
[46] A. Khan, A.R. Baig, Multi-objective feature subset selection using non- optimization, IEEE Trans. Evol. Comput. 23 (2) (2018) 331–345.
dominated sorting genetic algorithm, J. Appl. Res. Technol. 13 (1) (2015) [62] Y. Tian, X. Zhang, C. Wang, Y. Jin, An evolutionary algorithm for large-scale
145–159. sparse multiobjective optimization problems, IEEE Trans. Evol. Comput. 24
[47] M. Amoozegar, B. Minaei-Bidgoli, Optimizing multi-objective PSO based (2) (2019) 380–393.
feature selection method using a feature elitism mechanism, Expert Syst. [63] E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, V.G. Da Fonseca, Perfor-
Appl. 113 (2018) 499–514. mance assessment of multiobjective optimizers: An analysis and review,
[48] Y. Hu, Y. Zhang, D. Gong, Multiobjective particle swarm optimization IEEE Trans. Evol. Comput. 7 (2) (2003) 117–132.
for feature selection with fuzzy cost, IEEE Trans. Cybern. 51 (2) (2020) [64] H. Chen, G. Wu, W. Pedrycz, P.N. Suganthan, L. Xing, X. Zhu, An adaptive
874–888. resource allocation strategy for objective space partition-based multiob-
[49] Y. Zhang, S. Cheng, Y. Shi, D.-w. Gong, X. Zhao, Cost-sensitive feature jective optimization, IEEE Trans. Syst. Man Cybern. Syst. 51 (3) (2019)
selection using two-archive multi-objective artificial bee colony algorithm, 1507–1522.
Expert Syst. Appl. 137 (2019) 46–58. [65] A. Deniz, H.E. Kiziloz, On initial population generation in feature subset
[50] X.-h. Wang, Y. Zhang, X.-y. Sun, Y.-l. Wang, C.-h. Du, Multi-objective feature selection, Expert Syst. Appl. 137 (2019) 11–21.
selection based on artificial bee colony: An acceleration approach with [66] C.A. Coello Coello, Evolutionary multi-objective optimization: some current
variable sample size, Appl. Soft Comput. 88 (2020) 106041. research trends and topics that remain to be explored, Front. Comput. Sci.
[51] J. Piri, P. Mohapatra, An analytical study of modified multi-objective harris China 3 (1) (2009) 18–30.
hawk optimizer towards medical data feature selection, Comput. Biol. Med. [67] P. Ngatchou, A. Zarei, A. El-Sharkawi, Pareto multi objective optimiza-
135 (2021) 104558. tion, in: Proceedings of the 13th International Conference on, Intelligent
[52] Y. Zhou, J. Kang, H. Guo, Many-objective optimization of feature selection Systems Application To Power Systems, IEEE, 2005, pp. 84–91.
based on two-level particle cooperation, Inform. Sci. 532 (2020) 91–109.
19