Stacking MF Networks To Combine The Outputs Provided by RBF Networks
Stacking MF Networks To Combine The Outputs Provided by RBF Networks
Stacking MF Networks To Combine The Outputs Provided by RBF Networks
1 Introduction
A Radial Basis Functions (RBF) network is a commonly applied architecture which
is used to solve classification problems. This network can also be trained by gradient
descent [2,3]. So with a fully supervised training, it can be an element of an ensem-
ble of neural networks. Previouses comparisons showed that the performance of RBF
networks was better than Multilayer Feedforward (MF) networks. Being the Simple
Ensemble the best method to train an ensemble of RBF networks.
Among the methods of combining the outputs of an ensemble of neural networks,
the two most popular are the Majority voting and the Output average [4]. Last year, two
new combination methods based on the idea of Stacked Generalization called Stacked
and Stacked+ were successfully proposed in [1]. These new methods consist of training
a single MF to combine the networks.
In this paper, we want to apply these combiners to a ensemble of Radial Basis Func-
tions trained with the Simple Ensemble. Moreover, we want to increase the number of
networks used to combine the ensemble in our experiments in order get a better com-
biner. Finally, we compare the results we have got with these two new combiners with
the results we previously got with other classical combiners.
To test Stacked and Stacked+ with RBF ensembles, we have selected nine databases
from the UCI repository. This paper is organized as follows. The concepts related to
the ensembles training and combination are briefly commented in section 2 whereas the
experimental setup, results and the discussion are in section 3.
J. Marques de Sá et al. (Eds.): ICANN 2007, Part I, LNCS 4668, pp. 450–459, 2007.
c Springer-Verlag Berlin Heidelberg 2007
Stacking MF Networks to Combine the Outputs Provided by RBF Networks 451
2 Theory
y1 y2 yq
Output
layer
Hidden
layer
Input
layer
x1 x2 xn
Fig. 1. A MF network
y1 y2 yq
Linear units
Gaussian units
x1 x2 xn
to three and nine as an ensemble of combiners to test if the system could be improved by
adding more combination networks. With this procedure we want to combine the expert
networks of an ensemble of RBF networks with an ensemble of MF networks. Finally,
we have applied the output average in order to combine the combination networks.
Figure 3 show the diagram of Stacked and Stacked+ we have used in our experiments.
sg
y (x) ysg+(x)
x x x x x x
x={x1,x2, ... ,xn} x={x1,x2, ... ,xn}
3 Experimental Setup
In this section we describe the experimental setup and the datasets we have used in our
experiments. Then, we show the main results we have obtained with the combination
methods on the different datasets. Moreover, we calculate two general measurements in
order to compare the methods. Finally, we discus about the results we have got.
In our experiments we have used ensembles of 3 and 9 RBF networks previously
trained with Simple ensemble on nine different classification problems. Moreover, we
have trained 1, 3 and 9 MF networks with Stacked and Stacked+ in order to combine the
networks of the ensemble. In addition, we have generated 10 different partitions of data
at random in training, validation and test sets and repeat the whole learning process 10
times in order to get a mean performance of the ensemble an error in the performance
by standard error theory.
3.1 Databases
The datasets we have used in our experiments and their characteristics are described in
this subsection. We have applied the new combination methods, Stacked and Stacked+,
454 J. Torres-Sospedra, C. Hernández-Espinosa, and M. Fernández-Redondo
to nine classification problems from the UCI repository [13]. The datasets we have used
are the following ones:
Balance Scale Database (bala)
The aim is to determine if a balance is scaled tip to the right, tip to the left, or balanced.
This dataset contains 625 instances, 4 attributes and 3 classes.
Cylinder Bands Database (band)
Used in decision tree induction for mitigating process delays know as “cylinder bands”
in rotogravure printing. This dataset contains 512 instances, 19 attributes and 2 classes.
BUPA liver disorders (bupa)
The aim of this dataset is to try to detect liver disorders. This dataset contains 345
instances, 6 attributes and 2 classes.
Australian Credit Approval (cred)
This dataset concerns credit card applications. This dataset contains 653 instances, 15
attributes and 2 classes.
Glass Identification Database (glas)
The aim of the dataset is to determinate if the glass analysed was a type of ‘float’ glass
or not for Forensic Science. This dataset contains 2311 instances, 34 attributes and 2
classes.
Heart Disease Databases (hear) The aim of the dataset is to determinate the presence
of heart disease in the patient. This dataset contains 297 instances, 13 attributes and 2
classes.
The Monk’s Problem 1 (mok1)
Artificial problem with binary inputs. This dataset contains 432 instances, 6 attributes
and 2 classes.
The Monk’s Problem 2 (mok2)
Artificial problem with binary inputs. This dataset contains 432 instances, 6 attributes
and 2 classes.
Congressional Voting Records Database (Vote)
Classification between Republican or Democrat. All attributes are boolean. This dataset
contains 432 instances, 6 attributes and 2 classes.
Table 1 shows the training parameters (number of clusters, iterations, adaptation step
and the width of the gaussian units) of the expert networks and the performance of a
single network on each database. Moreover we have added to this table the performance
of the ensembles of 3 and 9 RBF networks previously trained with Simple Ensemble in
order to see if the new combination methods proposed increase the performance of the
classification systems.
Table 2 shows the training parameters we have used to train the combination net-
works (hidden units, adaptation step, momentum rate and number of iterations) with
the new two combiners, Stacked and with Stacked+.
Stacking MF Networks to Combine the Outputs Provided by RBF Networks 455
3.2 Results
The main results we have obtained with the application of stacking methods are pre-
sented in this subsection. Tables 3 and 4 shows the results we have obtained combining
ensembles of 3 and 9 networks with Stacked and Stacked+.
In [14] the complete results of the combination of RBF ensembles with 14 different
combination methods are published. Although we have omitted these results to keep
the length of the paper short, the general measurements related to these combination
methods appear in subsection 3.3. These methods are: Majority Vote (vote), Winner
456 J. Torres-Sospedra, C. Hernández-Espinosa, and M. Fernández-Redondo
Takes All (wta), Borda Count (borda), Bayesian Combination (bayesian), Weighted Av-
erage (w.ave), Choquet Integral (choquet), Choquet Integral with Data-Depend Den-
sities (choquet.dd), Weighted Average with Data-Depend Densities (w.ave.dd), BADD
Defuzzification Startegy (badd), Zimmermann’s Compensatory Operator (zimm), Di-
namically Averaged Networks versions 1 and 2 (dan and dan2), Nash vote (nash).
We have also calculated the percentage of error reduction (PER) of the results with
respect to a single network to get a general value for the comparison among all the
methods we have studied. We have used equation 1 to calculate the PER value.
Errorsinglenetwork − Errorensemble
P ER = 100 · (1)
Errorsinglenetwork
Furthermore, we have calculated the mean increase of performance (IoP ) with re-
spect to Single Network and the mean percentatge of error reduction (P ER) across
all databases for the methods proposed in this paper, Stacked and Stacked+, and for the
combination methods that appear in [14]. The P ER is calculated by equation 1 whereas
the IoP with respect a single network is calculated by equation 2. Table 5 shows the
results of the mean P ER and the mean IoP .
3.4 Discussion
The main results (tables 3-4) show that Stacked and Stacked+ get an improvement in a
wide majority of problems: bupa, cred, glas, hear, mok1 and vote.
The results show that the improvement in performance of training an ensemble of
nine combination networks Stacked9/Stacked9+ (instead of three Stacked3/Stacked3+)
is low. Taking into account the computational cost the best alternative might be an
ensemble of three combination networks Stacked3/Stacked3+.
Comparing the results of the different traditional combination methods with Stacked
and Stacked+, we can see that there is an improvement by the use of these new methods.
For example, in databases band and bala the results with the methods based on Stacked
Generalization are quite good. The largest difference between simple average and other
method is around 4.0% in the problem bala and around 1.5% in the problem band.
Comparing the general measurements, the mean PER and the mean IoP, we can
see that Stacked and Stacked+ are the best alternative to combine an ensemble of RBF
458 J. Torres-Sospedra, C. Hernández-Espinosa, and M. Fernández-Redondo
networks. Stacked+ with 3 combination networks is the best way to combine ensembles
of 3 and 9 RBF networks according to the values of the general measurements.
4 Conclusions
In this paper, we have presented experimental results by the use of Stacked and Stacked+,
two new methods based on Stacked Generalization, in order to combine the outputs of
an ensemble of RBF networks, using nine different databases.
We have trained ensembles of 3 and 9 combination networks (MF) to combine a
previously trained ensemble of expert networks (RBF). The results show that, in gen-
eral, there is a reasonable improvement by the use of Stacked and Stacked+ in a wide
majority of databases.
In addition, we have calculated the mean percentage of error reduction over all
databases. According to the values of the mean performance of error reduction, the
new combination methods, Stacked and Stacked+ are the best methods to combine en-
sembles of RBF networks.
Finally, taking into account the computational cost and the values of the general
measuremensts we can conclude that training 3 combination networks, as an ensemble
of MF networks, should be considered to be the best alternative when we combine
ensembles of RBF networks.
Acknowledgments
This research was supported by the project number P1·1B2004-03 entitled ‘Desarrollo
de métodos de diseño de conjuntos de redes neuronales’ of Universitat Jaume I - Ban-
caja in Castellón de la Plana, Spain.
References
1. Torres-Sospedra, J., Hernndez-Espinosa, C., Fernndez-Redondo, M.: Combining MF net-
works: A comparison among statistical methods and stacked generalization. In: Schwenker,
F., Marinai, S. (eds.) ANNPR 2006. LNCS (LNAI), vol. 4087, pp. 302–9743. Springer, Hei-
delberg (2006)
2. Hernndez-Espinosa, C., Fernndez-Redondo, M., Torres-Sospedra, J.: First experiments on
ensembles of radial basis functions. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004.
LNCS, vol. 3077, pp. 253–262. Springer, Heidelberg (2004)
3. Torres-Sospedra, J., Hernndez-Espinosa, C., Fernndez-Redondo, M.: An experimental study
on training radial basis functions by gradient descent. In: Schwenker, F., Marinai, S. (eds.)
ANNPR 2006. LNCS (LNAI), vol. 4087, pp. 302–9743. Springer, Heidelberg (2006)
4. Drucker, H., Cortes, C., Jackel, L.D., LeCun, Y., Vapnik, V.: Boosting and other ensemble
methods. Neural Computation 6, 1289–1301 (1994)
5. Karayiannis, N.B.: Reformulated radial basis neural networks trained by gradient descent.
IEEE Transactions on Neural Networks 10, 657–671 (1999)
6. Karayiannis, N.B., Randoph-Gips, M.M.: On the construction and training of reformulated
radial basis function neural networks. IEEE Transactions on Neural Networks 14, 835–846
(2003)
Stacking MF Networks to Combine the Outputs Provided by RBF Networks 459
7. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Inc., New
York, NY, USA (1995)
8. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience,
Chichester (2004)
9. Wolpert, D.H.: Stacked generalization. Neural Networks 5, 1289–1301 (1994)
10. Ghorbani, A.A., Owrangh, K.: Stacked generalization in neural networks: Generalization
on statistically neutral problems. In: IJCNN 2001. Proceedings of the International Joint
conference on Neural Networks, Washington DC, USA, pp. 1715–1720. IEEE Computer
Society Press, Los Alamitos (2001)
11. Ting, K.M., Witten, I.H.: Stacked generalizations: When does it work? In: International Joint
Conference on Artificial Intelligence proceedings, vol. 2, pp. 866–873 (1997)
12. Ting, K.M., Witten, I.H.: Issues in stacked generalization. Journal of Artificial Intelligence
Research 10, 271–289 (1999)
13. Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning
databases (1998), http://www.ics.uci.edu/∼mlearn/MLRepository.html
14. Torres-Sospedra, J., Hernandez-Espinosa, C., Fernandez-Redondo, M.: A comparison of
combination methods for ensembles of RBF networks. In: IJCNN 2005. Proceedings of In-
ternational Conference on Neural Networks, Montreal, Canada, vol. 2, pp. 1137–1141 (2005)