Neural Processing Letters 8: 181–192, 1998.
© 1998 Kluwer Academic Publishers. Printed in the Netherlands.
181
The Softmap Algorithm
STEVEN RAEKELBOOM and MARC M. VAN HULLE
Laboratorium voor Neuro- en Psychofysiologie, K.U. Leuven, Campus Gasthuisberg, Herestraat 49,
B-3000 Leuven, Belgium. E-mail: marc@neuro,kuleuven.ac.be
Abstract. A new unsupervised competitive learning rule is introduced, called the Self-organizing
free-topology map (Softmap) algorithm, for nonparametric density estimation. The receptive fields
of the formal neurons are overlapping, radially-symmetric kernels, the radii of which are adapted to
the local input density together with the weight vectors which define the kernel centers. A fuzzy code
membership function is introduced in order to encompass, in a novel way, the presence of overlapping
receptive fields in the competitive learning scheme. Furthermore, a computationally simple heuristic
is introduced for determining the overall degree of smoothness of the resulting density estimate.
Finally, the density estimation performance is compared to that of the variable kernel method, VBAR
and Kohonen’s SOM algorithm.
Key words: nonparametric density estimation, vector quantization, unsupervised competitive learning, variable kernel method
1. Introduction
The Self-Organizing (feature) Map (SOM) algorithm [1, 2] was originally conceived for nonparametric regression analysis, whereby the converged topographic
map was intended to capture the principal dimensions of the input space, but it
has also been regarded as a discrete, nonparametric model of the input probability
density ([2], p. 152). However, even though unsupervised competitive learning
(UCL) algorithms such as the SOM were not intended to model the fine-structure of
the density distribution, there is a more fundamental problem with them: Ritter and
Schulten [3] have shown that the converged weight vectors undersample high probability regions and oversample low probability regions in the density distribution.
In other words, the neuron weights do not perform an equiprobabilistic quantization of the input space. Instead, these learning rules are aimed at minimizing
the average distortion due to quantization, usually the mean squared error (MSE).
Adding a ‘conscience’ to frequently winning neurons to feel ‘guilty’ and to reduce
their winning rate [4], rather than adding a neighborhood function, as done for the
SOM algorithm, may be a good heuristic to achieve an equiprobabilistic quantization however, it is no more than that since, also for this case, equiprobabilistic
quantization is in general not equivalent to MSE minimization [5].
182
STEVEN RAEKELBOOM AND MARC M. VAN HULLE
A more direct way to density estimation is to optimize an information–theoretic
criterion, instead of a distortion criterion, when adapting the neuron weights. Following this lead, the Vectorial Boundary Adaptation Rule (VBAR) [6, 7, 8] was
introduced for developing lattices with regular and fixed topologies, just like the
SOM algorithm, but for which a formal guarantee exists that an equiprobabilistic
weight distribution is achieved at convergence. A drawback of this rule is that the
lattice dimension should match that of the input space in which it is developed.
Furthermore, the rule is far too complicated to be practical for building quantizers
in higher-dimensional spaces.
In the present article, we will introduce a new unsupervised competitive learning rule, called the Self-organizing free-topology map (Softmap) algorithm, for
nonparametric modeling of stationary density distributions of arbitrary dimensionality. To each neuron a kernel is associated the radius of which is adapted so as to
model the local input density (equiprobabilistic quantization). We will introduce
a technique for determining the overall degree of smoothing performed by the
kernels. The technique is different and more practical than Parzen’s [9, 10].
2. Softmap Algorithm
Consider a network A consisting of N formal neurons, which quantize the input
space V ⊆ ℜd , and a set V = {vm } of M input samples, M > N, drawn from a
probability density function f (v), with v ∈ V . To each of the N neurons corresponds a circular- (or hyperspherical-, in general) Receptive Field (RF) region Si
centered at wi = [wi1 , ..., wid ] and with radius σi (Figure 1). We associate with
each Si a code membership function
1li (v) =
1 if v ∈ Si
0 if v 6 ∈ Si .
(1)
Depending on this function, the weights and radii are adapted in an iterative learning scheme in such a way that an equitable distribution of neuron weights is obtained at convergence and for which, in addition, all neurons have an equal probability to be active (equiprobabilistic quantization). This will be done by using two
learning rules which form the basis of the Softmap algorithm.
The purpose of the radii updates is to achieve an equiprobabilistic quantization
in V -space: Pi = N1 , ∀i ∈ A, with Pi the probability that receptive field region Si
is activated given the training set V. Before deriving the learning rule for the radii,
consider first the following reasoning. Assume that, for each neuron, we determine
for all M samples of V the Euclidean distance to the neuron’s receptive field center
wi , and rank these distances using a sorting algorithm such as Quicksort. We then
M/N
determine the radius σi Pof Si for which the corresponding code membership
⌋ ≤ vm ∈V 1li (vm ) ≤ ⌊ M
⌋ + 1, where ⌊.⌋ denotes the integer
function satisfies: ⌊ M
N
N
183
THE SOFTMAP ALGORITHM
Figure 1. Neurons i and j have circular receptive field regions Si and Sj (circles drawn with
continuous lines) centered at wi and wj (small open dots) in 2-dimensional input space V . The
fuzzy means of the input samples falling in Si and Sj are Mi and Mj , respectively (small filled
dots). The arrow indicates the update of wi given the present fuzzy mean Mi (not to scale).
The dashed circles indicate the update of the receptive field regions given their respective
distributions of input samples falling in- and outside Si and Sj (also not to scale). Note that
the updated circles should also shift since their centers are updated, but this is not shown for
clarity’s sake. The shaded area indicates the intersection between regions Si and Sj (overlap).
part of its argument, so that we have for Pi : M1 ⌊ M
⌋ ≤ Pi ≤ M1 (⌊ M
⌋ + 1). In an iterN
N
M/N
ative learning scheme, we update the σi ’s after each epoch, given the present σi .
An equiprobabilistic quantization is achieved by minimizing the energy function⋆ :
Eσ =
N
X
i=1
M/N
|σi − σi
(2)
|,
M/N
i.e., a positive definite function, with σi the present radius of neuron i and σi
the
desired one. The batch update rule is then obtained by performing gradient descent
on Eσ :
∂Eσ
M/N
= −η sgn(σi − σi ), ∀i ∈ A,
∂σi
1σi = −η
(3)
with sgn(.) the sign function, and η the learning rate (a sufficiently small, and
M/N
positive number). Since it is cumbersome to determine σi , we consider the
following analogy:
M/N
sgn(σi − σi
) = sgn(Pi −
1
)
N
⋆ From this point on, we shall ignore whether or not M is an integer.
N
(4)
184
STEVEN RAEKELBOOM AND MARC M. VAN HULLE
M/N
which holds except when for Pi = N1 , σi 6 = σi . This poses no problem since
it still represents an equiprobabilistic quantization for neuron i: we only have to
.
define the sign function in such a way that sgn(0) = 0 so that no update will
be performed when this case occurs. Hence, the batch update rule for the radii
becomes
1σi = −η sgn(Pi −
1
), ∀i ∈ A.
N
(5)
The receptive field centers wi should be updated in such a way that an equitable distribution of them is obtained at convergence. This can be achieved by
reducing the overlap between the RF regions. To this end, we define a fuzzy code
membership function 4i [11]:
1li (v)
, ∀i ∈ A,
k∈A 1lk (v)
4i (v) = P
and use this function to obtain the fuzzy mean of the inputs that activate Si :
P
m
m
vm ∈V v 4i (v )
,
Mi = P
m
vm ∈V 4i (v )
(6)
(7)
if the denominator is non-zero, else wi is left unchanged. This formulation enables
us to compile the energy function for the weights:
Ew =
N
X
i=1
|Mi − wi |,
(8)
with wi the present RF center and Mi the desired one, given the current distribution
of RF regions. The batch update rule for the RF centers becomes
1wi = −η
∂Ew
= η Sgn(Mi − wi ), ∀i ∈ A,
∂wi
(9)
with Sgn(.) the sign function taken componentwise. Since an input shared by
different neurons will lead to smaller weight updates, there will be a competitive
element in the learning process and the RF centers will tend to be ‘pulled’ apart by
the unbalanced weight update strengths (Figure 1). Note that, since in general the
RF regions overlap, 4i is a measure for the probability that neuron i belongs to the
subset of the activated neurons, rather than the probability that neuron i is active.
Both rules Equations (5) and (6) define the Softmap algorithm. As is usually
the case with learning rates, an ‘optimal’ value is not always readily available.
However, simulations have shown that, by taking them equal, the weights and
radii will converge in a satisfactory manner. Furthermore, the training set size M
depends on the size of the network N. Again empirically, we found that by taking
M = O(10N), or higher, satisfactory quantization results are obtained.
THE SOFTMAP ALGORITHM
185
3. Nonparametric Density Estimation
Since the Softmap algorithm is aimed at producing an equiprobabilistic quantization of the input space, in terms of the RF regions Si , the volumes of the
hyperspheres with radii σi yield local density estimates at the hypersphere centers
wi . For example, for the 2-dimensional case:
fb(wi ) =
1
, ∀i ∈ A.
Nπσi2
(10)
In order to obtain an estimate at any position v ∈ V , we use the previous estimates
fb(wi ) for adjusting the radii (bandwidths) as done in the variable kernel density
estimation technique (VK) of Breiman et al. [12], i.e., a widely-used nonparametric
density estimation technique [10]. The basic idea is to place unit-volume, radiallysymmetric kernels K at a discrete number of ‘observations’ wi and to use the initial,
local density estimates fb(wi ) for obtaining the bandwidths λi of these kernels.
The major difference with classic VK is that we only place kernels at a restricted
number of sites instead of at every input sample, N ≪ M, whereas for VK N ≡ M
(also with respect to the formulas in this article). Note that, in order to obtain a local
density estimate for every input sample, VK relies on another density estimation
technique, such as the histogram-based estimator [10]. The kernel-based density
estimate then becomes, for the d-dimensional case
N K v−wi
X
hλi
1
,
(11)
fbh (v) =
N i=1 (hλi )d
with h a global smoothing parameter, K the standard normal distribution, and
α
g
,
λi =
fb(wi )
v
uN
uY
N
fb(wi ),
(12)
g = t
i=1
with g the geometric mean, and α ∈ [0, 1] a sensitivity parameter. Several authors
[12, 13] have shown that an empirical and theoretical justification exists for taking
α = d1 .
Although the smoothing parameter has a large effect on the shape of the density estimate, its choice is not straightforward. Silverman [10] discusses a range
of techniques for determining h, such as the subjective graph method, the leastsquares or likelihood cross-validation method, the test graph method, and so on. For
the likelihood cross-validation method, computations easily get out of hand since,
for each h-estimate, on the order of M 2 kernel evaluations have to be performed.
Perhaps Parzen’s estimation of the density roughness [9] is the most widely-used
186
STEVEN RAEKELBOOM AND MARC M. VAN HULLE
technique. It is based on the minimization of the MSE at position v, for an optimal
overall degree of smoothing hP (v). For example, in the univariate case:
1
2
1
hP (v) = C(K) N − 5 |f ′′ (v)|− 5 f (v) 5 ,
(13)
with C(K) a constant depending only on the type of kernel used. Unfortunately,
the previous equation depends on the availability of the density estimate as well
as its second derivative. The alternative we propose is a relatively simple and
computationally-efficient heuristic. It also relies on the minimization of a MSE
value, and it also starts from the same premises as the ones which led to Equation (13) but, instead of minimizing the MSE by referring to the unknown exact
density f , we select the overall degree of smoothing in such a way that the MSE
between the kernel-based density estimate fbh , located at the wi ’s, and the initial
density estimate fb Equation (10) (i.e., without using kernels), at the same loci, is
minimized:
2
PN
fbh (wi ) − fb(wi )
.
i=1
∗
.
(14)
h = arg min MSE(fbh , fb) = arg min
h
h
N
The underlying hypothesis is that
h∗ = min MSE(fbh , fb) ≈ hopt = min MSE(fbh , f ),
h
h
(15)
at the observation sites wi , with hopt the theoretically-optimal h-value. The viability of this hypothesis depends on the quality of the initial density estimates.
However, since in our case the initial estimates are expected to be more accurate than those used in VK, since Softmap uses more samples for determining
each one of them, the proposed approach seems feasible. We will test this point
experimentally in the next section.
4. Simulations
As an example density function, we consider a juxtaposition of two bivariate and
truncated exponential distributions (Figures 2A and 3A). This distribution is particularly difficult due to the presence of discontinuities and the fast rising slopes.
On the other hand, for reasons of comparison with other methods, we have introduced a bounded support for the original exponential distribution by truncating and
renormalizing it. The resulting density function f is, analytically
1 −βv2 )
f (v) = I (v1 ) γβ exp (−γv
γ1 β1
γβ exp (−γv1 −β(1−v2 ))
+ (1 − I (v1 ))
,
γ2 β1
(16)
with v = [v1 , v2 ], γ1 = 2(1 − exp (−γ/2)), γ2 = 2(exp (−γ/2) − exp (−γ)),
β1 = 1 − exp (−β), and I (v1 ) = 1 if v1 < 0.5, else I (v1 ) = 0. For γ and β, the
187
THE SOFTMAP ALGORITHM
B 1
v2
0
E 1
v2
0
0
v1
v2
0
1
0
v2
v1
0
D 1
C 1
0
1
0
v1
1
v1
1
Figure 2. Greyscale representation of density function Equation (16) (A), and distribution of
neuron weights (crosses) of a 15 × 15 lattice trained with Kohonen’s SOM algorithm, with
vanishing and non-vanishing neighborhood range (termed SOM0 and SOM1 ) (B,C), VBAR
(D), and Softmap (E).
usual parameters of an exponential distribution along the v1 and v2 axes, we take 3
and 1.5, respectively.
Before a nonparametric model of f can be obtained, given a finite set of samples
drawn from it, initial density estimates are needed. For the sake of comparison, we
first implement classic VK, thus with N ≡ M, and determine the initial density
estimates by using the histogram-based technique: (1) the unit square [0, 1]2 is
discretized by using a uniform grid sized 40 × 40 quadrilaterals, and (2) the proportion of M samples falling in each quadrilateral, divided by the quadrilateral’s
surface area, yields the initial density estimate for each one of the samples falling
in that quadrilateral. The initial density estimates are then used for selecting the
bandwidths of the M kernels in classic VK. The overall degree of smoothing is
determined by using our technique Equation (14) (step size 1h = 0.00375). Since
VK is computationally intensive, since M kernels have to be evaluated, we increase
M until the MSE performance in Equation (14) stabilizes, which occurs for M
around 10,000, so that all results shown will be for M = 10, 000 samples (results
labeled by VK).
We also determine the initial density estimates using Softmap, for N ≪ M,
as well as two other network-based techniques: VBAR [6, 7, 8], and the SOM
algorithm [1, 2]. For all three techniques, we use the same training sets V consisting
of M = 45, 000 samples for training the lattices sized N = 15 × 15 neurons until
tmax = 200 epochs have elapsed. (Note that the samples used in classic VK were
also drawn from this training set.) In addition, only for Softmap, also a lattice
sized N = 10 × 10 neurons is trained but now by using M = 20, 000 samples.
188
STEVEN RAEKELBOOM AND MARC M. VAN HULLE
For all three network-based techniques, the neuron weights are initialized on a
uniform grid in the unit square [0, 1]2 , since Softmap is not topology-preserving;
the radii for Softmap are initialized as 2√1N . For VBAR and the SOM algorithm, the
learning rates are linearly decreased, starting from 0.00035 and 0.001, respectively,
until zero is reached at tmax . For Softmap, the learning rates of the weights and
the radii start at 0.025 and are also linearly decreased. Furthermore, only for the
SOM algorithm, we use a Gaussian neighborhood function 3, expressed in lattice
coordinates, and decrease its range σ3 in the following way:
t
σ3 (t) = min σ3 min , σ30 exp(−2 σ30
) ,
(17)
tmax
with t the present epoch, tmax the maximum number of epochs, σ30 the range
spanned by the neighborhood function at t = 0, with σ30 = 6, and σ3 min the
minimum neighborhood range. For reasons to be explained, two implementations
of the SOM are considered, namely with σ3 min = 0 and 1, termed SOM0 and
SOM1, respectively.
The converged neuron weights of the 15 × 15 lattices are shown in Figures 2B–
E. The initial density estimates are obtained as follows. For Softmap, we use the
radii as indicated in Equation (10). For VBAR and the SOM algorithm, no such
direct density measure is available except for the distribution of the neuron weights
themselves, but this is difficult to quantify. In order to solve this problem, we
determine the surface area of each quadrilateral in the trained lattice and locate the
corresponding density estimate at the average of all input samples falling in each
lattice quadrilateral. The final density estimates are then obtained, for all three
network-based techniques, by locating adaptive kernels at the sites of the initial
density estimates; the optimal degree of smoothing performed by these kernels is
determined as indicated in Equation (14) (1h = 0.00375).
In order to quantify the density estimation performance of these algorithms, we
consider the following error metrics (Table I). We consider the MSE between the
c b
initial density estimate fband the final one fc
h∗ , i.e. MSE(fh∗ , f ) Equation (14), and
between the final density estimate and the analytical function f at the same loci as
for the final density estimate, MSE(fc
h∗ , f ). Furthermore, we also determine the
MSE between the final density estimate and f when we would use the optimal
degree of smoothing hopt instead of h∗ , MSE(fd
hopt , f ). The corresponding kernels
are evaluated on a uniform 20 × 20 grid within the unit square [0, 1]2 so as to yield
the nonparametric models shown in Figures 3B–E by using the h∗ -value which
b
yielded the lowest MSE(fc
h∗ , f ) (Table I). For the sake of comparison, we also
show the nonparametric model obtained with VK (Figure 3F).
From Table I we observe that VK and VBAR yield superior MSE(fc
h∗ , f ) results
compared to Softmap and the SOM algorithm (SOM0 and SOM1 ). On the other
hand, Softmap yields a smoother nonparametric model which better captures the
overall shape of the density function (Figure 3E). This is even more the case when
comparing the density models generated by Softmap and the SOM0 implementa-
189
THE SOFTMAP ALGORITHM
A
D
f
3
^ 3
f h*
1
0
0
v2
v1
1
0
0
v2
v1
10
B
10
E
^ 3
f h*
^ 3
f h*
1
0
0
v2
v1
1
0
0
v2
v1
10
C
10
F
^ 3
f h*
1
0
0
v2
v1
10
^ 3
f h*
1
0
0
v2
v1
10
Figure 3. Density function Equation (16) (A), and nonparametric models obtained by allocating adaptive kernels at the weights shown in Figures 2B–E (B–E), and by using the variable
kernel method (F). See text for details.
tion (Figure 3B): due to the occurrence of a series of phase transitions in SOM’s
with short neighborhood range [14], the quality of the probability density estimate
ensuing from the converged lattices is considerably deteriorated [7]. However, if
we allow for a non-vanishing neighborhood range, these phase transitions can be
avoided, and a smoother weight distribution will be obtained [15]. This is confirmed by the SOM1 result shown in Figure 2C. The corresponding density model
(Figure 3C) is now smoother, and the MSE(fc
h∗ , f ) improves (Table I), but it suffers
from border effects by the reduced ability of the weight distribution to fully cover
the input space (compare Figures 2B and C).
Compared to VK, the nonparametric density function obtained with VK is supported by much more kernels than in the case of Softmap but, on the other hand, as
190
STEVEN RAEKELBOOM AND MARC M. VAN HULLE
Table I. Density function estimation using four different methods (rows): the variable kernel
method (VK), Kohonen’s SOM algorithm, VBAR and Softmap. The term between brackets
[.] indicates the lattice size. See text for explanation.
method
h∗
VK
SOM0 [152 ]
SOM1 [152 ]
VBAR [152 ]
Softmap [102 ]
Softmap [152 ]
0.01875
0.02625
0.03000
0.04500
0.11250
0.07125
d
MSE(f
h∗ , fb)
0.0135
0.0211
0.0059
0.1723
0.1311
0.1482
d
MSE(f
h∗ , f )
0.0594
0.1635
0.1246
0.0688
0.1939
0.1361
hopt
0.03000
0.03750
0.02625
0.03750
0.11250
0.07500
MSE(fd
hopt , f )
0.0413
0.1225
0.1183
0.0653
0.1939
0.1356
Figure 4. Mean squared error (MSE) as a function of the overall degree of smoothing h plotted
for a 10 × 10 (thin and thick continuous lines) and a 15 × 15 lattice (thin and thick dashed
lines) trained with Softmap when 200 epochs have elapsed. The thin continuous and thin
dashed lines correspond to the MSE between the final density estimate fbh and the theoretical
function f , respectively; the thick continuous and thick dashed lines correspond to the MSE
between the final- and the initial density estimates, fbh and fb, respectively.
THE SOFTMAP ALGORITHM
191
mentioned above, the initial density estimates are expected to be less accurate than
those obtained by Softmap. When comparing the weight distributions (Figures 2B–
E), we observe that VBAR, unlike Softmap, yields initial estimates around the
discontinuity region in f , which in turn results in lower MSE values. Owing to
its heuristic nature, Softmap attracts proportionally more neuron weights in high
density areas. As a result of this, the final density estimate will not be as well
defined in the neighborhood of the discontinuities, as with VBAR. This results in a
higher degree of smoothing h∗ (Table I) and hence, a smoother density model. We
also observe that our technique for determining the overall degree of smoothing
works well since for almost all cases h∗ is quite close to the optimal value hopt
(Table I). The MSE as a function of the overall degree of smoothing is plotted
for Softmap in Figure 4. Finally, in light of these results, we can conclude that,
despite of its heuristic nature, Softmap yields an acceptable nonparametric density
model. The advantage of Softmap lies in the simplicity of its design, its computational efficiency and its ability to model stationary density functions of arbitrary
dimensionality.
Acknowledgements
M.M.V.H. is a research associate of the Fund for Scientific Research – Flanders
(Belgium) and is supported by research grants received from the Fund for Scientific
Research (G.0185.96), the National Lottery (9.0185.96), the Research Fund of the
K.U. Leuven (F/95/138), the Flemish Regional Ministry of Education (Belgium)
(GOA 95/99–06), and the European Commission (ECVnet EP8212).
References
1.
T. Kohonen, “Self-organized formation of topologically correct feature maps”, Biol. Cybern.,
Vol. 43, pp. 59–69, 1982.
2. T. Kohonen, Self-organizing maps, Springer: Heidelberg, 1995.
3. H. Ritter and K. Schulten, “On the stationary state of Kohonen’s self-organizing sensory
mapping”, Biol. Cybern., Vol. 54, pp. 99–106, 1986.
4. S. Grossberg, “Adaptive pattern classification and universal recoding: I. Parallel development
and coding of neural feature detectors”, Biol. Cybern., Vol. 23, pp. 121–134, 1976.
5. M.M. Van Hulle and D. Martinez, “On an unsupervised learning rule for scalar quantization
following the maximum entropy principle”, Neural Computation, Vol. 5, pp. 939-953, 1993.
6. M.M. Van Hulle, “Nonparametric Density Estimation and -regression Achieved with a
Learning Rule for Equiprobabilistic Topographic Map Formation”, in Proc. IEEE NNSP96,
pp. 33–41, Seika, Kyoto, 1996.
7. M.M. Van Hulle, “Topographic Map Formation by Maximizing Unconditional Entropy: A
Plausible Strategy for ‘On-line’ Unsupervised Competitive Learning and Non-Parametric
Density Estimation”, IEEE Transactions on Neural Networks, Vol. 7(5), pp. 1299–1305, 1996.
8. M.M. Van Hulle, “Topology-preserving map formation achieved with a purely local unsupervised competitive learning rule”, Neural Networks, Vol. 10(3), pp. 431–446, 1997.
9. E. Parzen, “On estimation of a probability density function and mode”, Ann. Math. Statist.,
Vol. 33, pp. 1065–1076, 1962.
192
10.
STEVEN RAEKELBOOM AND MARC M. VAN HULLE
B.W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall:
London, 1992.
11. L.A. Zadeh, “Fuzzy sets”, Information and control, Vol. 8, pp. 338–353, 1965.
12. L. Breiman, W. Meisel and E. Purcell, “Variable kernel estimates of multivariate densities”,
Technometrics, Vol. 19, pp. 135–144, 1977.
13. I.S. Abramson, “On bandwidth variation of kernel estimates – a square root law”, Ann. Statist.,
Vol. 10, pp. 1217–1223, 1982.
14. R. Der and M. Herrmann, “Instabilities in Self-Organized Feature Maps with Short Neighborhood Range,” in M. Verleysen (ed.) Proc. of the European Symposium on Artificial Neural
Networks – ESANN’94, pp. 271–276, Brussels, Belgium, 1994.
15. F. Mulier and V. Cherkassky, “Self-organization as an iterative kernel smoothing process”,
Neural Computation, Vol. 7, pp. 1165–1177, 1995.