Fugenesys - A Fuzzy Genetic Neural System For Fuzzy Modeling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

See

discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/3335773

FuGeNeSys - A fuzzy genetic neural system for


fuzzy modeling

ARTICLE in IEEE TRANSACTIONS ON FUZZY SYSTEMS · SEPTEMBER 1998


Impact Factor: 6.31 · DOI: 10.1109/91.705506 · Source: IEEE Xplore

CITATIONS DOWNLOADS VIEWS

128 126 100

1 AUTHOR:

Marco Russo
University of Catania
123 PUBLICATIONS 827 CITATIONS

SEE PROFILE

Available from: Marco Russo


Retrieved on: 19 September 2015
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 6, NO. 3, AUGUST 1998 373

FuGeNeSys—A Fuzzy Genetic


Neural System for Fuzzy Modeling
Marco Russo

Abstract—The author has developed a novel approach to fuzzy


modeling from input–output data. Using the basic techniques
of soft computing, the method allows supervised approximation
of multi-input multi-output (MIMO) systems. Typically, a small
number of rules are produced. The learning capacity of FuGe-
NeSys is considerable, as is shown by the numerous applications
developed. The paper gives a significant example of how the fuzzy
models developed are generally better than those to be found
in literature as concerns simplicity and both approximation and
classification capabilities.
Index Terms— Fuzzy neural networks, genetic algorithms,
learning systems.

I. INTRODUCTION

T HE term soft computing (SC) was recently coined by


Zadeh [1]. In the paper in which he uses this new term, he
introduces the concept of machine intelligence quotient (MIQ)
Fig. 1. Soft computing.

and states that the current trend in research is toward producing


machines with an increasingly higher MIQ. The time scale shown in Fig. 1 is ordered according to the
An increase in MIQ is, in fact, being achieved in two totally learning time. FL, as conceived of by Zadeh, is not capable of
separate directions. learning anything. NN’s and GA’s (on the other hand) have
On the one hand, there are machines based on classical logic this capacity although it can be said that on average pure
and, thus, on precision, which are evolving toward faster and GA’s generally need a longer learning time. From another
faster machines with an increasing degree of parallelism. viewpoint, the order is inverted. GA’s, in fact, need no a priori
On the other hand, new forms of logic are being developed knowledge, NN’s need very little and FL at times needs quite
such as fuzzy logic (FL), neural networks (NN’s), and proba- detailed knowledge of the problem to be solved.
bilistic reasoning (PR) [comprising genetic algorithms (GA’s)] In effect, each of these three areas of SC has its advantages
the strength of which (by contrast) lies in their capacity to deal and disadvantages. FL does not share the inherent concept
with uncertainty and imprecision. of NN’s, i.e., automatic learning. So it is impossible to use
In the former, case solutions are found for problems that when experts are not available. It does, however, have a great
are very precise and, consequently, have a high computational advantage over the other two techniques. Expressed according
cost. This is why Zadeh speaks of hard computing (HC). to fuzzy canons, the knowledge base is computationally much
In the latter case, on the other hand, he speaks of SC as less complex and the linguistic representation is very close to
imprecise or uncertain solutions can be found at a much lower human reasoning.
cost in terms of calculation effort. NN’s, at least as regards the typical features of gradient
There are a number of cases in which excessive precision descendent learning networks, are quite different. In the first
is quite useless so a nontraditional approach is preferable. In place, they were conceived of specifically for learning. They
some problems, this is the only way because the computational are, therefore, fundamental when all that is available is some
complexity of a classical approach would be prohibitive. significant examples of the problem to be solved rather than
Fig. 1 shows, at least as far as FL, NN’s, and GA’s a solution algorithm. There are two evident disadvantages
are concerned, how the various components of SC can be in using NN’s. In general, they can learn correctly from
approximately ordered on a time scale and on a scale relating examples, but what is learned is not easy for humans to
to their learning capacity. understand, i.e., the knowledge base extracted from them
does not have such an intuitive representation as that pro-
vided, for example, by FL. Second, the type of functions
Manuscript received April 2, 1996; revised June 29, 1997. that can be used in NN’s have to possess precise regularity
The author is with the Institute of Computer Science and Telecommunica-
tions, Faculty of Engineering, University of Catania, Catania, 95125 Italy. features and the derivative of these functions has to be
Publisher Item Identifier S 1063-6706(98)05153-4. known a priori.
1063–6706/98$10.00  1998 IEEE
374 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 6, NO. 3, AUGUST 1998

Similar considerations hold for GA’s, with certain clari- accurate among all hybrid fuzzy learning methods
fications. Their learning speed is usually slower. However, present in literature.
they have two great advantages over NN’s. The functions The tool’s learning capacity is also demonstrated by the
that can be used in GA’s can be much more general in various research activities the author has taken part in and in
nature and knowledge of the gradient of the functions is not which FuGeNeSys has been used. The fields of application in
usually required. Finally, as these algorithms explore in several which successful use has been made of FuGeNeSys range from
directions at the same time, they are affected much less than pharmacology [3], [16] to telecommunications [4]–[8], from
NN’s by the problem of local extremes; that is, a GA has Hw-Sw codesign [9], [10] to high-energy physics [11]–[13],
far less likelihood than an NN of finding a local extreme and from the design of Hopfield Networks [14], [15] to
rather than a global one. Even if the extreme found is not robotics [2].
a global one, it is likely to correspond to a less significant FuGeNeSys has also been used to show the effect in [17]
learning error. of an incorrect choice of the performance index for any
On the basis of these considerations, it is the author’s supervised learning method in order to determine the validity
opinion that a technique which makes use of a combination of of the learning method adopted.
SC aspects, i.e., GA’s, NN’s, and FL, would be an interesting The paper is organized as follows. Section II gives a detailed
prospect. A hybrid technique, in fact, would inherit all the introduction to GA’s. Section III is an introduction to FL
advantages, but not the less desirable features of the single and Section IV is a detailed description of FuGeNeSys. In
SC components. Section V, FuGeNeSys performance is analyzed with varying
It has to possess a good learning capacity, a better learning user-defined parameters. There is also a subsection comparing
time than that of pure GA’s, and less sensitivity to the FuGeNeSys with other techniques. Finally, Section VI gives
problem of local extremes than NN’s. In addition, it has the author’s conclusions.
to generate a fuzzy knowledge base, which has a linguis-
tic representation and a very low degree of computational II. GENETIC ALGORITHMS
complexity.
GA’s are a vast class of stochastic algorithms used to solve
This paper will give a detailed illustration of a tool devel-
optimization problems. They use models based on the natural
oped by the author called FuGeNeSys [2], which is based on
process of adaptation of living species, the first concepts of
a mixed technique GA’s NN’s FL.
which were introduced by Darwin [18] in 1859 with his theory
The tool allows supervised learning of fuzzy rules from
significant examples. The main features of FuGeNeSys are as of the evolution of species.
follows. Various authors have proposed algorithms that simulate
genetic systems [19]–[21]. Interesting overviews can also be
• The number of necessary rules is always very low. In
found in [19], and [22]–[25].
the various author developed applications [2]–[15], this
number has almost always been below ten. This means
A. The Natural Processes Behind Darwin’s Theory
that is not tied to the number of inputs as in the
previous coding methods, but only to the complexity of Behind the theory of evolution there are only four basic
the application itself. In the Section V-C he will explicitly processes:
show this peculiarity. • selection;
• The original genetic coding method proposed does not • reproduction;
explode as the number of input variables and the • mutation;
number of outputs increase. The total number of bits • competition.
is proportional to and and not to the power of . Selection is the natural process whereby individuals that
• Simplified fuzzy knowledge bases can be generated, i.e., can reproduce are chosen.
the tool is capable of eliminating the unnecessary an- Reproduction is the mechanism nature uses to transfer the
tecedents in any rule. This is proved in all fuzzy in- genes of an individual to its offspring.
ferences showed in the paper and in those presented in Mutation is the phenomenon by which, because of nondeter-
author’s previous works. ministic factors, the process of reproduction does not transfer
• Significant features are correctly identified. A whole genetic characteristics faithfully; that is, the resulting genes
section (V-B) is devoted to show this characteristic. contain a number of “copying” errors.
• The tool can be used in both classification and interpola- Competition is a direct consequence of the growth of pop-
tion problems (see Sections V-A and V-C). ulations in environments with a limited number of resources.
a) The misclassification error is comparable to, if not
better than, that of other similar techniques described B. Standard GA’s
in literature. In this section, a summary of the basic steps on which
classical GA’s are based will be made.
b) Also, the function approximation error is very good.
Let us assume, for the sake of simplicity, that the function
The author compared his results with neuro fuzzy
to be optimized is
techniques. In the author’s opinion, with this kind of
problem, the neuro fuzzy approaches are the most (1)
RUSSO: FuGeNeSYS—FUZZY GENETIC NEURAL SYSTEM FOR FUZZY MODELING 375

and, thus, that the space of solutions is represented by the


vectors .
1) Choice of the Fitness Function: The first step is that of
analyzing the optimization problem and identifying a suitable
fitness function

(2)
Fig. 2. An example of crossover with two “cuts.”

which will make it possible to discern the better of two


possible solutions .
It has to be chosen in such a way that the extreme of individuals have a higher probability of reproducing. Each
the function to be optimized (1) corresponds to the absolute chromosome of the population is chosen with a certain
minimum or maximum of the fitness function. probability prob of reproduction. The better the fitness
Sometimes the fitness function chosen is the function to function, the higher this probability has to be. The reproduction
be optimized if it has as its codomain, but this does not probability is often defined directly from the fitness function
generally happen because, as we shall see, the type of fitness [23]
function chosen may greatly affect the result of the search for
a solution. (4)
2) Definition of Suitable Coding: One of the most interest-
ing problems in GA’s is coding the solution space.
Typically, any potential solution (individual) is coded as
when the fitness function is nonnegative. Using this repro-
a vector called chromosome, the elements of which are
duction probability makes it possible to simulate a sort of
called genes and are situated in well-defined positions, usually
roulette, which stochastically indicates which individuals in
indicated as alleles.
the population can reproduce.
The best-known forms of coding [20], [21], [26] are strictly
6) Generation of New Individuals: On the basis of the re-
binary. Every possible solution is simply represented by a
production probability values, a new population of chromo-
binary string with . In this type of coding
somes is generated. The scheme followed in “constructing”
we speak of a Boolean allele value.
these new chromosomes is quite simple. Starting from the
Nothing, however, prevents allele values from being con-
chromosomes selected by the roulette, new strings of off-
tinuous or even structured (in [27] and [28], applications are
spring are generated using specific genetic operators such as
described in which allele values are even LISP (processing
crossover and genetic mutation.
language) expressions).
The crossover operator comprises the basic mechanism for
3) Initialization of the Population: The next step consists
redistributing genetic characteristics between the individuals.
of generation, usually random, of a certain number of
It is applied to two chromosomes called parents and creates
individuals. These represent a set of solutions, which, due to
two new individuals. They are generated by selecting one or
the properties of GA’s, evolve in such a way as to possess an
more points at which the parents’ strings are “cut.” The new
increasing fitness value.1
individuals will have a genetic code in which substrings of
4) Decoding and Calculation of Fitness: Each individual
the genetic codes of both parents alternate. An example of
in the population is decoded in such a way that it is
crossover with two cuts is shown in Fig. 2.
possible to calculate the fitness function.
The crossover algorithm described is the classical one. There
In the case of binary coding, for this to be done, a certain
are a number of different versions that various authors have
function has to be defined
used to enhance the performance of GA’s when applied to
specific optimization problems.
(3) The mutation operator simply introduces the artificial con-
cept of copying error. It serves to emulate the natural phe-
which, when applied to a string of bits, returns a potential nomenon whereby offspring sometimes happen to have genes
solution for which the fitness function fit can be with totally different characteristics from their parents because
calculated. of errors due to various factors. This operator is simulated by
5) Calculation of the Reproduction Probability: There are introducing a certain mutation probability according to
various methods for choosing the individuals that are to which an offspring bit is stochastically negated.
reproduce. Typically, but not always, selection is random. One Immediately after the generation of a new individual, the de-
of the best-known criteria for random choice is that of roulette coding operation has to be performed and its fitness calculated
wheel selection [19]. By this method, the individuals with the (see Section II-B.4).
highest fitness are selected stochastically. Hence, the “fittest” 7) Termination Condition: At this point, GA termination
tests are performed. The global process is usually stopped
1 This assertion is true when the fitness function must be raised; vice versa when a valid solution to the problem has been found or
the evolution evolves toward its minimization. when the time available for the search is up. If none of the
376 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 6, NO. 3, AUGUST 1998

is to survive and which member of the subpopulation is to be


replaced.

D. Considerations on Selection Algorithms


The task of the selection operator is to concentrate the
computational effort toward regions of the solution space that
appear to be promising.
To explain the concept of selection more clearly, the fol-
lowing are two diametrically opposed examples.
Let us assume that we are adopting a selection procedure
which concentrates all the hardware resources on the most
promising region in the search space. It is evident that in such
a case, no computational resources are devoted to the search
for other possibilities that, at present, seem less interesting but
which in the future may lead to much better solutions. In this
case, the GA can be said to degenerate into a local search
method such as the well-known gradient descending method.
A uniformly random selection algorithm, on the other
Fig. 3. Typical implementation of a fine-grain model.
hand, distributes calculation resources over the whole genetic
population. In this case, selection guarantees a search in
several areas, but generally disperses a large amount of the
termination tests gives a positive result, the steps outlined hardware resources searching in sectors of the solution space
previously (starting from Section II-B.5) are repeated. that are of no use. In practice, this kind of selection behaves
like a Monte Carlo method, randomly sampling the space of
C. Distribuited and Parallel GA’s admissible solutions.
A compromise between these two types of selection is what
A population in which each individual can cross over with is generally attempted in evolution-based algorithms.
any other in each generation is called panmictic. This category Now the description of some of the best known selection
includes the classical models such as those described above methods follows.
(Section II-B). The nature of GA’s suggested implementing 1) Fitness-Proportional Selection: This selection model is
them on networks of computers or parallel computers (SIMD the most widely used and was devised by Holland; it is a
or MIMD machines). The main problem to be dealt with in good tradeoff between the completely global method and an
this kind of implementation almost always relates to commu- exclusively local one [see (4)].
nication between parallel processes. It was, therefore, decided The method does have some drawbacks, however. If, for
to use strategies with apomictic structures, i.e., ones which instance, a certain fitness function is used, use of another
do not involve complete mixing. The idea behind apomictic fitness function built simply by adding an additional
populations is that of dividing the populations up into a certain component to the first one should not
number of subpopulations (often indicated as demes), which make any diference to the performance a GA can achieve.
are connected to each other in some way. This, however, does not happen. The selection probabilities
There are three apomictic models. may change considerably. Let us assume, for example, that
• Islands—in this model migration can be between any pair we have three individuals with respective fitness values of 0,
of subpopulations; 10, and 90. Their probability of selection according to (4) is
• Stepping Stone—in this model genetic migration can only 0, 10, and 90%. If the fitness function adopted is equal to
occur between adjacent demes; the previous one, except for an additional factor of 100, the
• Continuous—in this model selection and crossover are probabilities decidedly change, becoming 25, 27.5, and 47.5%.
local in an area with a preestablished radius and the Another problem typical of fitness-proportional selection
isolation of demes is a direct consequence of the distance is that of the so-called superindividuals. A superindividual
between them. is a member of the population whose fitness value is much
The last of these models gives rise to what are called fine- higher than the average value. In this case, in a very short
grain GA’s in which the individuals are typically placed on time the population will become homogeneous because at each
a planar grid. genetic iteration the number of individuals similar to the su-
Fig. 3 shows a typical implementation of a fine-grain model. perindividual will increase considerably. If the superindividual
First, there is global selection to identify a subpopulation in the represents coding of a solution close to the one being sought,
genetic grid. Then there is local selection within the selected the search time will be reduced drastically. Unfortunately,
deme to find two parents. Subsequently, the reproduction, however, although the superindividual has a much higher
crossover, and genetic mutation operations are performed. The fitness value than the other individuals, it often represents an
last step is a local conflict to choose which of the offspring unacceptable solution to the problem to be optimized. In this
RUSSO: FuGeNeSYS—FUZZY GENETIC NEURAL SYSTEM FOR FUZZY MODELING 377

case, the GA fails as the search proceeds very slowly. The selection is conceived of in such a way that the parents
crossover operator is, in fact, of no use as the individuals are extracted step by step and the offspring are gradually
are very similar and evolution only proceeds thanks to the introduced into the population.
mutation operator. The GA thus degenerates into a Monte
Carlo-type search method. This phenomenon is known as F. Nonstandard Genetic Operators
premature GA convergence.
At times, to improve performance in terms of both conver-
2) Selection by Linear Ordering: Linear selection [24],
gence speed and accuracy, nonclassical genetic operators are
[29] [often called linear ranking selection (LRS)] was first
introduced. A large number of such operators are described
suggested by Baker [30] to eliminate some of the problems
in literature, but the best-known would seem to be the hill-
inherent in the fitness-proportional selection.
climbing operator. Given a certain individual, this operator
In the LRS, the individuals are sorted according to their
finds an alternative similar individual who represents a local
fitness values. The maximum rank is assigned to the best
minimum close to the original individual in the space of
individual and the lower rank (1) to the worst. The selection
solutions. Typical techniques are gradient descending ones.
probability is linearly assigned to the individuals according to
The author has used this algorithm and objectively confirmed
their rank
its importance (see Section V).
(5)
G. Previous Fuzzy Logic Genetic Coding Methods
where is a real number comprised in the interval [0,2]. In Park et al. [33], only the conclusion part of the fuzzy
3) Tournament Selection: An additional selection method, inference and some control gains are coded. The total number
called tournament selection [19], [25] combines the idea of of rules is a power of the number of inputs.
ranking in a very interesting and efficient way. In Karr and Gentry [34], GA’s are used to alter just the
According to this method, first it needs to select individu- fuzzy set shapes of a predefinite rule base.
als and then the best one among those. This process is repeated They adopt the so-called concatenated, mapped, unsigned
a number of times equal to the population size. binary coding. They use seven bits for each parameter. They
It is clear that large values of increase selective pressure work with trapezoidal membership functions (four parameters)
of this procedure; typical value accepted by many applications or triangular membership functions (three parameters).
is (so-called tournment size). They fix the number of the linguistic terms for each
of the inputs and the one for each output. In the
triangular membership function case, the number of bits in
E. Classification of Selection Models
each individual is .
A scheme classifying the various selection models can The number of rules is equal to . This number
be found in [31]. Using the same taxonomy, the following increases dramatically with the number of the inputs.
orthogonal classes emerge. In [35] and [36], there is another interesting genetic rep-
• Dynamic static—static selection is when the probability resentation. They use the Takagi–Sugeno–Kang (TSK) fuzzy
of selection depends on the fitness value, as in the model [37] where the consequent part of a fuzzy rule is a
case of fitness-proportional selection. Dynamic selection, weighted linear combination of the input values. For each
on the other hand, is represented by ordering or local rule with inputs, it needs membership functions for the
tournament; antecedent part and weights for the consequent part.
• Preservative Extinctive—preservative selection guaran- They fix the maximum number of triangular fuzzy
tees that each individual will have a probability other sets for each input variable. For each one, they defined the
than zero of being selected. This category includes fitness- membership function chromosome (MFC), which contains the
proportional selection (if it is strictly positive) and selec- parameters of a triangular shape. These are the eight bits of
tion by linear ordering if . In extinctive selection, on left base and right base and the distance from the previous
the other hand, some individuals are precluded from the center point (see Fig. 4).
possibility of reproducing. This is the case in selection by For the consequent part, they introduce the rule–consequent
local tournament and linear ordering when . parameters chromosome (RPC) that contains eight bits of each
• Elitist Pure—elitist selection guarantees selection of the rule weight.
fittest individual, while in pure selection the same indi- The total number of bits in each individual is
vidual may be discarded. It is important here to mention .
the work of Rudolph [32]: classical algorithms based on The maximum number of rules is . This number
pure selection never converge, regardless of the type of increases dramatically with the number of the inputs as well.
crossover or mutation operators, the fitness value, or the In the fitness function, they included an additional term
type of initialization. Elitistic selection, on the other hand, to penalize systems according to the number of rules of the
allows convergence to the absolute extreme being sought. system.
• Generational Stationary State—in generational selection, In Ishibuchi et al. [38], GA’s are used to select fuzzy
the set of parents is determined only once until a new rules from a given fuzzy database. Starting from the work
population of offspring is determined. Stationary-state in [39], they generate multiple fuzzy rule tables. Each table
378 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 6, NO. 3, AUGUST 1998

which associates a number between zero and one to


each element of .

B. Fuzzy Reasoning
In general, a fuzzy conditional rule is made up of a premise
and a conclusion
(7)
The premise is made up of a number of fuzzy predicates
(henceforth, also called antecedents) of the general form (Tom
IS fast) that are eventually negated or combined by different
operators such as AND or OR computed with t-norms or
t-conorms.
In the latter example, Tom is the value of the linguistic
variable defined in the universe of the discourse of men and
fast is one of the names of the term set of the linguistic variable
Fig. 4. Takagi and Lee’s membership function representation.
(for example ).
The following is an example of a fuzzy conditional rule
corresponds to a different fuzzy partition. They start from using such operators:
a very coarse partition—only two membership functions per
input—and continue with finer partitions. They consider a (8)
three values coding for each rule. where, for example
• 1 denotes that that rule belongs to the compact final
fuzzy inference.
• 1 denotes that that rule does not belong.
• Zero denotes a dummy rule; that is, a rule that does not
include any pattern and therefore is not useful.
Therefore, the final fuzzy inference is that composed of all the
rules with its coding equal to one. A fuzzy inference is made up of several rules with the same
The main drawback of this method is that the number of output variables. The inference procedure [41] establishes how
starting fuzzy rules increases dramatically with the number the conclusions are to be inferred from this set of rules.
of inputs, so the method cannot work with high-dimensional Fig. 5 gives a practical example of an inference, known as
pattern spaces. The total number of bits is , where the max–min method. The figure outlines the calculation of
is the finest partition of the input term set and is the number the two rules
of inputs. For example, with only two inputs and a partition
equal to six, they need bits. If and
, they need bits.
assuming that is and is we obtain is .
By this method the membership function of the fuzzy output
III. FUZZY LOGIC
set (still with reference to Fig. 5) is obtained as [42]
A. Fuzzy Sets
(9)
In classical set theory, given a value and a subset
of , the element either belongs to the set or
not. So it is possible to associate a Boolean numerical value where
(that is one belonging to the set ) to each of the elements
of , specifying whether or not it belongs to . (10)
If we consider some of the most common adjectives used
in daily conversation such as tall, intelligent, tired, ill, etc., we
see that none of them is particularly precise: they are not used
in terms of classical logic but in terms of a generalization (11)
with multiple values.
The concept of a fuzzy set is a step toward this kind of
In (9), we can see an operation to aggregate the results (max)
generalization. A fuzzy set [40] is a set whose membership
of the whole rulebase, preceded by an implication operation
function can assume all the values in the closed interval
(min) based on the degree of activation of the single rules
. A fuzzy subset of a generic set called the
[see (10)] . This value is obtained by applying the logical
universe of discourse has the following membership function:
connector min to the activation values of the single
(6) antecedents (11).
RUSSO: FuGeNeSYS—FUZZY GENETIC NEURAL SYSTEM FOR FUZZY MODELING 379

Fig. 5. The max–min method.

C. Fuzzification and Defuzzification Besides the classical methods, based on the concept of fuzzy
In a fuzzy system, the desired inputs and outputs are usually output sets, there are a number of computationally simplified
numerical values. It is, therefore, necessary to make a sort defuzzification methods by which it is possible to combine
of “translation” from numerical inputs to fuzzy inputs and aggregation and defuzzification into a single phase.
then from fuzzy outputs to the desired numerical outputs One of them is the weighted-mean (WM) method in which
(crisp values). The first transformation is universally called
fuzzification and the second defuzzification.
1) Fuzzification: Fuzzification consists of building the re-
lation for the fuzzy input. (15)
Inputs are normally crisp so this phase is necessary. Given
values of crisp inputs in a fuzzy system, it is necessary
to construct the same number of fuzzy sets , i.e., a fuzzy
operator is needed for which where is the number of rules, is the degree of activation
of the th rule, and is a numerical values associated with
(12) the consequent of the rule. This value is usually calculated
The fuzzification operation is often greatly simplified. In using another defuzzification method. In the Yager method
such cases, the numerical value is associated with the corre- [43], the value is the mean value of the alpha-level set equal
sponding singleton to ; that is, is the mean point of the segment obtained as
f the intersection between the fuzzy set of the consequent and a
(13) line parallel to the universe of discourse with a height of .
otherwise
The fuzzy sets are, of course, assumed to be convex.
At times, fuzzy numbers [40] are used to model uncertainty
The weighted-sum (WS) method [44] is even less compu-
and imprecision.
tationally onerous, but this time the output of the single rule
2) Defuzzification Methods: There are various defuzzifica-
is relative, not absolute. The contributions of all the rules are,
tion methods.
in fact, simply summed
Defuzzification is usually achieved by average calculation.
One of the best-known methods is the center of gravity (COG)
method, which is the method used to calculate the barycenter (16)
of a mass.
A discrete representation of COG is
where the symbols used are the same as those used to describe
(15).

(14) IV. FuGeNeSYS

A. The Coding Used


where is the number of quantization steps by which the Each individual in the population is made up of a set of
universe of discourse of the membership function rules, each of which comprises inputs (antecedents) and
is discretized. outputs (consequents).
380 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 6, NO. 3, AUGUST 1998

The membership functions are Gaussian of the following Using the defuzzification method shown in (20), two sepa-
form: rate values were again stored for each consequent, both coded
with 16 bits.
(17)
It became apparent that the learning capacity of FuGeNeSys
Therefore, to code these functions we need two kinds of did not deteriorate significantly if a single value—the output
information: the center and the sigma . singleton—was coded for each consequent. The defuzzification
In FuGeNeSys, it was decided to code and the inverse of method in this case is still of the centroid type, but the formula
. This was done for two different reasons. The first was to is
eliminate the division operation and thus optimize the learning
time. The second was to eliminate the annoying singularity in
the equation of the Gaussian in the proximity of . From
(22)
the point of view of implementation, the Gaussian function
considered is written as
(18)
This method is clearly a WM defuzzification and is similar to
For each antecedent it is therefore necessary to code infor- the Yager method. So for each consequent a single value is
mation concerning a center and the width value . A 16-bit coded using 16 bits. With this coding it is also possible to use
coding was chosen for each of them. If the value of is equal a more simple defuzzification method such as WS
to zero, it is assumed that in the rule in question the antecedent
relating to the null is missing.
FuGeNeSys only considers crisp inputs, so the degree (23)
of membership of the th antecedent of the th rule
corresponding to the th crisp input will be given by Currently, one of these two methods can be chosen. In the
various applications in which FuGeNeSys has been used, it
(19) has been found that the second one works better as far as the
Various versions of FuGeNeSys have been implemented. approximation of functions is concerned, while the first one is
Substantially, the difference between them is in calculation of preferable in classification problems.
the conclusion and the method used for defuzzification. It is important to note that the first method (22) has the
The consequents have been coded in two different ways. The advantage that the knowledge base learned seems easier for a
first, which proved to be the most computationally complex, human to understand.
was used in the early versions of FuGeNeSys [3], [6]–[8], [10]. The WS defuzzification method, however, uses on average
In the second type of coding, two defuzzification methods less time for a genetic iteration than the other methods as it
are possible, one of which is computationally quite simple. is computationally less onerous.
The first method for coding the information about the Currently, each individual in the genetic population is
consequents is based on the use of a sort of weighted mean represented by words, each of 16 bits. , ,
which additionally associates a weight to each rule and and are, respectively, the number of inputs, outputs, and
consequent. The following formula was used for defuzzifica- rules. words are required to store the fuzzy sets of
tion: all premise of all rules .
words are required to store each consequrnt of each rule
, .
(20)
B. The Evolution Algorithm Used
To implement FuGeNeSys, an apomictic, continuous, and
fine-grain evolution algorithm was used. This was to allow
where is the degree of activation of the premise of the th the user to choose to slow down the process of genetic
rule, is the weight of the th consequent of the th rule, homogenization.
and is the output singleton. This formula coincides with In practice, the individuals are assumed to be on a rectan-
the MAX–DOT method if it is assumed that the parameters gular grid and are selected from a deme with a user-defined
and are, respectively, the center and area of any fuzzy radius. The selection is static, preservative, elitist, steady-state,
set symmetrical with . and fitness proportional.
It is useful to note here that to calculate the degree of truth As mentioned previously, in fine grain models there are
of a generic premise, it was assumed that all the connectors in two types of selection—global and local. In this case, global
the premises were AND’s for which the algebraic minimum selection is fitness-proportional as in (4) and identifies the
operator was chosen, The degree of activation of a rule is center of the subpopulation deme. Local selection uses the
thus given by the minimum degree of truth of the various same equation but only in a subgrid with a pre-established
antecedents in the rule radius.
(21) Two parents generate a single offspring using a crossover
operator with a single cut. Then the mutation operator is
RUSSO: FuGeNeSYS—FUZZY GENETIC NEURAL SYSTEM FOR FUZZY MODELING 381

always greater than zero was chosen so as to be able to use


stochastic selection (see Section II-B.5).
The function to be maximized is simply a rational fraction
in which some parameters can be varied directly by the
FuGeNeSys user.
The author has added some correction terms to improve
the quality of the solution FuGeNeSys obtains, using a term
whose weight is user-defined and forces the solution to search
for simplified rules. In practice, the correction factor added
privileges rules with few antecedents and/or consequents.
Another correction factor that has been introduced allows
the user to force the program to search for solutions that
take the smallest possible number of features into account.
Finally, a third factor has been introduced, again with a
user-defined weight, which prevents the various rules from
including some whose contribution is of negligible importance.
In certain situations, this factor, which has to be used with
Fig. 6. The architecture of the neuro fuzzy system. The system has two
inputs, one output, and two rules. It is based on the WS defuzzification. extreme caution, allows the learning time to be speeded up
considerably.
The analytical expression of the fitness function of an
applied and the offspring generated in the deme goes to replace individual is given by
the individual with the lowest fitness value.
Finally, a hill-climbing operator was introduced.
It starts whenever an individual is generated with a fitness
value higher than the best one obtained so far. (24)
In a first step, the individual selected is transformed in a In (24), the coefficients are the user defined numerical
neuro fuzzy system. That is, the equivalent neural network is coefficients mentioned above. Let us define the parameters that
built and all trainable weights are initiated. Then the system appear in (24). The parameter represents the learning error
is trained. In this phase, it is possible that some neurons can given by
be deleted, too. Finally, the trained system is retransformed in
a genetic individual and introduced in the genetic grid at the
place of the starting individual selected. (25)
For simplicity, in the case of WS defuzzification the neuro
fuzzy system has four layers. In Fig. 6, there is an example where represents the total number of patterns, the number
with only two inputs, one output, and two rules. of outputs for the function being approximated, and the
• The first is the input layer and it has at the maximum difference between the maximum and minimum values the
neurons. th output can have. Of course, and represent the th
• The second one is the fuzzification layer. In this layer, output for the calculated and desired th pattern. As is the
there are calculated the activation degree of the denominator of the fitness function expression, it is clear that
antecedents. It is possible to distinguish the activation the fitness value increases as the error decreases.
function where . The parameter indicates the ratio between the number
Of course, . The weights that can be learned of antecedents and consequents used in all the rules and the
are and . The total number of neurons is at the total number . If the coefficient is greater
maximum . than zero and the other parameters are equal, the fitness
• After, there is the inference layer. It needs to calculate of an individual representing coding of a “simpler” fuzzy
the degree of truth of the rules. It has at the maximum inference is greater. So, making the coefficient means
neurons and weights fixed to one. forcing the FuGeNeSys program to search for less complicated
• Last, there is the linear output layer in which the WS solutions. This option can be used whenever it is necessary
defuzzufication is calculated. The weights that can be to reduce (as far as possible) the complexity of the fuzzy
learned are . The neurons are . inference learned. It is also possible to use this feature to
The neural network is trained with a backpropagation pro- analyze the dependence of the fitness function on the features
cedure. This automatically adjusts all the learning speeds (one chosen. It may, in fact, happen that by forcing to
for each type of weight) and proceeds until the error decreases. sufficiently high values, some features are eliminated from
the rules resulting from the learning phase.
Care must be taken, however, not to let the ratio
C. The Fitness Function Used take excessively small values. When the ratio decreases, in
The fitness function was derived from a series of consider- fact, although increasingly simple rules are found, there is a
ations dictated by experience. First of all, a function that was collateral effect of an increase in the error .
382 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 6, NO. 3, AUGUST 1998

Fig. 7. Examples in which it doesn’t need to utilize the option f (f2) and
where it can help (f1).

The parameter is given by the percentage of inputs used


in a fuzzy inference as compared with the total number. The
considerations made for the parameter hold again here, but
this time they are restricted to the selection of features. The
difference between the two parameters is that while the former Fig. 8. The function chosen.
simplifies the fuzzy rules by forcing the program to eliminate
antecedents (and as an indirect consequence may lead to the
elimination of some features), the latter is expressly meant V. FuGeNeSYS EVALUATION
to eliminate features. Consequently, it is more efficient in
achieving this goal. A. Relationship Among the Variable Parameters of
The last parameter was introduced to try to give as FuGeNeSys and Its Learning Performance
reasonably equal as possible a “weight” to all the rules. The This section gives results which relate the various magni-
introduction of this parameter privileges fuzzy inferences in tudes modifiable in FuGeNeSys to its learning performance.
which each rule contributes significantly to the determination As we shall see, numerous learning phases were run. For
of a certain minimum number of patterns, as defined by most of them a learning problem that is quite well known in
the user. In other words, the use of makes each rule literature was chosen
contribute to the defuzzification process with a significantly
high degree of truth for the premise for a certain number (27)
of patterns. So making the coefficient greater than zero
forces FuGeNeSys to search for solutions, which present
In [45] and [44], the function (27) is approximated (see Fig. 8).
rules which all contribute significantly to the approximation
Both the learning and testing examples are taken to be
or classification of a certain minimum number of learning
uniformly distributed in the interval as defined for
patterns.
the dependent variable .
Use of this possibility does not necessarily lead to good
In this section, the author gives the results obtained relating
solutions. If, in fact, we consider a function that presents quite
the performance of FuGeNeSys to the size of the genetic grid
a high derivative in a certain limited area of the domain (see
and the number of iterations.
f2 in Fig. 7), it will probably be necessary to use one or
Fig. 9 shows the results obtained. The left-hand side of
more rules that make a nonnull contribution for a limited set
the histogram shows the percent learning error [see (33)],
of learning points as compared with the total number. In this
while the right-hand side shows the testing error (the learning
case, an option such as the one just described has to be used,
and testing patterns are always the same and contain 20 and
because the solutions found will very probably present a high
101 points respectively). The axis shows the number of
error in the proximity of the area mentioned. This option must
iterations, the axis the error obtained, and the remaining
clearly be used with great caution.
axis the size of the grid. All the simulations were performed
The parameter is defined by the following expression:
using the WS defuzzification method. The procedure was as
follows. Having established the size of the grid, a certain
if freq number of runs were performed until a pre-established con-
otherwise. (26) fidence interval was reached, discarding all the runs where
the learning error was higher than 5% and performing at least
50 runs. A few runs were discarded because the results are
The value freq is calculated as follows. counters one for to be attributed to the phenomenon of premature convergence
each rule, which we will indicate as with , normally encountered in genetic algorithms. The confidence
are set to zero. For each learning pattern we see which of the interval value chosen was one with which on average, 95%
rules contributes with the highest truth value for the premise of the solutions fell within the value given in Fig. 9—more
. Then, still for the pattern and each rule whose or less 0.1%.
premise has a truth value the value The error depends directly on the two parameters consid-
is increased by one unit. So . The ered. It decreases as the number of iterations and the size of
two values and are user definable. the grid increase. In addition, as the size of the population
RUSSO: FuGeNeSYS—FUZZY GENETIC NEURAL SYSTEM FOR FUZZY MODELING 383

Fig. 9. Percent learning and testing error in function of the iteration number and grid dimension (WS defuzzification method).

Fig. 10. Percent learning and testing error in function of the iteration number and grid dimension. (WM defuzzification method).

increases the performance of FuGeNeSys gets better, but to


obtain the best performance, the number of genetic iterations
must also be increased.
Using WM as the defuzzification method, we obtain the
histogram shown in Fig. 10.
The behavior is similar to that shown in Fig. 9, but perfor-
mance has clearly deteriorated. As the results were on average
worse, the few simulations discarded were those which led to
a learning error greater than 10%.
The obvious difference between the two results is to be
attributed to the fact that the second defuzzification method is
less suitable for this kind of function.
Where the WM method is suitable, on the other hand, is
in problems of classification rather than interpolation. Fig. 11
shows an example of a function to be approximated in which Fig. 11. The function used as problem of classification.
the learning performance is decidedly in favor of this defuzzi-
fication method. It should, however, be pointed out that the average amount
Figs. 12 and 13 give the results relating to the FuGeNeSys of time required for an iteration clearly suggests using the less
capability of generalization (testing) when the WS and WM computationally onerous method, i.e., WS.
methods are used. Another study performed is related to the importance of
In general, it can be said that the more “softly” a function the introduction of the hill-climbing genetic operator, which
varies, the more convenient it is to use the WS method. In involves backpropagation to the beginning of the generation
contrast, the more “sharply” it varies, the more convenient it of the grid every time a new, better individual is born.
is to use the WM method (for example, see f1 and f2 Let us recall that backpropagation is performed as long as
in Fig. 7). there are subsequent improvements. As soon as the opposite
384 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 6, NO. 3, AUGUST 1998

Fig. 12. Testing results using WS.

Fig. 14. Ten runs without backpropagation using WS as defuzzification


method.

quite randomly. More precisely, we have

(29)
Fig. 13. Testing results using WM.
Table I gives the learning examples in detail. Of course, a
learning program that can detect features correctly has to be
able to eliminate the three completely random inputs. It also
happens, the best solution obtained is kept. The weights of has to eliminate three more of the five remaining features so
each type of parameter to be learned are dynamically adapted as to end up with only two independent ones.
by means of an automatic procedure. Various runs were performed and each time the result
Fig. 14 shows the results of ten runs without the hill- obtained was as expected.
climbing operator, using a grid of 10 10, five rules, 5000 Table II gives the results of three different runs (1000
iterations, and WS as the defuzzification method. The function iterations) using WS as the defuzzification method. As can
to be approximated is still (27) The results obtained (using be seen, only two independent features are selected. Once
WS) are on average five to six times worse (see Fig. 9). again, the error is a monotonically decreasing function of the
number of rules. The values attributed to the fitness function
are the same, specifically
B. Learning with Features Detection and .
This section will illustrate how FuGeNeSys is able to Fig. 15 gives the results obtained with three rules. The
identify features correctly. universes of discourse of the three variables, , , and ,
As a significant example, we chose to construct some in the figure are [0,200], [60 300] and [0,17 000] respectively.
patterns to extract a fuzzy knowledge base which approximates For , the singleton corresponding to the crisp output value
the function is shown.

(28) C. Comparisons
Some comparisons were made between the performance
The 50 patterns generated to teach FuGeNeSys this pattern obtained with FuGeNeSys and that described in the literature.
were built as follows. Each pattern has eight inputs 1) Classification Problems: There are a lot of methods for
and one output . Two of the inputs correspond generating fuzzy rules, but only a few approaches deal with
to the independent variables and , both randomly and classification tasks.
uniformly generated in the interval [0,100]. Three inputs are Typically, the generation method consists of two phases.
linear combinations of and . The other three were generated The first is the fuzzy partition of the input pattern space
RUSSO: FuGeNeSYS—FUZZY GENETIC NEURAL SYSTEM FOR FUZZY MODELING 385

TABLE I
THE LEARNING SAMPLES

Fig. 15. The three rules learned.

because of the lack of training patterns. If the partition is too


coarse, the performance may be too low.
This approach needs a lot of rules as the number of inputs
increases as shown in Section II-G.
In [38], the authors use the concept of distributed fuzzy
rules [39] where all fuzzy rules corresponding to several fuzzy
partitions are simultaneously tested. They mix coarse and fine
fuzzy partitions and with a GA select the minimum number of
rules that permits the maximization of the number of correctly
classified patterns.
This method works well. In fact, for example, with the well-
known iris data [46] they obtain the correct classification of
all the 150 patterns with only 13 rules.
The result obtained outperformed all the results reported in
[47] for nine classification methods: fuzzy nearest neighbor,
fuzzy -means, hierarchical fuzzy -means, fuzzy integral with
perceptron, fuzzy pattern matching with minimum operator,
and others.
Iris data are a four input one output example. The number
of rules was 2274 including 1582 dummy rules. So they con-
structed their compact fuzzy classification system by selecting
only significant fuzzy rules from the remaining 692.
In this section the author shows the results obtained with
the aid of FuGeNeSys for the same problem.
To the three classes iris setosa, virginica, and versicolor
He respectively associated the integer numbers one, two, and
three. He successfully trained a only five rules system using
WM defuzzification that correctly classify all the 150 patterns.
TABLE II
FuGeNeSYS BEHAVIOR AS A FUNCTION OF THE NUMBER OF RULES In Fig. 16 the five rules are shown. The inputs are in the
order sepal length (SL) in cm (in the range [4.3,7.9]), sepal
width (SW) in cm (in the range [2.0,4.4]), petal length (PL)
in cm (in the range [1.0,6.9]), and petal width (PW) in cm
(in the range [0.1,2.5]). Naturally, the output is in the range
[0,2]. If the output is below 0.5, the class is iris setosa; if it
is comprised in [0.5,1.5], the class is iris virginic. Otherwise
it is the remaining class.
FuGeNeSys thus has an excellent classification capacity and
into fuzzy subspaces and the second phase deals with the
is capable of extracting extremely compact fuzzy inferences.
determination of the rule for each subspace. 2) Approximation Problems: In [44] and [45], function
The performance of this system depends on the choice of the (27) is used to test the learning capacity of the method
fuzzy partition. If it is too fine, many rules cannot be generated proposed. In both cases, 21 points were taken as learning
386 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 6, NO. 3, AUGUST 1998

Fig. 17. Ten runs arranged according to testing error. The ordinate value is
the index defined by Ralescu [45].

Fig. 16. The iris fuzzy classifier trained.

patterns and 101 uniformly distributed in the domain of


the independent variable as testing patterns. The following
formulas were taken as performance indexes for the learning
and generalization capacity, respectively:

(30)

(31)
Fig. 18. An example of the robot trajectory.

where is the output produced by the tool after the learning


phase corresponding to and is the desired value. soon as the number of inputs increases. However, the most
Fig. 17 gives the results of ten runs (using WS), none of
recent works are gaining in compactness.
which were discarded, ordered according to their testing error.
Homaifar and McCormick [48] needed 25 rules to control
20 000 iterations were performed with five rules. The number
the two inputs one input fuzzy cart-centering controller and 35
of iterations has to be higher than that used in the two methods
rules to control the two inputs one input fuzzy truck-backing
described due to the genetic nature of the algorithm and the
controller.
total absence of a precalculation phase prior to application of
An approach that produces more compact fuzzy databases
the algorithm.
The number of rules was determined as follows. In [44], is described in [49]. Lin and Lin needed ten fuzzy rules in
where the best performance was obtained in terms of both the action network and six in the critic network to control the
learning capacity and compactness of the fuzzy knowledge cart-pole system. Further, they needed 14 rules in the action
base, only four rules were used for a total of 16 parameters to network and nine in the critic network to control the four
be optimized. As FuGeNeSys only requires three parameters input—one output ball and beam system.
per rule the number of parameters to be optimized was set to Actually, it is impossible to compare directly FuGeNeSys
15. Therefore, five rules are chosen. with other GA approachs for fuzzy control.
As can be seen, in all cases performance is better than that In fact, FuGeNeSys is a tool that is able to extract fuzzy
reported in [3] , while two out of the ten runs knowledge only from samples of the function or of the
give better performance than the second method (in [44] we classification problem to be approximated.
have ). Also, in this case, FuGeNeSys has an In a future version of FuGeNeSys the author thinks he will
excellent learning capacity (function approximation) and is modify the fitness function, in a similar way as done in the
capable of extracting extremely compact fuzzy inferences. In related works, to obtain a system that works in control fields.
this case, the number of rules extracted possesses a number of However, he already used FuGeNeSys in a motion planning
parameters even lower than those reported in [44]. problem [2]. He considered the problem of navigation of a
3) Fuzzy Control: Typically, the fuzzy control applications mobile robot in unknown two-dimensional environments with
need a number of fuzzy rules, which becomes very large as obstacles.
RUSSO: FuGeNeSYS—FUZZY GENETIC NEURAL SYSTEM FOR FUZZY MODELING 387

The robot is memoryless, i.e., it does not build an in- dedicated fuzzy processor,” in Proc. 2nd Annu. Joint Conf. Inform. Sci.,
ternal map of the environment. Obstacles are convexes and Wrightsville Beach, NC, Sept. 1995, pp. 64–67.
[13] , “Silicon drift detector readout and ON-LINE data reduction
comparable with robot size. using a fast VLSI fuzzy processor,” Inform. Sci., vol. 95, pp. 233–260,
The five inputs of the fuzzy controller the author developed Dec. 1996.
[14] V. Catania, S. Cavalieri, and M. Russo, “Tuning Hopfield neural network
are the output of three proximity sensors, the rotation angle by a fuzzy approach,” in Proc. IEEE Int. Conf. Neural Networks,
required to get the right output direction and the previous fuzzy Washington, DC, June 1996, vol. II, pp. 1067–1072.
controller direction output. [15] S. Cavalieri and M. Russo, “Improving Hopfield neural network perfor-
mance by fuzzy logic—Based coefficient tuning,” Neurocomput., vol.
The fuzzy controller outputs are two: the direction and the 18, pp. 107–126, 1998.
speed. [16] M. Russo, N. A. Santagati, and E. Lo Pinto, “Medicinal chemistry and
The author manually generated several samples of the fuzzy logic,” Inform. Sci., vol. 105/1–4, pp. 299–314, May 1998.
[17] M. Russo, “Comments on ‘A new approach to fuzzy-neural system
desired controlling sequence in some typical cases. The system modeling’,” IEEE Trans. Fuzzy Syst., vol. 4, pp. 209–210, May 1996.
that he obtained is made of only five rules. [18] C. Darwin, On the Origin of Species by Means of Natural Selection.
In Fig. 18 the author shows an example of the result of the London, U.K.: Murray, 1859.
[19] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Ma-
robot trajectory he obtained. chine Learning. Reading, MA: Addison-Wesley, 1989.
[20] J. H. Holland, “Adaptive plans optimal for payoff-only environments,”
in Proc. 2nd Int. Conf. Syst. Sci., HI, Feb. 1969, pp. 917–920.
VI. CONCLUSIONS [21] , Adaption in Natural and Artificial Systems. Ann Arbor, MI:
Univ. Michigan Press, 1975.
The FuGeNeSys program presented in the paper generates [22] D. B. Fogel, “An introduction to simulated evolutionary optimization,”
simple, effective fuzzy models of complex systems from IEEE Trans. Neural Networks, vol. 5, pp. 3–14, Jan. 1995.
[23] E. Rietman, Genesis Redux—Experiments Creating Artificial Life, 1st
knowledge of the input–output data. The learning technique ed. New York: McGraw-Hill, 1994.
used is essentially based on GA’s. To enhance the learning [24] A. Tettamanzi, “Algoritmi evolutivi per l’ottimizzazione,” Ph.D. disser-
speed, a hill-climbing genetic operator based on neural tech- tation, Univ. Milan, Milan, Italy, 1995.
niques has been used. FuGeNeSys is also capable of correctly
[25] Z. Michalewics, Genetic Algorithms + Data Structures = Evolution
Programs. New York: Springer-Verlag, 1994.
selecting significant features. It has been demonstrated that the [26] S. Uckun, S. Bagchi, and K. Kawamura, “Managing genetic search in
results achieved represent a significant advance in the use of job shop scheduling,” IEEE Expert, vol. 8, pp. 15–24, Oct. 1993.
[27] J. R. Koza, “Evolution and co-evolution of computer programs to control
mixed techniques in soft computing. independently-acting agents,” in From Animals to Animats, J. A. Meyer
and S. W. Wilson, Eds. Cambridge: MA: MIT Press, 1991.
[28] J. R. Koza, Genetic Programming: On the Programming of Computers
ACKNOWLEDGMENT by Means of NAtural Selection. Cambridge, MA: MIT Press, 1993.
[29] T. Blinkle, “Theory of evolutionary algorithms and application to
The author would like to thank Prof. G. V. Russo and the system synethesis,” Ph.D. dissertation, Swiss Fed. Inst. Technol., Zurich,
reviewers for their helpful suggestions in improving the quality Switzerland, 1996.
of the final manuscript. [30] J. E. Baker, “An analysis of the effects of selection in genetic
algorithms,” Ph.D. dissertation, Graduate School Vanderbilt, Univ.
Nashville, Nashville, TN, 1989.
REFERENCES [31] D. Back and F. Hoffmeister, “Extended selection methods for genetic
algorithms,” in Proc. 1st Int. Conf. Genetic Algorithms, San Matteo,
[1] L. A. Zadeh, “Fuzzy logic, neural networks, and soft computing,” CA, 1991.
Commun. ACM, vol. 37, pp. 77–84, Mar. 1994. [32] G. Rudolph, “Convergence analysis of canonical genetic algorithms,”
[2] M. Russo, “Metodi hardware e software per logiche di tipo non IEEE Trans. Neural Networks, vol. 5, pp. 96–101, Jan. 1994.
tradizionale,” Ph.D. dissertation, Univ. Catania, Catania, Italy, 1996. [33] S. H. Park, Y. H. Kim, Y. K. Choi, H. C. Cho, and H. T. Jeon, “Self-
[3] N. A. Santagati, E. Lo Pinto, and M. Russo, “Fuzzy logic can help organization of fuzzy rule base using genetic algorithms,” in Proc. 5th
medicinal chemistry,” in Proc. Artificial Neural Networks Eng. Conf., IFSA World Congress, Seoul, Korea, July 1993, pp. 881–886.
St. Louis, MO, June 1995, pp. 297–302. [34] C. L. Karr and E. J. Gentry, “Fuzzy control of pH using genetic
[4] F. Beritelli, S. Casale, and M. Russo, “A voice/unvoiced speech discrim- algorithms,” IEEE Trans. Fuzzy Syst., vol. 1, pp. 46–53, Feb. 1993.
ination technique based on fuzzy logic,” in Proc. 4th Eur. Conf. Speech [35] M. A. Lee and H. Takagi, “Integrating design stages of fuzzy systems
Commun. Technol., Madrid, Spain, Sept. 1995, vol. 1, pp. 389–392. using genetic algorithms,” in Proc. IEEE Int. Conf. Fuzzy Syst., San
[5] , “Encoding of PARCOR coefficients using fuzzy rules,” in 8th Francisco, CA, Mar. 1993, pp. 612–617.
IEEE Mediterranean Electrotech. Conf., Bari, Italy, May 1996, pp. [36] , “Embedding a priori knowledge into an integrated fuzzy system
1659–1662. design method based on genetic algorithms,” in Proc. 5th IFSA World
[6] , “Robust phase reversal tone detection using soft computing,” Congress, Seoul, Korea, July 1993, pp. 1293–1296.
in IEEE Proc. 3rd Int. Symp. Uncertainity Modeling Anal. Annu. Conf. [37] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its
North Amer. Fuzzy Inform. Processing Soc., College Park, MD, Sept. applications to modeling and control,” IEEE Trans. Syst., Man, Cybern.,
1995, pp. 589–594. vol. 15, pp. 116–132, 1985.
[7] F. Beritelli, S. Casale, and M. Russo, “Multilevel speech classifica- [38] H. Ishibuchi, K. Nozaki, N. Yamamoto, and H. Tanaka, “Selecting fuzzy
tion based on fuzzy logic,” in Proc. IEEE Workshop Speech Coding if–then rules for classification problems using genetic algorithms,” IEEE
Telecommun., Annapolis, MD, Sept. 1995, pp. 97–98. Trans. Fuzzy Syst., vol. 3, pp. 260–270, Aug. 1995.
[8] , “Multimode speech coding decision based on fuzzy logic,” in [39] H. Ishibuchi, K. Nozaky, and H. Tanaka, “Distributed representation
Proc. Int. Conf. Signal Image Processing, Las Vegas, NV, Nov. 1995, of fuzzy rules and its qapplication to pattern classification,” Fuzzy Sets
pp. 72–75. Syst., vol. 52, pp. 21–32, 1992.
[9] V. Catania, M. Malgeri, and M. Russo, “Applying fuzzy logic to [40] L. A. Zadeh, “Fuzzy sets,” Inform. Contr., vol. 8, pp. 338–353, 1965.
codesign partitioning,” IEEE Micro, vol. 17, pp. 62–70, May 1997. [41] J. L. Castro, “Fuzzy logic controllers are universal approximators,” Tech.
[10] V. Catania and M. Russo, “Analog gates for a VLSI fuzzy processor,” Rep. DECSAI-93101,6/1993, Dept. Comput. Sci. Artificial Intell., Univ.
in 8th Int. Conf. VLSI Design, New Delhi, India, Jan. 1995, pp. 299–304. Grenada, Spain, 1993.
[11] A. Gabrielli, E. Gandolfi, M. Masetti, and M. Russo, “Design of a VLSI [42] R. Jager, “Fuzzy logic in control,” Ph.D. dissertation, Univ. Delft, Delft,
very high speed reconfigurable digital fuzzy processor,” in Proc. ACM The Netherlands, 1995.
Symp. Appl. Comput., Nashville, TN, Feb. 1995, pp. 477–481, invited [43] M. Figueiredo, F. Gomides, A. Rocha, and R. Yager, “Comparison of
talk. Yager’s level set method for fuzzy logic control with Mamdani and
[12] G. V. Russo, C. Petta, D. Lo Presti, N. Randazzo, and M. Russo, “Silicon Larsen methods,” IEEE Trans. Fuzzy Syst., vol. 2, pp. 156–159, May
drift detectors readout and ON-LINE data reduction using a fast VLSI 1993.
388 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 6, NO. 3, AUGUST 1998

[44] Y. Lin and G. A. Cunningham, III, “A new approach to fuzzy-neural Marco Russo received the “laurea” degree in elec-
system modeling,” IEEE Trans. Fuzzy Syst., vol. 3, pp. 190–198, May tronic engineering and the Ph.D. degree in elec-
1995. tronics, computer science and telecommunications
[45] H. Narazaki and A. L. Ralescu, “An improved synthesis method for engineering both from the University of Catania,
multilayered neural networks using qualitative knowledge,” IEEE Trans. Italy, in 1992 and 1996, respectively.
Fuzzy Syst., vol. 1, pp. 125–137, May 1993. He is currently an Assistant Researcher at the
[46] R. A. Fisher, “The use of multiple measurements in toxonomic prob- University of Catania. His research areas are in
lems,” Ann. Eugen., vol. 7, pp. 179–188, 1936. fuzzy logic, neural networks, genetic algorithms,
[47] M. Grabish and F. Dispot, “A comparison of some methods of fuzzy hybrid systems, dedicated hardware, signal process-
classification on real data,” in Proc. 2nd Int. Conf. Fuzzy Logic Neural ing, high-energy physics, and artificial intelligence
Networks, Iizuka, Japan, July 1992, pp. 659–662. applications. He is on the editorial board of the
[48] A. Homaifar and E. McCormick, “Simultaneous design of membership International Journal of Knowledge–Based Intelligent Engineering Systems.
functions and rule sets for fuzzy controllers using genetic algorithms,”
IEEE Trans. Fuzzy Syst., vol. 3, pp. 129–139, May 1995.
[49] C. J. Lin and C. T. Lin, “Reinforcement learning for an ART-based fuzzy
adaptive learning control network,” IEEE Trans. Neural Networks, vol.
7, pp. 709–731, May 1996.

You might also like