Introduction To Fuzzy Systems, Neural Networks, and Genetic Algorithms
Introduction To Fuzzy Systems, Neural Networks, and Genetic Algorithms
Introduction To Fuzzy Systems, Neural Networks, and Genetic Algorithms
Algorithms
Hideyuki TAKAGI
Kyushu Institute of Design
(a) (b)
3 What Are Neural Networks back and feed-forward NNs. Learning algorithms
3.1 Analogy from biological neural networks are categorized into supervised learning and unsu-
pervised learning. This section provides an overview
A biological neuron consists of dendrite, a cell body, of these models and algorithms.
and an axon (Figure 5(a)). The connections between
the dendrite and the axons of other neurons are
called synapses. Electric pulses coming from other
neurons are translated into chemical information at
each synapse. The cell body inputs these pieces of
information and res an electric pulse if the sum of
the inputs exceeds a certain threshold. The network
consisting of these neurons is a NN, the most essen-
tial part of our brain activity.
The main function of the biological neuron is to (a) (b)
output pulses according to the sum of multiple sig-
nals from other neurons with the characteristics of a Figure 6: (a) a feed-back neural network, and (b) a
pseudo-step function. The second function of the feed-forward neural network.
neuron is to change the transmission rate at the
synapses to optimize the whole network.
An articial neuron model simulates multiple in- The feed-back networks are NNs that have con-
puts and one output, the switching function of nections between network outputs and some or all
inputoutput relation, and the adaptive synaptic other neuron units (see Figure 6(a).) Certain unit
weights (Figure 5(b)). The rst neuron model pro- outputs in the gure are used as activated inputs
posed in 1943 used a step function for the switching to the network, and other unit outputs are used as
function [(McCulloch&Pitts, 1943)]. However, the network outputs.
perceptron [(Rosenblatt, 1958)] that is a NN consist- Due to the feed-back, there is no guarantee that
ing of this type of neuron has limited capability, be- the networks become stable. Some networks con-
cause of the constraints of binary on/o signals. To- verge to one stable point, other networks become
day, several continuous functions, such as sigmoidal limit-cycle, and others become chaotic or divergent.
or radial functions, are used as a neuron character- These characteristics are common to all non-linear
istic functions, which results higher performance of systems which have feed-back.
NNs. To guarantee stability, constraints on synaptic
Several learning algorithms that change the weights are introduced so that the dynamics of the
synaptic weights have been proposed. The combina- feed-back NN is expressed by the Lyapunov func-
tion of the articial NNs and the learning algorithms tion. Concretely, a constraint of equivalent mutual
have been applied to several engineering purposes. connection weights of two units is implemented. The
Hopeld network is one such NNs.
3.2 Several types of artificial neural networks It is important to understand two aspects of the
Many NN models and learning algorithms have been Hopeld network: (1) Synaptic weights are deter-
proposed. Typical network structures include feed- mined by analytically solving constraints not by per-
4
forming an iterative learning process. The weights
are xed during the Hopeld network runs. (2) Final
network outputs are obtained by running feed-back y = f ( wi xi )
networks for the solutions of an application task. 0.4 0.2 0.9
Another type of NN which is compared with the
feed-back type is a feed-forward type. The feed-
forward network is a lter which outputs the pro- w1 w2 wn
cessed input signal. Several algorithms determine
synaptic weights to make the outputs match the de- x1 x2 xn
sired result.
Supervised learning algorithms adjust synaptic (a) (b)
weights using inputoutput data to match the input
output characteristics of a network to desired char- Figure 7: (a) Hebbian learning algorithms strength
acteristics. The most frequently used algorithm, the weight, wi when input, xi activates a neuron, y.
backpropagation algorithm, is explained in detail in (b) Competitive learning algorithms strength only
the next section. weights connected to the unit whose output is the
Unsupervised learning algorithms use the mecha- biggest.
nism that changes synaptic weight values according
to the input values to the network, unlike supervised
learning which changes the weights according to su-
pervised data for the output of the network. Since
the output characteristics are determined by the 1.5
(a2) (c2)
(b2)
(a3) (b3)
(a4) (b4)
(a5)
Figure 11: Analysis of NN inside. Triangular points are training data, and horizontal and virtual axes are
input and output axes. (a1) (a5) are the inputoutput characteristics of a NN with the training of 10,
100, 200, 400, and 500 iterations, respectively. (b1) (b4), the characteristics of four trained sigmoidal
functions in a hidden layer, are f(22.3x 16.1), f(1.49x 0.9), f(20.7x + 10.3), and f(21.5x + 4.9),
respectively; where w1 x + w0 is trained weights, wi , and input variable, x. (c1) is the same as (a5): the
inputoutput characteristics of the trained NN. (c2) is the overlapped display of (b1) (b4). Comparison of
(c1) and (c2) shows the nal NN inputoutput characteristics are formed by combining sigmoidal functions
inside with weighting the function.
tions. The features of the evolutionary computa- by adding Gaussian noise, and ES controls the pa-
tion are that its search or optimization is conducted rameters of a Gaussian distribution allowing it to
(1) based on multiple searching points or solution converge to a global optimum.
candidates (population based search), (2) using op- EPs are similar to GAs. The primary dierence
erations inspired by biological evolution, such as is that mutation is the only EP operator. EPs use
crossover and mutation, (3) based on probabilistic real number coding, and the mutation sometimes
search and probabilistic operations, and (4) using changes the structure (length) of EP code. It is
little information of searching space, such as dier- said that the similarities and dierences come from
ential information mentioned in section 3.3. Typical their background; GAs started from the simulation
paradigms which consist of the evolutionary com- of genetic evolution, while EPs started from that of
putation include GA (genetic algorithm), ES (evo- environmental evolution.
lution strategies), EP (evolutionary programming),
GPs use tree structure coding to represent a com-
and GP (genetic programming).
puter program or create new structures of tasks.
GAs usually represent solutions for chromosomes The crossover operation is not for a numerical value
with bit coding (genotype) and searches for the bet- but for a branch of the tree structure. Consider
ter solution candidates in the genotype space using the applications relationship with NNs. GAs and
GA operations of selection, crossover, and mutation. ESs determine the best synaptic weights, which is
The crossover operation is the dominant operator. NN learning. GP, however, determines the best NN
ESs represent solutions as expressed by the chro- structure, which is a NN conguration.
mosomes with real number coding (phenotype) and It is beyond the scope of this chapter to describe
searches for the better solution in the phenotype these paradigms. We will focus only on GAs in the
space using the ES operations of crossover and mu- following sections and see how the GA searches for
tation. The mutation of a real number is realized solutions.
7
Table 1: Technical terms used in GA literatures
4.2 GA as a searching method expression. The solution expression of the bit code is
It is important to be acquainted with the technical decoded to values used in an application task. This
terms of GAs. Table 1 lists some of the terms fre- is phenotype expression. The multiple possible so-
quently used. lutions are applied to the application task and eval-
uated by each. These evaluation values are tness
There are advantages and one distinct disadvan-
values. GA feed-backs the tness values and selects
tage to using GAs as a search method.
current possible solutions according to their tness
The advantages are: (1) fast convergence to near
values. They are parent solutions that determine
global optimum, (2) superior global searching ca-
the next searching points. This idea is based on the
pability in the space which has complex searching
expectation that better parents can probabilistically
surface, and (3) applicability to the searching space
generate better ospring. The ospring in the next
where we cannot use gradient information of the
generation are generated by applying the GA opera-
space.
tions, crossover and mutation, to the selected parent
The rst and second advantages originate in
solution. This process is iterated until the GA search
population-based searching. Figure 12 shows this
converges to the required searching level.
situation. The gradient method determines the next
The GA operations are explained in the following
searching point using the gradient information at the
sections.
current searching point. On the other hand, the GA
determines the multiple next searching points using 4.4 GA operation: selection
the evaluation values of multiple current searching
points. When only the gradient information is used, Selection is an operation to choose parent solutions.
the next searching point is strongly inuenced by the New solution vectors in the next generation are cal-
local geometric information of the current searching culated from them.
points. Sometimes it results in the searching be- Since it is expected that better parents generate
ing trapped at a local minima (see Figure 12.) On better ospring, parent solution vectors which have
the contrary, the GA determines the next searching higher tness values have a higher probability to be
points using the tness values of the current search- selected.
ing points which are widely spread throughout the There are several selection methods. Roulette
searching space, and it has the mutation operator wheel selection is a typical selection method. The
to escape from a local minima. This is why these probability to be a winner is proportional to the area
advantages are realized. rate of a chosen number on a roulette wheel. From
The key disadvantage of the GAs is that its con- this analogy, the roulette wheel selection gives the
vergence speed near the global optimum becomes selection probability to individuals in proportion to
slow. The GA search is not based on gradient infor- their tness values.
mation but GA operations. There are several pro- The scale of tness values is not always suitable
posals to combine the two searching methods. for the scale of selection probability. Suppose there
are a few individuals whose tness values are very
4.3 GA operations high, and others whose are very low. Then, the few
Figs. 13 and 14 show the ows of GA process and parents are almost always selected and the variety of
data, respectively. Six possible solutions are ex- their ospring becomes small, which results in con-
pressed in bit code in Figure 14. This is the genotype vergence to a local minimum. Rank-based selection
8
GA
gradient
method
uses an order scale of tness values instead of a dis- have dierent bit at a certain bit position, then a
tance scale. For example, it gives the selection prov- complement bit of the poor parent is copied to the
ability of (100, 95, 90, 85, 80, ...) for the tness values ospring. This is analogous to learning something
of (98, 84, 83, 41, 36, ...). from bad behavior. The left end bit of the example
Since the scales of tness and selection are dier- in Figure 15 is the former case, and the second left
ent, scaling is needed to calculate selection proba- bit is the latter case.
bilities from tness values. The rank-based selection
can be called linear scaling. There are several scaling 4.6 GA operation: mutation
methods, and the scaling inuences GA conversion. When parent chromosomes have similar bit patterns,
Elitist strategy is an approach that copies the best the distance between the parents and ospring cre-
n parents into the next generation as they are. The ated by crossover is close in a genotype space. This
tness value of ospring does not always become bet- means that the crossover cannot escape from the lo-
ter than those of its parents. The elitist strategy cal minimum if individuals are concentrated near the
prevents the best tness value in the ospring gen- local minimum. If the parents in Figure 16 are only
eration from becoming worse than that in the previ- the black and white circles, ospring obtained by
ous generation by copying the best parent(s) to the combining bit strings of any of these parents will be
ospring. located nearby.
Mutation is an operation to avoid this trapping at
4.5 GA operation: crossover a local area by exchanging bits of chromosome. Sup-
Crossover is an operation to combine multiple par- pose the white individual jumps to the gray point in
ents and make ospring. The crossover is the the gure. Then, the searching area of GA spreads
most essential operation in GA. There are several widely. The mutated point has the second highest
ways to combine parent chromosomes. The simplest tness value in the gure. If the point that has
crossover is called one-point crossover. The parent the highest tness value and the mutated gray point
chromosomes are cut at one point, and the cut parts are selected and mate, then the ospring can be ex-
are exchanged. Crossover that uses two cut points is pected to be close to the global optimum.
called two-point crossover. Their natural expansion If the mutation rate is too high, the GA searching
is called multi-point crossover or uniform crossover. becomes a random search, and it becomes dicult
Figure 15 shows these standard types of crossover. to quickly converge to the global optimum.
There are several variations of crossover. One
4.7 Example
unique crossover is called the simplex crossover
[(Bersini&Scront, 1992)]. The simplex crossover The knapsack problem provides a concrete image of
uses two better parents and one poor parent and GA applications and demonstrates how to use GAs.
makes one ospring (the bottom of Figure 15.) Figure 17 illustrates the knapsack problem and its
When both better parents have the same 0 or 1 GA coding. The knapsack problem is a task to
at a certain bit position, the ospring copies the bit nd the optimum combination of goods whose total
into the same bit position. When better parents amount is the closest to the amount of the knapsack,
9
initializing individuals
parent 1 1 0 0 1 0 1 1 0 0 0 1 1 offspring 1
one point crossover
parent 2 0 1 0 0 1 1 0 1 0 1 0 1 offspring 2
evaluation
parent 1 1 0 0 1 0 1 1 0 0 0 1 1 offspring 1
two points crossover
selection of parents parent 2 0 1 0 0 1 1 0 1 0 1 0 1 offspring 2
crossover
parent 1 1 0 0 1 0 1 0 0 0 0 0 1 offspring 1
uniform crossover
parent 2 0 1 0 0 1 1 1 1 0 1 1 1 offspring 2
mutation (multi-points crossover)
template
better parent 1 1 0 0 1 0 1
Figure 13: The ow
better parent 2 0 1 0 0 1 1 simplex crossover 1 0 0 1 1 1 offspring
of GA process.
poor parent 0 1 1 0 0 1
GA Operations
(selection, crossover, mutation)
chromosome (genotype)
possible solutions for task
1 1 1 1 0 1
GA Engine
...
VP-211 2
...
IF scanned shape is THEN control N
NF-215 1
NG-166 2
...
HL-321 3
...
control 20 rolls
...
to make plate flat
plate surface
of plate surface
scanned depth
FAX FAX
Figure 18: Rolling mill control by fuzzy and neural
systems. Scanned pattern of plate surface is recog-
nized by an NN. The output value of the each output
unit of the NN is used as a rule strength for the cor- Figure 19: FAX OCR: hand-written character recog-
responding fuzzy rule. nition system for a FAX ordering system.
(Takagi&Hayashi, 1991)]. The purpose of this model some Korean consumer products since 1994. Some
is to design the entire shape of non-linear multi- of them are: cool air ow control of refrigerators;
dimensional membership functions with an NN. The slow motor control of washing machines to wash
outputs of the NN are the rule strengths of each rule wool and/or lingerie [(Kim et al., 1995)]; neuro-
which are a combination of membership values in fuzzy models for dish washers, rice cookers, and mi-
antecedents. crowave ovens to estimate the number of dishes, to
The Hitachi rolling mill uses the model to con- estimate the amount of rice, and to increase con-
trol 20 rolls that atten iron, stainless steel, trol performance; and FSs for refrigerators, washing
and aluminum plates to a uniform thickness machines, and vacuum cleaners [(Shin et al., 1995)].
[(Nakajima et al., 1993)]. An NN inputs the These neuro-fuzzy systems and FSs are tuned by
scanned surface shape of plate reel and outputs the GAs.
similarity between the input shape pattern and stan-
dard template patterns (see Figure 18). Since there 5.2 NN configuration based on fuzzy rule base
is a fuzzy control rule for each standard surface pat- NARA is a structured NN construct based on the IF-
tern, the outputs of the NN indicate how each con- THEN fuzzy rule structure [(Takagi et al., 1992)].
trol rule is activated. Dealing with the NN outputs An a priori knowledge of tasks is described by fuzzy
as rule strengths of all fuzzy rules, each control value rules, and small sub-NNs are combined according to
is weighted, and the nal control values of the 20 the rule structure. Since a priori knowledge of the
rolls are obtained to make plate at at the scanned task is embedded into the NARA, the complexity of
line. the task can be reduced.
Another approach parameterizes an FS and NARA has been used for a FAX ordering system.
tunes the parameters to optimize the FS using an When electric shop retailers order products from a
NN [(Ichihashi&Watanabe, 1990)] or using a GA Matsushita Electric dealer, they write an order form
[(Karr et al., 1989)]. For example, the shape of a by hand and send it by FAX. The FAX machine at
triangular membership function is dened by three the dealer site passes the FAX image to the NARA.
parameters: left, center, and right. The consequent The NARA recognizes the hand-written characters
part is also parameterized. and sends character codes to the dealers delivery
The approach using an NN was applied to develop center (Figure 19).
several commercial products. The rst neuro-fuzzy
consumer products were introduced to the Japanese 5.3 Combination of NN and FS
market in 1991. This is the type of Figure 20, and There are many consumer products which use both
some of the applications are listed in section 5.3. NN and FS in a combination of ways. Although
The approach using a GA has been applied to there are many possible combinations of the two sys-
11
as developing tool
(MATSUSHITA ELECTRIC)
FS FS FS
FS NN
NN NN NN
Figure 20: Combination types of an FS and an NN: (a) independent use, (b) developing tool type, (c)
correcting output mechanism, and (d) cascade combination.