Definition of Soft Computing
Definition of Soft Computing
Definition of Soft Computing
Basically, soft computing is not a homogeneous body of concepts and techniques. Rather, it is a
partnership of distinct methods that in one way or another conform to its guiding principle. At this
juncture, the dominant aim of soft computing is to exploit the tolerance for imprecision and
uncertainty to achieve tractability, robustness and low solutions cost. The principal constituents of soft
computing are fuzzy logic, neurocomputing, and probabilistic reasoning, with the latter subsuming
genetic algorithms, belief networks, chaotic systems, and parts of learning theory. In the partnership
of fuzzy logic, neurocomputing, and probabilistic reasoning, fuzzy logic is mainly concerned with
imprecision and approximate reasoning; neurocomputing with learning and curve-fitting; and
probabilistic reasoning with uncertainty and belief propagation.
Soft computing could therefore be seen as a series of techniques and methods so that real practical
situations could be dealt with in the same way as humans deal with them, i.e. on the basis of
intelligence, common sense, consideration of analogies, approaches, etc. In this sense, soft computing
is a family of problem-resolution methods headed by approximate reasoning and functional and
optimisation approximation methods, including search methods. Soft computing is therefore the
theoretical basis for the area of intelligent systems and it is evident that the difference between the
area of artificial intelligence and that of intelligent systems is that the first is based on hard computing
and the second on soft computing. Soft Computing is still growing and developing.
From this other viewpoint on a second level, soft computing can be then expanded into other
components which contribute to a definition by extension, such as the one first given. From the
beginning (Bonissone 2002), the components considered to be the most important in this second level
are probabilistic reasoning, fuzzy logic and fuzzy sets, neural networks, and genetic algorithms, which
because of their interdisciplinary, applications and results immediately stood out over other
methodologies such as the previously mentioned chaos theory, evidence theory, etc. The popularity of
genetic algorithms, together with their proven efficiency in a wide variety of areas and applications,
their attempt to imitate natural creatures (e.g. plants, animals, humans) which are clearly soft (i.e.
flexible, adaptable, creative, intelligent, etc.), and especially the extensions and different versions,
transform this fourth second-level ingredient into the well-known evolutionary algorithms which
consequently comprise the fourth fundamental component of soft computing, as shown in the
following diagram, see Figure 2.
There are two main characteristics of fuzzy systems that give them better performance
for specific applications.
Fuzzy systems are suitable for uncertain or approximate reasoning, especially for the system
with a mathematical model that is difficult to derive.
Fuzzy logic allows decision making with estimated values under incomplete or uncertain
information.
parallel information processing system. This contrasts with conventional computers in which
a single processor executes a single series of instructions.
As a discipline of Artificial Intelligence, Neural Networks attempt to bring computers a little
closer to the brains capabilities by imitating certain aspects of information processing in the
brain, in a highly simplified way.
The brain is not homogeneous. At the largest anatomical scale, we distinguish cortex,
midbrain, brainstem, and cerebellum. Each of these can be hierarchically subdivided into
many regions, and areas within each region, either according to the anatomical structure of
the neural networks within it, or according to the function performed by them. The overall
pattern of projections (bundles of neural connections) between areas is extremely complex,
and only partially known. The best mapped (and largest) system in the human brain is the
visual system, where the first 10 or 11 processing stages have been identified.
We distinguish feed forward projections that go from earlier processing stages (near the
sensory input) to later ones (near the motor output), from feedback connections that go in the
opposite direction. In addition to these long-range connections, neurons also link up with
many thousands of their neighbours. In this way they form very dense, complex local
networks.
The basic computational unit in the nervous system is the nerve cell, or neuron. A biological
neuron has, see Figure 61:
Dendrites (inputs) a neuron
Cell body
Axon (output)
Brains learn. From what we know of neuronal structures, one way brains learn is by altering
the strengths of connections between neurons, and by adding or deleting connections between
neurons. Furthermore, they learn on-line, based on experience, and typically without the
benefit of a benevolent teacher. The efficacy of a synapse can change as a result of
experience, providing both memory and learning through long-term potentiation. One way
this happens is through release of more neurotransmitters. Many other changes may also be
involved, see Figure 62.
Hebb rule
Both learning paradigms discussed above result in an adjustment of the weights of the connections
between units, according to some modification rule. Virtually all learning rules for models of this type
can be considered as a variant of the Hebbian learning rule suggested by Hebb in the classic book
Organization of Behaviour (Hebb 1949). The Hebb rule determines the change in the weight
connection from ui to uj by wij = *yi *yj, where is the learning rate and yi, yj represent the
activations of ui and uj respectively. Thus, if both ui and uj are activated the weight of the connection
from ui to uj should be increased.
Examples can be given of input/output associations which can be learned by a two-layer Hebb rule
pattern associator. In fact, it can be proved that if the set of input patterns used in training are mutually
orthogonal, the association can be learned by a two-layer pattern associator using Hebbian learning.
However, if the set of input patterns are not mutually orthogonal, interference may occur and the
network may not be able to learn associations. This limitation of Hebbian learning can be overcome
by using the delta rule.
Delta rule
The delta rule (Russell 2005), also called the Least Mean Square (LMS) method, is one of the most
commonly used learning rules. For a given input vector, the output vector is compared to the correct
answer. If the difference is zero, no learning takes place; otherwise, the weights are adjusted to reduce
this difference. The change in weight from ui to uj is given by: wij = * yi * ej, where is the
learning rate, yi represents the activation of ui and ej is the difference between the expected output
and the actual output of uj. If the set of input patterns form a linearly independent set then arbitrary
associations can be learned using the delta rule.
This learning rule not only moves the weight vector nearer to the ideal weight vector, it does so in the
most efficient way. The delta rule implements a gradient descent by moving the weight vector from
the point on the surface of the paraboloid down toward the lowest point, the vertex.
In the case of linear activation functions where the network has no hidden units, the delta rule will
always find the best set of weight vectors. On the other hand, that is not the case for hidden units. The
error surface is not a paraboloid and so does not have a unique minimum point. There is no such
powerful rule as the delta rule for networks with hidden units. There have been a number of theories
in response to this problem. These include the generalized delta rule and the unsupervised competitive
learning model.
Generalizing the ideas of the delta rule, consider a hierarchical network with an input layer, an output
layer and a number of hidden layers. We consider only the case where there is one hidden layer. The
network is presented with input signals which produce output signals that act as input to the middle
layer. Output signals from the middle layer in turn act as input to the output layer to produce the final
output vector. This vector is compared to the desired output vector. Since both the output and the
desired output vectors are known, we can calculate differences between both outputs and get an error
of neural network. The error is backpropagated from the output layer through the middle layer to the
unit which are responsible for generating that output. The delta rule can be used to adjust all the
weights. More details are presented in (Fausett 1994).
surviving result from 100 failures and 100 surviving S and L institutions between January, 1986 to
December, 1987. The result showed the threelayer ANN gained more predictive power over
logit model. The latter is equivalent to a two-layer (no middle-layer) network.
5.4. Prediction of stock price index
With limited knowledge about the stock market and with only data available from a public
library, Ward Systems Group, Inc. [261 created an example showing how one might set up an ANN
application to predict stock market behavior. The first step was to decide what to predict or classify
(i.e., the target outputs). Obviously there are many possible outputs that could be predicted, such as
turning points, market direction, etc. For this application, the next months average Standard
and Poors stock price index was selected. The next step was to consider which input
facts or parameters are necessary or useful for predicting the target outputs. In this case, the
stock price index for the current month was chosen because it should be an important factor in
predicting next months index. In addition, nine other publicly available economic indicators were
selected: unadjusted retail sales, average three month Treasury bill rate, total U.S. Government
securities, industrial production index, New York gold price, outstanding commercial paper and
acceptances, Swiss Franc value, U.S. Government receipts, and U.S. Government expenditures (see
Figure 7).
Next, the case characteristics for the problem were entered into the system. These included the
defining characteristics (the names of the input parameters) and the classifying characteristics
(the names of the output results). Finally, examples of previous results were entered in order to
train the network. These case histories contain information for all the months in the years of
1974 to 1979. The goal is to see if the system could predict the monthly stock price indexes in
1980.
After several hours of training, the network was able to predict the next months stock price
index for all of 1980. The result has shown that
such neural system can produce the first 8 month predictions with less than 3.2% average absolute
error and the entire 12 month predictions with only 4% average error. Therefore, through a
carefully designed ANN, it is possible to predict the volatile stock market.
6. Limitations of artificial neural networks Artificial neural network is undoubtedly a
powerful tool for decision making. But there are several weaknesses in its use.
(1) ANN is not a general-purpose problem solver. It is good at complex numerical computation
for the purposes of solving system of linear or non-linear equations, organizing data into
equivalent classes, and adapting the solution model to environmental changes. However, it is
not good at such mundane tasks as calculating payroll, balancing checks, and generating invoices.
Neither is it good at logical inference a job suited for expert systems. Therefore, users
must know when a problem could be solved with an ANN.
(2) There is no structured methodology available for choosing, developing, training, and verifying
an ANN [23]. The solution quality of an ANN is known to be affected by the number of
layers, the number of neurons at each layer, the transfer function of each neuron, and the size of
the training set. One would think that the more data in the training set, the better the accuracy of
the output. But, this is not so. While too small a training set will prohibit the network from
developing generalized patterns of the inputs, too large a one will break down the generalized
patterns and make the network sensitive to input noise. In any case, the selection of these
parameters is more of an art than a science. Users of ANNs must conduct experiments (or sensitivity
analyses) to identify the best possible configuration of the network. This calls for easy-to-use and
easy-tomodify ANN development tools that are gradually appearing on the market.
9
(3) There is no single standardized paradigm for ANN development. Because of its interdisciplinary
nature, there have been duplicating efforts spent on ANN research. For example, the
backpropagation learning algorithm was independently developed by three groups of researchers
in different times: Werbos [29], Parker 1191, and Rumelhart, Hinton, and Williams [21].
To resolve this problem, the ANN community should establish a repository of available paradigms
to facilitate knowledge transfer between researchers.
Moreover, to make an ANN work, it must be tailored specifically to the problem it is intended
to solve. To do so, users of ANN must select a particular paradigm as the starting prototype.
However, there are many possible paradigms. Without a proper training, users may easily get
lost in this. Fortunately, most of the ANN development tools commercially available today provide
scores of sample paradigms that work on various classes of problems. A user may follow
the advice and tailor it to his or her own needs.
Genetic algorithms
A genetic algorithm is a type of a searching algorithm. It searches a solution space for an optimal
solution to a problem. The key characteristic of the genetic algorithm is how the searching is done.
The algorithm creates a population of possible solutions to the problem and lets them evolve over
multiple generations to find better and better solutions. The generic form of the genetic algorithm is
shown in Figure 51. The items in bold in the algorithm are defined here.
The population consists of the collection of candidate solutions that we are considering during the
course of the algorithm. Over the generations of the algorithm, new members are born into the
population, while others die out of the population. A single solution in the population is referred to
as an individual. The fitness of an individual is a measure of how good is the solution represented
by the individual. The better solution has a higher fitness value obviously, this is dependent on the
problem to be solved. The selection process is analogous to the survival of the fittest in the natural
world. Individuals are selected for breeding (or cross-over) based upon their fitness values. The
crossover occurs by mingling two solutions together to produce two new individuals. During each
generation, there is a small chance for each individual to mutate.
To use a genetic algorithm, there are several questions that need to be answered:
How is an individual represented?
How is an individuals fitness calculated?
How are individuals selected for breeding?
How are individuals crossed-over?
How are individuals mutated?
What is the size of the population?
What are the termination conditions?
10
Most of these questions have problem specific answers. The last two, however, can be discussed in a
more general way. The size of the population is highly variable. The population should be as large as
possible. The limiting factor is, of course, the running time of the algorithm. The larger population
means more time consuming calculation.
The algorithm in Figure 51 has a very vague end point the meaning of until the termination
conditions are met is not immediately obvious. The reason for this is that there is no one way to end
the algorithm. The simplest approach is to run the search for a set number of generations the longer.
Another approach is to end the algorithm after a certain number of generations pass with no
improvement of the fitness of the best individual in the population. There are other possibilities as
well. Since most of the other questions are dependent upon the search problem, we will look at two
example problems that can be solved using genetic algorithms: finding a mathematical functions
maximum and the travelling salesman problem.
3.2.1 Function maximization
Example (Thede 2004) One application for a genetic algorithm is to find values for a collection of
variables that will maximize a particular function of those variables. While this type of problem could
be solved otherwise, it is useful as an example of the operation of genetic algorithms. For this
example, lets assume that we are trying to determine such variables that produce the maximum value
for this function:
f( w, x, y, z) = w3 + x2 y2 z2 + 2yz 3wx + wz xy + 2
This could probably be solved using multivariable calculus, but it is a good simple example of the use
of genetic algorithms. To use the genetic algorithm, we need to answer the questions listed in the
previous section.
How is an individual represented?
What information is needed to have a solution of the maximization problem? It is clear that we need
only values: w, x, y, and z. Assuming that we have values for these four variables, we have a candidate
solution for our problem.
The question is how to represent these four values. A simple way to do this is to use an array of four
values (integers or floating point numbers). However, for genetic algorithms it is usually better to
have a larger individual this way, variations can be done in a more subtle way. The research shows
(Holland 1975) that representation of individuals using bit strings offers the best performance. We can
simply choose a size in bits for each variable, and then concatenate the four values together into a
single bit string. For example, we will choose to represent each variable as a four-bit integer, making
our entire individual a 16-bit string. Thus, an individual such as
1101 0110 0111 1100
represents a solution where w = 13, x = 6, y = 7, and z = 12.
How is an individuals fitness calculated?
Next, we consider how to determine the fitness of each individual. There is generally a differentiation
between the fitness and evaluation functions. The evaluation function is a function that returns an
absolute measure of the individual. The fitness function is a function that measures the value of the
individual relative to the rest of the population.
In our example, an obvious evaluation function would be to simply calculate the value of f for the
given variables. For example, assume we have a population of 4 individuals:
1010 1110 1000 0011
0110 1001 1111 0110
0111 0110 1110 1011
0001 0110 1000 0000
The first individual represents w = 10, x = 14, y = 8, and z = 3, for an f value of 671. The values for
the entire population can be seen in the following table:
11
The fitness function can be chosen from many options. For example, the individuals could be listed in
order from the lowest to the highest evaluation function values, and an ordinal ranking applied. OR
The fitness function could be the individuals evaluation value divided by the average evaluation
value.
Looking at both of these approaches would give us something like this:
The key is that the fitness of an individual should represent the value of the individual relative to the
rest of the population, so that the best individual has the highest fitness.
How are individuals selected for breeding?
The key to the selection process is that it should be probabilistically weighted so that higher fitness
individuals have a higher probability of being selected. Other than these specifications, the method of
selection is open to interpretation.
One possibility is to use the ordinal method for the fitness function, then calculate a probability of
selection that is equal to the individuals fitness value divided by the total fitness of all the
individuals. In the example above, that would give the first individual a 40% chance of being selected,
the second a 20% chance, the third a 30% chance, and the fourth a 10% chance. It gives better
individuals more chances to be selected.
A similar approach could be used with the average fitness calculation. This would give the first
individual a 72% chance, the second a 5% chance, the third a 22% chance, and the fourth a 1%
chance. This method makes the probability more dependent on the relative evaluation functions of
each individual.
How are individuals crossed-over?
Once we have selected a pair of individuals, they are bred or in genetic algorithm language, they
are crossed-over. Typically two children are created from each set of parents. One method for
performing the cross-over is described here, but there are other approaches. Two locations are
randomly chosen within the individual. These define corresponding substrings in each individual. The
substrings are swapped between two parent individuals, creating two new children. For example, lets
look at our four individuals again:
1010 1110 1000 0011
0110 1001 1111 0110
0111 0110 1110 1011
0001 0110 1000 0000
Lets assume that the first and third individuals are chosen for cross-over. Keep in mind that the
selection process is random. The fourth and fourteenth bits are randomly selected to define the
substring to be swapped, so the cross-over looks like this:
12
Thus, two new individuals are created. We should create new individuals until we replace the entire
population in our example, we need one more cross-over operators. Assume that the first and fourth
individuals are selected this time. Note that an individual may be selected multiple times for breeding,
while other individuals might never be selected. Further assume that the eleventh and sixteenth bits
are randomly selected for the cross-over point. We could apply the second cross-over like this:
Another method for selecting the solution to be copied is tournament selection. Typically the genetic
13
program chooses two solutions random. The solution with the higher fitness will win. This method
simulates biological mating patterns in which, two members of the same sex compete to mate with a
third one of a different sex. Finally, the third method is done by rank. In rank selection, selection is
based on the rank, (not the numerical value) of the fitness values of the solutions of the population
(Koza 1992).
The creation of offsprings from the crossover operation is accomplished by deleting the crossover
fragment of the first parent and then inserting the crossover fragment of the second parent. The second
offspring is produced in a symmetric manner. For example consider the two S-expressions in Figure
54, written in a modified scheme programming language and represented in a tree.
An important improvement that genetic programming displays over genetic algorithms is its ability to
create two new solutions from the same solution. In Figure 55 the same parent is used twice to create
two new children. This figure illustrates one of the main advantages of genetic programming over
genetic algorithms. In genetic programming identical parents can yield different offspring, while in
genetic algorithms identical parents would yield identical offspring. The bold selections indicate the
subtrees to be swapped.
14
Mutation Operator
Mutation is another important feature of genetic programming. Two types of mutations are possible.
In the first kind a function can only replace a function or a terminal can only replace a terminal. In the
second kind an entire subtree can replace another subtree. Figure 56 explains the concept of mutation.
Genetic programming uses two different types of mutations. The top parse tree is the original agent.
The bottom left parse tree illustrates a mutation of a single terminal (2) for another single terminal (a).
It also illustrates a mutation of a single function (-) for another single function (+). The parse tree on
the bottom right illustrates a replacement of a subtree by another subtree.
15