Ebnn & BGP &arpi

Evolving Optimal Bayesian Neural Network Using Breeder Genetic
Programming to Acoustic Radar Pattern Identification
Samaher Hussein Ali

Department of Computer Science, University of Babylon, Iraq
[email protected]
Abstract: The Evolutionary Algorithms have been used for neural networks in two main ways: (i) to optimize the
network architecture in terms of hidden layers and number of neurons in each layers, and (ii) to train the weights of
fixed architecture. While most previous work focuses on only one of these two options, this paper investigative
Evolutionary approach called Breeder Genetic Programming (BGP) in which the architecture and the weights are
optimized simultaneously. The genotype of each network is represented as a tree whose depth and width are
dynamically adapted to the particular application by specifically defined genetic operators. The weights are trained by
Gaussian approximation. The fitness function has been chosen as a function increasing when the mean square error at
the end of the training and the number of epochs needed for learning are increase. This paper considers with fined
optimal Bayesian neural network using Breeder genetic programming to classify or identify acoustic radar Patterns. The
results demonstrate that the method is capable of successfully identifying the different acoustic radar patterns.
Key words: Bayesian Neural Network, Breeder Genetic Programming, Gaussian Approximation, SODAR Pattern Identification.
1. Introduction working with the system and its recorded observations

There are two basic techniques to explore the for a reasonable period of time. This restricts the
atmospheric environment, (1) direct measurement and utility of the data to a limited number of persons
(2) remote sensing. The most commonly used artificial having expertise in the field. Utilization of the full
direct techniques are radiosonde, instrumented tower, potential of SODAR observations on a global basis
microwave refractometer, and airborne systems. calls for the existence of computer-based techniques,
developed preferably by incorporating human
Remote sensing techniques are comparatively
expertise.
recent and can be operated around-the-clock without
human intervention. This can again be divided into Construction multilayer neural networks involves
two categories: passive and active. Radiometer, difficult optimization problems, i.e. finding a network
satellite imagery, etc. are examples of passive remote architecture appropriate for the application at hand
sensing techniques. On the other hand, active and finding an optimal set of weight values for the
measurement involves the transmission and reception network to solve the problem. Evaluation algorithms
of electromagnetic, light, and acoustic energy. The have been used for solving both optimization
three commonly used detection and ranging systems problems. In weight optimization, the set of weights is
(“dars”) are: (1) radar; (2) Lidar; and (3) acoustic represented as a chromosome and a genetic search is
radar (SODAR), corresponding to the three types of applied on the encoded representation to find a set of
radiation used [1]. weights that best fits the training data. Some
encouraging results have been reported which are
SODAR (or acoustic radar) plays a very significant
comparable with conventional learning algorithms. In
role in probing of the lower planetary boundary layer
architecture optimization, the topology of the
(LPBL). This is because the interaction of sound
networks is encoded as a chromosome and some
waves with the lower atmosphere is stronger as
genetic operators are applied to find an architecture
compared to the electromagnetic spectrum. It can be
which fits best the specified task according to some
designed at a reasonable cost and is capable of
explicit design criteria.
providing the three-dimensional (3-D) (height, time,
and intensity) view of the lower atmospheric Optimization of neural network architectures or
microstructures, appearing during different time, finding a minimal network for particular applications
month, and season, over a particular region [2]. is important because the speed and accuracy of
learning and performance are dependent on the
Potential use of SODAR data can be made,
network complexity, i.e. the type and number of units
provided it can be interpreted and identified correctly.
and connections, and the connectivity of units. For
So far, the task of identification of SODAR-recorded
example, a network having a large number of
observations has been completely dependent on the
adjustable connections tends to converge fast, but it
knowledge, experience, and expertise of researchers
usually leads to over fitting of the training data. On
-1-
the other hand, a small network will achieve a good
generalization if it converges; it needs, however,
generally a large amount of training time. Therefore,
the size of the network should be as small possible,
but sufficiently large to ensure an accurate fitting of
the training set [3].
Koza [4] provides an alternative approach to
representing neural networks, under the framework of
so-called genetic programming, which enables
modification not only of the weights but also of the
architecture for a neural network. However, this
method provides neither a general method for
representing an arbitrary feed forward network, nor a
mechanism for finding a network of minimal
complexity.
This paper describes a new genetic programming
method, called Breeder Genetic Programming (BGP) Fig 1. A genotype to phenotype mapping example
that employs the fitness function as a function
increasing when the number of epoch’s (EP) needed
for learning and mean square error of the network 3. Genetic Breeding of Neural Networks
(MSE) increase. The weights are trained not by back
propagation, but by Bayesian method. 3.1. Breeder Genetic Programming (BGP)
In this manner, 90 data points were obtained from For the evolution of optimal neural networks we
the charts for each category. These were used as the use the concepts based on the Breeder Genetic
input to the network. The six acoustic radar pattern Algorithm (BGA). While the usual genetic algorithms
categories correspond to the output classes. model a natural evolution, the BGA model a rational
selection performed by human breeders.
2. Representing Neural Networks as Trees BGA represents a class of random optimization
techniques gleaned from the science of population
Multilayer feedforward neural networks are
genetics, which have proved their ability to solve hard
networks of simple processing elements, called
optimization problems with continuous parameters.
neurons or units, organized in layers.
BGA which can be seen as a recombination between
The external inputs are presented to the input layer Evoluation Strategies (ES) and Genetic Algorithm
and feedforward via one or more layers of hidden (GA) , uses truncation selection which is very similar
units to the output layer. There is no connection to the (u ,λ) strategy in ESs ( this strategy called also
between units in the same layer. the comma strategy denoted by (µ, λ) the parents, after
generating offspring, die off and are not taken into
A commonly adopted architecture involves full account to form the next generation) and the search
connectivity between neighboring layers only. We process is mainly driven by recombination making
allow both partial connectivity and direct connections BGAs very similar to GAs. It has been proven that
between non-neighboring layers, since this is BGAs can solve problems more efficiently than GAs
important for finding a parsimonious architecture. due to the theoretical faster convergence to the
Specifically, this architecture allows for some of input optimum and they can, like GAs, be easily written in a
units to be connected directly to output units. parallel form [6].
For the genetic optimization, we represent a Our approach differs from the BGA in that we use
feedforward network as a set of m trees [5]. Each variable size of chromosomes, a characteristic of
corresponding to one output unit. Figure 1 shows a Genetic Programming (GP). Thus we call the method
genotype to phenotype mapping example. A genotype Breeder Genetic Programming (BGP). BGP also
is depicted that procedures the shown phenotype. differs from usual GP. While GP uses proportional
There are 3 nodes, input nodes, one hidden, one output selection combined with crossover as main operator,
node, and seven connection definitions, one of which BGP uses truncation selection combined with
is recurrent. The second disabled, so the connection crossover plus Gaussian approximation.
that it specifies (between nodes 2 and 4) is not
expressed in the phenotype. The BGP evolutionary learning algorithm is
summarized in eight steps as explain below the
algorithm maintains a population A consisting of M
individuals Ai of variable size. Each individual
represents a neural network. The networks of initial
population, A(0), are generated with a random number
of layers. The receptive field of each neural unit and
its width are also chosen randomly. The (g+1)
-2-
population, A(g+1), is created from A(g) in three 7. Replace the worst fit network in A(t+1) by
steps: selection, Gaussian approximation, and the best in A(t).
mating.
8. Set g←g+1 and return to step 2.
The selection step starts from a population of A(g)
individuals. Only the M% elements showing the best 3.2. Genetic Operators
fitness are chosen to give origin to the individuals of The weights of a network are trained by applying a
the next generation (i.e., are accepted into the mating Gaussian approximation to each of the individuals
pool B(g)). Usually truncation ratio lies in rang [10% accepted by truncation selection. Given a chromosome
to 50%]. The fitness function has been chosen as a Si of the network, Bayesian network finds a better
function increasing when the mean square error at the chromosome by repeatedly applying the Gaussian
end of the training and the number of epochs needed approximation and adapting the weight until there is
for learning are increase. no weight form found having better fitness in each
After selection, each individual in B(g) undergoes sweep through the individual.
a Bayesian neural network where the weight’s of the After that the crossover operator adapts the size and
network are adapted by Gaussian approximation. This shape of the network architecture. A crossover
results in the revised mate set B(g). The mating phase operation stars by choosing at random two parent
repeatedly selects two random parent individuals in individuals from the mating pool B(g). Actual
B(g) to mate and generate one offspring in the new crossover of two individuals, i and j, is done on their
population A(g+1) by applying crossover operators, genotypical representations Si and Sj. The nodes in
until the population size amount to M. Notice that not the tree are numbered according to depth – first search
only the size of individuals in one population may be order and crossover sites Ci and Cj are chosen at
different, random with the following conditions:
‫׀‬Ai(g)‫׀≠׀‬Aj(g)‫׀‬, i≠j and j ∈ {1,,M} (1) 1≤ Ci ≤size(Si)
But the size of same individual of subsequent And (4)
population may also different, 1≤ Cj ≤size(Sj)
‫ ׀‬Ai(g+1)‫׀≠׀‬Aj(g)‫׀‬, i ∈ {1,,M} (2) Here, the length of an individual, size (Si), is
A new population is generated repeatedly until an defined as the total number of units and weights.
acceptable solution is found or the variance of the Given the crossover points, the subtrees of two
fitness V(g) falls below a specified limit value Vmin parent individuals, Si and Sj are exchanged to form
(i.e., the procedure terminates if ) one offspring. The label of the nodes Ci and Cj must
1 M
⎛ −
⎞
2 belong to the same class as explain below:-
V (g) =
M
∑
i =1
⎜ Fi ( g ) − F ( g ) ⎟
⎝ ⎠
≤ V min
(3)
−
Where F ( g ) the average fitness of the individuals in
A(g), the algorithm also stops if a specified number of
generations, gmax, is carried out .
Eight steps explains the summary of the BGP
algorithm
1. Generate initial population A(0) of M
networks at random. Set current generation
g←0.
2. Evaluate fitness values Fi(g) of networks
using the training set of N examples.
3. If the termination condition is satisfied, then
stop the evolution otherwise, continue with
step 4.
4. Select upper M% networks of gth population
into mating pool B(g).
5. Each network in B(g) undergoes train by 3.3. Fitness Function
Bayesian neural network, resulting in revised
mating pool B(g). To evaluate the goodness of an individual, the
network is trained with a fixed number of patterns and
6. Great (g+1) population A(t+1) of size M by then evaluated according to determined parameters.
applying genetic operators to randomly The parameters which seen to better describe the
chosen parent networks in B(g). goodness of a network configuration are the mean
-3-
square error MSE at the end of the training and the setting hyper parameters such as α that control the
number of epochs EP needed for learning. Clearly it is "complexity" of the model with probability theory we
desirable to attain for both the parameters MSE and can effectively infer the appropriate value of the
EP value low as possible: in fact, a neural network regularization constant α from the data. This removes
must learn as fast as possible (same value of EP), and the need to waste resources on cross-validation, which
with a good approximation of the output desired. is traditionally used to set such hyper parameters. This
is a especially important advantage of the Bayesian
It is necessary to establish an upper limit EPmax to
approach when there are many such regularization
the number of epochs EP units by the network for the
constants.
training , thus 0≤ EP ≤ EPmax . Moreover, it is
desirable MSE be as close as possible to a small value Assume we have a set of n independent training
emin (0≤ emin ≤ MSE) which represent the minimum items (x1,y1)….(xn,yn), each of which gives the value
error required (below emin we can say the network has of an output vector, yi associative with an "input"
learned its task). vector, xi. And the prior probability distribution to
weights vector p(w), likelihood function that represent
EP error mature function p(y/x,w) while the post
F ( x) = + MSE (5)
EPmax probability to weights vector represent by Bayesian
rule as the following :
Please note that, according to the above formula,
F(x) can be lower than 1 if and only if the learning p(w) p((x1 , y1 ),...,(xn , yn ) / w)
p(w /(x1 , y1 ),...(xn , yn )) =
phase is such that minimum error required is reached p((x1 , y1 ),...,(xn , yn )) (7)
be artificial neural network within a number of epochs
lower than EPmax [7]. p ( w ) p ( y 1 ,..., y n / x 1 ,..., x n , w )
=
4. Bayesian Network p ( y 1 ,..., y n / x 1 ,..., x n )
A belief network represents a joint probability p (w )Π n

p( yi / xi,w)
distribution over a set of variables [8].We assume that
= i=1
p ( y 1 ,..., y n / x 1 ,..., x n )
these are discrete variables, partitioned into three sets:
set of inputs, output, and intermediate variables, In the Bayesian approach to statistical prediction,
respectively. The model consists of a qualitative part one does not use a single "best" set of network
(a directed graph) and quantitative parts (dependency weights, but rather integrates the prediction from all
models). The vertices of the graph represent the possible weight vectors over the posterior weight
random variables and the edges define the distribution, which combines information from the
independency relations (each variable is independent data with a prior bias toward more plausible weights.
of its non descendants given its parents [9]). There is a And assuming the addition input represents by xn+1
probabilistic dependency model for each variable that therefore the desired output yn+1 can be compute as
describes its dependency on its parents. following [10]:
Let U = {x1, …, xn}, n ≥ 1 be a set of variables. A
Bayesian network B over a set of variables U is a
yn+1 = ∫ RN f (xn+1 , w) p(w (x1 , y1 ),...,(xn , yn ))dw (8)
network structure BS, which is a directed acyclic graph
(DAG) over U and a set of probability tables BP = In the prior distribution and predictive distribution
that represent by the above relation are present the
{p(u‫׀‬pa(u))‫׀‬u ∈ U}, where pa(u) is the set of parents optimal solution of the neural network problems but
of u in BS. A Bayesian network represents probability this techniques includes multi-dimensional
distributions [9]. integrations or multivariate integration and this types
of integration represent the base of most difficulty
P(U ) = Π u∈U p(u pa(u ) ) (6) particular that found in Bayesian approach specially in
many application[11].
Therefore we can consider the training is the
4.1. The Bayesian Solution
problem of neural networks while the multivariate
integration is the problem of the Bayesian neural
The Bayesian viewpoint allows the following network, the analysis solutions are impossible
advantages to be gained by simple application of the therefore we need to search about the replacement
rules of probability theory: methods to solve this integrations. And from these
A. By viewing the product of learning as an ensemble methods:-
of probable classifiers, we can take our uncertainty A. Direct Numerical Integration.
into account when making predictions. This
improves the quality of predictions. B. Monte Carlo Simulation.
B. By viewing the regularizer with regularization C. Gaussian Approximation.
constant α as defining a prior probability , we can
In this paper, I will deal with Gaussian
solve the overfitting problem, i.e., the problem of
approximation to find the best weights of the network.
-4-
4.2. Gaussian Approximation distribution that its represent by Gaussian distribution
as follows:-
In this paper, I will deal only with neural networks
used for regressions, through the method should be N 2
applicable to classification network as well. Assume w
p ( w) = ( 2πω ) exp( − 2 2
) (13)
we have a set of n independent training items 2ω 2
(x1,y1)….(xn,yn), each of which gives the value of an
output vector, yi associative with an "input" vector, xi. 5. Bayesian neural network training
We also have the value of the input vector for an
additional test item, xn+1. The task is to predict the The Bayesian neural network learning algorithm is
corresponding output vector, yn+1[10]. a form of supervised learning used to train mainly
A neural network of some given architecture and feed forward neural networks to perform many tasks
with weight vectors w defines a mapping from an [12]. It's used sigmoid function to determine
input vector x to predicted output vector y given by activation of their neurons and it uses the MSE as a
measure to determine convergence of network outputs
y = f ( x, w) (9) toward the desired output. Figure 2 explains the
structure of Bayesian neural network.
Assuming a Gaussian model, the conditional Bayesian neural network consists of three kinds of
probability distribution for the output vector given the layers [13]:-
input vector based on this mapping will be as follows:
• Input Layer (i): - This layer is accountable for
2
received network inputs and is distributed on
−
N
y − f ( x, w) (10) hidden layer neurons.
p( y / x, w) = (2πσ ) 2 2
exp(− )
2σ 2
• Hidden Layer (j):- This layer does extraction
information about distribution features within the
Where y is desired output, f(x,w) is actual output, sum of training examples and it's sent to a
N is dimension of the output vector, σ is the noise in neighboring hidden layer or output layer with
output. dependency weights [14].
The conventional maximum likelihood approach to • Output Layer (k): - This layer is responsible for
neural network training is to used some gradient-based receiving stimulus pattern code from the hidden
algorithm to find the weight vector, that maximize the layers with dependency weights and finding the
degree of fit to the data, given by the log-likelihood actual output of the network. The output of each
L(w) , defined as:- layers can be computed as following:-
n
N yi − f (xi , w)
2
(11)
h = f (∑ w xi) (14)
L(w) =∑log p(yi / xi , w) = −∑
j ij
+C i=1
i=0 2σ 2 l
Where C does not depend on w. the output noise

y k = f ( ∑
j = 1
w jk h j ) (15)
level σ can also estimated from the training data.

The derivative of equation (11) represents Where hj is the output of hidden node, yk is the
maximum penalized likelihood estimation, defined output of output node, f() represents a nonlinear
as:- contentious function increasing monotonically and
2 2 differential contentious that called activation function
w N
y i − f ( xi , w )
L′( w) = − −∑ (12) and in this paper, I will deal with sigmoid function.
2ω 2 i=0 2σ 2 1
f ( x) = (16)
Here, the constant ω relates to the expected scale 1 + exp(−λnet )
of the weights, and might be set by program designer.
In the Bayesian network the above equation
Let λ =1 and net represents as:-
represents the mount of error occurs between the n
desired and actual output of the network. In Gaussian
probability distribution method I used this equation to
net = ∑
i = 1
x iw ij (17)
adapts neural network weights and reaches to the l

optimal weights vector.
net = ∑ h j w jk (18)
To complete the Bayesian formulation of the j = 1
problem, a prior distribution for the network weights
is required. To determine the weights vector we take
random samples and then find the probability
-5-
How you can determine the weights of the
Bayesian neural network?
w ij = ∆ w ij′ + w ij′′ (24)
At the first, we generate random weights of the Where w″ij is prior weight.
network between 0 and 1 to the output and hidden
layers. Then calculate weights through prior Adjust weights between hidden layer and output
probability distribution that represents a Gaussian layer as follows:-
approximation as the following:-
L 2 ∆ w jk = η l k′ h j + α ∆ w ′jk (25)
− w ij
p ( w ij ) = ( 2πω ) 2 2
exp( − ) (19)
2ω 2 Where ∆wjk represents the different between the
output layer and hidden layer weights, η learning rate
L 2 is a value between [0.1, 0.3], ∆w' ij represents the

− w jk (20)
p ( w jk ) = ( 2πω ) 2 2
exp( − ) different between current and prior weight. α
2ω 2 '
momentum rate, l k error in node k, hj represents
output of hidden layer. The new weight become
Where L is number of output nodes (i.e., in this
paper, L=6).
w jk = ∆ w ′jk + w ′jk′ (26)
At the above I explains the Bayesian neural
network is feedforward network therefore its needs to In this paper, I use the MSE as a measure to
desired output and this value use to training the determine convergence of network outputs toward the
network by compute the magnitude of different desired output as following.
between the actual output and desired output of the
network, as explain above Bayesian network depend 1 N L
on compute probability in its works therefore we must
compute the condition probability distribution
MSE =
2
∑ ∑ [d
i=0 j=0
j ( i ) − y j ( i )] 2 (27)
function and there derivative, where the derivative of

probability function represents maximum magnitude Where N is number of training patterns, L is
of weights that given the best output of network and number of classes.
the following equation represent the derivative of
Gaussian approximation to likelihood function:
2 2
wjk [dk − f (xk , wjk )]
lk′ (dk xk , wjk ) = − − (21)
2ω2 2σ 2
Where dk is desired output, f(xk,wjk) is actual
output, σ the noise level in output.
In this paper uses the probability function in
compute the error occurs of all nodes in output layer.
If the error occurs in hidden layer then the require
value in hidden nodes unknown therefore we use the
error founds in output nodes.
l ′j = h j ( 1 − h j ) ∑ ( l k′ * w jk ) (22)
k
Fig 2. The Structure of Bayesian Neural Network
Adjust weights between input layer and hidden
layer as follows:-
6. Implementation and Results
∆ w ij = η l ′j x i + α ∆ w ij′ (23)
In this paper, 90 data points were obtained from
Where ∆wij represents the different between the the charts for each category. These were used as the
input features. The six pattern categories correspond
input layer and hidden layer weights, η learning rate to the output classes (i.e., output classes are
is a value between [0.1, 0.3], ∆w'ij represents the convective boundary layer, temperature inversion
layer with flat top, temperature inversion layer with
different between current and prior weight. α small spikes, temperature inversion layer with tall
'
momentum rate, l j error in node j. The new weight spikes, rising inversion layer, rising inversion layer
become:- with capping a convective boundary layer[15]).
-6-
A three-layered Bayesian neural network, with 6.3. Case Study Number Three
breeder genetic programming, was used for
classifying the patterns, random sets of 20 sequential In this case study uses the following parameters of
observations, from the total 90 data points in each breeder genetic programming (BGP):-
pattern category, constituted a training pattern with 20 Population size= 100
attributes (input nodes) for that class. Six nodes were
involved at the output layer. Various truncation Truncation ratio=30%
selection ratio were used (10%, 20%, 30%, 40%, No. of generation= 50
50%) each one presents different network in shape and
size (number of node in hidden layer) and also in And the following parameters of the Bayesian
training time. The selection ratio Q% refers to random neural network:-
class wise selection of Q% training data (a sequence Max No. of epochs =7000
of 20 points in the chart) from the entire dataset. The
remaining (100-Q) % data constitute the test set in Learning rate= 0.11
each case. Momentum term=0.09
6.1. Case Study Number One Noise level value= 0.18
In this case study uses the following parameters of Result the network consist of three layers, the
breeder genetic programming (BGP):- hidden layer of it has 13 nodes, the value of fitness
Population size= 50 function=0.78501 and the network enables from
identification all other patterns after 3100 epochs with
Truncation ratio=10% MSE is 0.342153.
No. of generation= 50 6.4. Case Study Number Four
And the following parameters of the Bayesian In this case study uses the following parameters of
neural network:- breeder genetic programming (BGP):-
Max No. of epochs =20000 Population size= 100
Learning rate= 0.2 Truncation ratio=40%
Momentum term=0.09 No. of generation= 50
Noise level value= 0.15 And the following parameters of the Bayesian
Result the network consist of three layers, the neural network:-
hidden layer of it has 6 nodes, the value of fitness Max No. of epochs =10000
function=0.992068 and the network enables from
identification all other patterns after 18795 epochs Learning rate= 0.25
with MSE is 0.052318. Momentum term=0.04
6.2. Case Study Number Two Noise level value= 0.13

Result the network consist of three layers, the
In this case study uses the following parameters of hidden layer of it has 8 nodes, the value of fitness
breeder genetic programming (BGP):- function=0.65681 and the network enables from
Population size= 30 identification all other patterns after 5641 epochs with
MSE is 0.09271.
Truncation ratio=20%
No. of generation= 50 6.5. Case Study Number Five
And the following parameters of the Bayesian In this case study uses the following parameters of
neural network:- breeder genetic programming (BGP):-
Max No. of epochs =3000 Population size= 100
Learning rate= 0.2 Truncation ratio=50%
Momentum term=0.05 No. of generation= 50
Noise level value= 0.17 And the following parameters of the Bayesian
neural network:-
Result the network consist of three layers, the
hidden layer of it has 10 nodes, the value of fitness Max No. of epochs =1000
function=0.80561 and the network enables from Learning rate= 0.3
identification all other patterns after 2405 epochs with
MSE is 0.00395. Momentum term=0.089
Noise level value= 0.18
-7-
Result the network consist of three layers, the REFERENCES
hidden layer of it has 15 nodes, the value of fitness
function=0.59468 and the network enables from [1] R. G. Reeves and F. J. Janza, "Manual of Remote
identification all other patterns after 585 epochs with Sensing: Theory Instruments and Techniques". Falls
MSE is 0.00968. Church, VA: Amer. Soc. Photogramm., 1975.
7. Conclusions [2] S. P. Singal, “Acoustic sounding studies of the

atmospheric boundary layer,” Meteorol. Service,
Wellington, New Zealand, Tech. Rep. 30, 1988.
This paper presented an evolutionary method
called breeder genetic programming (BGP) for [3] F.Smieja, “ Neural Network Constructive Algorithms,
learning both the network architecture and the weights Trading Generalization for Learning Efficiency?”,
at the same time. The method uses trees to represent a Circuits Systems, and Signal Processing, pp 331-374,
feedforward network whose size and topology are 1993.
dynamically adapted by genetic operators. [4] J.R.Koza,”Genetic Proramming: On the Programming of
The use of the Bayesian neural network for Computers by Means of Natural Selection, 1992.
identifying types of acoustic radar patterns is expected
to overcome the above problems. Neural network [5] R.M.Neal and J.Zhang,"Classification for High
models are inherently suitable in data-rich Dimensional Problems Using Bayesian Neural Networks
environments and can extract underlying relationships and Dirichlet Diffusion Trees",University of Toronto,
NIPS , Feature Selection Workshop,2003.
from the data domain. This reduces dependence on the
human expert for identification process. [6] I. Falco, A.Cioppa, R.Balio and E.Tarantino,
"Investigating A Parallel Breeder Genetic Algorithm on
In all the cases study above the network enables
The Inverse Aerodynamic Design", Naples-Italy, 2000.
from identification all the remaining data related to
those classes in test stage. Site:http://citeseer.ist.psu.edu/cache/papers/cs/1832/http:zSz
zSzwww.irsip.na.cnr.itzSz~hotgSzpaperszSzppsn.pdf/de
By looking of the results to all the cases study falco96investigating.pdf
above, we can conclusion the following:-
[7] D.Quagliarella,J.Periaux,C.Poloni.andG.Winter,"Genetic
When increasing the truncation ratio and number Algorithms and Evoluation Strategy in Engineering and
of neurons in hidden layer the ability of network to Computer Science", John Wiley and Sons, New Yourk,
identification is increasing and the time of learning is October, 1998.
decrease.
[8] P. Antala,, G. Fannesa, D. Timmermanb, Y. Moreaua,
The real challenge in this situation is to be able to and B. De Moora, "Bayesian applications of belief
automatically providing the optimal Bayesian neural networks and multilayer perceptrons for ovarian tumor
network by dynamically adapted the architecture and classification with rejection" Artificial Intelligence in
the weights of the network, I think the proposed Medicine, Vol 29, pp 39–60, 2003.
method successes in satisfy this point. [9] R.R. Bouckaert "Bayesian Network Classifiers in
The goal of this paper is identifying the different Weka", University of Utrecht, April 21, 2005.
acoustic radar patterns, and from the results I can say [10] R. Neal, “Bayesian Training Of Back propagation
the propose method also successes in satisfy that goal. Networks By The Hybrid Monte Carlo Method”,
Department of Computer Science, Toronto University,
April, 1992.
[11] H. Khalid,"Bayesian Neural Network for Medical
Image Classification", M.Sc. Thesis, Babylon
University, 2006.
[12] W.Bill.,"Neural Networks and Error Back-Propagation
Learning", 2003.
Site:http:// www.cs.unew.edu.au\~billw\mldict.html
[13] J. M. Zurada, “Introduction to Artificial Neural
System”, JAICO publishing House, Bombay, 1997
[14] S. Mahdi, "Hybrid System By Using Wavelet
Transform in Neural Networks for Image Classification
", M.Sc. Thesis, Babylon University, 2003.
[15] S.Choudhury and S.Mitra, "a Connectionist Approach
to SODAR Pattern Classification", IEEE Geoscience
and Remote Sensing Litters, vol 1, No 2, April, 2004.
-8-

Ebnn & BGP &arpi

Uploaded by

Copyright:

Available Formats

Ebnn & BGP &arpi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ebnn & BGP &arpi

Uploaded by

Copyright:

Available Formats

Evolving Optimal Bayesian Neural Network Using Breeder Genetic

Programming to Acoustic Radar Pattern Identification

Samaher Hussein Ali

1. Introduction working with the system and its recorded observations

A belief network represents a joint probability p (w )Π n

Where C does not depend on w. the output noise

level σ can also estimated from the training data.

adapts neural network weights and reaches to the l

L 2 is a value between [0.1, 0.3], ∆w' ij represents the

function and there derivative, where the derivative of

6.2. Case Study Number Two Noise level value= 0.13

Max No. of epochs =3000 Population size= 100

Learning rate= 0.2 Truncation ratio=50%

Momentum term=0.05 No. of generation= 50

7. Conclusions [2] S. P. Singal, “Acoustic sounding studies of the

You might also like