Fundamentals of Artificial Neural Network: Workshop On "Neural Network Approach For Image Processing", Feb 4 & 5, 2011
Fundamentals of Artificial Neural Network: Workshop On "Neural Network Approach For Image Processing", Feb 4 & 5, 2011
Fundamentals of Artificial Neural Network: Workshop On "Neural Network Approach For Image Processing", Feb 4 & 5, 2011
CHAPTER 1
1.1 Introduction
A majority of information processing today is carried out by digital computers. There are
many tasks that are ideally suited to solution by conventional computers: scientific and
mathematical problem solving, database creation, manipulation and maintenance,
electronic communication, word processing, graphics and desktop publication, even the
simple control functions that add intelligence to and simplify our household tools and
appliances are handled quite effectively by today’s computers. In contrast cognitive tasks
like Speech and Image processing are hard to solve by the conventional algorithmic
approach. Whereas, human beings are typically much better at perceiving and identifying
an object of interest in a natural scene, or interpreting natural language, and many other
natural cognitive tasks, than a digital computer. One reason why we are much better at
recognizing objects in a complex scene, for example, is due to the way that our brain is
organized. Our brain employs a computational architecture that is well suited to solve
complex problems that a digital computer would have a difficult time with. Since our
conventional computers are obviously not suited to this type of problem, we borrow the
features from physiology of the brain as the basis for our new computing models, and this
technology has come to be known as Artificial Neural Network (ANN). This chapter
gives an overview of the fundamental principles of artificial neural network.
The brain is the central element of the nervous system. The information
processing cells of the brain are the neurons. Figure 1.1 shows the structure of a
biological neuron. It is composed of a cell body, or soma and two types of out-reaching
tree-like branches: the axon and the dendrites. The cell body has a nucleus that contains
information about hereditary traits and a plasma that holds the molecular equipment for
producing material needed by the neuron. A neuron receives signals from other neurons
through its dendrites. (receivers) and transmits signals generated by its cell body along
the axon (transmitter), which eventually branches into strands and substrands. At the
terminals of these strands are the synapses. A synapse is an elementary structure and a
fundamental unit between two neurons. The synapse’s effectiveness can be adjusted by
the signals passing through it so that the synapses can learn from the activities in which
they participate.
It is estimated that the brain consists of about 1011 neurons which are interconnected to
form a vast and complex neural network.
In 1943, Mc Culloh and Pitts proposed a binary threshold unit as a computational model
for an artificial neuron. Figure 1.2 shows the fundamental representation of an artificial
neuron. Each node has a node function, associated with it which along with a set of local
parameters determines the output of the node, given an input. The input signals are
transmitted by means of connection links. The links possesses an associated weight,
which is multiplied along with the incoming signal (net input) for any typical neural net.
Positive weights correspond to excitatory synapses, while negative weights model
inhibitory ones. This mathematical neuron computes a weighted sum of its n input signals
xi, i = 1, 2……n given by
n
net = x 1 w 1 + x 2 w 2 + ... + x n w n = ∑
i =1
xnwn
If this sum exceeds the neuron’s threshold value then the neuron fires (the output is ON).
This is illustrated in Figure 1.3.
x1
w1
x2 w2
h
y
wn θ
xn
Figure 1.2 Basic Artificial Neuron
⎛ n ⎞
y = f ⎜ ∑ (wi xi − θ )⎟
⎝ i =1 ⎠
Where f(.) is a step function defined by
f (net ) = 1, net > θ
= 0, net ≤ θ
The bias or offset θ could be introduced as one of the inputs with a permanent weight of
1, leading to a simpler expression:
⎛ n ⎞
y = f ⎜ ∑ (wi xi )⎟
⎝ i =0 ⎠
The lower limit in the above equation is from 0 rather than 1. The value of input x0 is
always set to 1.
Output
Output
net θ net
a) Thersholding at 0 b) Thersholding at θ
The other choices for activation function besides thersholding function are given in
Figure 1.4.
Networks consisting of MP neurons with binary (on-off) output signals can be configured
to perform several logical functions. Figure 1.5 shows some examples of logic circuits
realized using the MP model.
The arrangement of neurons into layers and the pattern of connection within and in-
between layer are called “architecture of the network”. The neurons within a layer are
found to be either fully interconnected or not interconnected. The number of layers in the
network can be defined to be the number of layers of weighted interconnected links
between the particular slabs of neurons. Based on the connection pattern (architecture),
ANNs can be grouped into three categories:
(i) Feed forward networks
(ii) Feedback networks
(iii) Competitive networks
Learning Process in the ANN context can be viewed as the problem of updating network
architecture and connection weights so that a network can efficiently perform a specific
task. The network usually must learn the specific weights from available training
patterns. Performance is improved over time by iteratively updating the weights in the
network. ANN’s ability to automatically learn from examples make them attractive and
exciting tool for many tasks.
There are three main learning paradigms for network training:
1. Supervised,
2. Unsupervised and
3. Reinforcement learning.
CHAPTER 2
2.1 Introduction
In chapter 1, the mathematical details of a neuron at the single cell level and as a network
were described. Although single neurons can perform certain simple pattern detection
functions, the power of neural computation comes from the neurons connected in a
network structure. Larger networks generally offer greater computational capabilities but
the converse is not true. Arranging neurons in layers or stages is supposed to mimic the
layered structure of certain portions of the brain.
In this chapter, we describe the most popular Artificial Neural Network (ANN)
architecture, namely Feedforward (FF) network. First we briefly review the perceptron
model.Next a multilayer feedforward network architecture is presented.This type of
network is sometimes called multilayer perceptron because of its similarity to perceptron
networks with more than one layer. We derive the generalized delta (backpropagation)
learning rule and see how it is implemented in practice. We will also examine variations
in the learning process to improve the efficiency, and ways to avoid some potential
problems that can arise during training. We will also discuss optimal parameters settings
and discuss various other training methods. We will also look at the capabilities and
limitations of the ANN.
A two layer feedforward neural network with hard limiting threshold units in the output
layer is called as single layer perceptron model. A perceptron model consisting of two
input units and three output units is shown in Figure 2.1. The number of units in the
input layer corresponds to the dimensionality of the input pattern vectors. The units in the
input layer are linear as the input layer merely contributes to fan out the input to each of
the output units. The number of output units depends on the number of distinct classes in
the pattern classification task.
The perceptron network is trained with the help of perceptron algorithm which is
a supervised learning paradigm. The network is given a desired output for each input
pattern. During the learning process, the actual output yi generated by the network may
not equal the desired output di. The perceptron learning rule is based on error-correction
principle. The basic principle of error correction learning rule is to use the error signal
(di-yi) to modify the connection weights to gradually reduce this error.
Y1
w11
X1
w12
Y2
w21
w22
X2
w13
w23
Y3
The algorithm for weight adjustment using perceptron learning law is given below:
x1
There are certain restrictions on the class of problems for which perceptron model can be
used. Perceptron network can be used only if the patterns are linearly separable. Because
many classification problems do not possess linearly separable property, this condition
places a severe restriction on the applicability of the perceptron network. A feed forward
network with hidden layer is an obvious choice in this case, the details of which are given
in the next section.
Problem 1:
⎡1 ⎤
⎢− 1 ⎥
w' = ⎢ ⎥
⎢0 ⎥
⎢ ⎥
⎣0.5⎦
Step 1:
Step 3:
Figure 2.3 shows the structure of multilayer feed forward neural network. This type of
architecture is part of a large class of feed forward neural networks with the neurons
arranged in cascaded layers. The neural network architecture in this class share a
common feature that all neurons in a layer (or sometimes called a slab) are connected to
all neurons in adjacent layers through unidirectional branches. That is, the branches and
links can only broadcast information in one direction, that is, the “forward direction”. The
branches have associated transmittances, that is, synaptic weights that can be adjusted
according to a defined learning rule. Feed forward networks do not allow connections
between neurons within any layer of architecture. At every neuron, the output of the
linear combiner, that is, the neuron activity level is input to a non linear activation
function f (.), whose output is the response of the neuron.
The neurons in the network typically have activity levels in the range [-1, 1], and
in some applications the range [0, 1] is used. In Figure 2.3, actually there are three
layers; The first layer in the network does not perform any computations, but only serves
to feed the input signal to the neurons of the “second” layer (called the hidden layer),
whose outputs are then input to the “third” layer (or the output layer). The output of the
output layer is the network response factor. This network can perform the non linear
input/output mapping. In general there can be any number of hidden layers in the
architecture; however, from a practical perspective, only one or two hidden layers are
typically used. In fact, it can be shown that a Multi Layer Perceptron (MLP) that has only
one hidden layer, with a sufficient number of neurons, act as a universal approximator of
non-linear mapping.
h1
w11 w11
x1 w12 w12
w13 h2 w21 y1
w12
w22
x2 w22 w31
w23
w32
w13 h3 y2
w32 w41
x3 w33 w14
w42
w24
w34 h4
Back Propagation learning is the commonly used algorithm for training the MLP. It is a
gradient descent method minimizing the mean square error between the actual and target
output of a multi layer perceptron.
In Iteration , k
n
net hk = ∑x w
i =0
i
k
hi ; h = 1, 2 ,...., H
1
z hk = f (net hk ) = ; h = 1,2,...., H
1 + e −(neth )
k
H
net kj = ∑z
h=0
k
h w kjh ; j = 1,2,..., m
1
y kj = f (net kj ) = (
− net kj ) ; j = 1,2,...., m
1+ e
Error Function
∑ (d j − y kj ) 2
m
Ek = 1
2
j =1
Adjust the weights such that the error is minimized
∂E k
∂w k ( ) (
= − d kj − y kj . f ' net kj . z hk)
jh
= − δ jk z hk
Here δ kj = (d k
j ) ( )
− y kj . f ' net kj represents the error scaled by the slope
∂E k ⎛ m ⎞
= ⎜⎜ − ∑ δ jk w kjh ⎟⎟ f ' (net hk ) xi
∂z hk ⎝ j =1 ⎠
= δ h xi
k
⎛ m ⎞
Here δ hk = ⎜⎜ − ∑ δ jk w kjh ⎟⎟ . f ' (net hk )
⎝ j =1 ⎠
k +1
whi = whi
k
+ ∆whi
k
= whi
k
+ ηδ hk xi
This section discusses the various design issues that concern the inner workings of the
back propagation algorithm.
The Back Propagation algorithm operates by sequentially presenting the data drawn from
a training set to predefined network architecture. There are two choices in implementing
this algorithm. Once a data is presented to the network, its gradients are calculated and
proceed to change the network weights based on these instantaneous (local) gradient
values. This is called pattern mode training. Alternatively, one could collect the error
gradients over an entire epoch and then change the weights of the initial neural network
in one shot. This is called batch mode training.
Both the generalization and approximation ability of a feed forward neural network are
closely related to the architecture of the network and the size of the training set. Choosing
appropriate network architecture means the number of layers and the number of hidden
neurons per layer. Although the back propagation algorithm can be applied to any
number of hidden layers, a three-layered network can approximate any continuous
function. The problem of selecting the number of neurons in the hidden layers of
multilayer networks is an important issue. The number of nodes must be large enough to
form a decision region as complex as that required by the problem. And yet the number
must not be excessively large so that the weights cannot be reliably estimated by
available training data.
• Divide the data set into a training set Ttraining and a test set Ttest.
• Subdivide Ttraining into two subsets: one to train the network Tlearning, and
one to validate the network Tvalidation.
• Train different network architectures on Tlearning and evaluate their
performance on Tvalidation.
• Select the best network.
• Finally, retrain this network architecture on Ttraining.
• Test for generalization ability using Ttest.
It is important to correctly choose a set of initial weights for the network. Sometimes it
can decide whether or not the network is able to learn the training set function. It is
common practice to initialize weights to small random values within some
interval [−ε , ε ] . Initialization of weights of the entire network to the same value can lead
to network paralysis where the network learns nothing-weight changes are uniformly
zero. Further, very small ranges of weight randomization should be avoided in general
since this may lead to very slow learning in the initial stages. Alternatively, an incorrect
choice of weights might lead to network saturation where weight changes are almost
negligible over consecutive epochs.
To get the best result the initial weights (and biases) are set to random numbers
between -0.5 and 0.5 or between -1 and 1. In general, the initialization of weights (bias)
can be done randomly.
The motivation for applying back propagation net is to achieve a balance between
memorization and generalization; it is not necessarily advantages to continue training
until the error reaches a minimum value. The weight adjustments are based on the
training patterns. As along as the error for validation decreases training continues.
Whenever the error begins to increase, the net is starting to memorize the training
patterns. At this point training is terminated.
Multi layered feed forward neural networks trained using back propagation algorithm
account for a majority of applications of real world problems. This is because back
propagation algorithm is easy to implement, fast and efficient to operate. Some of the
applications of ANN are mentioned below.
• Speech recognition
• Data Mining
• Robot arm control
• Bio-Informatics
• Power system security assessment
• Load forecasting
• Image processing
References
CHAPTER 3
MATLAB stands for MATrix LABortary. It is a software package for high performance
numerical computation and visualization. It also provides an interactive environment with
hundreds of reliable and accurate built-in mathematical functions. MATLAB’s built-in
functions provide excellent tools for linear algebraic computations, data analysis, signal
processing, optimization, numerical solution of ordinary differential equation and many
other types of scientific computations. The basic building block of MATLAB is the
matrix. The fundamental data type is the array. Vectors, Scalars, real matrices and
complex matrices are all handled as special cases of the basic data type.
Dimensioning of a matrix is automatic in MATLAB. MATLAB is case-sensitive:
Most of the MATLAB commands and built-in functions are in lower case letters. The
output of every command is displayed on the screen unless MATLAB is directed
otherwise. A semi-colon at the end of a command suppresses the screen output, except
for graphics and on-line help commands.
This is the main window, which is characterized by the MATLAB command prompt,
‘>>’. All commands, including those for running user-written programs, are typed in this
window at the MATLAB command prompt. It consists of four other sub-windows.
• Launch Pad:
This sub-window lists all MATLAB related applications and toolboxes.
• Work Space:
This sub-window lists all variables that have been generated so far and shows
their type and size. Various operations can be performed on these variables such
as plotting.
• Command History:
All commands typed on the MATLAB command prompt get recorded, even
across multiple sessions in this window.
• Current Directory:
This is the sub-window, in which all files from the current directory are listed.
The output of all graphics commands typed in the command window are flushed to the
graphics or figure window, a separate gray window with white background color. The
user can create as many figure windows as the system memory will allow.
In this sub-window, programs can be written, edited, created and saved in files called
“M-files”. MATLAB provides its own built-in editor.
A script file is a user-created file with a sequence of MATLAB commands in it. It may be
created by selecting a new M-file from the file menu in edit window. The file must be
saved with a ‘.m’ extension to its name, thereby, making it an M-file. This file is
executed by typing its name, without extension at the command prompt in command
window. If we use ‘%’ symbol before a line in the MATLAB program then it will be
treated as command line.
Eg. % MATLAB is user friendly program
The character limit for a line in MATLAB program is 4096.
The following are some important commands for working with directories:
• pwd (print working directory)
This command opens a separate sub-window, to the left of the command window,
and also displays the current directory.
Eg: >>pwd
C:\matlabR12\work
Displays the present working directory.
• cd (change directory)
This command is used to change the current working directory to a new directory.
Eg: >>cd new directory
• dir (directory)
On execution of this command, the contents present in the current directory can
be viewed.
• addpath
This command is used to add the specified directories to the existing path.
Eg: >>addpath D:\matlabR12\work
>>addpath C:\mywork
(or)
>>addpath D:\matlabR12\work C:\mywork
3.5 Variables
Expressions typed without a variable name are evaluated by MATLAB, and the result is
stored and displayed by a variable, ans. The result can also be stored to a variable name.
The variable name should begin with an alphabet and it can have a maximum word
length of 31 characters. After the first letter any symbols, numbers (or) special characters
may be used. Variables used in script files are global. Declaration of a set of variables to
be globally accessible to all or some functions without passing the variables in the input
list.
Eg: >>global x y;
An M-file can prompt for input from the keyboard by using input command.
Eg: >>V=input(‘enter radius’)
displays the string - enter radius - and waits for a number to be entered. That number
will be assigned to the variable, V.
The following are some of the commands used in matrix and vector manipulation:
• Square brackets with no elements between them creates a null matrix.
Eg: >>X = [ ] produces a null vector
• Any row(s) or column(s) of a matrix can be deleted by setting the row or
column to a null vector.
Eg: >>A(2,:) = [ ] deletes the second row of the matrix A.
• It provides a much higher level of index specifications. It allows a range of
rows and columns to be specified at the same time.
Eg: >>B=A(2:4,1:4) creates a matrix B consists of the elements in 2 to
4 rows and 1 to 4 columns from A matrix.
• When a row (or column) to be specified range over all rows (or columns) of
the matrix, a colon alone can be used.
Eg: >>B=A(2:3,:) creates a matrix B consists of all elements in 2rd and
3rd row of A matrix.
• No Dimension declaration required normally, but if the matrix is large then it
is advisable to initialize it
Eg: >>A=zeros(m,n); creates or initializes a matrix A of zeros as
elements with a dimension of m x n.
• The size of the matrix can be determined by the command size(matrix name)
in the command window.
Eg: >>a=size(A);
• If we want to know about the size of the row and column in a matrix then type
Eg: >>[m,n] = size(matrix name)
• Transpose of a matrix
Eg: >>B = A’
• Appending a row or a column in a matrix
>>A=[A ; u] Æ appends a column
>>A=[v ; A] Æ appends a row
where, u and v represents the column or row to be added.
• Deleting a row or a column in a matrix Æ any row or column in a matrix can
be
deleted by setting it to a null vector.
>>U (: ; a) = [ ] will delete the ath column from the U matrix.
>>V (b ; :) = [ ] will delete the bth row from the V matrix.
• Performing arithmetic operation in a matrix and in an array
Eg: >> A=[ 1 2: 3 4]
>>A=A*A;
• For Array Operation:
>>A=A.*A
This construction provides a logical branching for computations, also nested if statement
is possible as long as we have matching end statements
Eg: >>i=6;
>j=21;
>>if i>5
k=i;
This construction provides a logical branching for computations. A flag is used as switch
and the values of the flag make up the different cases for execution.
Eg: color = input(‘color =’,’s’);
switch color
case ‘red’
c=[1 0 0]
case ‘blue’
c=[1 1 1]
otherwise
error(‘Invalid color’)
end
3.8.5 Error
The command error inside a function or a script aborts the execution, displays the error
message and returns the control to the keyboard.
3.8.6 Break
The command break insider the for loop or while loop terminates the execution of the
loop, even if the condition for execution of the loop is true
Eg: for i=1:10
if u(i)<0
break
end
a=a+v(i);
end
3.8.7 Return
3.9 Exercises :
⎡3 7 −4 12 ⎤
⎢− 5 9 10 2 ⎥⎥
4. Given A = ⎢
⎢6 13 8 11 ⎥
⎢ ⎥
⎣15 5 4 1⎦
a. Create a vector V consisting of the elements in the second column of A
b. Create a vector W consisting of the elements in the second row of A
⎡3 7 −4 12 ⎤
⎢− 5 9 10 2 ⎥⎥
5. Given A = ⎢
⎢6 13 8 11 ⎥
⎢ ⎥
⎣15 5 4 1⎦
(i) Create a 4 x 3 array B consisting of all elements in the second through
4th column of A
(ii) Create a 3 x 4 array C consisting of all elements in the second through
fourth rows of A
(iii) Create a 2 x 3 array D consisting of all elements in the first two rows
and the last three columns of A
CHAPTER 4
4.1 Introduction
Load the data using the command “load” and store it in a variable
(eg.) >> load PR.dat
>> XY=PR
Separate the input and output data into training and test cases
X1,Y1 for training
X2,Y2 for testing
Normalize the input / output data if they are in different ranges using the formula
Similarly normalized value of the ith pattern in the jth output variable
Y1n(i,j)=(Y1(i,j)-minY(j))/(maxY(j)-minY(j));
• Specify the number of hidden layer and number of output layer using the
assignment
nhid =---
nout= ---
• Specify the number of epochs and the error goal using the assignment
net.trainParam.epochs=----;
net.trainParam.goal=-----;