Multi-Layer Perceptron Tutorial

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 87

Multi-layer Perceptron Tutorial

This tutorial continues from the last neural network tutorial the Perceptron tutorial. We will now
introduce the structure of the multi-layer perceptron and the back-propagation algorithm, without
doubt the most popular neural network structure to date. If you are in a hurry and just want to
mess with the code you can get it from here but I would recommend reading on to see how the
network functions.

Tutorial Prerequisities

 The reader should be familiar with the perceptron neural network


 The reader should have a basic understanding of C/C++
 The reader should know how to compile and run a program from the command line in
Windows or Linux

Tutorial Goals

 The reader will understand the structure of the Multi-layer Perceptron neural network
 The reader will understand the back-propagation algorithm
 The reader will know about the wide array of applications this network is used in
 The reader will learn all the above via an actual practical application in optical character
recognition

Tutorial Body

This network was introduced around 1986 with the advent of the back-propagation algorithm.
Until then there was no rule via which we could train neural networks with more than one layer.
As the name implies, a Multi-layer Perceptron is just that, a network that is comprised of many
neurons, divided in layers. These layers are divided as follows:

 The input layer, where the input of the network goes. The number of neurons here depends
on the number of inputs we want our network to get
 One or more hidden layers. These layers come between the input and the output and their
number can vary. The function that the hidden layer serves is to encode the input and map it to
the output. It has been proven that a multi-layer perceptron with only one hidden layer can
approximate any function that connects its input with its outputs if such a function exists.
 The output layer, where the outcome of the network can be seen. The number of neurons here
depends on the problem we want the neural net to learn

The Multi-layer perceptron differs from the simple perceptron in many ways. The same part is
that of weight randomization. All weights are given random values between a certain range,
usually [-0.5,0.5]. Having that aside though, for each pattern that is fed to the network three
passes over the net are made. Let's see them one by one in detail.

 Calculate the output:

In this phase we calculate the output of the network. For each layer, we calculate the firing value
of each neuron by getting the sum of the products of the multiplications of all the neurons
connected to said neuron from the previous layer and their corresponding weights. That sounded
a little big though so here it is in pseudocode:

for(int i = 0; i < previousLayerNeurons; i ++)


value[neuron,layer] += weight(i,neuron) * value[i,layer-1]

value[neuron,layer] = activationFunction(value[neuron,layer]);

As can be seen from the pseudocode, here too we have activation functions. They are used to
normalize the output of each neuron and the functions that are most commonly used in the
perceptron apply here too.So, we gradually propagate forward in the network until we reach the
output layer, and create some output values. Just like the perceptron these values are initially
completely random and have nothing to do with our goal values. But it is here that the back-
propagation learning algorithm kicks in.

 Back-Propagation:

The back propagation learning algorithm uses the delta-rule. What this does is that it computes
the deltas, (local gradients) of each neuron starting from the output neurons and going backwards
until it reaches the input layer. To compute the deltas of the output neurons though we first have
to get the error of each output neuron. That's pretty simple, since the multi-layer perceptron is a
supervised training network so the error is the difference between the network's output and the
desired output.
ej(n) = dj(n) - oj(n)
where e(n) is the error vector, d(n) is the desired output vector and o(n) is the actual output
vector. Now to compute the deltas:

deltaj(L)(n) = ej(L)(n) * f'(uj(L)(n)) , for neuron j in the output layer L


where f'(uj(l)(n)) is the derivative of the value of the jth neuron of layer L

deltaj(l)(n) = f'(uj(l)(n)) Σk(deltak(l+1)(n)*wkj(l+1)(n)), for neuron j in hidden layer l


where f'(uj(l)(n)) is the derivative of the value of the jth neuron in layer l and inside the Sum we
have the products of all the deltas of the neurons of the next layer multiplied by their
corresponding weights.

This part is a very impontant part of the delta rule and the whole essence of back propagation.
Why you might ask? Because just like wikipedia says a derivative is how much a function
changes as its input changes. By propagating the derivatives backwards , we are informing all the
neurons in the previous layers of the change that is needed in our weights to match the desired
output. And all that starts from the initial error calculation at the output layer. Just like magic!

 Weight adjustment

Having calculated the deltas for all the neurons we are now ready for one third and final pass of
the network, this time to adjust the weights according to the generalized delta rule:
wji(l)(n+1) = wji(l)(n) + α * [wji(l)(n) - wji(l)(n-1)] + η * deltaj(l)(n)yi(l-1)(n)
Do not be discouraged by lots of mathematical mumbo jumbo. It is actually quite simple. What
the above says is:
The new weights for layer l are calculated by adding two things to the current weights. The first
is the difference between the current weights and the previous weights multiplied by the
coefficient we symbolize with α. This coefficient is called the momentum coefficient, and true to
its name it adds speed to the training of any multi-layer perceptron by adding part of the already
occured weight changes to the current weight change. This is a double edged sword though since
if your momentum constant is too large the network will not converge and it will probably get
stuck in a local minima.

The other thing that adds to the weight change is the delta of the layer whose weights we change
(l) multiplied by the outputs of the neurons of the previous layer (l-1) and all that multiplied by
the constant η which we know to be the teaching step from the previous tutorial about the
perceptron. And that is basically it! That's what the multi layer perceptron is all about. It is no
doubt a very powerfull neural network and a very powerful tool in statistical analysis.

Practical example

It would not be a tutorial if we just explained how it works and gave you the equations. As was
already mentioned the Multi-layer perceptron has many applications. Satistical analysis, pattern
recognition, optical character recognition are just some of them. Our example will focus on just a
simple instance of optical character recognition. Specifically the final program will be able to use
an MLP to differentiate between a number of .bmp monochrome bitmap files and tell us which
number each image depicts.I used 8x8 pixels resolution for the images but it is up to the reader to
make his own resolutions and/or monochrome images since the program will read the size from
the bitmap itself. Below you can see an example of such bitmaps.

They are ugly, right? Differentiating between them should be hard for a computer? This ugliness
could be considered noice. And MLPs are really good at differentiating between noice and actual
data that help it reach a conclusion. But let's go on and see some code to understand how it is
done.

class MLP
{
private:
std::vector<float> inputNeurons;
std::vector<float>> hiddenNeurons;
std::vector<float> outputNeurons;
std::vector<float> weights;
FileReader* reader;

int inputN,outputN,hiddenN,hiddenL;

public:
MLP(int hiddenL,int hiddenN);
~MLP();

//assigns values to the input neurons


bool populateInput(int fileNum);
//calculates the whole network, from input to output
void calculateNetwork();
//trains the network according to our parameters
bool trainNetwork(float teachingStep,float lmse,float momentum,int
trainingFiles);

//recalls the network for a given bitmap file


void recallNetwork(int fileNum);

};

The above is our multi-layer perceptron class. As you can see it has vectors for all the neurons
and their connection weights. It also contains a FileReader object. As we will see below this
FileReader is a class we will make to read the bitmap files to populate our input. The functions
the MLP has are similar to the perceptron. It populates its input by reading the bitmap images,
calculates an output for the network and trains the network. Moreover you can recall the network
for a given 'fileNum' image to see what number the network thinks the image represents.

//Multi-layer perceptron constructor


MLP::MLP(int hL,int hN)
{
//initialize the filereader
reader = new FileReader();

outputN = 10; //the 9 possible numbers and zero


hiddenL = hL;
hiddenN = hN;

//initialize the filereader


reader = new FileReader();

//read the first image to see what kind of input will our net have
inputN = reader->getBitmapDimensions();
if(inputN == -1)
{
printf("There was an error detecting img0.bmp\n\r");
return ;
}

//let's allocate the memory for the weights


weights.reserve(inputN*hiddenN+(hiddenN*hiddenN*(hiddenL-1))
+hiddenN*outputN);

//also let's set the size for the neurons vector


inputNeurons.resize(inputN);
hiddenNeurons.resize(hiddenN*hiddenL);
outputNeurons.resize(outputN);

//randomize weights for inputs to 1st hidden layer


for(int i = 0; i < inputN*hiddenN; i++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]

//if there are more than 1 hidden layers, randomize their weights
for(int i=1; i < hiddenL; i++)
{
for(int j = 0; j < hiddenN*hiddenN; j++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]
}
}

//and finally randomize the weights for the output layer


for(int i = 0; i < hiddenN*outputN; i ++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]
}

The network takes the number of hidden neurons and hidden layers as parameters so it can know
how to initialize its neurons and weights vectors. Moreover it reads the first bitmap, 'img0.bmp'
to take the dimensions that all the images will have as can be seen from this line:
inputN = reader->getBitmapDimensions();
That is a requirement our tutorial's program will have. You are free to provide any bitmap size
you want for the first image 'img0.bmp' but you are required to have all the following images be
of the same size. As in most neural networks the weights are initialized in the range between [-
0.5,0.5].
void MLP::calculateNetwork()
{
//let's propagate towards the hidden layer
for(int hidden = 0; hidden < hiddenN; hidden++)
{
hiddenAt(1,hidden) = 0;

for(int input = 0 ; input < inputN; input ++)


{
hiddenAt(1,hidden) +=
inputNeurons.at(input)*inputToHidden(input,hidden);
}

//and finally pass it through the activation function


hiddenAt(1,hidden) = sigmoid(hiddenAt(1,hidden));
}

//now if we got more than one hidden layers


for(int i = 2; i <= hiddenL; i ++)
{

//for each one of these extra layers calculate their values


for(int j = 0; j < hiddenN; j++)//to
{
hiddenAt(i,j) = 0;

for(int k = 0; k < hiddenN; k++)//from


{
hiddenAt(i,j) += hiddenAt(i-1,k)*hiddenToHidden(i,k,j);
}

//and finally pass it through the activation function


hiddenAt(i,j) = sigmoid(hiddenAt(i,j));
}
}

int i;
//and now hidden to output
for(i =0; i < outputN; i ++)
{
outputNeurons.at(i) = 0;

for(int j = 0; j < hiddenN; j++)


{
outputNeurons.at(i) += hiddenAt(hiddenL,j) * hiddenToOutput(j,i);
}

//and finally pass it through the activation function


outputNeurons.at(i) = sigmoid( outputNeurons.at(i) );
}

}
The calculate network function just finds the output of the network that correspponds to the
currently given input. It just propagates the input signals through each layer until they reach the
output layer. Nothing really special with the above code, it is just an implementation of the
equations that were presented above. The neural network of our tutorial as we saw in the
constructor has 10 different ouput. Each of these output represent the possibility that the input
pattern is a certain number. So, output 1 being close to 1.0 would mean that the input pattern is
most certainly 1 and so on...

The training function is too big to just post it all in here, so I recommend you take a look at the
.zip with the source code to see it in full. We will just focus in the implementation of the back-
propagation algorithm.

for(int i = 0; i < outputN; i ++)


{
//let's get the delta of the output layer
//and the accumulated error
if(i != target)
{
outputDeltaAt(i) = (0.0 -
outputNeurons[i])*dersigmoid(outputNeurons[i]);
error += (0.0 - outputNeurons[i])*(0.0-outputNeurons[i]);
}
else
{
outputDeltaAt(i) = (1.0 -
outputNeurons[i])*dersigmoid(outputNeurons[i]);
error += (1.0 - outputNeurons[i])*(1.0-outputNeurons[i]);
}

//we start popagating backwards now, to get the error of each neuron
//in every layer

/let's get the delta of the last hidden layer first


for(int i = 0; i < hiddenN; i++)
{
hiddenDeltaAt(hiddenL,i) = 0;//zero the values from the previous
iteration

//add to the delta for each connection with an output neuron


for(int j = 0; j < outputN; j ++)
{
hiddenDeltaAt(hiddenL,i) += outputDeltaAt(j) * hiddenToOutput(i,j)
;
}

//The derivative here is only because of the


//delta rule weight adjustment about to follow
hiddenDeltaAt(hiddenL,i) *= dersigmoid(hiddenAt(hiddenL,i));
}
//now for each additional hidden layer, provided they exist
for(int i = hiddenL-1; i >0; i--)
{
//add to each neuron's hidden delta
for(int j = 0; j < hiddenN; j ++)//from
{

hiddenDeltaAt(i,j) = 0;//zero the values from the previous


iteration

for(int k = 0; k < hiddenN; k++)//to


{
//the previous hidden layers delta multiplied by the
weights
//for each neuron
hiddenDeltaAt(i,j) += hiddenDeltaAt(i+1,k) *
hiddenToHidden(i+1,j,k);
}

//The derivative here is only because of the


//delta rule weight adjustment about to follow
hiddenDeltaAt(i,j) *= dersigmoid(hiddenAt(i,j));
}
}

As you can see above this is the second pass over the network, the so called back-propagation as
we presented it above, since we are going backwards this time. Having calculated the output and
knowing the desired output (called target, in the above code) we start the delta calculation
according to the equations that we saw at the start of the tutorial. If you don't like math, then here
it is for you in code. As you can see many helper macros are used to differentiate between
weights of diffeent layers and deltas.

//Weights modification
tempWeights = weights;//keep the previous weights somewhere, we will need them

//hidden to Input weights


for(int i = 0; i < inputN; i ++)
{
for(int j = 0; j < hiddenN; j ++)
{
inputToHidden(i,j) += momentum*(inputToHidden(i,j) -
_prev_inputToHidden(i,j)) +
teachingStep* hiddenDeltaAt(1,j) * inputNeurons[i];
}
}

//hidden to hidden weights, provided more than 1 layer exists


for(int i = 2; i <=hiddenL; i++)
{
for(int j = 0; j < hiddenN; j ++)//from
{
for(int k =0; k < hiddenN; k ++)//to
{
hiddenToHidden(i,j,k) += momentum*(hiddenToHidden(i,j,k) -
_prev_hiddenToHidden(i,j,k)) +
teachingStep * hiddenDeltaAt(i,k)
* hiddenAt(i-1,j);
}
}
}

//last hidden layer to output weights


for(int i = 0; i < outputN; i++)
{
for(int j = 0; j < hiddenN; j ++)
{
hiddenToOutput(j,i) += momentum*(hiddenToOutput(j,i) -
_prev_hiddenToOutput(j,i)) +
teachingStep * outputDeltaAt(i) *
hiddenAt(hiddenL,j);
}
}

prWeights = tempWeights;

And finally this is the third and final pass over the network (for each image of course), which is a
forward propagation from the input layer to the output layer. Here we use the previously
calculated deltas to adjust the weights of the network, to make up for the error we found at the
initial calculation. This is just an implementation in code of the weight adjustment equations we
saw in the theoretical part of the tutorial.

We can see the teaching step at work here. Moreover the careful reader will have noticed that we
keep the previous weight vector values in a temporary vector. That is because of the momentum.
If you recall, we mentioned that the momentum adds a percentage of the already applied weight
change to each subsequent weight change, achieving faster training speeds. Hence the term
momentum.
Well that's actually all there is to know about the back-propagation algorithm training and the
Multi-layer perceptron. Let's take a look at the fileReader class.

class FileReader
{

private:
char* imgBuffer;
//a DWORD
char* check;
bool firstImageRead;
//the input filestream used to read
ifstream fs;

//image stuff
int width;
int height;

public:
FileReader();
~FileReader();
bool readBitmap(int fileNum);
//reads the first bitmap file, the one designated with a '0'
//and gets the dimensions. All other .bmp are assumed with
//equal and identical dimensions
int getBitmapDimensions();
//returns a pointer to integers with all the goals
//that each bitmap should have. Reads it from a file
int* getImgGoals();
//returns a pointer to the currently read data
char* getImgData();
//helper function convering bytes to an int
int bytesToInt(char* bytes,int number);

};

This is the fileReader, class. It contains the imgBuffer, which hold the data of the currently read
bitmap, the input file stream used to read the bitmaps and it also keeps the width and height of
the initializer image. Seeing how the functions are implemented is out of the scope of this
tutorial but you can check the code in the .zip file to see how it is done. What you need to know
is that this class will read the image designated as 'img0.bmp' and assume all the other images
will be monochrome bitmaps with the same dimensions and that all are located in the same path
as the executable.

On the right you can see how to save a monochrome bitmap as a .bmp file using MS windows
paint program. You can create your own bitmap images, and save them like that but just
remember use incrementing numbers to name the files and update goals.txt accordingly.
Moreover all images should have the same dimensions.

Assuming you have the image bitmaps AND the goals.txt file in the same directory as the
executable you can run the tutorial like you can see in the above image. It is using the cmd
command line in windows, but it should work fine in Linux too. You can see how it is called by
looking at the above image. If you call it incorrectly you will be prompted for correct calling.
Any time during training (in Windows) and in Linux each 1000 epochs (for now, it is in the
TODO list, to use the pdCurses library), you are able to stop and start recalling images. You are
just prompted for the image number, the one coming after 'img' in the filename and the network
recalls that image and tells you what it thinks that image represents. Afterwards as you can see
from the image above you also get some percentages to know how much the network thinks the
image match the numbers from 0 to 9.

Well this was it. I hope you enjoyed this tutorial and managed to comprehend the workings of
the multi-layer perceptron neural network. You can find the source code and the images I used to
train the network in the tutorial's source code. I used really small dimensions , 8x8 , just so it can
get trained fast. If you stick with the parameters I used above you are sure to converge. Since this
network has many outputs, some of which look alike the mean square error can not go really
low. That is since some numbers are almost the same, (especially the way I painted them).
Specifically 7 with 4 , and 0 with 8. Still as far as picking the best matching pattern the network
performs brilliantly. For least mean square error you can feel free to stop training when it goes
below 0.45 or so.

As always if you have any comments about the tutorial, constructive criticism or found any bugs
in the code please email me at:
lefteris *at* realintelligence *dot* net

Multi-layer Perceptron Tutorial


This tutorial continues from the last neural network tutorial the Perceptron tutorial. We will now
introduce the structure of the multi-layer perceptron and the back-propagation algorithm, without
doubt the most popular neural network structure to date. If you are in a hurry and just want to
mess with the code you can get it from here but I would recommend reading on to see how the
network functions.

Tutorial Prerequisities

 The reader should be familiar with the perceptron neural network


 The reader should have a basic understanding of C/C++
 The reader should know how to compile and run a program from the command line in
Windows or Linux

Tutorial Goals

 The reader will understand the structure of the Multi-layer Perceptron neural network
 The reader will understand the back-propagation algorithm
 The reader will know about the wide array of applications this network is used in
 The reader will learn all the above via an actual practical application in optical character
recognition

Tutorial Body

This network was introduced around 1986 with the advent of the back-propagation algorithm.
Until then there was no rule via which we could train neural networks with more than one layer.
As the name implies, a Multi-layer Perceptron is just that, a network that is comprised of many
neurons, divided in layers. These layers are divided as follows:

 The input layer, where the input of the network goes. The number of neurons here depends
on the number of inputs we want our network to get
 One or more hidden layers. These layers come between the input and the output and their
number can vary. The function that the hidden layer serves is to encode the input and map it to
the output. It has been proven that a multi-layer perceptron with only one hidden layer can
approximate any function that connects its input with its outputs if such a function exists.
 The output layer, where the outcome of the network can be seen. The number of neurons here
depends on the problem we want the neural net to learn

The Multi-layer perceptron differs from the simple perceptron in many ways. The same part is
that of weight randomization. All weights are given random values between a certain range,
usually [-0.5,0.5]. Having that aside though, for each pattern that is fed to the network three
passes over the net are made. Let's see them one by one in detail.

 Calculate the output:

In this phase we calculate the output of the network. For each layer, we calculate the firing value
of each neuron by getting the sum of the products of the multiplications of all the neurons
connected to said neuron from the previous layer and their corresponding weights. That sounded
a little big though so here it is in pseudocode:

for(int i = 0; i < previousLayerNeurons; i ++)


value[neuron,layer] += weight(i,neuron) * value[i,layer-1]

value[neuron,layer] = activationFunction(value[neuron,layer]);

As can be seen from the pseudocode, here too we have activation functions. They are used to
normalize the output of each neuron and the functions that are most commonly used in the
perceptron apply here too.So, we gradually propagate forward in the network until we reach the
output layer, and create some output values. Just like the perceptron these values are initially
completely random and have nothing to do with our goal values. But it is here that the back-
propagation learning algorithm kicks in.

 Back-Propagation:

The back propagation learning algorithm uses the delta-rule. What this does is that it computes
the deltas, (local gradients) of each neuron starting from the output neurons and going backwards
until it reaches the input layer. To compute the deltas of the output neurons though we first have
to get the error of each output neuron. That's pretty simple, since the multi-layer perceptron is a
supervised training network so the error is the difference between the network's output and the
desired output.
ej(n) = dj(n) - oj(n)
where e(n) is the error vector, d(n) is the desired output vector and o(n) is the actual output
vector. Now to compute the deltas:

deltaj(L)(n) = ej(L)(n) * f'(uj(L)(n)) , for neuron j in the output layer L


where f'(uj(l)(n)) is the derivative of the value of the jth neuron of layer L

deltaj(l)(n) = f'(uj(l)(n)) Σk(deltak(l+1)(n)*wkj(l+1)(n)), for neuron j in hidden layer l


where f'(uj(l)(n)) is the derivative of the value of the jth neuron in layer l and inside the Sum we
have the products of all the deltas of the neurons of the next layer multiplied by their
corresponding weights.

This part is a very impontant part of the delta rule and the whole essence of back propagation.
Why you might ask? Because just like wikipedia says a derivative is how much a function
changes as its input changes. By propagating the derivatives backwards , we are informing all the
neurons in the previous layers of the change that is needed in our weights to match the desired
output. And all that starts from the initial error calculation at the output layer. Just like magic!

 Weight adjustment

Having calculated the deltas for all the neurons we are now ready for one third and final pass of
the network, this time to adjust the weights according to the generalized delta rule:
wji(l)(n+1) = wji(l)(n) + α * [wji(l)(n) - wji(l)(n-1)] + η * deltaj(l)(n)yi(l-1)(n)
Do not be discouraged by lots of mathematical mumbo jumbo. It is actually quite simple. What
the above says is:
The new weights for layer l are calculated by adding two things to the current weights. The first
is the difference between the current weights and the previous weights multiplied by the
coefficient we symbolize with α. This coefficient is called the momentum coefficient, and true to
its name it adds speed to the training of any multi-layer perceptron by adding part of the already
occured weight changes to the current weight change. This is a double edged sword though since
if your momentum constant is too large the network will not converge and it will probably get
stuck in a local minima.

The other thing that adds to the weight change is the delta of the layer whose weights we change
(l) multiplied by the outputs of the neurons of the previous layer (l-1) and all that multiplied by
the constant η which we know to be the teaching step from the previous tutorial about the
perceptron. And that is basically it! That's what the multi layer perceptron is all about. It is no
doubt a very powerfull neural network and a very powerful tool in statistical analysis.

Practical example

It would not be a tutorial if we just explained how it works and gave you the equations. As was
already mentioned the Multi-layer perceptron has many applications. Satistical analysis, pattern
recognition, optical character recognition are just some of them. Our example will focus on just a
simple instance of optical character recognition. Specifically the final program will be able to use
an MLP to differentiate between a number of .bmp monochrome bitmap files and tell us which
number each image depicts.I used 8x8 pixels resolution for the images but it is up to the reader to
make his own resolutions and/or monochrome images since the program will read the size from
the bitmap itself. Below you can see an example of such bitmaps.

They are ugly, right? Differentiating between them should be hard for a computer? This ugliness
could be considered noice. And MLPs are really good at differentiating between noice and actual
data that help it reach a conclusion. But let's go on and see some code to understand how it is
done.

class MLP
{
private:
std::vector<float> inputNeurons;
std::vector<float>> hiddenNeurons;
std::vector<float> outputNeurons;
std::vector<float> weights;
FileReader* reader;

int inputN,outputN,hiddenN,hiddenL;

public:
MLP(int hiddenL,int hiddenN);
~MLP();

//assigns values to the input neurons


bool populateInput(int fileNum);
//calculates the whole network, from input to output
void calculateNetwork();
//trains the network according to our parameters
bool trainNetwork(float teachingStep,float lmse,float momentum,int
trainingFiles);

//recalls the network for a given bitmap file


void recallNetwork(int fileNum);

};

The above is our multi-layer perceptron class. As you can see it has vectors for all the neurons
and their connection weights. It also contains a FileReader object. As we will see below this
FileReader is a class we will make to read the bitmap files to populate our input. The functions
the MLP has are similar to the perceptron. It populates its input by reading the bitmap images,
calculates an output for the network and trains the network. Moreover you can recall the network
for a given 'fileNum' image to see what number the network thinks the image represents.

//Multi-layer perceptron constructor


MLP::MLP(int hL,int hN)
{
//initialize the filereader
reader = new FileReader();

outputN = 10; //the 9 possible numbers and zero


hiddenL = hL;
hiddenN = hN;

//initialize the filereader


reader = new FileReader();

//read the first image to see what kind of input will our net have
inputN = reader->getBitmapDimensions();
if(inputN == -1)
{
printf("There was an error detecting img0.bmp\n\r");
return ;
}

//let's allocate the memory for the weights


weights.reserve(inputN*hiddenN+(hiddenN*hiddenN*(hiddenL-1))
+hiddenN*outputN);

//also let's set the size for the neurons vector


inputNeurons.resize(inputN);
hiddenNeurons.resize(hiddenN*hiddenL);
outputNeurons.resize(outputN);

//randomize weights for inputs to 1st hidden layer


for(int i = 0; i < inputN*hiddenN; i++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]

//if there are more than 1 hidden layers, randomize their weights
for(int i=1; i < hiddenL; i++)
{
for(int j = 0; j < hiddenN*hiddenN; j++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]
}
}

//and finally randomize the weights for the output layer


for(int i = 0; i < hiddenN*outputN; i ++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]
}

The network takes the number of hidden neurons and hidden layers as parameters so it can know
how to initialize its neurons and weights vectors. Moreover it reads the first bitmap, 'img0.bmp'
to take the dimensions that all the images will have as can be seen from this line:
inputN = reader->getBitmapDimensions();
That is a requirement our tutorial's program will have. You are free to provide any bitmap size
you want for the first image 'img0.bmp' but you are required to have all the following images be
of the same size. As in most neural networks the weights are initialized in the range between [-
0.5,0.5].
void MLP::calculateNetwork()
{
//let's propagate towards the hidden layer
for(int hidden = 0; hidden < hiddenN; hidden++)
{
hiddenAt(1,hidden) = 0;

for(int input = 0 ; input < inputN; input ++)


{
hiddenAt(1,hidden) +=
inputNeurons.at(input)*inputToHidden(input,hidden);
}

//and finally pass it through the activation function


hiddenAt(1,hidden) = sigmoid(hiddenAt(1,hidden));
}

//now if we got more than one hidden layers


for(int i = 2; i <= hiddenL; i ++)
{

//for each one of these extra layers calculate their values


for(int j = 0; j < hiddenN; j++)//to
{
hiddenAt(i,j) = 0;

for(int k = 0; k < hiddenN; k++)//from


{
hiddenAt(i,j) += hiddenAt(i-1,k)*hiddenToHidden(i,k,j);
}

//and finally pass it through the activation function


hiddenAt(i,j) = sigmoid(hiddenAt(i,j));
}
}

int i;
//and now hidden to output
for(i =0; i < outputN; i ++)
{
outputNeurons.at(i) = 0;

for(int j = 0; j < hiddenN; j++)


{
outputNeurons.at(i) += hiddenAt(hiddenL,j) * hiddenToOutput(j,i);
}

//and finally pass it through the activation function


outputNeurons.at(i) = sigmoid( outputNeurons.at(i) );
}

}
The calculate network function just finds the output of the network that correspponds to the
currently given input. It just propagates the input signals through each layer until they reach the
output layer. Nothing really special with the above code, it is just an implementation of the
equations that were presented above. The neural network of our tutorial as we saw in the
constructor has 10 different ouput. Each of these output represent the possibility that the input
pattern is a certain number. So, output 1 being close to 1.0 would mean that the input pattern is
most certainly 1 and so on...

The training function is too big to just post it all in here, so I recommend you take a look at the
.zip with the source code to see it in full. We will just focus in the implementation of the back-
propagation algorithm.

for(int i = 0; i < outputN; i ++)


{
//let's get the delta of the output layer
//and the accumulated error
if(i != target)
{
outputDeltaAt(i) = (0.0 -
outputNeurons[i])*dersigmoid(outputNeurons[i]);
error += (0.0 - outputNeurons[i])*(0.0-outputNeurons[i]);
}
else
{
outputDeltaAt(i) = (1.0 -
outputNeurons[i])*dersigmoid(outputNeurons[i]);
error += (1.0 - outputNeurons[i])*(1.0-outputNeurons[i]);
}

//we start popagating backwards now, to get the error of each neuron
//in every layer

/let's get the delta of the last hidden layer first


for(int i = 0; i < hiddenN; i++)
{
hiddenDeltaAt(hiddenL,i) = 0;//zero the values from the previous
iteration

//add to the delta for each connection with an output neuron


for(int j = 0; j < outputN; j ++)
{
hiddenDeltaAt(hiddenL,i) += outputDeltaAt(j) * hiddenToOutput(i,j)
;
}

//The derivative here is only because of the


//delta rule weight adjustment about to follow
hiddenDeltaAt(hiddenL,i) *= dersigmoid(hiddenAt(hiddenL,i));
}
//now for each additional hidden layer, provided they exist
for(int i = hiddenL-1; i >0; i--)
{
//add to each neuron's hidden delta
for(int j = 0; j < hiddenN; j ++)//from
{

hiddenDeltaAt(i,j) = 0;//zero the values from the previous


iteration

for(int k = 0; k < hiddenN; k++)//to


{
//the previous hidden layers delta multiplied by the
weights
//for each neuron
hiddenDeltaAt(i,j) += hiddenDeltaAt(i+1,k) *
hiddenToHidden(i+1,j,k);
}

//The derivative here is only because of the


//delta rule weight adjustment about to follow
hiddenDeltaAt(i,j) *= dersigmoid(hiddenAt(i,j));
}
}

As you can see above this is the second pass over the network, the so called back-propagation as
we presented it above, since we are going backwards this time. Having calculated the output and
knowing the desired output (called target, in the above code) we start the delta calculation
according to the equations that we saw at the start of the tutorial. If you don't like math, then here
it is for you in code. As you can see many helper macros are used to differentiate between
weights of diffeent layers and deltas.

//Weights modification
tempWeights = weights;//keep the previous weights somewhere, we will need them

//hidden to Input weights


for(int i = 0; i < inputN; i ++)
{
for(int j = 0; j < hiddenN; j ++)
{
inputToHidden(i,j) += momentum*(inputToHidden(i,j) -
_prev_inputToHidden(i,j)) +
teachingStep* hiddenDeltaAt(1,j) * inputNeurons[i];
}
}

//hidden to hidden weights, provided more than 1 layer exists


for(int i = 2; i <=hiddenL; i++)
{
for(int j = 0; j < hiddenN; j ++)//from
{
for(int k =0; k < hiddenN; k ++)//to
{
hiddenToHidden(i,j,k) += momentum*(hiddenToHidden(i,j,k) -
_prev_hiddenToHidden(i,j,k)) +
teachingStep * hiddenDeltaAt(i,k)
* hiddenAt(i-1,j);
}
}
}

//last hidden layer to output weights


for(int i = 0; i < outputN; i++)
{
for(int j = 0; j < hiddenN; j ++)
{
hiddenToOutput(j,i) += momentum*(hiddenToOutput(j,i) -
_prev_hiddenToOutput(j,i)) +
teachingStep * outputDeltaAt(i) *
hiddenAt(hiddenL,j);
}
}

prWeights = tempWeights;

And finally this is the third and final pass over the network (for each image of course), which is a
forward propagation from the input layer to the output layer. Here we use the previously
calculated deltas to adjust the weights of the network, to make up for the error we found at the
initial calculation. This is just an implementation in code of the weight adjustment equations we
saw in the theoretical part of the tutorial.

We can see the teaching step at work here. Moreover the careful reader will have noticed that we
keep the previous weight vector values in a temporary vector. That is because of the momentum.
If you recall, we mentioned that the momentum adds a percentage of the already applied weight
change to each subsequent weight change, achieving faster training speeds. Hence the term
momentum.
Well that's actually all there is to know about the back-propagation algorithm training and the
Multi-layer perceptron. Let's take a look at the fileReader class.

class FileReader
{

private:
char* imgBuffer;
//a DWORD
char* check;
bool firstImageRead;
//the input filestream used to read
ifstream fs;

//image stuff
int width;
int height;

public:
FileReader();
~FileReader();
bool readBitmap(int fileNum);
//reads the first bitmap file, the one designated with a '0'
//and gets the dimensions. All other .bmp are assumed with
//equal and identical dimensions
int getBitmapDimensions();
//returns a pointer to integers with all the goals
//that each bitmap should have. Reads it from a file
int* getImgGoals();
//returns a pointer to the currently read data
char* getImgData();
//helper function convering bytes to an int
int bytesToInt(char* bytes,int number);

};

This is the fileReader, class. It contains the imgBuffer, which hold the data of the currently read
bitmap, the input file stream used to read the bitmaps and it also keeps the width and height of
the initializer image. Seeing how the functions are implemented is out of the scope of this
tutorial but you can check the code in the .zip file to see how it is done. What you need to know
is that this class will read the image designated as 'img0.bmp' and assume all the other images
will be monochrome bitmaps with the same dimensions and that all are located in the same path
as the executable.

On the right you can see how to save a monochrome bitmap as a .bmp file using MS windows
paint program. You can create your own bitmap images, and save them like that but just
remember use incrementing numbers to name the files and update goals.txt accordingly.
Moreover all images should have the same dimensions.

Assuming you have the image bitmaps AND the goals.txt file in the same directory as the
executable you can run the tutorial like you can see in the above image. It is using the cmd
command line in windows, but it should work fine in Linux too. You can see how it is called by
looking at the above image. If you call it incorrectly you will be prompted for correct calling.
Any time during training (in Windows) and in Linux each 1000 epochs (for now, it is in the
TODO list, to use the pdCurses library), you are able to stop and start recalling images. You are
just prompted for the image number, the one coming after 'img' in the filename and the network
recalls that image and tells you what it thinks that image represents. Afterwards as you can see
from the image above you also get some percentages to know how much the network thinks the
image match the numbers from 0 to 9.

Well this was it. I hope you enjoyed this tutorial and managed to comprehend the workings of
the multi-layer perceptron neural network. You can find the source code and the images I used to
train the network in the tutorial's source code. I used really small dimensions , 8x8 , just so it can
get trained fast. If you stick with the parameters I used above you are sure to converge. Since this
network has many outputs, some of which look alike the mean square error can not go really
low. That is since some numbers are almost the same, (especially the way I painted them).
Specifically 7 with 4 , and 0 with 8. Still as far as picking the best matching pattern the network
performs brilliantly. For least mean square error you can feel free to stop training when it goes
below 0.45 or so.

As always if you have any comments about the tutorial, constructive criticism or found any bugs
in the code please email me at:
lefteris *at* realintelligence *dot* net

Multi-layer Perceptron Tutorial


This tutorial continues from the last neural network tutorial the Perceptron tutorial. We will now
introduce the structure of the multi-layer perceptron and the back-propagation algorithm, without
doubt the most popular neural network structure to date. If you are in a hurry and just want to
mess with the code you can get it from here but I would recommend reading on to see how the
network functions.

Tutorial Prerequisities

 The reader should be familiar with the perceptron neural network


 The reader should have a basic understanding of C/C++
 The reader should know how to compile and run a program from the command line in
Windows or Linux

Tutorial Goals

 The reader will understand the structure of the Multi-layer Perceptron neural network
 The reader will understand the back-propagation algorithm
 The reader will know about the wide array of applications this network is used in
 The reader will learn all the above via an actual practical application in optical character
recognition

Tutorial Body

This network was introduced around 1986 with the advent of the back-propagation algorithm.
Until then there was no rule via which we could train neural networks with more than one layer.
As the name implies, a Multi-layer Perceptron is just that, a network that is comprised of many
neurons, divided in layers. These layers are divided as follows:

 The input layer, where the input of the network goes. The number of neurons here depends
on the number of inputs we want our network to get
 One or more hidden layers. These layers come between the input and the output and their
number can vary. The function that the hidden layer serves is to encode the input and map it to
the output. It has been proven that a multi-layer perceptron with only one hidden layer can
approximate any function that connects its input with its outputs if such a function exists.
 The output layer, where the outcome of the network can be seen. The number of neurons here
depends on the problem we want the neural net to learn

The Multi-layer perceptron differs from the simple perceptron in many ways. The same part is
that of weight randomization. All weights are given random values between a certain range,
usually [-0.5,0.5]. Having that aside though, for each pattern that is fed to the network three
passes over the net are made. Let's see them one by one in detail.

 Calculate the output:

In this phase we calculate the output of the network. For each layer, we calculate the firing value
of each neuron by getting the sum of the products of the multiplications of all the neurons
connected to said neuron from the previous layer and their corresponding weights. That sounded
a little big though so here it is in pseudocode:

for(int i = 0; i < previousLayerNeurons; i ++)


value[neuron,layer] += weight(i,neuron) * value[i,layer-1]

value[neuron,layer] = activationFunction(value[neuron,layer]);

As can be seen from the pseudocode, here too we have activation functions. They are used to
normalize the output of each neuron and the functions that are most commonly used in the
perceptron apply here too.So, we gradually propagate forward in the network until we reach the
output layer, and create some output values. Just like the perceptron these values are initially
completely random and have nothing to do with our goal values. But it is here that the back-
propagation learning algorithm kicks in.

 Back-Propagation:

The back propagation learning algorithm uses the delta-rule. What this does is that it computes
the deltas, (local gradients) of each neuron starting from the output neurons and going backwards
until it reaches the input layer. To compute the deltas of the output neurons though we first have
to get the error of each output neuron. That's pretty simple, since the multi-layer perceptron is a
supervised training network so the error is the difference between the network's output and the
desired output.
ej(n) = dj(n) - oj(n)
where e(n) is the error vector, d(n) is the desired output vector and o(n) is the actual output
vector. Now to compute the deltas:

deltaj(L)(n) = ej(L)(n) * f'(uj(L)(n)) , for neuron j in the output layer L


where f'(uj(l)(n)) is the derivative of the value of the jth neuron of layer L

deltaj(l)(n) = f'(uj(l)(n)) Σk(deltak(l+1)(n)*wkj(l+1)(n)), for neuron j in hidden layer l


where f'(uj(l)(n)) is the derivative of the value of the jth neuron in layer l and inside the Sum we
have the products of all the deltas of the neurons of the next layer multiplied by their
corresponding weights.

This part is a very impontant part of the delta rule and the whole essence of back propagation.
Why you might ask? Because just like wikipedia says a derivative is how much a function
changes as its input changes. By propagating the derivatives backwards , we are informing all the
neurons in the previous layers of the change that is needed in our weights to match the desired
output. And all that starts from the initial error calculation at the output layer. Just like magic!

 Weight adjustment

Having calculated the deltas for all the neurons we are now ready for one third and final pass of
the network, this time to adjust the weights according to the generalized delta rule:
wji(l)(n+1) = wji(l)(n) + α * [wji(l)(n) - wji(l)(n-1)] + η * deltaj(l)(n)yi(l-1)(n)
Do not be discouraged by lots of mathematical mumbo jumbo. It is actually quite simple. What
the above says is:
The new weights for layer l are calculated by adding two things to the current weights. The first
is the difference between the current weights and the previous weights multiplied by the
coefficient we symbolize with α. This coefficient is called the momentum coefficient, and true to
its name it adds speed to the training of any multi-layer perceptron by adding part of the already
occured weight changes to the current weight change. This is a double edged sword though since
if your momentum constant is too large the network will not converge and it will probably get
stuck in a local minima.

The other thing that adds to the weight change is the delta of the layer whose weights we change
(l) multiplied by the outputs of the neurons of the previous layer (l-1) and all that multiplied by
the constant η which we know to be the teaching step from the previous tutorial about the
perceptron. And that is basically it! That's what the multi layer perceptron is all about. It is no
doubt a very powerfull neural network and a very powerful tool in statistical analysis.

Practical example

It would not be a tutorial if we just explained how it works and gave you the equations. As was
already mentioned the Multi-layer perceptron has many applications. Satistical analysis, pattern
recognition, optical character recognition are just some of them. Our example will focus on just a
simple instance of optical character recognition. Specifically the final program will be able to use
an MLP to differentiate between a number of .bmp monochrome bitmap files and tell us which
number each image depicts.I used 8x8 pixels resolution for the images but it is up to the reader to
make his own resolutions and/or monochrome images since the program will read the size from
the bitmap itself. Below you can see an example of such bitmaps.

They are ugly, right? Differentiating between them should be hard for a computer? This ugliness
could be considered noice. And MLPs are really good at differentiating between noice and actual
data that help it reach a conclusion. But let's go on and see some code to understand how it is
done.

class MLP
{
private:
std::vector<float> inputNeurons;
std::vector<float>> hiddenNeurons;
std::vector<float> outputNeurons;
std::vector<float> weights;
FileReader* reader;

int inputN,outputN,hiddenN,hiddenL;

public:
MLP(int hiddenL,int hiddenN);
~MLP();

//assigns values to the input neurons


bool populateInput(int fileNum);
//calculates the whole network, from input to output
void calculateNetwork();
//trains the network according to our parameters
bool trainNetwork(float teachingStep,float lmse,float momentum,int
trainingFiles);

//recalls the network for a given bitmap file


void recallNetwork(int fileNum);

};

The above is our multi-layer perceptron class. As you can see it has vectors for all the neurons
and their connection weights. It also contains a FileReader object. As we will see below this
FileReader is a class we will make to read the bitmap files to populate our input. The functions
the MLP has are similar to the perceptron. It populates its input by reading the bitmap images,
calculates an output for the network and trains the network. Moreover you can recall the network
for a given 'fileNum' image to see what number the network thinks the image represents.

//Multi-layer perceptron constructor


MLP::MLP(int hL,int hN)
{
//initialize the filereader
reader = new FileReader();

outputN = 10; //the 9 possible numbers and zero


hiddenL = hL;
hiddenN = hN;

//initialize the filereader


reader = new FileReader();

//read the first image to see what kind of input will our net have
inputN = reader->getBitmapDimensions();
if(inputN == -1)
{
printf("There was an error detecting img0.bmp\n\r");
return ;
}

//let's allocate the memory for the weights


weights.reserve(inputN*hiddenN+(hiddenN*hiddenN*(hiddenL-1))
+hiddenN*outputN);

//also let's set the size for the neurons vector


inputNeurons.resize(inputN);
hiddenNeurons.resize(hiddenN*hiddenL);
outputNeurons.resize(outputN);

//randomize weights for inputs to 1st hidden layer


for(int i = 0; i < inputN*hiddenN; i++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]

//if there are more than 1 hidden layers, randomize their weights
for(int i=1; i < hiddenL; i++)
{
for(int j = 0; j < hiddenN*hiddenN; j++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]
}
}

//and finally randomize the weights for the output layer


for(int i = 0; i < hiddenN*outputN; i ++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]
}

The network takes the number of hidden neurons and hidden layers as parameters so it can know
how to initialize its neurons and weights vectors. Moreover it reads the first bitmap, 'img0.bmp'
to take the dimensions that all the images will have as can be seen from this line:
inputN = reader->getBitmapDimensions();
That is a requirement our tutorial's program will have. You are free to provide any bitmap size
you want for the first image 'img0.bmp' but you are required to have all the following images be
of the same size. As in most neural networks the weights are initialized in the range between [-
0.5,0.5].
void MLP::calculateNetwork()
{
//let's propagate towards the hidden layer
for(int hidden = 0; hidden < hiddenN; hidden++)
{
hiddenAt(1,hidden) = 0;

for(int input = 0 ; input < inputN; input ++)


{
hiddenAt(1,hidden) +=
inputNeurons.at(input)*inputToHidden(input,hidden);
}

//and finally pass it through the activation function


hiddenAt(1,hidden) = sigmoid(hiddenAt(1,hidden));
}

//now if we got more than one hidden layers


for(int i = 2; i <= hiddenL; i ++)
{

//for each one of these extra layers calculate their values


for(int j = 0; j < hiddenN; j++)//to
{
hiddenAt(i,j) = 0;

for(int k = 0; k < hiddenN; k++)//from


{
hiddenAt(i,j) += hiddenAt(i-1,k)*hiddenToHidden(i,k,j);
}

//and finally pass it through the activation function


hiddenAt(i,j) = sigmoid(hiddenAt(i,j));
}
}

int i;
//and now hidden to output
for(i =0; i < outputN; i ++)
{
outputNeurons.at(i) = 0;

for(int j = 0; j < hiddenN; j++)


{
outputNeurons.at(i) += hiddenAt(hiddenL,j) * hiddenToOutput(j,i);
}

//and finally pass it through the activation function


outputNeurons.at(i) = sigmoid( outputNeurons.at(i) );
}

}
The calculate network function just finds the output of the network that correspponds to the
currently given input. It just propagates the input signals through each layer until they reach the
output layer. Nothing really special with the above code, it is just an implementation of the
equations that were presented above. The neural network of our tutorial as we saw in the
constructor has 10 different ouput. Each of these output represent the possibility that the input
pattern is a certain number. So, output 1 being close to 1.0 would mean that the input pattern is
most certainly 1 and so on...

The training function is too big to just post it all in here, so I recommend you take a look at the
.zip with the source code to see it in full. We will just focus in the implementation of the back-
propagation algorithm.

for(int i = 0; i < outputN; i ++)


{
//let's get the delta of the output layer
//and the accumulated error
if(i != target)
{
outputDeltaAt(i) = (0.0 -
outputNeurons[i])*dersigmoid(outputNeurons[i]);
error += (0.0 - outputNeurons[i])*(0.0-outputNeurons[i]);
}
else
{
outputDeltaAt(i) = (1.0 -
outputNeurons[i])*dersigmoid(outputNeurons[i]);
error += (1.0 - outputNeurons[i])*(1.0-outputNeurons[i]);
}

//we start popagating backwards now, to get the error of each neuron
//in every layer

/let's get the delta of the last hidden layer first


for(int i = 0; i < hiddenN; i++)
{
hiddenDeltaAt(hiddenL,i) = 0;//zero the values from the previous
iteration

//add to the delta for each connection with an output neuron


for(int j = 0; j < outputN; j ++)
{
hiddenDeltaAt(hiddenL,i) += outputDeltaAt(j) * hiddenToOutput(i,j)
;
}

//The derivative here is only because of the


//delta rule weight adjustment about to follow
hiddenDeltaAt(hiddenL,i) *= dersigmoid(hiddenAt(hiddenL,i));
}
//now for each additional hidden layer, provided they exist
for(int i = hiddenL-1; i >0; i--)
{
//add to each neuron's hidden delta
for(int j = 0; j < hiddenN; j ++)//from
{

hiddenDeltaAt(i,j) = 0;//zero the values from the previous


iteration

for(int k = 0; k < hiddenN; k++)//to


{
//the previous hidden layers delta multiplied by the
weights
//for each neuron
hiddenDeltaAt(i,j) += hiddenDeltaAt(i+1,k) *
hiddenToHidden(i+1,j,k);
}

//The derivative here is only because of the


//delta rule weight adjustment about to follow
hiddenDeltaAt(i,j) *= dersigmoid(hiddenAt(i,j));
}
}

As you can see above this is the second pass over the network, the so called back-propagation as
we presented it above, since we are going backwards this time. Having calculated the output and
knowing the desired output (called target, in the above code) we start the delta calculation
according to the equations that we saw at the start of the tutorial. If you don't like math, then here
it is for you in code. As you can see many helper macros are used to differentiate between
weights of diffeent layers and deltas.

//Weights modification
tempWeights = weights;//keep the previous weights somewhere, we will need them

//hidden to Input weights


for(int i = 0; i < inputN; i ++)
{
for(int j = 0; j < hiddenN; j ++)
{
inputToHidden(i,j) += momentum*(inputToHidden(i,j) -
_prev_inputToHidden(i,j)) +
teachingStep* hiddenDeltaAt(1,j) * inputNeurons[i];
}
}

//hidden to hidden weights, provided more than 1 layer exists


for(int i = 2; i <=hiddenL; i++)
{
for(int j = 0; j < hiddenN; j ++)//from
{
for(int k =0; k < hiddenN; k ++)//to
{
hiddenToHidden(i,j,k) += momentum*(hiddenToHidden(i,j,k) -
_prev_hiddenToHidden(i,j,k)) +
teachingStep * hiddenDeltaAt(i,k)
* hiddenAt(i-1,j);
}
}
}

//last hidden layer to output weights


for(int i = 0; i < outputN; i++)
{
for(int j = 0; j < hiddenN; j ++)
{
hiddenToOutput(j,i) += momentum*(hiddenToOutput(j,i) -
_prev_hiddenToOutput(j,i)) +
teachingStep * outputDeltaAt(i) *
hiddenAt(hiddenL,j);
}
}

prWeights = tempWeights;

And finally this is the third and final pass over the network (for each image of course), which is a
forward propagation from the input layer to the output layer. Here we use the previously
calculated deltas to adjust the weights of the network, to make up for the error we found at the
initial calculation. This is just an implementation in code of the weight adjustment equations we
saw in the theoretical part of the tutorial.

We can see the teaching step at work here. Moreover the careful reader will have noticed that we
keep the previous weight vector values in a temporary vector. That is because of the momentum.
If you recall, we mentioned that the momentum adds a percentage of the already applied weight
change to each subsequent weight change, achieving faster training speeds. Hence the term
momentum.
Well that's actually all there is to know about the back-propagation algorithm training and the
Multi-layer perceptron. Let's take a look at the fileReader class.

class FileReader
{

private:
char* imgBuffer;
//a DWORD
char* check;
bool firstImageRead;
//the input filestream used to read
ifstream fs;

//image stuff
int width;
int height;

public:
FileReader();
~FileReader();
bool readBitmap(int fileNum);
//reads the first bitmap file, the one designated with a '0'
//and gets the dimensions. All other .bmp are assumed with
//equal and identical dimensions
int getBitmapDimensions();
//returns a pointer to integers with all the goals
//that each bitmap should have. Reads it from a file
int* getImgGoals();
//returns a pointer to the currently read data
char* getImgData();
//helper function convering bytes to an int
int bytesToInt(char* bytes,int number);

};

This is the fileReader, class. It contains the imgBuffer, which hold the data of the currently read
bitmap, the input file stream used to read the bitmaps and it also keeps the width and height of
the initializer image. Seeing how the functions are implemented is out of the scope of this
tutorial but you can check the code in the .zip file to see how it is done. What you need to know
is that this class will read the image designated as 'img0.bmp' and assume all the other images
will be monochrome bitmaps with the same dimensions and that all are located in the same path
as the executable.

On the right you can see how to save a monochrome bitmap as a .bmp file using MS windows
paint program. You can create your own bitmap images, and save them like that but just
remember use incrementing numbers to name the files and update goals.txt accordingly.
Moreover all images should have the same dimensions.

Assuming you have the image bitmaps AND the goals.txt file in the same directory as the
executable you can run the tutorial like you can see in the above image. It is using the cmd
command line in windows, but it should work fine in Linux too. You can see how it is called by
looking at the above image. If you call it incorrectly you will be prompted for correct calling.
Any time during training (in Windows) and in Linux each 1000 epochs (for now, it is in the
TODO list, to use the pdCurses library), you are able to stop and start recalling images. You are
just prompted for the image number, the one coming after 'img' in the filename and the network
recalls that image and tells you what it thinks that image represents. Afterwards as you can see
from the image above you also get some percentages to know how much the network thinks the
image match the numbers from 0 to 9.

Well this was it. I hope you enjoyed this tutorial and managed to comprehend the workings of
the multi-layer perceptron neural network. You can find the source code and the images I used to
train the network in the tutorial's source code. I used really small dimensions , 8x8 , just so it can
get trained fast. If you stick with the parameters I used above you are sure to converge. Since this
network has many outputs, some of which look alike the mean square error can not go really
low. That is since some numbers are almost the same, (especially the way I painted them).
Specifically 7 with 4 , and 0 with 8. Still as far as picking the best matching pattern the network
performs brilliantly. For least mean square error you can feel free to stop training when it goes
below 0.45 or so.

As always if you have any comments about the tutorial, constructive criticism or found any bugs
in the code please email me at:
lefteris *at* realintelligence *dot* net

Multi-layer Perceptron Tutorial


This tutorial continues from the last neural network tutorial the Perceptron tutorial. We will now
introduce the structure of the multi-layer perceptron and the back-propagation algorithm, without
doubt the most popular neural network structure to date. If you are in a hurry and just want to
mess with the code you can get it from here but I would recommend reading on to see how the
network functions.

Tutorial Prerequisities

 The reader should be familiar with the perceptron neural network


 The reader should have a basic understanding of C/C++
 The reader should know how to compile and run a program from the command line in
Windows or Linux

Tutorial Goals

 The reader will understand the structure of the Multi-layer Perceptron neural network
 The reader will understand the back-propagation algorithm
 The reader will know about the wide array of applications this network is used in
 The reader will learn all the above via an actual practical application in optical character
recognition

Tutorial Body

This network was introduced around 1986 with the advent of the back-propagation algorithm.
Until then there was no rule via which we could train neural networks with more than one layer.
As the name implies, a Multi-layer Perceptron is just that, a network that is comprised of many
neurons, divided in layers. These layers are divided as follows:

 The input layer, where the input of the network goes. The number of neurons here depends
on the number of inputs we want our network to get
 One or more hidden layers. These layers come between the input and the output and their
number can vary. The function that the hidden layer serves is to encode the input and map it to
the output. It has been proven that a multi-layer perceptron with only one hidden layer can
approximate any function that connects its input with its outputs if such a function exists.
 The output layer, where the outcome of the network can be seen. The number of neurons here
depends on the problem we want the neural net to learn

The Multi-layer perceptron differs from the simple perceptron in many ways. The same part is
that of weight randomization. All weights are given random values between a certain range,
usually [-0.5,0.5]. Having that aside though, for each pattern that is fed to the network three
passes over the net are made. Let's see them one by one in detail.

 Calculate the output:

In this phase we calculate the output of the network. For each layer, we calculate the firing value
of each neuron by getting the sum of the products of the multiplications of all the neurons
connected to said neuron from the previous layer and their corresponding weights. That sounded
a little big though so here it is in pseudocode:

for(int i = 0; i < previousLayerNeurons; i ++)


value[neuron,layer] += weight(i,neuron) * value[i,layer-1]

value[neuron,layer] = activationFunction(value[neuron,layer]);

As can be seen from the pseudocode, here too we have activation functions. They are used to
normalize the output of each neuron and the functions that are most commonly used in the
perceptron apply here too.So, we gradually propagate forward in the network until we reach the
output layer, and create some output values. Just like the perceptron these values are initially
completely random and have nothing to do with our goal values. But it is here that the back-
propagation learning algorithm kicks in.

 Back-Propagation:

The back propagation learning algorithm uses the delta-rule. What this does is that it computes
the deltas, (local gradients) of each neuron starting from the output neurons and going backwards
until it reaches the input layer. To compute the deltas of the output neurons though we first have
to get the error of each output neuron. That's pretty simple, since the multi-layer perceptron is a
supervised training network so the error is the difference between the network's output and the
desired output.
ej(n) = dj(n) - oj(n)
where e(n) is the error vector, d(n) is the desired output vector and o(n) is the actual output
vector. Now to compute the deltas:

deltaj(L)(n) = ej(L)(n) * f'(uj(L)(n)) , for neuron j in the output layer L


where f'(uj(l)(n)) is the derivative of the value of the jth neuron of layer L

deltaj(l)(n) = f'(uj(l)(n)) Σk(deltak(l+1)(n)*wkj(l+1)(n)), for neuron j in hidden layer l


where f'(uj(l)(n)) is the derivative of the value of the jth neuron in layer l and inside the Sum we
have the products of all the deltas of the neurons of the next layer multiplied by their
corresponding weights.

This part is a very impontant part of the delta rule and the whole essence of back propagation.
Why you might ask? Because just like wikipedia says a derivative is how much a function
changes as its input changes. By propagating the derivatives backwards , we are informing all the
neurons in the previous layers of the change that is needed in our weights to match the desired
output. And all that starts from the initial error calculation at the output layer. Just like magic!

 Weight adjustment

Having calculated the deltas for all the neurons we are now ready for one third and final pass of
the network, this time to adjust the weights according to the generalized delta rule:
wji(l)(n+1) = wji(l)(n) + α * [wji(l)(n) - wji(l)(n-1)] + η * deltaj(l)(n)yi(l-1)(n)
Do not be discouraged by lots of mathematical mumbo jumbo. It is actually quite simple. What
the above says is:
The new weights for layer l are calculated by adding two things to the current weights. The first
is the difference between the current weights and the previous weights multiplied by the
coefficient we symbolize with α. This coefficient is called the momentum coefficient, and true to
its name it adds speed to the training of any multi-layer perceptron by adding part of the already
occured weight changes to the current weight change. This is a double edged sword though since
if your momentum constant is too large the network will not converge and it will probably get
stuck in a local minima.

The other thing that adds to the weight change is the delta of the layer whose weights we change
(l) multiplied by the outputs of the neurons of the previous layer (l-1) and all that multiplied by
the constant η which we know to be the teaching step from the previous tutorial about the
perceptron. And that is basically it! That's what the multi layer perceptron is all about. It is no
doubt a very powerfull neural network and a very powerful tool in statistical analysis.

Practical example

It would not be a tutorial if we just explained how it works and gave you the equations. As was
already mentioned the Multi-layer perceptron has many applications. Satistical analysis, pattern
recognition, optical character recognition are just some of them. Our example will focus on just a
simple instance of optical character recognition. Specifically the final program will be able to use
an MLP to differentiate between a number of .bmp monochrome bitmap files and tell us which
number each image depicts.I used 8x8 pixels resolution for the images but it is up to the reader to
make his own resolutions and/or monochrome images since the program will read the size from
the bitmap itself. Below you can see an example of such bitmaps.

They are ugly, right? Differentiating between them should be hard for a computer? This ugliness
could be considered noice. And MLPs are really good at differentiating between noice and actual
data that help it reach a conclusion. But let's go on and see some code to understand how it is
done.

class MLP
{
private:
std::vector<float> inputNeurons;
std::vector<float>> hiddenNeurons;
std::vector<float> outputNeurons;
std::vector<float> weights;
FileReader* reader;

int inputN,outputN,hiddenN,hiddenL;

public:
MLP(int hiddenL,int hiddenN);
~MLP();

//assigns values to the input neurons


bool populateInput(int fileNum);
//calculates the whole network, from input to output
void calculateNetwork();
//trains the network according to our parameters
bool trainNetwork(float teachingStep,float lmse,float momentum,int
trainingFiles);

//recalls the network for a given bitmap file


void recallNetwork(int fileNum);

};

The above is our multi-layer perceptron class. As you can see it has vectors for all the neurons
and their connection weights. It also contains a FileReader object. As we will see below this
FileReader is a class we will make to read the bitmap files to populate our input. The functions
the MLP has are similar to the perceptron. It populates its input by reading the bitmap images,
calculates an output for the network and trains the network. Moreover you can recall the network
for a given 'fileNum' image to see what number the network thinks the image represents.

//Multi-layer perceptron constructor


MLP::MLP(int hL,int hN)
{
//initialize the filereader
reader = new FileReader();

outputN = 10; //the 9 possible numbers and zero


hiddenL = hL;
hiddenN = hN;

//initialize the filereader


reader = new FileReader();

//read the first image to see what kind of input will our net have
inputN = reader->getBitmapDimensions();
if(inputN == -1)
{
printf("There was an error detecting img0.bmp\n\r");
return ;
}

//let's allocate the memory for the weights


weights.reserve(inputN*hiddenN+(hiddenN*hiddenN*(hiddenL-1))
+hiddenN*outputN);

//also let's set the size for the neurons vector


inputNeurons.resize(inputN);
hiddenNeurons.resize(hiddenN*hiddenL);
outputNeurons.resize(outputN);

//randomize weights for inputs to 1st hidden layer


for(int i = 0; i < inputN*hiddenN; i++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]

//if there are more than 1 hidden layers, randomize their weights
for(int i=1; i < hiddenL; i++)
{
for(int j = 0; j < hiddenN*hiddenN; j++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]
}
}

//and finally randomize the weights for the output layer


for(int i = 0; i < hiddenN*outputN; i ++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]
}

The network takes the number of hidden neurons and hidden layers as parameters so it can know
how to initialize its neurons and weights vectors. Moreover it reads the first bitmap, 'img0.bmp'
to take the dimensions that all the images will have as can be seen from this line:
inputN = reader->getBitmapDimensions();
That is a requirement our tutorial's program will have. You are free to provide any bitmap size
you want for the first image 'img0.bmp' but you are required to have all the following images be
of the same size. As in most neural networks the weights are initialized in the range between [-
0.5,0.5].
void MLP::calculateNetwork()
{
//let's propagate towards the hidden layer
for(int hidden = 0; hidden < hiddenN; hidden++)
{
hiddenAt(1,hidden) = 0;

for(int input = 0 ; input < inputN; input ++)


{
hiddenAt(1,hidden) +=
inputNeurons.at(input)*inputToHidden(input,hidden);
}

//and finally pass it through the activation function


hiddenAt(1,hidden) = sigmoid(hiddenAt(1,hidden));
}

//now if we got more than one hidden layers


for(int i = 2; i <= hiddenL; i ++)
{

//for each one of these extra layers calculate their values


for(int j = 0; j < hiddenN; j++)//to
{
hiddenAt(i,j) = 0;

for(int k = 0; k < hiddenN; k++)//from


{
hiddenAt(i,j) += hiddenAt(i-1,k)*hiddenToHidden(i,k,j);
}

//and finally pass it through the activation function


hiddenAt(i,j) = sigmoid(hiddenAt(i,j));
}
}

int i;
//and now hidden to output
for(i =0; i < outputN; i ++)
{
outputNeurons.at(i) = 0;

for(int j = 0; j < hiddenN; j++)


{
outputNeurons.at(i) += hiddenAt(hiddenL,j) * hiddenToOutput(j,i);
}

//and finally pass it through the activation function


outputNeurons.at(i) = sigmoid( outputNeurons.at(i) );
}

}
The calculate network function just finds the output of the network that correspponds to the
currently given input. It just propagates the input signals through each layer until they reach the
output layer. Nothing really special with the above code, it is just an implementation of the
equations that were presented above. The neural network of our tutorial as we saw in the
constructor has 10 different ouput. Each of these output represent the possibility that the input
pattern is a certain number. So, output 1 being close to 1.0 would mean that the input pattern is
most certainly 1 and so on...

The training function is too big to just post it all in here, so I recommend you take a look at the
.zip with the source code to see it in full. We will just focus in the implementation of the back-
propagation algorithm.

for(int i = 0; i < outputN; i ++)


{
//let's get the delta of the output layer
//and the accumulated error
if(i != target)
{
outputDeltaAt(i) = (0.0 -
outputNeurons[i])*dersigmoid(outputNeurons[i]);
error += (0.0 - outputNeurons[i])*(0.0-outputNeurons[i]);
}
else
{
outputDeltaAt(i) = (1.0 -
outputNeurons[i])*dersigmoid(outputNeurons[i]);
error += (1.0 - outputNeurons[i])*(1.0-outputNeurons[i]);
}

//we start popagating backwards now, to get the error of each neuron
//in every layer

/let's get the delta of the last hidden layer first


for(int i = 0; i < hiddenN; i++)
{
hiddenDeltaAt(hiddenL,i) = 0;//zero the values from the previous
iteration

//add to the delta for each connection with an output neuron


for(int j = 0; j < outputN; j ++)
{
hiddenDeltaAt(hiddenL,i) += outputDeltaAt(j) * hiddenToOutput(i,j)
;
}

//The derivative here is only because of the


//delta rule weight adjustment about to follow
hiddenDeltaAt(hiddenL,i) *= dersigmoid(hiddenAt(hiddenL,i));
}
//now for each additional hidden layer, provided they exist
for(int i = hiddenL-1; i >0; i--)
{
//add to each neuron's hidden delta
for(int j = 0; j < hiddenN; j ++)//from
{

hiddenDeltaAt(i,j) = 0;//zero the values from the previous


iteration

for(int k = 0; k < hiddenN; k++)//to


{
//the previous hidden layers delta multiplied by the
weights
//for each neuron
hiddenDeltaAt(i,j) += hiddenDeltaAt(i+1,k) *
hiddenToHidden(i+1,j,k);
}

//The derivative here is only because of the


//delta rule weight adjustment about to follow
hiddenDeltaAt(i,j) *= dersigmoid(hiddenAt(i,j));
}
}

As you can see above this is the second pass over the network, the so called back-propagation as
we presented it above, since we are going backwards this time. Having calculated the output and
knowing the desired output (called target, in the above code) we start the delta calculation
according to the equations that we saw at the start of the tutorial. If you don't like math, then here
it is for you in code. As you can see many helper macros are used to differentiate between
weights of diffeent layers and deltas.

//Weights modification
tempWeights = weights;//keep the previous weights somewhere, we will need them

//hidden to Input weights


for(int i = 0; i < inputN; i ++)
{
for(int j = 0; j < hiddenN; j ++)
{
inputToHidden(i,j) += momentum*(inputToHidden(i,j) -
_prev_inputToHidden(i,j)) +
teachingStep* hiddenDeltaAt(1,j) * inputNeurons[i];
}
}

//hidden to hidden weights, provided more than 1 layer exists


for(int i = 2; i <=hiddenL; i++)
{
for(int j = 0; j < hiddenN; j ++)//from
{
for(int k =0; k < hiddenN; k ++)//to
{
hiddenToHidden(i,j,k) += momentum*(hiddenToHidden(i,j,k) -
_prev_hiddenToHidden(i,j,k)) +
teachingStep * hiddenDeltaAt(i,k)
* hiddenAt(i-1,j);
}
}
}

//last hidden layer to output weights


for(int i = 0; i < outputN; i++)
{
for(int j = 0; j < hiddenN; j ++)
{
hiddenToOutput(j,i) += momentum*(hiddenToOutput(j,i) -
_prev_hiddenToOutput(j,i)) +
teachingStep * outputDeltaAt(i) *
hiddenAt(hiddenL,j);
}
}

prWeights = tempWeights;

And finally this is the third and final pass over the network (for each image of course), which is a
forward propagation from the input layer to the output layer. Here we use the previously
calculated deltas to adjust the weights of the network, to make up for the error we found at the
initial calculation. This is just an implementation in code of the weight adjustment equations we
saw in the theoretical part of the tutorial.

We can see the teaching step at work here. Moreover the careful reader will have noticed that we
keep the previous weight vector values in a temporary vector. That is because of the momentum.
If you recall, we mentioned that the momentum adds a percentage of the already applied weight
change to each subsequent weight change, achieving faster training speeds. Hence the term
momentum.
Well that's actually all there is to know about the back-propagation algorithm training and the
Multi-layer perceptron. Let's take a look at the fileReader class.

class FileReader
{

private:
char* imgBuffer;
//a DWORD
char* check;
bool firstImageRead;
//the input filestream used to read
ifstream fs;

//image stuff
int width;
int height;

public:
FileReader();
~FileReader();
bool readBitmap(int fileNum);
//reads the first bitmap file, the one designated with a '0'
//and gets the dimensions. All other .bmp are assumed with
//equal and identical dimensions
int getBitmapDimensions();
//returns a pointer to integers with all the goals
//that each bitmap should have. Reads it from a file
int* getImgGoals();
//returns a pointer to the currently read data
char* getImgData();
//helper function convering bytes to an int
int bytesToInt(char* bytes,int number);

};

This is the fileReader, class. It contains the imgBuffer, which hold the data of the currently read
bitmap, the input file stream used to read the bitmaps and it also keeps the width and height of
the initializer image. Seeing how the functions are implemented is out of the scope of this
tutorial but you can check the code in the .zip file to see how it is done. What you need to know
is that this class will read the image designated as 'img0.bmp' and assume all the other images
will be monochrome bitmaps with the same dimensions and that all are located in the same path
as the executable.

On the right you can see how to save a monochrome bitmap as a .bmp file using MS windows
paint program. You can create your own bitmap images, and save them like that but just
remember use incrementing numbers to name the files and update goals.txt accordingly.
Moreover all images should have the same dimensions.

Assuming you have the image bitmaps AND the goals.txt file in the same directory as the
executable you can run the tutorial like you can see in the above image. It is using the cmd
command line in windows, but it should work fine in Linux too. You can see how it is called by
looking at the above image. If you call it incorrectly you will be prompted for correct calling.
Any time during training (in Windows) and in Linux each 1000 epochs (for now, it is in the
TODO list, to use the pdCurses library), you are able to stop and start recalling images. You are
just prompted for the image number, the one coming after 'img' in the filename and the network
recalls that image and tells you what it thinks that image represents. Afterwards as you can see
from the image above you also get some percentages to know how much the network thinks the
image match the numbers from 0 to 9.

Well this was it. I hope you enjoyed this tutorial and managed to comprehend the workings of
the multi-layer perceptron neural network. You can find the source code and the images I used to
train the network in the tutorial's source code. I used really small dimensions , 8x8 , just so it can
get trained fast. If you stick with the parameters I used above you are sure to converge. Since this
network has many outputs, some of which look alike the mean square error can not go really
low. That is since some numbers are almost the same, (especially the way I painted them).
Specifically 7 with 4 , and 0 with 8. Still as far as picking the best matching pattern the network
performs brilliantly. For least mean square error you can feel free to stop training when it goes
below 0.45 or so.

As always if you have any comments about the tutorial, constructive criticism or found any bugs
in the code please email me at:
lefteris *at* realintelligence *dot* net

Multi-layer Perceptron Tutorial


This tutorial continues from the last neural network tutorial the Perceptron tutorial. We will now
introduce the structure of the multi-layer perceptron and the back-propagation algorithm, without
doubt the most popular neural network structure to date. If you are in a hurry and just want to
mess with the code you can get it from here but I would recommend reading on to see how the
network functions.

Tutorial Prerequisities

 The reader should be familiar with the perceptron neural network


 The reader should have a basic understanding of C/C++
 The reader should know how to compile and run a program from the command line in
Windows or Linux

Tutorial Goals

 The reader will understand the structure of the Multi-layer Perceptron neural network
 The reader will understand the back-propagation algorithm
 The reader will know about the wide array of applications this network is used in
 The reader will learn all the above via an actual practical application in optical character
recognition

Tutorial Body

This network was introduced around 1986 with the advent of the back-propagation algorithm.
Until then there was no rule via which we could train neural networks with more than one layer.
As the name implies, a Multi-layer Perceptron is just that, a network that is comprised of many
neurons, divided in layers. These layers are divided as follows:

 The input layer, where the input of the network goes. The number of neurons here depends
on the number of inputs we want our network to get
 One or more hidden layers. These layers come between the input and the output and their
number can vary. The function that the hidden layer serves is to encode the input and map it to
the output. It has been proven that a multi-layer perceptron with only one hidden layer can
approximate any function that connects its input with its outputs if such a function exists.
 The output layer, where the outcome of the network can be seen. The number of neurons here
depends on the problem we want the neural net to learn

The Multi-layer perceptron differs from the simple perceptron in many ways. The same part is
that of weight randomization. All weights are given random values between a certain range,
usually [-0.5,0.5]. Having that aside though, for each pattern that is fed to the network three
passes over the net are made. Let's see them one by one in detail.

 Calculate the output:

In this phase we calculate the output of the network. For each layer, we calculate the firing value
of each neuron by getting the sum of the products of the multiplications of all the neurons
connected to said neuron from the previous layer and their corresponding weights. That sounded
a little big though so here it is in pseudocode:

for(int i = 0; i < previousLayerNeurons; i ++)


value[neuron,layer] += weight(i,neuron) * value[i,layer-1]

value[neuron,layer] = activationFunction(value[neuron,layer]);

As can be seen from the pseudocode, here too we have activation functions. They are used to
normalize the output of each neuron and the functions that are most commonly used in the
perceptron apply here too.So, we gradually propagate forward in the network until we reach the
output layer, and create some output values. Just like the perceptron these values are initially
completely random and have nothing to do with our goal values. But it is here that the back-
propagation learning algorithm kicks in.

 Back-Propagation:

The back propagation learning algorithm uses the delta-rule. What this does is that it computes
the deltas, (local gradients) of each neuron starting from the output neurons and going backwards
until it reaches the input layer. To compute the deltas of the output neurons though we first have
to get the error of each output neuron. That's pretty simple, since the multi-layer perceptron is a
supervised training network so the error is the difference between the network's output and the
desired output.
ej(n) = dj(n) - oj(n)
where e(n) is the error vector, d(n) is the desired output vector and o(n) is the actual output
vector. Now to compute the deltas:

deltaj(L)(n) = ej(L)(n) * f'(uj(L)(n)) , for neuron j in the output layer L


where f'(uj(l)(n)) is the derivative of the value of the jth neuron of layer L

deltaj(l)(n) = f'(uj(l)(n)) Σk(deltak(l+1)(n)*wkj(l+1)(n)), for neuron j in hidden layer l


where f'(uj(l)(n)) is the derivative of the value of the jth neuron in layer l and inside the Sum we
have the products of all the deltas of the neurons of the next layer multiplied by their
corresponding weights.

This part is a very impontant part of the delta rule and the whole essence of back propagation.
Why you might ask? Because just like wikipedia says a derivative is how much a function
changes as its input changes. By propagating the derivatives backwards , we are informing all the
neurons in the previous layers of the change that is needed in our weights to match the desired
output. And all that starts from the initial error calculation at the output layer. Just like magic!

 Weight adjustment

Having calculated the deltas for all the neurons we are now ready for one third and final pass of
the network, this time to adjust the weights according to the generalized delta rule:
wji(l)(n+1) = wji(l)(n) + α * [wji(l)(n) - wji(l)(n-1)] + η * deltaj(l)(n)yi(l-1)(n)
Do not be discouraged by lots of mathematical mumbo jumbo. It is actually quite simple. What
the above says is:
The new weights for layer l are calculated by adding two things to the current weights. The first
is the difference between the current weights and the previous weights multiplied by the
coefficient we symbolize with α. This coefficient is called the momentum coefficient, and true to
its name it adds speed to the training of any multi-layer perceptron by adding part of the already
occured weight changes to the current weight change. This is a double edged sword though since
if your momentum constant is too large the network will not converge and it will probably get
stuck in a local minima.

The other thing that adds to the weight change is the delta of the layer whose weights we change
(l) multiplied by the outputs of the neurons of the previous layer (l-1) and all that multiplied by
the constant η which we know to be the teaching step from the previous tutorial about the
perceptron. And that is basically it! That's what the multi layer perceptron is all about. It is no
doubt a very powerfull neural network and a very powerful tool in statistical analysis.

Practical example

It would not be a tutorial if we just explained how it works and gave you the equations. As was
already mentioned the Multi-layer perceptron has many applications. Satistical analysis, pattern
recognition, optical character recognition are just some of them. Our example will focus on just a
simple instance of optical character recognition. Specifically the final program will be able to use
an MLP to differentiate between a number of .bmp monochrome bitmap files and tell us which
number each image depicts.I used 8x8 pixels resolution for the images but it is up to the reader to
make his own resolutions and/or monochrome images since the program will read the size from
the bitmap itself. Below you can see an example of such bitmaps.

They are ugly, right? Differentiating between them should be hard for a computer? This ugliness
could be considered noice. And MLPs are really good at differentiating between noice and actual
data that help it reach a conclusion. But let's go on and see some code to understand how it is
done.

class MLP
{
private:
std::vector<float> inputNeurons;
std::vector<float>> hiddenNeurons;
std::vector<float> outputNeurons;
std::vector<float> weights;
FileReader* reader;

int inputN,outputN,hiddenN,hiddenL;

public:
MLP(int hiddenL,int hiddenN);
~MLP();

//assigns values to the input neurons


bool populateInput(int fileNum);
//calculates the whole network, from input to output
void calculateNetwork();
//trains the network according to our parameters
bool trainNetwork(float teachingStep,float lmse,float momentum,int
trainingFiles);

//recalls the network for a given bitmap file


void recallNetwork(int fileNum);

};

The above is our multi-layer perceptron class. As you can see it has vectors for all the neurons
and their connection weights. It also contains a FileReader object. As we will see below this
FileReader is a class we will make to read the bitmap files to populate our input. The functions
the MLP has are similar to the perceptron. It populates its input by reading the bitmap images,
calculates an output for the network and trains the network. Moreover you can recall the network
for a given 'fileNum' image to see what number the network thinks the image represents.

//Multi-layer perceptron constructor


MLP::MLP(int hL,int hN)
{
//initialize the filereader
reader = new FileReader();

outputN = 10; //the 9 possible numbers and zero


hiddenL = hL;
hiddenN = hN;

//initialize the filereader


reader = new FileReader();

//read the first image to see what kind of input will our net have
inputN = reader->getBitmapDimensions();
if(inputN == -1)
{
printf("There was an error detecting img0.bmp\n\r");
return ;
}

//let's allocate the memory for the weights


weights.reserve(inputN*hiddenN+(hiddenN*hiddenN*(hiddenL-1))
+hiddenN*outputN);

//also let's set the size for the neurons vector


inputNeurons.resize(inputN);
hiddenNeurons.resize(hiddenN*hiddenL);
outputNeurons.resize(outputN);

//randomize weights for inputs to 1st hidden layer


for(int i = 0; i < inputN*hiddenN; i++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]

//if there are more than 1 hidden layers, randomize their weights
for(int i=1; i < hiddenL; i++)
{
for(int j = 0; j < hiddenN*hiddenN; j++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]
}
}

//and finally randomize the weights for the output layer


for(int i = 0; i < hiddenN*outputN; i ++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]
}

The network takes the number of hidden neurons and hidden layers as parameters so it can know
how to initialize its neurons and weights vectors. Moreover it reads the first bitmap, 'img0.bmp'
to take the dimensions that all the images will have as can be seen from this line:
inputN = reader->getBitmapDimensions();
That is a requirement our tutorial's program will have. You are free to provide any bitmap size
you want for the first image 'img0.bmp' but you are required to have all the following images be
of the same size. As in most neural networks the weights are initialized in the range between [-
0.5,0.5].
void MLP::calculateNetwork()
{
//let's propagate towards the hidden layer
for(int hidden = 0; hidden < hiddenN; hidden++)
{
hiddenAt(1,hidden) = 0;

for(int input = 0 ; input < inputN; input ++)


{
hiddenAt(1,hidden) +=
inputNeurons.at(input)*inputToHidden(input,hidden);
}

//and finally pass it through the activation function


hiddenAt(1,hidden) = sigmoid(hiddenAt(1,hidden));
}

//now if we got more than one hidden layers


for(int i = 2; i <= hiddenL; i ++)
{

//for each one of these extra layers calculate their values


for(int j = 0; j < hiddenN; j++)//to
{
hiddenAt(i,j) = 0;

for(int k = 0; k < hiddenN; k++)//from


{
hiddenAt(i,j) += hiddenAt(i-1,k)*hiddenToHidden(i,k,j);
}

//and finally pass it through the activation function


hiddenAt(i,j) = sigmoid(hiddenAt(i,j));
}
}

int i;
//and now hidden to output
for(i =0; i < outputN; i ++)
{
outputNeurons.at(i) = 0;

for(int j = 0; j < hiddenN; j++)


{
outputNeurons.at(i) += hiddenAt(hiddenL,j) * hiddenToOutput(j,i);
}

//and finally pass it through the activation function


outputNeurons.at(i) = sigmoid( outputNeurons.at(i) );
}

}
The calculate network function just finds the output of the network that correspponds to the
currently given input. It just propagates the input signals through each layer until they reach the
output layer. Nothing really special with the above code, it is just an implementation of the
equations that were presented above. The neural network of our tutorial as we saw in the
constructor has 10 different ouput. Each of these output represent the possibility that the input
pattern is a certain number. So, output 1 being close to 1.0 would mean that the input pattern is
most certainly 1 and so on...

The training function is too big to just post it all in here, so I recommend you take a look at the
.zip with the source code to see it in full. We will just focus in the implementation of the back-
propagation algorithm.

for(int i = 0; i < outputN; i ++)


{
//let's get the delta of the output layer
//and the accumulated error
if(i != target)
{
outputDeltaAt(i) = (0.0 -
outputNeurons[i])*dersigmoid(outputNeurons[i]);
error += (0.0 - outputNeurons[i])*(0.0-outputNeurons[i]);
}
else
{
outputDeltaAt(i) = (1.0 -
outputNeurons[i])*dersigmoid(outputNeurons[i]);
error += (1.0 - outputNeurons[i])*(1.0-outputNeurons[i]);
}

//we start popagating backwards now, to get the error of each neuron
//in every layer

/let's get the delta of the last hidden layer first


for(int i = 0; i < hiddenN; i++)
{
hiddenDeltaAt(hiddenL,i) = 0;//zero the values from the previous
iteration

//add to the delta for each connection with an output neuron


for(int j = 0; j < outputN; j ++)
{
hiddenDeltaAt(hiddenL,i) += outputDeltaAt(j) * hiddenToOutput(i,j)
;
}

//The derivative here is only because of the


//delta rule weight adjustment about to follow
hiddenDeltaAt(hiddenL,i) *= dersigmoid(hiddenAt(hiddenL,i));
}
//now for each additional hidden layer, provided they exist
for(int i = hiddenL-1; i >0; i--)
{
//add to each neuron's hidden delta
for(int j = 0; j < hiddenN; j ++)//from
{

hiddenDeltaAt(i,j) = 0;//zero the values from the previous


iteration

for(int k = 0; k < hiddenN; k++)//to


{
//the previous hidden layers delta multiplied by the
weights
//for each neuron
hiddenDeltaAt(i,j) += hiddenDeltaAt(i+1,k) *
hiddenToHidden(i+1,j,k);
}

//The derivative here is only because of the


//delta rule weight adjustment about to follow
hiddenDeltaAt(i,j) *= dersigmoid(hiddenAt(i,j));
}
}

As you can see above this is the second pass over the network, the so called back-propagation as
we presented it above, since we are going backwards this time. Having calculated the output and
knowing the desired output (called target, in the above code) we start the delta calculation
according to the equations that we saw at the start of the tutorial. If you don't like math, then here
it is for you in code. As you can see many helper macros are used to differentiate between
weights of diffeent layers and deltas.

//Weights modification
tempWeights = weights;//keep the previous weights somewhere, we will need them

//hidden to Input weights


for(int i = 0; i < inputN; i ++)
{
for(int j = 0; j < hiddenN; j ++)
{
inputToHidden(i,j) += momentum*(inputToHidden(i,j) -
_prev_inputToHidden(i,j)) +
teachingStep* hiddenDeltaAt(1,j) * inputNeurons[i];
}
}

//hidden to hidden weights, provided more than 1 layer exists


for(int i = 2; i <=hiddenL; i++)
{
for(int j = 0; j < hiddenN; j ++)//from
{
for(int k =0; k < hiddenN; k ++)//to
{
hiddenToHidden(i,j,k) += momentum*(hiddenToHidden(i,j,k) -
_prev_hiddenToHidden(i,j,k)) +
teachingStep * hiddenDeltaAt(i,k)
* hiddenAt(i-1,j);
}
}
}

//last hidden layer to output weights


for(int i = 0; i < outputN; i++)
{
for(int j = 0; j < hiddenN; j ++)
{
hiddenToOutput(j,i) += momentum*(hiddenToOutput(j,i) -
_prev_hiddenToOutput(j,i)) +
teachingStep * outputDeltaAt(i) *
hiddenAt(hiddenL,j);
}
}

prWeights = tempWeights;

And finally this is the third and final pass over the network (for each image of course), which is a
forward propagation from the input layer to the output layer. Here we use the previously
calculated deltas to adjust the weights of the network, to make up for the error we found at the
initial calculation. This is just an implementation in code of the weight adjustment equations we
saw in the theoretical part of the tutorial.

We can see the teaching step at work here. Moreover the careful reader will have noticed that we
keep the previous weight vector values in a temporary vector. That is because of the momentum.
If you recall, we mentioned that the momentum adds a percentage of the already applied weight
change to each subsequent weight change, achieving faster training speeds. Hence the term
momentum.
Well that's actually all there is to know about the back-propagation algorithm training and the
Multi-layer perceptron. Let's take a look at the fileReader class.

class FileReader
{

private:
char* imgBuffer;
//a DWORD
char* check;
bool firstImageRead;
//the input filestream used to read
ifstream fs;

//image stuff
int width;
int height;

public:
FileReader();
~FileReader();
bool readBitmap(int fileNum);
//reads the first bitmap file, the one designated with a '0'
//and gets the dimensions. All other .bmp are assumed with
//equal and identical dimensions
int getBitmapDimensions();
//returns a pointer to integers with all the goals
//that each bitmap should have. Reads it from a file
int* getImgGoals();
//returns a pointer to the currently read data
char* getImgData();
//helper function convering bytes to an int
int bytesToInt(char* bytes,int number);

};

This is the fileReader, class. It contains the imgBuffer, which hold the data of the currently read
bitmap, the input file stream used to read the bitmaps and it also keeps the width and height of
the initializer image. Seeing how the functions are implemented is out of the scope of this
tutorial but you can check the code in the .zip file to see how it is done. What you need to know
is that this class will read the image designated as 'img0.bmp' and assume all the other images
will be monochrome bitmaps with the same dimensions and that all are located in the same path
as the executable.

On the right you can see how to save a monochrome bitmap as a .bmp file using MS windows
paint program. You can create your own bitmap images, and save them like that but just
remember use incrementing numbers to name the files and update goals.txt accordingly.
Moreover all images should have the same dimensions.

Assuming you have the image bitmaps AND the goals.txt file in the same directory as the
executable you can run the tutorial like you can see in the above image. It is using the cmd
command line in windows, but it should work fine in Linux too. You can see how it is called by
looking at the above image. If you call it incorrectly you will be prompted for correct calling.
Any time during training (in Windows) and in Linux each 1000 epochs (for now, it is in the
TODO list, to use the pdCurses library), you are able to stop and start recalling images. You are
just prompted for the image number, the one coming after 'img' in the filename and the network
recalls that image and tells you what it thinks that image represents. Afterwards as you can see
from the image above you also get some percentages to know how much the network thinks the
image match the numbers from 0 to 9.

Well this was it. I hope you enjoyed this tutorial and managed to comprehend the workings of
the multi-layer perceptron neural network. You can find the source code and the images I used to
train the network in the tutorial's source code. I used really small dimensions , 8x8 , just so it can
get trained fast. If you stick with the parameters I used above you are sure to converge. Since this
network has many outputs, some of which look alike the mean square error can not go really
low. That is since some numbers are almost the same, (especially the way I painted them).
Specifically 7 with 4 , and 0 with 8. Still as far as picking the best matching pattern the network
performs brilliantly. For least mean square error you can feel free to stop training when it goes
below 0.45 or so.

As always if you have any comments about the tutorial, constructive criticism or found any bugs
in the code please email me at:
lefteris *at* realintelligence *dot* net

Multi-layer Perceptron Tutorial


This tutorial continues from the last neural network tutorial the Perceptron tutorial. We will now
introduce the structure of the multi-layer perceptron and the back-propagation algorithm, without
doubt the most popular neural network structure to date. If you are in a hurry and just want to
mess with the code you can get it from here but I would recommend reading on to see how the
network functions.

Tutorial Prerequisities

 The reader should be familiar with the perceptron neural network


 The reader should have a basic understanding of C/C++
 The reader should know how to compile and run a program from the command line in
Windows or Linux

Tutorial Goals

 The reader will understand the structure of the Multi-layer Perceptron neural network
 The reader will understand the back-propagation algorithm
 The reader will know about the wide array of applications this network is used in
 The reader will learn all the above via an actual practical application in optical character
recognition

Tutorial Body

This network was introduced around 1986 with the advent of the back-propagation algorithm.
Until then there was no rule via which we could train neural networks with more than one layer.
As the name implies, a Multi-layer Perceptron is just that, a network that is comprised of many
neurons, divided in layers. These layers are divided as follows:

 The input layer, where the input of the network goes. The number of neurons here depends
on the number of inputs we want our network to get
 One or more hidden layers. These layers come between the input and the output and their
number can vary. The function that the hidden layer serves is to encode the input and map it to
the output. It has been proven that a multi-layer perceptron with only one hidden layer can
approximate any function that connects its input with its outputs if such a function exists.
 The output layer, where the outcome of the network can be seen. The number of neurons here
depends on the problem we want the neural net to learn

The Multi-layer perceptron differs from the simple perceptron in many ways. The same part is
that of weight randomization. All weights are given random values between a certain range,
usually [-0.5,0.5]. Having that aside though, for each pattern that is fed to the network three
passes over the net are made. Let's see them one by one in detail.

 Calculate the output:

In this phase we calculate the output of the network. For each layer, we calculate the firing value
of each neuron by getting the sum of the products of the multiplications of all the neurons
connected to said neuron from the previous layer and their corresponding weights. That sounded
a little big though so here it is in pseudocode:

for(int i = 0; i < previousLayerNeurons; i ++)


value[neuron,layer] += weight(i,neuron) * value[i,layer-1]

value[neuron,layer] = activationFunction(value[neuron,layer]);

As can be seen from the pseudocode, here too we have activation functions. They are used to
normalize the output of each neuron and the functions that are most commonly used in the
perceptron apply here too.So, we gradually propagate forward in the network until we reach the
output layer, and create some output values. Just like the perceptron these values are initially
completely random and have nothing to do with our goal values. But it is here that the back-
propagation learning algorithm kicks in.

 Back-Propagation:

The back propagation learning algorithm uses the delta-rule. What this does is that it computes
the deltas, (local gradients) of each neuron starting from the output neurons and going backwards
until it reaches the input layer. To compute the deltas of the output neurons though we first have
to get the error of each output neuron. That's pretty simple, since the multi-layer perceptron is a
supervised training network so the error is the difference between the network's output and the
desired output.
ej(n) = dj(n) - oj(n)
where e(n) is the error vector, d(n) is the desired output vector and o(n) is the actual output
vector. Now to compute the deltas:

deltaj(L)(n) = ej(L)(n) * f'(uj(L)(n)) , for neuron j in the output layer L


where f'(uj(l)(n)) is the derivative of the value of the jth neuron of layer L

deltaj(l)(n) = f'(uj(l)(n)) Σk(deltak(l+1)(n)*wkj(l+1)(n)), for neuron j in hidden layer l


where f'(uj(l)(n)) is the derivative of the value of the jth neuron in layer l and inside the Sum we
have the products of all the deltas of the neurons of the next layer multiplied by their
corresponding weights.

This part is a very impontant part of the delta rule and the whole essence of back propagation.
Why you might ask? Because just like wikipedia says a derivative is how much a function
changes as its input changes. By propagating the derivatives backwards , we are informing all the
neurons in the previous layers of the change that is needed in our weights to match the desired
output. And all that starts from the initial error calculation at the output layer. Just like magic!

 Weight adjustment

Having calculated the deltas for all the neurons we are now ready for one third and final pass of
the network, this time to adjust the weights according to the generalized delta rule:
wji(l)(n+1) = wji(l)(n) + α * [wji(l)(n) - wji(l)(n-1)] + η * deltaj(l)(n)yi(l-1)(n)
Do not be discouraged by lots of mathematical mumbo jumbo. It is actually quite simple. What
the above says is:
The new weights for layer l are calculated by adding two things to the current weights. The first
is the difference between the current weights and the previous weights multiplied by the
coefficient we symbolize with α. This coefficient is called the momentum coefficient, and true to
its name it adds speed to the training of any multi-layer perceptron by adding part of the already
occured weight changes to the current weight change. This is a double edged sword though since
if your momentum constant is too large the network will not converge and it will probably get
stuck in a local minima.

The other thing that adds to the weight change is the delta of the layer whose weights we change
(l) multiplied by the outputs of the neurons of the previous layer (l-1) and all that multiplied by
the constant η which we know to be the teaching step from the previous tutorial about the
perceptron. And that is basically it! That's what the multi layer perceptron is all about. It is no
doubt a very powerfull neural network and a very powerful tool in statistical analysis.

Practical example

It would not be a tutorial if we just explained how it works and gave you the equations. As was
already mentioned the Multi-layer perceptron has many applications. Satistical analysis, pattern
recognition, optical character recognition are just some of them. Our example will focus on just a
simple instance of optical character recognition. Specifically the final program will be able to use
an MLP to differentiate between a number of .bmp monochrome bitmap files and tell us which
number each image depicts.I used 8x8 pixels resolution for the images but it is up to the reader to
make his own resolutions and/or monochrome images since the program will read the size from
the bitmap itself. Below you can see an example of such bitmaps.

They are ugly, right? Differentiating between them should be hard for a computer? This ugliness
could be considered noice. And MLPs are really good at differentiating between noice and actual
data that help it reach a conclusion. But let's go on and see some code to understand how it is
done.

class MLP
{
private:
std::vector<float> inputNeurons;
std::vector<float>> hiddenNeurons;
std::vector<float> outputNeurons;
std::vector<float> weights;
FileReader* reader;

int inputN,outputN,hiddenN,hiddenL;

public:
MLP(int hiddenL,int hiddenN);
~MLP();

//assigns values to the input neurons


bool populateInput(int fileNum);
//calculates the whole network, from input to output
void calculateNetwork();
//trains the network according to our parameters
bool trainNetwork(float teachingStep,float lmse,float momentum,int
trainingFiles);

//recalls the network for a given bitmap file


void recallNetwork(int fileNum);

};

The above is our multi-layer perceptron class. As you can see it has vectors for all the neurons
and their connection weights. It also contains a FileReader object. As we will see below this
FileReader is a class we will make to read the bitmap files to populate our input. The functions
the MLP has are similar to the perceptron. It populates its input by reading the bitmap images,
calculates an output for the network and trains the network. Moreover you can recall the network
for a given 'fileNum' image to see what number the network thinks the image represents.

//Multi-layer perceptron constructor


MLP::MLP(int hL,int hN)
{
//initialize the filereader
reader = new FileReader();

outputN = 10; //the 9 possible numbers and zero


hiddenL = hL;
hiddenN = hN;

//initialize the filereader


reader = new FileReader();

//read the first image to see what kind of input will our net have
inputN = reader->getBitmapDimensions();
if(inputN == -1)
{
printf("There was an error detecting img0.bmp\n\r");
return ;
}

//let's allocate the memory for the weights


weights.reserve(inputN*hiddenN+(hiddenN*hiddenN*(hiddenL-1))
+hiddenN*outputN);

//also let's set the size for the neurons vector


inputNeurons.resize(inputN);
hiddenNeurons.resize(hiddenN*hiddenL);
outputNeurons.resize(outputN);

//randomize weights for inputs to 1st hidden layer


for(int i = 0; i < inputN*hiddenN; i++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]

//if there are more than 1 hidden layers, randomize their weights
for(int i=1; i < hiddenL; i++)
{
for(int j = 0; j < hiddenN*hiddenN; j++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]
}
}

//and finally randomize the weights for the output layer


for(int i = 0; i < hiddenN*outputN; i ++)
{
weights.push_back( (( (float)rand() / ((float)(RAND_MAX)+(float)
(1)) )) - 0.5 );//[-0.5,0.5]
}

The network takes the number of hidden neurons and hidden layers as parameters so it can know
how to initialize its neurons and weights vectors. Moreover it reads the first bitmap, 'img0.bmp'
to take the dimensions that all the images will have as can be seen from this line:
inputN = reader->getBitmapDimensions();
That is a requirement our tutorial's program will have. You are free to provide any bitmap size
you want for the first image 'img0.bmp' but you are required to have all the following images be
of the same size. As in most neural networks the weights are initialized in the range between [-
0.5,0.5].
void MLP::calculateNetwork()
{
//let's propagate towards the hidden layer
for(int hidden = 0; hidden < hiddenN; hidden++)
{
hiddenAt(1,hidden) = 0;

for(int input = 0 ; input < inputN; input ++)


{
hiddenAt(1,hidden) +=
inputNeurons.at(input)*inputToHidden(input,hidden);
}

//and finally pass it through the activation function


hiddenAt(1,hidden) = sigmoid(hiddenAt(1,hidden));
}

//now if we got more than one hidden layers


for(int i = 2; i <= hiddenL; i ++)
{

//for each one of these extra layers calculate their values


for(int j = 0; j < hiddenN; j++)//to
{
hiddenAt(i,j) = 0;

for(int k = 0; k < hiddenN; k++)//from


{
hiddenAt(i,j) += hiddenAt(i-1,k)*hiddenToHidden(i,k,j);
}

//and finally pass it through the activation function


hiddenAt(i,j) = sigmoid(hiddenAt(i,j));
}
}

int i;
//and now hidden to output
for(i =0; i < outputN; i ++)
{
outputNeurons.at(i) = 0;

for(int j = 0; j < hiddenN; j++)


{
outputNeurons.at(i) += hiddenAt(hiddenL,j) * hiddenToOutput(j,i);
}

//and finally pass it through the activation function


outputNeurons.at(i) = sigmoid( outputNeurons.at(i) );
}

}
The calculate network function just finds the output of the network that correspponds to the
currently given input. It just propagates the input signals through each layer until they reach the
output layer. Nothing really special with the above code, it is just an implementation of the
equations that were presented above. The neural network of our tutorial as we saw in the
constructor has 10 different ouput. Each of these output represent the possibility that the input
pattern is a certain number. So, output 1 being close to 1.0 would mean that the input pattern is
most certainly 1 and so on...

The training function is too big to just post it all in here, so I recommend you take a look at the
.zip with the source code to see it in full. We will just focus in the implementation of the back-
propagation algorithm.

for(int i = 0; i < outputN; i ++)


{
//let's get the delta of the output layer
//and the accumulated error
if(i != target)
{
outputDeltaAt(i) = (0.0 -
outputNeurons[i])*dersigmoid(outputNeurons[i]);
error += (0.0 - outputNeurons[i])*(0.0-outputNeurons[i]);
}
else
{
outputDeltaAt(i) = (1.0 -
outputNeurons[i])*dersigmoid(outputNeurons[i]);
error += (1.0 - outputNeurons[i])*(1.0-outputNeurons[i]);
}

//we start popagating backwards now, to get the error of each neuron
//in every layer

/let's get the delta of the last hidden layer first


for(int i = 0; i < hiddenN; i++)
{
hiddenDeltaAt(hiddenL,i) = 0;//zero the values from the previous
iteration

//add to the delta for each connection with an output neuron


for(int j = 0; j < outputN; j ++)
{
hiddenDeltaAt(hiddenL,i) += outputDeltaAt(j) * hiddenToOutput(i,j)
;
}

//The derivative here is only because of the


//delta rule weight adjustment about to follow
hiddenDeltaAt(hiddenL,i) *= dersigmoid(hiddenAt(hiddenL,i));
}
//now for each additional hidden layer, provided they exist
for(int i = hiddenL-1; i >0; i--)
{
//add to each neuron's hidden delta
for(int j = 0; j < hiddenN; j ++)//from
{

hiddenDeltaAt(i,j) = 0;//zero the values from the previous


iteration

for(int k = 0; k < hiddenN; k++)//to


{
//the previous hidden layers delta multiplied by the
weights
//for each neuron
hiddenDeltaAt(i,j) += hiddenDeltaAt(i+1,k) *
hiddenToHidden(i+1,j,k);
}

//The derivative here is only because of the


//delta rule weight adjustment about to follow
hiddenDeltaAt(i,j) *= dersigmoid(hiddenAt(i,j));
}
}

As you can see above this is the second pass over the network, the so called back-propagation as
we presented it above, since we are going backwards this time. Having calculated the output and
knowing the desired output (called target, in the above code) we start the delta calculation
according to the equations that we saw at the start of the tutorial. If you don't like math, then here
it is for you in code. As you can see many helper macros are used to differentiate between
weights of diffeent layers and deltas.

//Weights modification
tempWeights = weights;//keep the previous weights somewhere, we will need them

//hidden to Input weights


for(int i = 0; i < inputN; i ++)
{
for(int j = 0; j < hiddenN; j ++)
{
inputToHidden(i,j) += momentum*(inputToHidden(i,j) -
_prev_inputToHidden(i,j)) +
teachingStep* hiddenDeltaAt(1,j) * inputNeurons[i];
}
}

//hidden to hidden weights, provided more than 1 layer exists


for(int i = 2; i <=hiddenL; i++)
{
for(int j = 0; j < hiddenN; j ++)//from
{
for(int k =0; k < hiddenN; k ++)//to
{
hiddenToHidden(i,j,k) += momentum*(hiddenToHidden(i,j,k) -
_prev_hiddenToHidden(i,j,k)) +
teachingStep * hiddenDeltaAt(i,k)
* hiddenAt(i-1,j);
}
}
}

//last hidden layer to output weights


for(int i = 0; i < outputN; i++)
{
for(int j = 0; j < hiddenN; j ++)
{
hiddenToOutput(j,i) += momentum*(hiddenToOutput(j,i) -
_prev_hiddenToOutput(j,i)) +
teachingStep * outputDeltaAt(i) *
hiddenAt(hiddenL,j);
}
}

prWeights = tempWeights;

And finally this is the third and final pass over the network (for each image of course), which is a
forward propagation from the input layer to the output layer. Here we use the previously
calculated deltas to adjust the weights of the network, to make up for the error we found at the
initial calculation. This is just an implementation in code of the weight adjustment equations we
saw in the theoretical part of the tutorial.

We can see the teaching step at work here. Moreover the careful reader will have noticed that we
keep the previous weight vector values in a temporary vector. That is because of the momentum.
If you recall, we mentioned that the momentum adds a percentage of the already applied weight
change to each subsequent weight change, achieving faster training speeds. Hence the term
momentum.
Well that's actually all there is to know about the back-propagation algorithm training and the
Multi-layer perceptron. Let's take a look at the fileReader class.

class FileReader
{

private:
char* imgBuffer;
//a DWORD
char* check;
bool firstImageRead;
//the input filestream used to read
ifstream fs;

//image stuff
int width;
int height;

public:
FileReader();
~FileReader();
bool readBitmap(int fileNum);
//reads the first bitmap file, the one designated with a '0'
//and gets the dimensions. All other .bmp are assumed with
//equal and identical dimensions
int getBitmapDimensions();
//returns a pointer to integers with all the goals
//that each bitmap should have. Reads it from a file
int* getImgGoals();
//returns a pointer to the currently read data
char* getImgData();
//helper function convering bytes to an int
int bytesToInt(char* bytes,int number);

};

This is the fileReader, class. It contains the imgBuffer, which hold the data of the currently read
bitmap, the input file stream used to read the bitmaps and it also keeps the width and height of
the initializer image. Seeing how the functions are implemented is out of the scope of this
tutorial but you can check the code in the .zip file to see how it is done. What you need to know
is that this class will read the image designated as 'img0.bmp' and assume all the other images
will be monochrome bitmaps with the same dimensions and that all are located in the same path
as the executable.

On the right you can see how to save a monochrome bitmap as a .bmp file using MS windows
paint program. You can create your own bitmap images, and save them like that but just
remember use incrementing numbers to name the files and update goals.txt accordingly.
Moreover all images should have the same dimensions.

Assuming you have the image bitmaps AND the goals.txt file in the same directory as the
executable you can run the tutorial like you can see in the above image. It is using the cmd
command line in windows, but it should work fine in Linux too. You can see how it is called by
looking at the above image. If you call it incorrectly you will be prompted for correct calling.
Any time during training (in Windows) and in Linux each 1000 epochs (for now, it is in the
TODO list, to use the pdCurses library), you are able to stop and start recalling images. You are
just prompted for the image number, the one coming after 'img' in the filename and the network
recalls that image and tells you what it thinks that image represents. Afterwards as you can see
from the image above you also get some percentages to know how much the network thinks the
image match the numbers from 0 to 9.

Well this was it. I hope you enjoyed this tutorial and managed to comprehend the workings of
the multi-layer perceptron neural network. You can find the source code and the images I used to
train the network in the tutorial's source code. I used really small dimensions , 8x8 , just so it can
get trained fast. If you stick with the parameters I used above you are sure to converge. Since this
network has many outputs, some of which look alike the mean square error can not go really
low. That is since some numbers are almost the same, (especially the way I painted them).
Specifically 7 with 4 , and 0 with 8. Still as far as picking the best matching pattern the network
performs brilliantly. For least mean square error you can feel free to stop training when it goes
below 0.45 or so.

As always if you have any comments about the tutorial, constructive criticism or found any bugs
in the code please email me at:
lefteris *at* realintelligence *dot* net

Multi-layered Perceptron Trained With Backpropagation


(in Scala)
My next field of study is neural networks, I’ve been reading about them for some time.
Interestingly, it’s not really easy to find resources that aren’t full of not-so-trivial math, so I had
to get a calculus refresher as well. By the way if you’re interested in learning algorithms strictly
from a programming standpoint, I can’t recommend enough Programming Collective
Intelligence, very good practical book.

Before we get into the details, neural networks are used in supervised machine learning to solve
classification problems and more generally build non-linear models out of raw data. One of their
main strength is that they’re good at generalizing (responding sensibly to data that hasn’t been
seen before). They were very popular in the 70s, many thought they were the early stage of a
new kind of intelligence. That didn’t really happen so they fell in disgrace. To make a comeback
in the mid-80s when back-propagation was invented. Then new machine learning techniques
came about more recently, like Support Vector Machines, showing somewhat better results (with
somewhat complex math techniques). But neural nets came back with a vengeance! With
techniques still evolving, deep belief nets or LSTMs are now showing really good results on
some categories of problems (on that last subject, a couple of Google tech talks from Hinton are
worth the time).

So the simplest form of neural network is Rosenblatt’s Perceptron, which really isn’t much of a
network given that it’s usually represented as a single neuron. Behold the Wikipedia illustration
ripoff:

Basically, you have a series of input signals that stimulate the neuron into producing an output
signal. Each input has a weight and the output is the sum of all inputs time their respective
weight, passed through some normalization function. The role of the function is simply to
normalize the sum.

The f function can be as simple as a step function (i.e. 0 for x < 0 and 1 for x > 0) although most
problems are better handled with more smoothness (like an hyperbolic tangent). At this point you
may start to wonder: but where is the learning? It’s all in the weight, brother. For example, say
you want to predict the chances of someone having a cancer using a set of basic physiological
measurements before recommending a more thorough and expensive examination. Each of
theses measurements (blood pressure, quantity of white cells, …) is going to be one of the X
input in the above picture, the expected output would ideally be a number between 0 to 1 telling
you how sure the network is. To get to that you need to adjust the weights of each connection by
training the network. Enters supervised learning.

In supervised learning, the network is fed a data set for which the answers are already known.
The difference between the expected output and the one produced by the network is the error and
can be used directly to adjust the weights:

In plain English, the new weight for a given input is the old weight added to the product of a
learning factor (usually in the 0.1 to 0.01 ballpark, depending on how fast or precise you want
the learning to happen), the error and the input. Pretty easy, no?

But there’s a catch, with Rosenblatt’s Perceptron you can only model a linear problem (the
solutions can neatly be separated by a straight line). Sometimes that’s enough but often, like in
my cancer detector example, it’s not. So what can we do about it? Just stack the neurons on top
of each other.

Here we have 3 layers, neurons on each layer are connected to all the others neurons of the
neighboring layers (the illustration above is slightly wrong, missing connections between the
hidden and the output). Middle layers (there’s only one in the illustration) are called hidden
layers, as they’re invisible to the outside world. The network can be as large as needed, with
many hidden layers composed of many neurons. The thing is, depending on the problem, a
network that would be too large can actually perform poorly (over-fitting) and will be long to
train. That’s where Perceptrons can be annoying: in theory, they can model any function, in
practice several educated guesses need to be made as to the size and structure of the network to
get good results. That being said, people have been using them quite successfully for some time.

Multi-layers perceptrons were first described in 1969 but it took a while to figure out how to
train them. 17 years actually. The trick is to propagate the error calculated at the output (just like
for Rosenblatt’s perceptron) back through the hidden layers, all the way to the input layer. The
algorithm is fairly well explained on Wikipedia so I won’t go into the details. Plus you’re going
to get some code anyway.

Multi-layers perceptrons are still fairly basic as far as neural networks go nowadays but it’s
necessary to understand them well to understand other networks. They also perform decently
well and are easy to implement, so why not do that, huh? I’ve implemented one in Scala, it gave
me a good occasion to try the language a little more. At the beginning I wanted the
implementation to be purely functional but somewhere along the way it felt more natural to start
mutating the objects. I’m not sure whether that’s because of Scala, because of object-orientation
or because of me. My Scala is probably still a little rough, so let me know if there are more
idiomatic ways.

package spike

import util.Random

class Neuron(nm: String, ns: List[Neuron], rnd: Random) {


  val (a, b, rate) = (1.7159, 2.0/3.0, 0.1)
  val dendrites = connect(ns)
  val name = nm

  // need to remember output and gather error for training


  var (out, error, bias) = (0.0, 0.0, rnd.nextDouble * 2.0 - 1.0)

  def input = {
    error = 0.0 // error reset on new input
    dendrites.map(_.input).sum + bias;
  }

  def output = {
    out = a * tanh(b*input)
    out
  }

  def expectation(expected: Double) = updateError(expected - out)

  def updateError(delta: Double) {


    error += delta
    dendrites.foreach(_.updateError(delta))
  }

  def adjust {
    val adjustment = error * deriv(out) * rate
    dendrites.foreach(_.adjust(adjustment))
    bias += adjustment
  }
  override def toString = name + "[" + dendrites.mkString(",") + "]\n "

  // Derivative of our output function


  private def deriv(out: Double) = a * b * (1-Math.pow(tanh(b*out), 2))

  private def connect(ns: List[Neuron]): List[Dendrite] =


    ns.map(n => new Dendrite(n, rnd.nextDouble * 2 * Math.pow(ns.size, -0.5) -
1))

  // Hyperbolic tangent function


  private def tanh(x: Double) = {
    val exp = Math.exp(2*x)
    (exp - 1) / (exp + 1)
  }
}

// Dendrites are the neurons input connections, just like in your brain.
class Dendrite(n: Neuron, w: Double) {
  // the input neuron
  val neuron = n
  // weight of the signal
  var weight = w

  def input = weight * neuron.out;

  def updateError(delta: Double) {


    neuron.updateError(delta * weight)
  }

  def adjust(adjustment: Double) {


    weight += adjustment * neuron.out
  }

  override def toString = "--["+weight+"]-->"+neuron.name


}

class Net(layout: List[Int], rnd: Random) {


  val layers = build(layout, rnd)

  def output(ins: List[Double]) = {


    layers.head.zip(ins).foreach { case (n, in) => n.out = in }
    layers.tail.foldLeft(ins) { (z, l) => l.map(_.output) }
  }

  def train(ins: List[Double], outs: List[Double]) = {


    val outputs = output(ins)
    layers.last.zip(0 until outs.length).foreach {case (n, m) =>
n.expectation(outs(m))}
    layers.foreach(_.foreach(_.adjust))
  }

  override def toString = layers.mkString("\n")

  private def build(layout: List[Int], rnd: Random) =


    layout.zip(1 to layout.size).foldLeft(List(List[Neuron]())) {
      case (z, (n, l)) => buildLayer("L"+l, n, z.head, rnd) :: z
    }.reverse.tail
  private def buildLayer(name: String, n: Int, lower: List[Neuron], rnd:
Random) =
    (0 until n) map { n => new Neuron(name+"N"+n, lower, rnd) } toList
}

object XOR extends Application {


  val net = new Net(List(2,3,2,1), new Random)
  println(net)
  for (i <- 1 to 150) {
    net.train(List(1, 1), List(-1))
    net.train(List(-1, -1), List(-1))
    net.train(List(1, -1), List(1))
    net.train(List(-1, 1), List(1))
    if (i % 33 == 0) println(net)
  }
  println("Training done.")
  println("** Output for (1,1) " + net.output(List(1, 1)))
  println("** Output for (1,-1) " + net.output(List(1, -1)))
  println("** Output for (-1,1) " + net.output(List(-1, 1)))
  println("** Output for (-1,-1) " + net.output(List(-1, -1)))
}

Bahasa C

/*
* See bottom for address of author.
*
* title: bpsim.c
* author: Josiah C. Hoskins
* date: June 1987
*
* purpose: backpropagation learning rule neural net simulator
* for the tabula rasa Little Red Riding Hood example
*
* description: Bpsim provides an implementation of a neural network
* containing a single hidden layer which uses the
* generalized backpropagation delta rule for learning.
* A simple user interface is supplied for experimenting
* with a neural network solution to the Little Red Riding
* Hood example described in the text.
*
* In addition, bpsim contains some useful building blocks
* for further experimentation with single layer neural
* networks. The data structure which describes the general
* processing unit allows one to easily investigate different
* activation (output) and/or error functions. The utility
* function create_link can be used to create links between
* any two units by supplying your own create_in_out_links
* function. The flexibility of creating units and links
* to your specifications allows one to modify the code
* to tune the network architecture to problems of interest.
*
* There are some parameters that perhaps need some
* explanation. You will notice that the target values are
* either 0.1 or 0.9 (corresponding to the binary values
* 0 or 1). With the sigmoidal function used in out_f the
* weights become very large if 0 and 1 are used as targets.
* The ON_TOLERANCE value is used as a criteria for an output
* value to be considered "on", i.e., close enough to the
* target of 0.9 to be considered 1. The learning_rate and
* momentum variables may be changed to vary the rate of
* learning, however, in general they each should be less
* than 1.0.
*
* Bpsim has been compiled using CI-C86 version 2.30 on an
* IBM-PC and the Sun C compiler on a Sun 3/160.
*
* Note to compile and link on U*IX machines use:
* cc -o bpsim bpsim.c -lm
*
* For other machines remember to link in the math library.
*
* status: This program may be freely used, modified, and distributed
* except for commercial purposes.
*
* Copyright (c) 1987 Josiah C. Hoskins
*/
/* Modified to function properly under Turbo C by replacing malloc(...)
with calloc(...,1). Thanks to Pavel Rozalski who detected the error.
He assumed that Turbo C's "malloc" doesn't automatically set pointers
to NULL - and he was right!
Thomas Muhr, Berlin April, 1988
*/

#include <math.h>
#include <stdio.h>
#include <ctype.h>

#define BUFSIZ 512

#define FALSE 0
#define TRUE !FALSE
#define NUM_IN 6 /* number of input units */
#define NUM_HID 3 /* number of hidden units */
#define NUM_OUT 7 /* number of output units */
#define TOTAL (NUM_IN + NUM_HID + NUM_OUT)
#define BIAS_UID (TOTAL) /* threshold unit */

/* macros to provide indexes for processing units */


#define IN_UID(X) (X)
#define HID_UID(X) (NUM_IN + X)
#define OUT_UID(X) (NUM_IN + NUM_HID + X)
#define TARGET_INDEX(X) (X - (NUM_IN + NUM_HID))

#define WOLF_PATTERN 0
#define GRANDMA_PATTERN 1
#define WOODCUT_PATTERN 2
#define PATTERNS 3 /* number of input patterns */
#define ERROR_TOLERANCE 0.01
#define ON_TOLERANCE 0.8 /* a unit's output is on if > ON_TOLERENCE */
#define NOTIFY 10 /* iterations per dot notification */
#define DEFAULT_ITER 250

struct unit { /* general processing unit */


int uid; /* integer uniquely identifying each unit */
char *label;
double output; /* activation level */
double (*unit_out_f)(); /* note output fcn == activation fcn*/
double delta; /* delta for unit */
double (*unit_delta_f)(); /* ptr to function to calc delta */
struct link *inlinks; /* for propagation */
struct link *outlinks; /* for back propagation */
} *pu[TOTAL+1]; /* one extra for the bias unit */

struct link { /* link between two processing units */


char *label;
double weight; /* connection or link weight */
double data; /* used to hold the change in weights */
int from_unit; /* uid of from unit */
int to_unit; /* uid of to unit */
struct link *next_inlink;
struct link *next_outlink;
};

int iterations = DEFAULT_ITER;


double learning_rate = 0.2;
double momentum = 0.9;
double pattern_err[PATTERNS];

/*
* Input Patterns
* {Big Ears, Big Eyes, Big Teeth, Kindly, Wrinkled, Handsome}
* unit 0 unit 1 unit 2 unit 3 unit 4 unit 5
*/
double input_pat[PATTERNS+1][NUM_IN] = {
{1.0, 1.0, 1.0, 0.0, 0.0, 0.0}, /* Wolf */
{0.0, 1.0, 0.0, 1.0, 1.0, 0.0}, /* Grandma */
{1.0, 0.0, 0.0, 1.0, 0.0, 1.0}, /* Woodcutter */
{0.0, 0.0, 0.0, 0.0, 0.0, 0.0}, /* Used for Recognize Mode */
};

/*
* Target Patterns
* {Scream, Run Away, Look for Woodcutter, Approach, Kiss on Cheek,
* Offer Food, Flirt with}
*/
double target_pat[PATTERNS][NUM_OUT] = {
{0.9, 0.9, 0.9, 0.1, 0.1, 0.1, 0.1}, /* response to Wolf */
{0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.1}, /* response to Grandma */
{0.1, 0.1, 0.1, 0.9, 0.1, 0.9, 0.9}, /* response to Woodcutter */
};

/*
* function declarations
*/
void print_header();
char get_command();
double out_f(), delta_f_out(), delta_f_hid(), random(), pattern_error();

main()
{
char ch;
extern struct unit *pu[];

print_header();
create_processing_units(pu);
create_in_out_links(pu);
for (;;) {
ch = get_command("\nEnter Command (Learn, Recognize, Quit) => ");
switch (ch) {
case 'l':
case 'L':
printf("\n\tLEARN MODE\n\n");
learn(pu);
break;
case 'r':
case 'R':
printf("\n\tRECOGNIZE MODE\n\n");
recognize(pu);
break;
case 'q':
case 'Q':
exit(1);
break;
default:
fprintf(stderr, "Invalid Command\n");
break;
}
}
}

void
print_header()
{
printf("%s%s%s",
"\n\tBPSIM -- Back Propagation Learning Rule Neural Net Simulator\n",
"\t\t for the tabula rasa Little Red Riding Hood example.\n\n",
"\t\t Written by Josiah C. Hoskins\n");
}

/*
* create input, hidden, output units (and threshold or bias unit)
*/
create_processing_units(pu)
struct unit *pu[];
{
int id; /* processing unit index */
struct unit *create_unit();

for (id = IN_UID(0); id < IN_UID(NUM_IN); id++)


pu[id] = create_unit(id, "input", 0.0, NULL, 0.0, NULL);
for (id = HID_UID(0); id < HID_UID(NUM_HID); id++)
pu[id] = create_unit(id, "hidden", 0.0, out_f, 0.0, delta_f_hid);
for (id = OUT_UID(0); id < OUT_UID(NUM_OUT); id++)
pu[id] = create_unit(id, "output", 0.0, out_f, 0.0, delta_f_out);
pu[BIAS_UID] = create_unit(BIAS_UID, "bias", 1.0, NULL, 0.0, NULL);
}

/*
* create links - fully connected for each layer
* note: the bias unit has one link to ea hid and out unit
*/
create_in_out_links(pu)
struct unit *pu[];
{
int i, j; /* i == to and j == from unit id's */
struct link *create_link();

/* fully connected units */


for (i = HID_UID(0); i < HID_UID(NUM_HID); i++) { /* links to hidden */
pu[BIAS_UID]->outlinks =
pu[i]->inlinks = create_link(pu[i]->inlinks, i,
pu[BIAS_UID]->outlinks, BIAS_UID,
(char *)NULL,
random(), 0.0);
for (j = IN_UID(0); j < IN_UID(NUM_IN); j++) /* from input units */
pu[j]->outlinks =
pu[i]->inlinks = create_link(pu[i]->inlinks, i, pu[j]->outlinks, j,
(char *)NULL, random(), 0.0);
}
for (i = OUT_UID(0); i < OUT_UID(NUM_OUT); i++) { /* links to output */
pu[BIAS_UID]->outlinks =
pu[i]->inlinks = create_link(pu[i]->inlinks, i,
pu[BIAS_UID]->outlinks, BIAS_UID,
(char *)NULL, random(), 0.0);
for (j = HID_UID(0); j < HID_UID(NUM_HID); j++) /* from hidden units */
pu[j]->outlinks =
pu[i]->inlinks = create_link(pu[i]->inlinks, i, pu[j]->outlinks, j,
(char *)NULL, random(), 0.0);
}
}

/*
* return a random number bet 0.0 and 1.0
*/
double
random()
{
return((rand() % 32727) / 32737.0);
}

/*
* the next two functions are general utility functions to create units
* and create links
*/
struct unit *
create_unit(uid, label, output, out_f, delta, delta_f)
int uid;
char *label;
double output, delta;
double (*out_f)(), (*delta_f)();
{
struct unit *unitptr;

/*
if (!(unitptr = (struct unit *)malloc(sizeof(struct unit)))) {
TURBO C doesnt automatically set pointers to NULL - so use calloc(...,1) */
if (!(unitptr = (struct unit *)calloc(sizeof(struct unit),1))) {
fprintf(stderr, "create_unit: not enough memory\n");
exit(1);
}
/* initialize unit data */
unitptr->uid = uid;
unitptr->label = label;
unitptr->output = output;
unitptr->unit_out_f = out_f; /* ptr to output fcn */
unitptr->delta = delta;
unitptr->unit_delta_f = delta_f;
return (unitptr);
}

struct link *
create_link(start_inlist, to_uid, start_outlist, from_uid, label, wt, data)
struct link *start_inlist, *start_outlist;
int to_uid, from_uid;
char * label;
double wt, data;
{
struct link *linkptr;

/* if (!(linkptr = (struct link *)malloc(sizeof(struct link)))) { */


if (!(linkptr = (struct link *)calloc(sizeof(struct link),1))) {
fprintf(stderr, "create_link: not enough memory\n");
exit(1);
}
/* initialize link data */
linkptr->label = label;
linkptr->from_unit = from_uid;
linkptr->to_unit = to_uid;
linkptr->weight = wt;
linkptr->data = data;
linkptr->next_inlink = start_inlist;
linkptr->next_outlink = start_outlist;
return(linkptr);
}

char
get_command(s)
char *s;
{
char command[BUFSIZ];

fputs(s, stdout);
fflush(stdin); fflush(stdout);
(void)fgets(command, BUFSIZ, stdin);
return((command[0])); /* return 1st letter of command */
}

learn(pu)
struct unit *pu[];
{
register i, temp;
char tempstr[BUFSIZ];
extern int iterations;
extern double learning_rate, momentum;
static char prompt[] = "Enter # iterations (default is 250) => ";
static char quote1[] = "Perhaps, Little Red Riding Hood ";
static char quote2[] = "should do more learning.\n";

printf(prompt);
fflush(stdin); fflush(stdout);
gets(tempstr);
if (temp = atoi(tempstr))
iterations = temp;

printf("\nLearning ");
for (i = 0; i < iterations; i++) {
if ((i % NOTIFY) == 0) {
printf(".");
fflush(stdout);
}
bp_learn(pu, (i == iterations-2 || i == iterations-1 || i == iterations));
}
printf(" Done\n\n");
printf("Error for Wolf pattern = \t%lf\n", pattern_err[0]);
printf("Error for Grandma pattern = \t%lf\n", pattern_err[1]);
printf("Error for Woodcutter pattern = \t%lf\n", pattern_err[2]);
if (pattern_err[WOLF_PATTERN] > ERROR_TOLERANCE) {
printf("\nI don't know the Wolf very well.\n%s%s", quote1, quote2);
} else if (pattern_err[GRANDMA_PATTERN] > ERROR_TOLERANCE) {
printf("\nI don't know Grandma very well.\n%s%s", quote1, quote2);
} else if (pattern_err[WOODCUT_PATTERN] > ERROR_TOLERANCE) {
printf("\nI don't know Mr. Woodcutter very well.\n%s%s", quote1, quote2);
} else {
printf("\nI feel pretty smart, now.\n");
}
}

/*
* back propagation learning
*/
bp_learn(pu, save_error)
struct unit *pu[];
int save_error;
{
static int count = 0;
static int pattern = 0;
extern double pattern_err[PATTERNS];

init_input_units(pu, pattern); /* initialize input pattern to learn */


propagate(pu); /* calc outputs to check versus targets */
if (save_error)
pattern_err[pattern] = pattern_error(pattern, pu);
bp_adjust_weights(pattern, pu);
if (pattern < PATTERNS - 1)
pattern++;
else
pattern = 0;
count++;
}

/*
* initialize the input units with a specific input pattern to learn
*/
init_input_units(pu, pattern)
struct unit *pu[];
int pattern;
{
int id;

for (id = IN_UID(0); id < IN_UID(NUM_IN); id++)


pu[id]->output = input_pat[pattern][id];
}

/*
* calculate the activation level of each unit
*/
propagate(pu)
struct unit *pu[];
{
int id;

for (id = HID_UID(0); id < HID_UID(NUM_HID); id++)


(*(pu[id]->unit_out_f))(pu[id], pu);
for (id = OUT_UID(0); id < OUT_UID(NUM_OUT); id++)
(*(pu[id]->unit_out_f))(pu[id], pu);
}

/*
* function to calculate the activation or output of units
*/
double
out_f(pu_ptr, pu)
struct unit *pu_ptr, *pu[];
{
double sum = 0.0 , exp();
struct link *tmp_ptr;

tmp_ptr = pu_ptr->inlinks;
while (tmp_ptr) {
/* sum up (outputs from inlinks times weights on the inlinks) */
sum += pu[tmp_ptr->from_unit]->output * tmp_ptr->weight;
tmp_ptr = tmp_ptr->next_inlink;
}
pu_ptr->output = 1.0/(1.0 + exp(-sum));
}

/*
* half of the sum of the squares of the errors of the
* output versus target values
*/
double
pattern_error(pat_num, pu)
int pat_num; /* pattern number */
struct unit *pu[];
{
int i;
double temp, sum = 0.0;

for (i = OUT_UID(0); i < OUT_UID(NUM_OUT); i++) {


temp = target_pat[pat_num][TARGET_INDEX(i)] - pu[i]->output;
sum += temp * temp;
}
return (sum/2.0);
}

bp_adjust_weights(pat_num, pu)
int pat_num; /* pattern number */
struct unit *pu[];
{
int i; /* processing units id */
double temp1, temp2, delta, error_sum;
struct link *inlink_ptr, *outlink_ptr;

/* calc deltas */
for (i = OUT_UID(0); i < OUT_UID(NUM_OUT); i++) /* for each output unit */
(*(pu[i]->unit_delta_f))(pu, i, pat_num); /* calc delta */
for (i = HID_UID(0); i < HID_UID(NUM_HID); i++) /* for each hidden unit */
(*(pu[i]->unit_delta_f))(pu, i); /* calc delta */
/* calculate weights */
for (i = OUT_UID(0); i < OUT_UID(NUM_OUT); i++) { /* for output units */
inlink_ptr = pu[i]->inlinks;
while (inlink_ptr) { /* for each inlink to output unit */
temp1 = learning_rate * pu[i]->delta *
pu[inlink_ptr->from_unit]->output;
temp2 = momentum * inlink_ptr->data;
inlink_ptr->data = temp1 + temp2; /* new delta weight */
inlink_ptr->weight += inlink_ptr->data; /* new weight */
inlink_ptr = inlink_ptr->next_inlink;
}
}
for (i = HID_UID(0); i < HID_UID(NUM_HID); i++) { /* for ea hid unit */
inlink_ptr = pu[i]->inlinks;
while (inlink_ptr) { /* for each inlink to output unit */
temp1 = learning_rate * pu[i]->delta *
pu[inlink_ptr->from_unit]->output;
temp2 = momentum * inlink_ptr->data;
inlink_ptr->data = temp1 + temp2; /* new delta weight */
inlink_ptr->weight += inlink_ptr->data; /* new weight */
inlink_ptr = inlink_ptr->next_inlink;
}
}
}

/*
* calculate the delta for an output unit
*/
double
delta_f_out(pu, uid, pat_num)
struct unit *pu[];
int uid, pat_num;
{
double temp1, temp2, delta;

/* calc deltas */
temp1 = (target_pat[pat_num][TARGET_INDEX(uid)] - pu[uid]->output);
temp2 = (1.0 - pu[uid]->output);
delta = temp1 * pu[uid]->output * temp2; /* calc delta */
pu[uid]->delta = delta; /* store delta to pass on */
}

/*
* calculate the delta for a hidden unit
*/
double
delta_f_hid(pu, uid)
struct unit *pu[];
int uid;
{
double temp1, temp2, delta, error_sum;
struct link *inlink_ptr, *outlink_ptr;

outlink_ptr = pu[uid]->outlinks;
error_sum = 0.0;
while (outlink_ptr) {
error_sum += pu[outlink_ptr->to_unit]->delta * outlink_ptr->weight;
outlink_ptr = outlink_ptr->next_outlink;
}
delta = pu[uid]->output * (1.0 - pu[uid]->output) * error_sum;
pu[uid]->delta = delta;
}

recognize(pu)
struct unit *pu[];
{
int i;
char tempstr[BUFSIZ];
static char *p[] = {"Big Ears?", "Big Eyes?", "Big Teeth?",
"Kindly?\t", "Wrinkled?", "Handsome?"};

for (i = 0; i < NUM_IN; i++) {


printf("%s\t(y/n) ", p[i]);
fflush(stdin); fflush(stdout);
fgets(tempstr, BUFSIZ, stdin);
if (tempstr[0] == 'Y' || tempstr[0] == 'y')
input_pat[PATTERNS][i] = 1.0;
else
input_pat[PATTERNS][i] = 0.0;
}
init_input_units(pu, PATTERNS);
propagate(pu);
print_behaviour(pu);
}

print_behaviour(pu)
struct unit *pu[];
{
int id, count = 0;
static char *behaviour[] = {
"Screams", "Runs Away", "Looks for Woodcutter", "Approaches",
"Kisses on Cheek", "Offers Food", "Flirts with Woodcutter" };

printf("\nLittle Red Riding Hood: \n");


for (id = OUT_UID(0); id < OUT_UID(NUM_OUT); id++){ /* links to out units */
if (pu[id]->output > ON_TOLERANCE)
printf("\t%s\n", behaviour[count]);
count++;
}
printf("\n");
}

/*
! Thomas Muhr Knowledge-Based Systems Dept. Technical University of
Berlin !
! BITNET/EARN:
[email protected] !
! UUCP: [email protected] (Please don't use from outside
Germany) !
! BTX: 030874162 Tel.: (Germany 0049) (Berlin 030) 87 41
62 !
*/

You might also like