Business Intelligence & Data Mining-10
Business Intelligence & Data Mining-10
Business Intelligence & Data Mining-10
(ANNs)
Arithmetic:
Vision:
A bit of biology . . .
Most important functional unit in human brain a class of cells called
NEURONS
Dendrites
Cell Body
Axon
Synapse
Actual Neuron
Schematic
Artificial Neurons
simulated on hardware or by software
input connections - the receivers
node, unit, Processing Unit or Processing Element
simulates neuron body
output connection and transfer function - the
transmitter (axon)
activation function employs a threshold or bias
connection weights act as synaptic junctions
Learning (for a particular model) occurs via changes in
value of the various weights and choice of functions.
X1
X2
w1
w2
..
.
Xp
wp
Cell Body
Axon
V = f(I)
I = w1X1 + w2X2
+ w3X3 + + wpXp
Activity
in
an
ANN
Node
Mathematical Model of a Node
Adder _ Function
a0
w0
Incoming
activation
wi
ai
Outgoing
activation
wn
an
Activation_ Function
a0
I = xi wi
w0
Incoming
activation
wi
ai
i =0
Outgoing
activation
wn
an
Activation_ Function
wi
ai
Outgoing
activation
wn
Activation_ Function
an
1 if I > t,
f (I ) =
0 if I <= t
Transfer Functions
There are various choices for Transfer / Activation functions
0.5
-1
Tanh
f(x) =
(ex e-x) / (ex + e-x)
Logistic
f(x) = ex / (1 + ex)
Threshold
0 if x< 0
f(x) =
1 if x >= 1
None
One
More
X2
X3
X4
Input Layer
Hidden Layer
- Connects Input and Output
layers
Output Layer
- Output of each neuron
directly goes to outside
y1
y2
X2
X3
X4
y1
y2
An ANN model
What do we mean by A particular Model ?
Input: X1 X2 X3
For an ANN :
Output: Y
However it is characterized by
# Input Neurons
# Hidden Layers
# Neurons in each Hidden Layer
# Output Neurons
The adder, activation and transfer functions
WEIGHTS for all the connections
Fitting an ANN model = Specifying values for all those parameters
Output: Y
X2
X1
0.6
-0.1
X3
-0.2
0.1
0.7
0.5
0.1
Example
# Input Neurons
# Hidden Layers
# Output Neurons
-0.2
Weights
# Input Nrns = # of Xs
# Output Nrns = # of Ys
Learnt
Free parameters
Output: Y
X2=-1
0.6
-0.1
X3 =2
-0.2
0.1
0.7
0.5
0.2
f (0.2) = 0.55
0.55
0.9
f (0.9) = 0.71
0.71
0.1
f(x) = ex / (1 + ex)
f(0.2) = e0.2 / (1 + e0.2) = 0.55
Predicted Y = 0.478
-0.2
Suppose Actual Y = 2
-0.087
f (-0.087) = 0.478
0.478
Then
Prediction Error = (2-0.478) =1.522
No fixed strategy.
By trial and error
How to get the weights ???
( V1,V2,,Vn)
They are function of W.
Choose W such that over all prediction error E is minimized
E = (Yi Vi) 2
Back Propagation
Feed forward
E = (Yi Vi) 2
Back Propagation
Each weight Shares the Blame for prediction
error and tries to adjust with other weights.
Back Propagation algorithm decides how to
distribute the blame among all weights and
adjust the weights accordingly.
Small portion of blame leads to small
adjustment.
Large portion of the blame leads to large
adjustment.
E = (Yi Vi) 2
E( W ) = [ Yi Vi( W ) ] 2
Gradient Descent Method :
For every individual weight W i, updation formula looks like
E( w1, w2 ) = [ Yi Vi(w1, w2 ) ] 2
A pair ( w1, w2 ) is a point on 2-D plane.
w1
w2
Move to a better point ( w1, w2 ) where the height of error surface is lower.
Keep moving till you reach ( w*1, w*2 ), where the error is minimum.
Error Surface
12.0
Local Minima
10.0
8.0
Error
6.0
4.0
w*
w0
Weight Space
-2.000
-1.000
0.000
1.000
2.000
3.000
4.000
5.000
6.000
6.000
5.000
4.000
3.000
2.000
1.000
0.000
-1.000
W2
-2.000
-3.000
0.0
-3.000
2.0
Global Minima
W1
Training Algorithm
Decide the Network architecture
(# Hidden layers, #Neurons in each Hidden Layer)
Decide the Learning parameter and Momentum
Initialize the Network with random weights
Do till Convergence criterion is not met
For I = 1 to # Training Data points
Feed forward the I-th observation thru the Net
Compute the prediction error on I-th observation
Back propagate the error and adjust weights
E = (Yi Vi)
Next I
Check for Convergence
End Do
Convergence Criterion
When to stop training the Network ?
Ideally when we reach the global minima of the error surface
How do we know we have reached there ?
We dont
Suggestions:
1. Stop if the decrease in total prediction error (since last cycle) is small.
2. Stop if the overall changes in the weights (since last cycle) are small.
Drawback:
Error keeps on decreasing. We get a very good fit to training data.
BUT The network thus obtained have poor generalizing power on unseen data
The phenomenon is also known as - Over fitting of the Training data
The network is said to Memorize the training data.
- so that when an X in training set is given,
the network faithfully produces the corresponding Y.
-However for Xs which the network didnt see before, it predicts poorly.
Convergence Criterion
Modified Suggestion:
Partition the training data into Training set and Validation set
Use:
Error
Validation
Training
Cycle
Summary
q
Given a training data set weights are found by Feed forward Back
propagation algorithm, which is a form of Gradient Descent Method a
popular technique for function minimization.
Self-Organizing Maps
Overview
A Self-Organizing Map (SOM) is a way to
represent higher dimensional data in an usually 2D or 3-D manner, such that similar data is grouped
together.
It runs unsupervised and performs the grouping on
its own.
Once the SOM converges, it can classify new data.
SOMs run in two phases:
Training phase: the map is built, the network organizes
using a competitive process, it is trained using a large
numbers of inputs.
Mapping phase: new vectors are quickly given a
location on the converged map, easily classifying or
categorizing the new data.
Example
8)
9)