Artificial Neural Networks
Artificial Neural Networks
Artificial Neural Networks
Networks
Artificial Neural Network
Neural Networks (NN) are based on biological neural system
structure, which consists of several connected elements
named neurons
The nodes can take input data and perform simple operations on
the data
Z a
x association activation
w Z=WTx+b a = σ(Z) Y
b dz da
Where,
W is weight, b is bias
x is the input, T is vector transpose
Artificial Neural Network(Contd…)
Hidden Layer
x1
Node 1 Z1[1] = W1[1]Tx(i) + b1[1] a1[1] = σ(Z1[1])
a1[1]
x2 y
X1 \
X2 ==> z = XW + B ==> a = Sigmoid(z) ==> l(a,Y)
X3 /
dz da
Neural Networks Overview
In neural networks with one layer we will have:
X1 \
X2 => z[1] = W[1] X + b[1] => a[1] = Sigmoid(z[1] ) => z[2] = W[2] a[1]+ b[2] => a[2] = Sigmoid(z[2]) => L(a[2],Y)
X3 /
X is the input vector (X1, X2, X3), and Y is the output variable (1x1) [1]
[1] ---> refers to Layer 1
[2] ---> refers to Layer 2 [2]
𝟏 y
Sigmoid: a= 𝟏 𝒆 𝒛
It exists between (0 to 1)
All negative values become zero immediately which decreases the ability of the
model to fit or train from the data properly
Any negative input given to the ReLU function turns the value into zero
immediately in the graph (affects the resulting graph by not mapping the negative
values)
Activation Functions (Contd…)
4) Leaky ReLU Activation Function
1943 McCulloch and Mcculloch Pitts Arrangement of neurons is combination of logic gate. Unique
Pitts Neuron feature is threshhold
1949 Hebb Network Hebb If two neurons are active, then their connection strengths should
be increased.
1958-1988 Perceptron Frank Rosenblatt, Weights of path can be adjusted
Block, Minsky and
Papert
1960 Adaline Widrow and Hoff The weights are adjusted to reduce the difference between the
net input to the output unit and the desired output.
1972 Kohonen Kohenen Inputs are clustered to obtain a fired output neuron.
selforganizing
feature map
Evolution of Neural Networks
Year Neural Network Designer Description
No decision boundary
Exercise
Exercise –Solution Notes
Perceptron or Single-layer Perceptron
Single Layer Perceptron has just two layers of
input and output
It only has single layer hence the name single
y
layer perceptron
It does not contain Hidden Layers as that of
Multilayer perceptron
Perceptrons
Perceptron is type of ANN that can be seen as the simplest
kind of feedforward neural network: a linear classifier
Introduced in the late 1950s
Perceptron convergence theorem (Rosenblatt 1962):
◦ Perceptron will learn to classify any linearly separable set
of inputs.
Perceptron is a network:
– single-layer
– feed-forward: data only
XOR function (no linear separation) travels in one direction
24
Perceptron or Single-layer Perceptron
Perceptron is a single layer neural network where as a multi-layer perceptron is called
Neural Networks
y
An MLP has input and output layers, and one or
more hidden layers with many neurons stacked
together
Multi Layer Perceptron (MLP)
A type of feed-forward artificial neural network that generates a set of outputs from a set of
inputs
An MLP is a neural network connecting multiple layers in a directed graph, which means that
the signal path through the nodes goes one way y
Mulitlayer ANN has no guarantee of convergence but can learn functions that are
not linearly separable
30
Multi Layer Perceptron (MLP)
Each layer is feeding the next one with the result of their computation, their internal
representation of the data
This goes all the way through the hidden layers to the output layer
If the algorithm only computed the weighted sums in each neuron, propagated results
y to the
output layer, and stopped there, it wouldn’t be able to learn the weights that minimize the
cost function. If the algorithm only computed one iteration, there would be no actual
learning
This is where Backpropagation comes into play
Perceptron Training
Assume supervised training examples giving the desired output for a unit given a set of known input
activations.
Goal: learn the weight vector (synaptic weights) that causes the perceptron to produce the correct +/-
1 values
Perceptron uses iterative update algorithm to learn a correct set of weights
Perceptron training rule
Delta rule (Not in Syllabus)
Both algorithms guaranteed to converge under somewhat different conditions
32
Perceptron Training Rule
Update weights by:
wi wi wi
wi (t o) wi
where
η is the learning rate (Learning rate is generally represented by )
◦ a small value (e.g., 0.1)
◦ sometimes decays as the number of weight-tuning operations increases
34
Perceptron Learning
1.1×0.3+2.6×1.0=2.93
y
Neural Network Representation
For given input X, the outputs for each neuron will be:
z[1] = W[1]x + b[1]
a[1] = 𝛔(z[1])
z[2] = W[2]x + b[2]
a[2] = 𝛔(z[2])
Computing a Neural Network's Output
y
Backpropagation
Repeat:
Compute predictions (y'[i], i = 0,...m)
Get derivatives: dW[1], db[1], dW[2], db[2] y
Update:
W[1] = W[1] - α * dW[1]
b[1] = b[1] - α * db[1]
W[2] = W[2] - α * dW[2]
b[2] = b[2] - α * db[2] dW[1] = [1] db[1] = [1]
dW db
Gradient Descent
Forward propagation:
dZ[2] = A[2] - Y # derivative of cost function we used * derivative of the sigmoid function
dW[2] = (dZ[2] * A[1].T) / m
y
db[2] = Sum(dZ[2]) /m
dZ[1] = (W[2].T * dZ[2]) * g'[1] (Z[1]) # element wise product (*)
dW[1] = (dZ[1] * A[0].T) / m # A0 = X
db[1] = Sum(dZ[1]) / m
# Hint there are transposes with multiplication because to keep dimensions correct
Random Initialization
We have previously seen that the weights are initialized to 0 in case of a logistic
regression algorithm
For logistic regression, it was okay to initialize weights to 0 because it doesn’t have any
hidden layer
But should we initialize the weights of a neural network to 0?
y
Random Initialization
If the weights are initialized to 0, the W matrix will be:
Identical or Symmetric because both of these hidden units are computing exactly the
same function
Random Initialization
When we compute backpropagation:
dZ1[1] = dZ2[1]
No matter how many hidden units we use in a layer, we are always getting the same
output which is similar to that of using a single unit
So, instead of initializing the weights to 0, we randomly initialize them
Back-propagation
network training
Back-propagation
network training
Example-1
Initial Weights:
Next-Generation Firewall
Steps for Implementation
1. Data Preprocessing
2. Feature Extraction
3. Feature Selection
4. Implement Machine learning model
5. Training Testing (if supervised model)
6. Calculate Parameters
Evaluation – Model Training
While the parameters of each model may differ, there are several methods to train a model.
◦ We want to avoid overfitting a model and maximize its predictive power.
Many software (e.g., WEKA, RapidMiner) will do these methods automatically for you.
Evaluation
There are several questions we should ask after model training:
◦ How predictive is the model we learned?
◦ How reliable and accurate are the predicted results?
◦ Which model performs better?
We want our model to perform well on our training set but also have strong predictive power.
Fortunately, various metrics applied on the testing set can help us choose the “best” model for
our application.
Metrics for Performance Evaluation
A Confusion Matrix provides measures to compute a
models’ accuracy:
◦ True Positives (TP) – # of positive examples correctly predicted
by the model
.
Accuracy = = .
(3) Accuracy-> Percentage of correct predictions made by ML
model
( ∗ )
F − Score = 2. ( )
(4) F-Score is the harmonic mean of precision and recall
Metrics for Performance Evaluation
However, accuracy can be skewed due to a class imbalance.
TP
FP