08 NN
08 NN
08 NN
Artificial Intelligence
Neural Networks
Prof. Mahmoud Khalil
Summer 2024
1
Artificial Neural Networks
4
Artificial Neuron Node or Unit: Mathematical Abstraction
Artificial Neuron,
Node or unit ,
Processing Unit i
Step Function
Sigmoid Function Sign Function
6
Normalizing Unit Thresholds
If t is the threshold value of the output unit, then
Where W0 = t and I0 = −1
- We can always assume that the unit’s threshold is 0 This allows thresholds to be
learned like any other weight
- We can even allow output values in [0, 1] by replacing step0 by the sigmoid function
7
Units as logic Gates
Activation of
threshold units when:
n
W j,i a j W0,i
j1
8
AND
x1 x2 output
0 0 0
W0= 1.5
0 1 0
1 0 0
-1
1 1 1 w1=1 w2=1
x1 x2
Activation of
threshold units when:
n
W j,i a j W0,i
j1
9
OR
x1 x2 output
0 0 0 w0= 0.5
0 1 1
1 0 1 -1
w1=1 w2=1
1 1 1
x1 x2
Activation of
threshold units when:
n
W j,i a j W0,i
j1
10
NOT
x1 output
w0= -
0 1
1 0
-1 w1= 1
x1
Activation of
threshold units when:
n
So, units with a threshold activation function
W j,i a j W0,i
j1 can act as logic gates given the appropriate
input and bias weights.
11
Network Structures
• Feed-forward networks
• Activation flows from input layer to output layer
• single-layer perceptrons
• multi-layer perceptrons
• Feed-forward networks implement functions,
have no internal state (only weights).
• Recurrent networks
• Feed the outputs back into own inputs
Network is a dynamical system
(stable state, oscillations, chaotic behavior)
Response of the network depends on initial state
• Can support short-term memory
• More difficult to understand
12
Feed-Forward Network
Two input units Two hidden units One Output
Given an input vector x = (x1,x2), the activations of the input units are set to values of the
input vector, i.e., (a1,a2)=(x1,x2), and the network computes:
14
Perceptron Learning Intuition
• Weight Update
• Input Ij (j=1,2,…,n)
• Single output O: target output, T.
• Consider some initial weights
• Define example error: Err = T – O
• Now just move weights in right direction!
• If the error is positive, then we need to increase O.
• Err >0 need to increase O;
• If the error is negative, then we need to decrease O
Ij Wj O
• Err <0 need to decrease O;
• Each input unit j, contributes Wj Ij to total input:
• if Ij is positive, increasing Wj tends to increaseO;
• if Ij is negative, decreasing Wj tends to decreaseO;
• So we use Wj Wj + Ij Err is the learning rate.
15
Perceptron Leaning: Example
• Let’s consider an example
• Framework and notation:
• 0/1 signals
• Input vector:
X x0 , x1, x2 … , xn
• Weight vector:
W w0 , w1, w2 … , wn
• x0 = 1 and -w0, simulate the threshold.
17
Perceptron Leaning: Example
• Consider learning the logical OR function.
• Our examples are:
• Training Samples
x0 x1 x2 label
1 1 0 0 0
2 1 0 1 1
3 1 1 0 1
4 1 1 1 1
kn
Activation Function S wk xk S 0 then O 1 else O 0
k 0
18
Perceptron Leaning: Example
If perceptron is 0 while it should be 1,
k n
add the input vector to the weight vector
S wk x k S 0 then O 1 else O 0 If perceptron is 1 while it should be 0,
k 0
subtract the input vector to the weight vector
Error correcting method Otherwise do nothing.
I2 w2
20
Perceptron Leaning: Example
22
Perceptron Leaning: Example
24
Perceptron Leaning: Example
1
• Example 1 I= <1,0,0> label=0 W = <0,1,1> I0 W0 =0
• Perceptron (10+ 01+ 01 = 0) output 0
• it classifies it as 0, so correct, do nothing I1 W1=1O
27
Expressiveness of Perceptron
28
Linear Separability
+
What is its equation?
w0 w1 x1 w2 x2 0
x1
w1 w
x2 x1 0
w2 w2
Percepton used for classification
29
Linear Separability
x2
OR
x1
30
Linear Separability
x2
AND
x1
31
Linear Separability
x2
XOR
x1
32
Linear Separability
x2
Not linearly separable
XOR
x1
w1 x1 w2 x2 T
• Our examples are:
• Given our examples, we have the following inequalities for
x1 x2 label the perceptron:
1 0 0 0
• From (1) 0 + 0 ≤ T
2 1 0 1
• From (2) w1+ 0 > T
3 0 1 1
• From (3) 0 + w2 > T
4 1 1 0
• From (4) w1 + w2 ≤ T (contradiction)
35
Non Linear Classifiers
• The XOR problem
x1 x2 XOR Class
0 0 0 B
0 1 1 A
1 0 1 A
1 1 0 B
36
Non Linear Classifiers
• There is no single line (hyperplane) that separates class A
from class B. On the contrary, AND and OR operations are
linearly separable problems
37
The Two‐Layer Perceptron
• For the XOR problem,
draw two, instead, of
one lines
38
The Two‐Layer Perceptron
• Then class B is located outside the shaded area and class A inside. This is a
two‐phase design.
• Phase 1: Draw two lines (hyperplanes)
g1 ( x) g 2 ( x) 0
Each of them is realized by a perceptron. The outputs of the perceptrons
will be
0
yi f ( g i ( x)) i 1, 2
1
depending on the position of x.
• Phase 2: Find the position of x w.r.t. both lines, based on the values of y1,
y2.
39
The Two‐Layer Perceptron
x y [ y1 , y2 ]T
40
The Two‐Layer Perceptron
The decision is now performed on the transformed y data.
g ( y) 0
• The architecture
42
The Two‐Layer Perceptron
• This is known as the two layer perceptron with one hidden and one
output layer. The activation functions are
0
f (.)
1
• The neurons (nodes) of the figure realize the following lines
(hyperplanes)
1
g1 ( x) x1 x2 0
2
3
g 2 ( x) x1 x2 0
2
1
g ( y ) y1 2 y2 0
2
43