08 NN

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

CSE472, CSE386

Artificial Intelligence

Neural Networks
Prof. Mahmoud Khalil
Summer 2024
1
Artificial Neural Networks

Artificial neural networks are inspired by brains and neurons


A neural network is a graph with nodes, or units, connected by links
Each link has an associated weight, a real number
Typically, each node I outputs a real number, which is fed as input to the nodes connected to I
The output of a node is a function of the weighted sum of the node’s inputs
2
Basic Concepts
• A Neural Network maps a set of inputs to a set of
outputs
• Number of inputs/outputs is variable. Input 0 Input 1 Input n

• The Network itself is composed of an arbitrary


number of nodes or units, connected by links, with an
arbitrary topology. Neural Network
• A link from unit i to unit j serves to propagate the activation
aj to j, and it has a weight Wij.
• What can a neural networks do? Output 0 Output 1 Output m

Compute a known function / Approximate an unknown


function Pattern Recognition / Signal Processing
Learn to do any of the above
3
Fully Connected NN
Different
types of nodes

4
Artificial Neuron Node or Unit: Mathematical Abstraction
Artificial Neuron,
Node or unit ,
Processing Unit i

Input edges, Input Output edges,


Activation Output
each with weights Function (ini): each with weights
weighted sum function (g)
n

(positive, negative, and ai  g(  W j,i a j )


of its inputs, (positive, negative,
applied to j0

change over time, including and change over


input function
time, learning)
learning) (typically
n
ini   W j,i a j non-linear).
j0

 a processing element producing an output based on a function of its inputs


5
Activation Functions

Step Function
Sigmoid Function Sign Function

6
Normalizing Unit Thresholds
If t is the threshold value of the output unit, then

Where W0 = t and I0 = −1

- We can always assume that the unit’s threshold is 0 This allows thresholds to be
learned like any other weight

- We can even allow output values in [0, 1] by replacing step0 by the sigmoid function

7
Units as logic Gates

Units with a threshold activation function can act as logic gates;


we can use these units to compute Boolean function of its inputs.

Activation of
threshold units when:
n

W j,i a j  W0,i
j1

8
AND

x1 x2 output
0 0 0
W0= 1.5
0 1 0
1 0 0
-1
1 1 1 w1=1 w2=1

x1 x2
Activation of
threshold units when:
n

W j,i a j  W0,i
j1

9
OR
x1 x2 output
0 0 0 w0= 0.5
0 1 1
1 0 1 -1
w1=1 w2=1
1 1 1

x1 x2

Activation of
threshold units when:
n

W j,i a j  W0,i
j1

10
NOT

x1 output
w0= -
0 1
1 0
-1 w1= 1

x1
Activation of
threshold units when:
n
So, units with a threshold activation function
W j,i a j  W0,i
j1 can act as logic gates given the appropriate
input and bias weights.

11
Network Structures
• Feed-forward networks
• Activation flows from input layer to output layer
• single-layer perceptrons
• multi-layer perceptrons
• Feed-forward networks implement functions,
have no internal state (only weights).

• Recurrent networks
• Feed the outputs back into own inputs
Network is a dynamical system
(stable state, oscillations, chaotic behavior)
Response of the network depends on initial state
• Can support short-term memory
• More difficult to understand
12
Feed-Forward Network
Two input units Two hidden units One Output

Each unit receives input only


from units in the immediately
for simplicity, Bias unit omitted
preceding layer.

Given an input vector x = (x1,x2), the activations of the input units are set to values of the
input vector, i.e., (a1,a2)=(x1,x2), and the network computes:

By adjusting the weights we get different functions:


that is how learning is done in neural networks!
13
Single Layer Feed-Forward Networks (perceptron)
•Single-layer neural network (perceptron network)

A network with all the inputs connected directly to the outputs

– Output units all operate separately: no shared weights

Since each output unit is


independent of the others,
we can limit our study to
single output perceptrons.

14
Perceptron Learning Intuition
• Weight Update
•  Input Ij (j=1,2,…,n)
•  Single output O: target output, T.
• Consider some initial weights
• Define example error: Err = T – O
• Now just move weights in right direction!
• If the error is positive, then we need to increase O.
• Err >0  need to increase O;
• If the error is negative, then we need to decrease O
Ij Wj O
• Err <0  need to decrease O;
• Each input unit j, contributes Wj Ij to total input:
• if Ij is positive, increasing Wj tends to increaseO;
• if Ij is negative, decreasing Wj tends to decreaseO;
• So we use Wj  Wj +   Ij  Err  is the learning rate.
15
Perceptron Leaning: Example
• Let’s consider an example
• Framework and notation:
• 0/1 signals
• Input vector:
X  x0 , x1, x2 … , xn 
• Weight vector:

W  w0 , w1, w2 … , wn 
• x0 = 1 and -w0, simulate the threshold.

• O is output (0 or 1) (single output).


Learning rate = 1.
kn
• Threshold function: S   wk xk S  0 then O  1 else O  0
k 0
16
Perceptron Leaning: Example
Err = T – O
• Set of training examples, each example is a pair
Wj  Wj +   Ij  Err
• i.e., an input vector and a label y (0 or 1). (x , y )
i i

• Learning procedure, called the “error correcting method”

• Start with all zero-weight vector.


• Cycle (repeatedly) through the training examples and for each example do:
• If perceptron is 0 while it should be 1,
add the input vector to the weight vector
• If perceptron is 1 while it should be 0, Intuitively correct,
(e.g., if output is 0
subtract the input vector to the weight vector but it should be 1,
• Otherwise do nothing. the weights are
increased) !

17
Perceptron Leaning: Example
• Consider learning the logical OR function.
• Our examples are:

• Training Samples
x0 x1 x2 label
1 1 0 0 0
2 1 0 1 1
3 1 1 0 1
4 1 1 1 1
kn
Activation Function S   wk xk S  0 then O  1 else O  0
k 0

18
Perceptron Leaning: Example
If perceptron is 0 while it should be 1,
k n
add the input vector to the weight vector
S   wk x k S  0 then O  1 else O  0 If perceptron is 1 while it should be 0,
k 0
subtract the input vector to the weight vector
Error correcting method Otherwise do nothing.

• We’ll use a single perceptron with three inputs.


• We’ll start with all weights 0 W= <0,0,0>
1
I0 w 0
• Example 1 I= < 1,0, 0> label=0 W= <0,0,0>
• Perceptron (10+ 00+ 00 =0, S=0) output  0 I1 w1 O
• it classifies it as 0, so correct, do nothing
I2 w2
• Example 2 I=<1, 0, 1> label=1 W= <0,0,0>
• Perceptron (10+ 00+ 10 = 0) output 0
• it classifies it as 0, while it should be 1, so we add input to weights
• W = <0,0,0> + <1,0,1>= <1,0,1> 19
Perceptron Leaning: Example

• Example 3 I=<1,1, 0> label=1 W= <1,0,1>


• Perceptron (11+ 10+ 01 > 0) output = 1 I0 w0
• it classifies it as 1, correct, do nothing
• W = <1,0,1> I1 w1 O

I2 w2

• Example 4 I=<1,1,1> label=1 W= <1,0,1>


• Perceptron (11+ 10+ 11 > 0) output = 1
• it classifies it as 1, correct, do nothing
• W = <1,0,1>

20
Perceptron Leaning: Example

• Epoch 2, through the examples, W = <1,0,1> . 1


I0 w0

• Example 1 I = <1,0,0> label=0 W = <1,0,1> I1 w1 O


• Perceptron (11+ 00+ 01 >0) output  1
I2 w2
• it classifies it as 1, while it should be 0, so subtract input
from weights
• W = <1,0,1> - <1,0,0> = <0, 0, 1>

• Example 2 I=<1,0, 1> label=1 W= <0,0,1>


• Perceptron (10+ 00+ 11 > 0) output 1
• it classifies it as 1, so correct, do nothing 21
Perceptron Leaning: Example

• Example 3 I=<1,1,0> label=1 W= <0,0,1>


• Perceptron (10+ 10+ 01 = 0) output = 0
• it classifies it as 0, while it should be 1, so add input to weights
• W = I + W = <1, 1, 1>

• Example 4 I=<1,1,1> label=1 W= <1,1,1>


• Perceptron (11+ 11+ 11 > 0) output = 1
• it classifies it as 1, correct, do nothing
• W = <1,1,1>

22
Perceptron Leaning: Example

• Epoch 3, through the examples, W = <1,1,1> .


1 I0 w0

• Example 1 I=<1,0,0> label=0 W = <1,1,1> I1 w1 O


• Perceptron (11+ 01+ 01 >0) output  1
I2 w2
• it classifies it as 1, while it should be 0, so subtract
input from weights
• W = <1,1,1> - <1,0,0> = <0, 1, 1>

• Example 2 I=<1,0,1> label=1 W= <0, 1, 1>


• Perceptron (10+ 01+ 11 > 0) output 1
• it classifies it as 1, so correct, do nothing
• 23
Perceptron Leaning: Example

• Example 3 I=<1,1,0> label=1 W= <0, 1, 1>


• Perceptron (10+ 11+ 01 > 0) output = 1
• it classifies it as 1, correct, do nothing

• Example 4 I=<1,1,1> label=1 W= <0, 1, 1>


• Perceptron (10+ 11+ 11 > 0) output = 1
• it classifies it as 1, correct, do nothing

24
Perceptron Leaning: Example

• Epoch 4, through the examples, W= <0, 1, 1>.

1
• Example 1 I= <1,0,0> label=0 W = <0,1,1> I0 W0 =0
• Perceptron (10+ 01+ 01 = 0) output  0
• it classifies it as 0, so correct, do nothing I1 W1=1O

So the final weight vector W= <0, 1, 1> classifies all I2 W2=1


examples correctly, and the perceptron has learned the function!
OR
Aside: in more realistic cases the bias (W0) will not be 0.
(This was just a toy example!)
Also, in general, many more inputs (100 to 1000)
25
Perceptron Leaning: Example
Desired New New New
Epoch x0 x1 x2 w0 w1 w2 Output Error w2
Target w0 w1
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
1 0 1 1 0 0 0 0 1 1 0 1
1 1 0 1 1 0 1 1 0 1 0 1
1 1 1 1 1 0 1 1 0 1 0 1
2 1 0 0 0 1 0 1 1 -1 0 0 1
1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
26
Expressiveness of Perceptron

What hypothesis space can a perceptron represent?

Even more complex Booelan functions such as majority function .

But can it represent any arbitrary Boolean function?

27
Expressiveness of Perceptron

A threshold perceptron returns 1 iff the weighted sum of its inputs


(including the bias) is positive, i.e.,:

I.e., iff the input is on one side of the hyperplane it defines.


Perceptron  Linear Separator

Linear discriminant function or linear decision surface.

Weights determine slope and bias determines offset.

28
Linear Separability

Consider example with two inputs, x1, x2:


x2
+ +
+ Can view trained network
++ + as defining a “separation line”.

+
What is its equation?
 w0  w1 x1  w2 x2 0
x1

w1 w
x2   x1  0
w2 w2
Percepton used for classification

29
Linear Separability

x2
 
OR

  x1

30
Linear Separability

x2

 

AND
  x1

31
Linear Separability

x2

 
XOR

  x1

32
Linear Separability

x2
Not linearly separable
 
XOR

  x1

Minsky & Papert (1969)


Bad News: Perceptrons can only represent linearly separable functions.
33
Linear Separability XOR
• Consider a threshold perceptron for the logical XOR function (two inputs):

w1 x1  w2 x2  T
• Our examples are:
• Given our examples, we have the following inequalities for
x1 x2 label the perceptron:
1 0 0 0
• From (1) 0 + 0 ≤ T
2 1 0 1
• From (2) w1+ 0 > T
3 0 1 1
• From (3) 0 + w2 > T
4 1 1 0
• From (4) w1 + w2 ≤ T (contradiction)

So, XOR is not linearly separable


34
Convergence of Perceptron Learning Algorithm

Perceptron converges to a consistent function, if…

• … training data linearly separable


• … step size  sufficiently small
• … no “hidden” units

35
Non Linear Classifiers
• The XOR problem
x1 x2 XOR Class
0 0 0 B
0 1 1 A
1 0 1 A
1 1 0 B

36
Non Linear Classifiers
• There is no single line (hyperplane) that separates class A
from class B. On the contrary, AND and OR operations are
linearly separable problems

37
The Two‐Layer Perceptron
• For the XOR problem,
draw two, instead, of
one lines

38
The Two‐Layer Perceptron
• Then class B is located outside the shaded area and class A inside. This is a
two‐phase design.
• Phase 1: Draw two lines (hyperplanes)

g1 ( x)  g 2 ( x)  0
Each of them is realized by a perceptron. The outputs of the perceptrons
will be
0
yi  f ( g i ( x))   i  1, 2
1
depending on the position of x.

• Phase 2: Find the position of x w.r.t. both lines, based on the values of y1,
y2.
39
The Two‐Layer Perceptron

1st phase 2nd


x1 x2 y1 y2 phase
0 0 0(-) 0(-) B(0)
0 1 1(+) 0(-) A(1)
1 0 1(+) 0(-) A(1)
1 1 1(+) 1(+) B(0)

• Equivalently: The computations of the first phase perform a mapping

x  y  [ y1 , y2 ]T

40
The Two‐Layer Perceptron
The decision is now performed on the transformed y data.

g ( y)  0

This can be performed via a second line, which can also be


realized by a perceptron. 41
The Two‐Layer Perceptron
• Computations of the first phase perform a mapping
that transforms the nonlinearly separable problem to a
linearly separable one.

• The architecture

42
The Two‐Layer Perceptron
• This is known as the two layer perceptron with one hidden and one
output layer. The activation functions are
0
f (.)  
1
• The neurons (nodes) of the figure realize the following lines
(hyperplanes)
1
g1 ( x)  x1  x2   0
2
3
g 2 ( x)  x1  x2   0
2
1
g ( y )  y1  2 y2   0
2
43

You might also like