Different Classes of Abstract Models: - Supervised Learning (EX: Perceptron) Reinforcement Learning - Unsupervised Learning (EX: Hebb Rule)

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

Different classes of abstract models:

- Supervised learning (EX: Perceptron)

Reinforcement learning

-Unsupervised learning (EX: Hebb rule)

- Associative memory (EX: Matrix memory)


Abstraction – so what is a neuron?

• Threshold unit (McCullough-Pitts)


x x  0
O   (iwi xi  w0 ) where  ( x)  
 

0 x  0

• Linear: O 
 i i i  w0
w x 

• Sigmoid: O 
 sig ( i i i  w0 )
w x 
THE PERCEPTRON:
(Classification)
x x  0
Threshold unit: O   ( wi x  w0 ) where  ( x)  

0 x  0
i

 
where o is the output for input pattern x,

Wi are the synaptic weights and y is the desired output
AND
x1 x2 y

o
1 1 1
1 0 0
w1 w2 w3 w4 w5
0 1 0
     0 0 0
x1 x2 x 3 x4 x 5
AND
x1 x2 y
1
1 1 1
1 0 0 x1  x2 1.5  0
0 1 0
0 0 0
o 0 1
-1.5

1 1 Linearly seprable

 
x1 x2
OR
x1 x2 y 1
1 1 1
1 0 1 x1  x2  0.5  0
0 1 1
0 0 0
o 0 1
-0.5

1 1 Linearly separable

 
x1 x2
Perceptron learning rule:

   A Convergence Proof exists


  (y o )
 
Wi   xi

o

w1 w2 w3 w4 w5

   
x2 x 3 x4 x 5
1. Show examples of Perceptron learning with demo program

2. Show the program itself

3. Talk about linear seperability, define dot product, show on


computer.
Unsupervised learning – the “Hebb” rule.

dWi
  xi y where xi are the inputs
dt
and y the output is assumed linear:

y  W j x j
j
Results in 2D
Example of Hebb in 2D

2
=/3
m
w
1

0
x2

-1

-2
-2 -1 x1 0 1 2

(Note: here inputs have a mean of zero)


• Show program, tilt axis, look a divergence
Why do we get these results?

On the board:

• Solve simple linear first order ODE

• Fixed points and their stability for non linear ODE

• Eigen-values, Eigen-vectors
In the simplest case, the change in synaptic weight w
is:
wi  xi y
where x are input vectors and y is the neural response.

Assume for simplicity a linear neuron: y  w


j
j xj
 
So we get: wi     xi x j w j 
 j 
Now take an average with respect to the distribution
of inputs, get:
 
E[wi ]     E[ xi x j ]w j     Qij w j
 j  j
If a small change Δw occurs over a short time Δt then:
(in matrix notation)

w dw
  Qw
t dt
If <x>=0 , Q is the covariance matrix.

What is then the solution of this simple first order


linear ODE ?
(Show on board)
• Show program of Hebb rule again

• Show effect of saturation limits

• Possible solution – normalization

Oja (PCA) rule

dWi
dt

  xi y  Wi y 2

Show PCA program:
dWi
dt

  xi y  Wi y 2 

1
0.8
0.6
0.4
0.2
W1 0
-0.2
-0.4
-0.6
-0.8
-1
-1 -0.5 0 0.5 1

W2

OK- some more programming – convert Hebb program


To Oja rule program
So OK , simulations, matlab, mathhhh etc.

What does this have to do with Biology, with the


brain?
Another unsupervised learning model:
The BCM theory of synaptic plasticity.

The BCM theory of cortical plasticity

BCM stands for Bienestock Cooper and Munro, it dates back to


1982. It was designed in order to account for experiments
which demonstrated that the development of orientation
selective cells depends on rearing in a patterned environment.
BCM Theory
(Bienenstock, Cooper, Munro 1982;
Intrator, Cooper 1992)

Requires
• Bidirectional synaptic modification
LTP/LTD
• Sliding modification threshold LTD
LTP
• The fixed points depend on the
environment, and in a patterned
environment only selective fixed points
are stable.
The integral form of the average:
Is equivalent to this differential form:

Note, it is essential that θm is a superlinear function of the


history of C, that is:

with p>0

Note also that in the original BCM formulation (1982)


rather then
What is the outcome of the BCM theory?

Assume a neuron with N inputs (N synapses),


and an environment composed of N different
input vectors.

A N=2 example:

x1
What are the stable
x2 fixed points of m in
this case?
(Notation: )

Note:
Every time a new input
is presented, m
x1 changes, and so does
x2 θm

What are the fixed points? What are the


stable fixed points?
The integral form of the average:

Is equivalent to this differential form:

Alternative form:

Show matlab example:


Two examples
with N= 5

Note: The stable FP is


such that for one pattern
yi=∑wixi =θm while for the
others
y(i≠j)=0.

(note: here c=y)


BCM Theory
Stability

•One dimension y  w  xT
•Quadratic form  (c )
 y  y   M x
dw
•Instantaneous limit dt
M  y 2

y  y  y 2 x
dw

dt y
 y 2 (1  y ) x

0 1 y
BCM Theory
Selectivity
•Two dimensions y  w1x1  w2 x2  w  xT
•Two patterns y1  w  x1 , y 2  w  x 2

  y k  y k   M x k
•Quadratic form dw
•Averaged threshold dt
M  E y2  
patterns x2
2
  k )
p (
k 1
y k 2
x2

dw
•Fixed points 0
dt
BCM Theory: Selectivity
 y k  y k   M x k
•Learning Equation dw
dt
•Four possible fixed points

(unselective) y 1
 0 , y2  0 w1 x1
(Selective) y1  M , y 2  0 x2
(Selective) y1  0 , y2  M
(unselective) y1   , y2
M  M w2

•Threshold  M  p1 ( y1 )2  p2 ( y 2 )2  p1 ( y1 )2
 y1  1 / p1
Summary

• The BCM rule is based on two differential equations, what are


they?

• When there are two linearly independent inputs, what will be the
BCM stable fixed points? What will θ be?

•When there are K independent inputs, what are the stable fixed
points? What will θ be?

Bonus project – 10 extra points for section

Write in matlab a code for a BCM neuron trained with 2 inputs in


2D. Include a 1 page write-up and you will also meet with me for
about 15 minutes to explain code and results.
Associative memory:
Famous images Names

Albert
Input desired output

 x11 x12 x13 x14   y11 y12 y13 y14 


Marilyn  1 4  1 
 x2
2 3
x2 x2 x2   y2 y22 y23 y24 
 x31 x32 x33 x34   y31 y32 y33 y34 
 1 4
 1 
 x4 x4 x4 x4 
2 3
 y4 y42 y43 y44 
. .
. .
. 1. Feed forward matrix
.
networks

2. Attractor networks (auto-


Harel associative)
Linear matrix memory:

N input neurons, M output neurons:

o1 o oM
P input output pairs

w1µ w2µ wNµ 1. Set synaptic weights


by Hebb rule
  
x1 x2 xN 2. Present input – output
is a linear operation
P
1. Hebb rule: Wij   x y k
i
k
j
k 1

N
2. Linear output: O   x Wij
r
j
r
i
i 1

Here you are on your own – write a matlab program to do this.


Tip – use large N, small P, start with orthogonal patterns.
A low-D example of a linear matrix memory, do on the board.

Use simple Hebb rule between input and desired output.

- Orthogonal inputs

- Non Orthogonal inputs

Give examples
Might require other rules, Covariance, Perceptron
Formal neural networks can accomplish many tasks, for example:

• Perform complex classification

•Learn arbitrary functions

•Account for associative memory


Some applications: Robotics, Character recognition, Speech recognition,
Medical diagnostics.

This is not Neuroscience, but is motivated loosely by neuroscience and


carries important information for neuroscience as well.

For example: Memory, learning and some aspects of development are


assumed to be based on synaptic plasticity.

You might also like