Academia.eduAcademia.edu

Neural Networks

Neural Networks - Introduction and Review - Wen Yu 1 References Neural Networks: Introduction and Review by Wen Yu Mitchell: Machine Learning, McGraw Hill, 1997. Russell and Norvig: Artificial Intelligence: A Modern Approach, Prentice Hall, 1995. Hertz, Krogh & Palmer: Introduction to the theory of neural computation, Addison-Wesley, 1991. Cowan & Sharp: Neural nets and artificial intelligence, Daedalus 117:85-121, 1988. _____, University of Wisconsin History of Neural Networks Attempts to mimic the human brain date back to work in the 1930s, 1940s, & 1950s by Alan Turing, Warren McCullough, Walter Pitts, Donald Hebb and James von Neumann – 1943 McCulloch-Pitts: neuron as comp. elem – 1948 Wiener: cybernatics – 1949 Hebb: learning rule 1957 Rosenblatt at Cornell developed Perceptron, a hardware neural net for character recognition 1959 Widrow and Hoff at Stanford developed Adaline for adaptive control of noise on telephone lines – 1960 Widrow-Hoff: least mean square algorithm History of Neural Networks Recession – 1969 Minsky-Papert: limitations perceptron model Linear Separability in Perceptrons History of Neural Networks Revival, mathematically tied together many of the ideas from previous research. – 1982 Hopfield: recurrent network model – 1982 Kohonen: self-organizing maps – 1986 Rumelhart et. al.: backpropagation – universial approximation Since then, growth has exploded. Over 80% of “Fortune 500” have neural net R&D programs. – Thousands of research papers – Commercial software applications Applications with Neural Network Forecasting/Market Prediction: finance and banking Manufacturing: quality control, fault diagnosis Medicine: analysis of electrocardiogram data, RNA & DNA sequencing, drug development without animal testing Pattern/Image recognition: handwriting recognition, airport bomb detection Optimization: without Simplex Control: process, robotics The Biological Neuron Neurons are brain cells, it is estimated that there are 1012 neurons and 1014 synaptic connections in the human brain Biological Neurons synapse axon nucleus cell body dendrites axon dendrites synapses The information transmission happens at the synapses. Neural Dynamics 40 mV membrane rest activation 20 0 Action potential ≈ 100mV Activation threshold ≈ 20-30mV Rest potential ≈ -65mV Spike time ≈ 1-2ms Refractory time ≈ 10-20ms -20 Action potential -40 -60 -80 -100 Refractory time ms -120 0 10 20 30 40 50 60 70 80 90 100 Key to Intelligence Synapse weight adjustment. – Connection strength Each neuron receives input from nearly 50,000 to 80,000 other neurons in the human brain. The contribution of the signals depends on the strength of the synaptic connection. Simple Neuron Nodes have input signals. Dendrites carry an impulse to the neuron Nodes have one output signal. Axons carry signal out of neuron and synapses are local regions where signals are transmitted from the axon of one neuron to dendrites of another Input signal weights are summed at each node. Nerve impulses are binary; they are “go” or “no go”. Neurons sum up the incoming signal and fire if a threshold value is reached. Artificial Neurons Neurons work by processing information. They receive and provide information in form of spikes. x1 w1 x2 Inputs w2 x3 … xn-1 xn n . z = ∑ wi xi ; y = H z ) i =1 w3 . . wn-1 wn The McCullogh-Pitts model Output y Neural Networks Inspired by natural decision making structures (real nervous systems and brains) If you connect lots of simple decision making pieces together, they can make more complex decisions – Compose simple functions to produce complex functions Neural networks: – Take multiple numeric input variables – Produce multiple numeric output values – Normally threshold outputs to turn them into discrete values – Map discrete values onto classes, and you have a classifier! – But, the only time I’ve used them is as approximation functions Simulated Neuron - Perceptron Inputs (aj) from other perceptrons with weights (Wi,j) – Learning occurs by adjusting the weights Perceptron calculates weighted sum of inputs (ini) Threshold function calculates output (ai) – Step function (if ini > t then ai = 1 else ai = 0) – Sigmoid g(a) = 1/(1+e-x) Output becomes input for next layer of perceptron aj W Σ W aj = ini ai ai i) Network Structure Single perceptron can represent AND or OR, but not XOR – Combinations of perceptron are more powerful Perceptron are usually organized on layers – Input layer: takes external input – Hidden layer(s) – Output layer: external output Feed-forward vs. recurrent – Feed-forward: outputs only connect to later layers Learning is easier – Recurrent: outputs can connect to earlier layers or same layer Internal state Neural network for Quake Four input perceptron – One input for each condition Enemy Dead Sound Four perceptron hidden layer Low Health – Fully connected Five output perceptron – One output for each action – Choose action with highest output – Or, probabilistic action selection Choose at random weighted by output Attack Wander Retreat Spawn Chase Learning Neural Networks Learning from examples – Examples consist of input, t, and correct output, o Learn if network’s output doesn’t match correct output – Adjust weights to reduce difference – Only change weights a small amount (η) Basic perceptron learning – – – – Wi,j = Wi,j + η(t-o)aj If output is too high (t-o) is negative so Wi,j will be reduced If output is too low (t-o) is positive so Wi,j will be increased If aj is negative the opposite happens Neural Net Example Single perceptron to represent OR – Two inputs – One output (1 if either inputs is 1) – Step function (if weighted sum > 0.5 output a 1) Initial state (below) gives error on (1,0) input – Training occurs 1 0.1 Σ Wj aj = 0.1 0 0 0.6 Neural Net Example Wj = Wj + η(t-o)aj W1 = 0.1 + 0.1(1-0)1 = 0.2 W2 = 0.6 + 0.1(1-0)0 = 0.6 After this step, try (0,1)→1 example – No error, so no training 0 0.2 Σ Wj aj = 0.6 1 1 0.6 Neural Net Example 1 0.2 Σ Wj aj = 0.2 0 Try (1,0)→1 example 0 0.6 – Still an error, so training occurs W1 = 0.2 + 0.1(1-0)1 = 0.3 W2 = 0.6 + 0.1(1-0)0 = 0.6 And so on… – What is a network that works for OR? – What about AND? – Why not XOR? Neural Networks Evaluation Advantages – Handle errors well – Graceful degradation – Can learn novel solutions Disadvantages – “Neural networks are the second best way to do anything” – Can’t understand how or why the learned network works – Examples must match real problems – Need as many examples as possible – Learning takes lots of processing Incremental so learning during play might be possible Binary Neurons hard threshold 1.2 Stimulus output 1 0.8 ui = ∑ wij ⋅ x j on 0.6 Response yi = f (urest + ui ) j 0.4 0.2 input 0 -0.2 -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.4 -0.6 heaviside -0.8 -1 -1.2 off “Hard” threshold  z ≥ Θ ⇒ ON    f (z ) =    else ⇒ OFF    < ex: Perceptrons, Hopfield NNs, Boltzmann Machines < Main drawbacks: can only map binary functions, biologically implausible. Θ= threshold Analog Neurons sigmoid 1.2 output Stimulus 1 on ui = ∑ wij ⋅ x j 0.8 0.6 Response yi = f (urest + ui ) j 0.4 0.2 input 0 -0.2 -0.4 -10 -8 -6 -4 -2 0 2 4 6 8 2/(1+exp(-x))-1 10 “Soft” threshold -0.6 -0.8 -1 -1.2 off f (z ) = 2 −1 1 + e−z < ex: MLPs, Recurrent NNs, RBF NNs... < Main drawbacks: difficult to process time patterns, biologically implausible. Spiking Neurons Stimulus η = spike and afterspike potential urest = resting potential ε(t,u(τ)) = trace at time t of input at time τ Θ= threshold xj(t) = output of neuron j at time t wij = efficacy of synapse from neuron i to neuron j u(t) = input stimulus at time t ui (t ) = ∑ wij ⋅ x j (t ) j Response ( t ) yi t ) = f urest + η t − t f ) + ∑τ =0 ε (t , ui (τ )) dz   ≥ Θ > ⇒ ON z 0   dt f (z ) =     else ⇒ OFF   Firing pattern of units) Timing of spike trains to first spike, phase of signal, correlation and synchronicity Spiking Neuron Dynamics ne 5 5 1 5 - 5 -1 V y(t) Θ urest+η(t-tf) Artificial Neural Networks Output layer Hidden layers fully connected Input layer sparsely connected Feedforward ANN Architectures Information flow unidirectional Static mapping: y=f(x) Multi-Layer Perceptron (MLP) Radial Basis Function (RBF) Kohonen Self-Organising Map (SOM) Recurrent ANN Architectures Feedback connections Dynamic memory: y(t +1)=f(x(τ),y(τ),s(τ)) τ∈(t,t-1,...) Jordan/Elman ANNs Hopfield Adaptive Resonance Theory (ART) Activation Functions 20 18 16 14 y=x Linear 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20 2 1.5 1 0.5 Sigmoid 0 -0.5 -1 1 y= 1 + exp − x) -1.5 -2 -10 -8 -6 -4 -2 0 2 4 6 8 10 2 1.5 1 0.5 Hyperbolic tangent 0 -0.5 -1 -1.5 -2 -10 -8 -6 -4 -2 0 2 4 6 8 10 exp x) − exp − x) y= exp x) + exp − x) Neural Network Mathematics a w1 Inputs y f b Output 1 y= Gaussian RBF f β − 1+ 2 ) w1 y f w2 −α ( ) ( )  a −c k 2 b−c k 2  −  k1 2 + k 2 2   σ1 ) σ2 )    2 y=∑ 2 −γ ) 1+ k k =1 + b + w2 MLP neural network: a y= f − n y=∑ k =1 k || −c k ||2 ak 2 ANN Capabilities Learning Approximate reasoning Generalisation capability Noise filtering Parallel processing Distributed knowledge base Fault tolerance Properties of Neural Networks Supervised networks are universal approximators Theorem : Any limited function can be approximated by a neural network with a finite number of hidden neurons to an arbitrary precision Type of Approximators Linear approximators : for a given precision, the number of parameters grows exponentially with the number of variables (polynomials) Non-linear approximators (NN), the number of parameters grows linearly with the number of variables Knowledge base not transparent (black box) (Partially resolved) Learning sometimes difficult/slow Limited storage capability Learning in Biological Systems as Optimisation Learning = learning by adaptation The young animal learns that the green fruits are sour, while the yellowish/reddish ones are sweet. The learning happens by adapting the fruit picking behavior. The animal likes to eat many energy rich, juicy fruits that make its stomach full, and makes it feel happy. At the neural level the learning happens by changing of the synaptic strengths, eliminating some synapses, and building new ones. the objective of learning in biological organisms is to optimise the amount of available resources, happiness, or in general to achieve a closer to optimal state Learning Principle for Artificial Neural Networks Maintaining synaptic strength needs energy, it should be maintained at those places where it is needed, and it shouldn’t be maintained at places where it’s not needed ENERGY MINIMIZATION We need an appropriate definition of energy for artificial neural networks, and having that we can use mathematical optimisation techniques to find how to change the weights of the synaptic connections between neurons. ENERGY = measure of task performance error Neural Networks Learning Supervised learning – – – – Classification Control Function approximation Associative memory Unsupervised learning – Clustering Reinforcement learning – Control Unsupervised Learning ANN adapts weights to cluster input data Hebbian learning – Connection stimulus-response strengthened (hebbian) Competitive learning algorithms – Kohonen & ART – Input weights adjusted to resemble stimulus Hebbian Learning General Formulation d wij = F (wij , yi , x j ) dt Hebbian Kohonen, Competitive (ART) F = λ ⋅ yi ⋅ x j F = λ ⋅ yi ⋅ (x j − wij ) d wij = λ ⋅ yi ⋅ x j dt d wij = λ ⋅ yi ⋅ (x j − wij ) dt λ=learning coefficient wij=connection from neuron xj to yi y1 Hebb postulate (1948) Correlation-based learning Connections between concurrently firing neurons are strengthened Experimentally verified (1973) w11 x1 w12 x2 Supervised Learning Teacher presents ANN input-output pairs ANN weights adjusted according to error Iterative algorithms (e.g. Delta rule, BP rule) One-shot learning (Hopfield) Quality of training examples is critical Delta Rule = di − yi Δwij = λ ⋅ e ⋅ x j λ=learning coefficient wij=connection from neuron xj to yi x=(x1,x2,...,xn) ANN input y=(y1,y2,...,yn) ANN output d=(d1,d2,...,dn) desired output (x,d) training example e=ANN error Least Mean Squares y1 y2 y3 w13 w14 Widrow-Hoff iterative delta rule Gradient descent of the error surface w11 w12 Guaranteed to find minimum error configuration in single layer ANNs x1 x2 x3 x4 Gradient Learning Error measure: = 1 N 2 ( ( ; ) − y ) ∑ t t t =1 Rule for changing the synaptic weights: Δ j i j i ∂ ( ) = −η j ∂ i (k + 1) = j i (k ) + Δwi j η is the learning parameter (usually a constant) Learning with a Perceptron A perceptron is able to learn a linear function. Perceptron: yout = w x 1 Data: ( x y1 ( x Error: Learning: 2 y2 ) .. ( x N yN ) E ( ) = ( ( )out − 2 t ) = (w − t ) 2 ∂E (t ) ∂( w(t ) x t − yt )2 = w (t ) − η w (t + 1) = w (t ) − η ∂w ∂w w (t + 1) = w (t ) − η ( w(t ) x t − yt ) ⋅ x t w(t ) x = ∑ w (t ) ⋅ x tj =1 Learning with RBF Neural Networks An RBF neural network learns a nonlinear function. RBF neural network: Data: Error: Learning: 1 t = F ( x W ) = ∑ wk e 2 ( x y1 ( x y2 ) .. ( x E =( out − N − x −c k ak 2 k =1 yN ) t ) = ( ∑ wk e 2 t − − k 2 2 − k ∂E (t ) wi (t + 1) = wi (t ) − η ∂wi k =1 − ∂E (t ) t = ( F ( x W (t )) − yt )e ∂wi x t −c i ai 2 2 2 ) t 2 Learning with General Optimization MLP neural network with a single hidden layer: w1 φ x w2 φ ∑ φ out = F ( x W ) = ∑ w2 =1 1 1+ e − w1 x −ak Learning with General Optimization E = ( yout − yt ) 2 w (k + 1) = w (k ) − η ∂E (t ) 2 2 ∂w2 ∂E (t ) w1 i j ( k + 1) = w1 i j ( k ) − η ∂w1 i j 1 ∂  ∂E (t )  = 2( F − y t ) ∂w1 i j  1 + e − w1T i x −ai ∂w1 i j ∂ ∂w1 i  1   − w T x − ai j 1+ e 1i (     − w1T i x − ai  ∂ e T = w ⋅ − 1 i x − ai 2  T  1 + e − w1 i x −ai  ∂w1 i j   ) ∂ − w1Ti x − ai = − x j ∂w1 i j − w1T i x − ai η 1 ∂E (k ) = 2( F − y t ) −w ∂w2 1+ e 1 e ∂E (t ) (− x j ) = −2η ( F − yt ) 2 ∂w1 i j 1 + e − w1T i x −ai      ( ) x t −a Reinforcement Learning Sequential tasks Desired action may not be known Critic evaluation of ANN behaviour Weights adjusted according to critic May require credit assignment Population-based learning – Evolutionary Algorithms – Swarming Techniques – Immune Networks Learning Summary Artificial Neural Networks Feedforward Unsupervised Kohonen, Hebbian Supervised MLP, RBF Recurrent Unsupervised Supervised ART Elman, Jordan, Hopfield