Artificial Neural Networks: Slides Are By: Tan, Steinbach, Karpatne, Kumar

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

Intelligent Systems

Artificial Neural Networks

Slides are by: Tan, Steinbach, Karpatne, Kumar

02/14/2018 Introduction to Data Mining, 2nd Edition 1


Machine learning Problem

 Non-linear Classification Complex non-linear hypotheses (only 2 features)

Large number of features i.e.100

Quadratic features has complexity of


O(n2)~n2/2 i.e, 5000 features

02/14/2018 Introduction to Data Mining, 2nd Edition 2


Machine learning Problem

 What is this?
 You see this:

02/14/2018 Introduction to Data Mining, 2nd Edition 3


Computer vision : Car detection

 Car Detection Training labeled pictures

02/14/2018 Introduction to Data Mining, 2nd Edition 4


02/14/2018 Introduction to Data Mining, 2nd Edition 5
Neuron in the brain

(INPUT)

(OUTPUT)

02/14/2018 Introduction to Data Mining, 2nd Edition 6


Artificial Neural Networks (ANN)

X 1 X2 X3 Y Input Black box


1 0 0 -1
1 0 1 1
X1
1 1 0 1 Output
1 1 1 1
0 0 1 -1
X2 Y
0 1 0 -1
0 1 1 1 X3
0 0 0 -1

Output Y is 1 if at least two of the three inputs are equal to 1.

02/14/2018 Introduction to Data Mining, 2nd Edition 7


Artificial Neural Networks (ANN)
X0 1
 0   0 . 4 ( bias unit )
Input
nodes Black box
X 1 X2 X 3 Y
1 0 0 -1 Output
1 0 1 1
X1 0.3 node
1 1 0 1
0.3 hY
 (x )
1 1 1 1
0 0 1 -1
X2 

0 1 0 -1
0 1 1 1 X3 0.3 t=0.4
0 0 0 -1

h ( x )  sign ( 0 . 3 X 1  0 . 3 X 2  0 . 3 X 3  0 . 4 )
1 if x  0
where sign ( x )  
 1 if x  0

02/14/2018 Introduction to Data Mining, 2nd Edition 8


Artificial Neural Networks (ANN)

Input
 Model is an assembly of nodes
inter-connected nodes and Black box
Output
weighted links X1 w1 node
w2
X2  hY (x)
 Output node sums up w3

each of its input value X3 t


according to the weights
of its links Perceptron Model
d
h ( x)  sign( ( wi X i )  w0 X 0
i 1
d
 sign( wi X i )
i 0
02/14/2018 Introduction to Data Mining, 2nd Edition 9
Perceptron

 Single layer network


– Contains only input and output nodes

 Activation function: h(x)= sign(wx)

 Applying model is straightforward


h ( x)  sign(0.3 X 1  0.3 X 2  0.3 X 3  0.4)
 1 if x  0
where sign( x)  
 1 if x  0

– X1 = 1, X2 = 0, X3 =1 => y = sign(0.2) = 1
02/14/2018 Introduction to Data Mining, 2nd Edition 10
Perceptron Learning Rule

 Initialize the weights (w0, w1, …, wd)


 Repeat
– For each training example (xi, yi)
 Compute f(w, xi)
 Update the weights:
 
w( k 1)  w( k )   yi  f ( w( k ) , xi ) xi

 Until stopping condition is met

02/14/2018 Introduction to Data Mining, 2nd Edition 11


Perceptron Learning Rule

 Weight update formula:


 
w( k 1)  w( k )   yi  f ( w ( k ) , xi ) xi ;  : learning rate

 Intuition:
– Update weight based on error: e   yi  f ( w( k ) , xi )
– If y=f(x,w), e=0: no update needed
– If y>f(x,w), e=2: weight must be increased so
that f(x,w) will increase
– If y<f(x,w), e=-2: weight must be decreased so
that f(x,w) will decrease
02/14/2018 Introduction to Data Mining, 2nd Edition 12
Example of Perceptron Learning


w ( k 1)  w ( k )   y i  f ( w ( k ) , x i ) x i 
d
Y  sign(  wi X i )
i 0

  0 .1
X1 X2 X3 Y w0 w1 w2 w3 Epoch w0 w1 w2 w3
1 0 0 -1 0 0 0 0 0 0 0 0 0 0
1 0 1 1 1 -0.2 -0.2 0 0 1 -0.2 0 0.2 0.2
2 0 0 0 0.2 2 -0.2 0 0.4 0.2
1 1 0 1
3 0 0 0 0.2
1 1 1 1 3 -0.4 0 0.4 0.2
4 0 0 0 0.2
0 0 1 -1 5 -0.2 0 0 0 4 -0.4 0.2 0.4 0.4
0 1 0 -1 6 -0.2 0 0 0 5 -0.6 0.2 0.4 0.2
0 1 1 1 7 0 0 0.2 0.2 6 -0.6 0.4 0.4 0.2
0 0 0 -1 8 -0.2 0 0.2 0.2

02/14/2018 Introduction to Data Mining, 2nd Edition 13


Perceptron Learning Rule

 Since f(w,x) is a linear


combination of input
variables, decision
boundary is linear

 For nonlinearly separable problems, perceptron


learning algorithm will fail because no linear
hyperplane can separate the data perfectly

02/14/2018 Introduction to Data Mining, 2nd Edition 14


General Structure of ANN

x1 x2 x3 x4 x5

Input
Layer Input Neuron i Output
I1 wi1
wi2 Activation
I2
wi3
Si function Oi Oi
Hidden g(Si )
Layer I3

threshold, t

Output Training ANN means learning


Layer the weights of the neurons

02/14/2018 Introduction to Data Mining, 2nd Edition 15


Nonlinearly Separable Data

XOR Data

y  x1  x2
x1 x2 y
0 0 -1
1 0 1
0 1 1
1 1 -1

02/14/2018 Introduction to Data Mining, 2nd Edition 16


Multilayer Neural Network

 An artificial neural network has a more complex


structure than that of a perceptron model.
– Hidden layers: intermediary layers between input &
output layers
– The network may use types of activation functions
other than the sign function.
– Examples of other activation functions include
sigmoid (logistic), and hyperbolic tangent
functions.
– These activation functions allow the hidden and output
nodes to produce output values that are nonlinear in
their input parameters.

02/14/2018 Introduction to Data Mining, 2nd Edition 17


Artificial Neural Networks (ANN)

 Various types of neural network topology


– single-layered network (perceptron) versus
multi-layered network
– Feed-forward versus recurrent network

 Various types of
activation functions (f)

h ( x)  f ( wi X i )
i

02/14/2018 Introduction to Data Mining, 2nd Edition 18


Multi-layer Neural Network

 Multi-layer neural network can solve any type of classification task


involving nonlinear decision surfaces. hyperplanes
XOR Data

Where σ is a sigmoid function

02/14/2018 Introduction to Data Mining, 2nd Edition 19


Learning Multi-layer Neural Network

 Can we apply perceptron learning rule to each


node, including hidden nodes?
– Perceptron learning rule computes error term
e = y-f(w,x) and updates weights accordingly
 Problem: how to determine the true value of y for
hidden nodes?
– Approximate error in hidden nodes by error in
the output nodes
 Problem:
– Not clear how adjustment in the hidden nodes affect overall
error
– No guarantee of convergence to optimal solution

02/14/2018 Introduction to Data Mining, 2nd Edition 20


Learning the ANN model (Multilayer)

 The goal of the ANN learning algorithm


is to determine a set of weights w that
minimize the total sum of squared
errors:

 sum of squared errors depends on w


because the predicted class is a
function of the weights assigned to the
hidden and output nodes.
 error surface is typically encountered
when is a linear function of its
parameters, w.
 i.e. when the error function
becomes quadratic in its parameters and
a global minimum solution can be easily
found.
02/14/2018 Introduction to Data Mining, 2
nd
Edition 21
Learning the ANN model (Multilayer)

 In most cases, the output of an ANN is a


nonlinear function of its parameters because of
the choice of its activation functions (e.g., sigmoid
or tanh function).
 As a result, it is no longer straightforward to
derive a solution for w that is guaranteed to be
globally optimal.
 Greedy algorithms such as those based on the
gradient descent method have been developed to
efficiently solve the optimization problem.

02/14/2018 Introduction to Data Mining, 2nd Edition 22


Learning the ANN model (Multilayer)

 The weight update formula used by the gradient descent method can
be written as follows:

 where λ is the learning rate.


 The second term states that the weight should be increased in a
direction that reduces the overall error term.

 For hidden nodes, the computation is not trivial because it is difficult


to assess their error term , without knowing what their output
values should be.
 A technique known as back-propagation has been developed to
address this problem.

02/14/2018 Introduction to Data Mining, 2nd Edition 23


Design Issues in ANN

 Number of nodes in input layer


– One input node per binary/continuous attribute
– k or log2 k nodes for each categorical attribute with k
values
 Number of nodes in output layer
– One output for binary class problem
– k nodes for k-class problem
 Number of nodes in hidden layer
 Initial weights and biases, random assignment are usually
acceptable.
 Training examples with missing values should be removed
or replaced with most likely values.

02/14/2018 Introduction to Data Mining, 2nd Edition 24


Design Issues in ANN

 Number of nodes in hidden layer:


– start from a fully connected network with a sufficiently
large number of nodes and hidden layers, and then
repeat the model-building procedure with a smaller
number of nodes.
– Alternatively, instead of repeating the model-building
procedure, we could remove some of the nodes and
repeat the model evaluation procedure to select the
right model complexity.
 Initial weights and biases: Random assignment
 Training examples with missing values should be
removed or replaced with most likely values.

02/14/2018 Introduction to Data Mining, 2nd Edition 25


Characteristics of ANN

 Multilayer ANN are universal approximators but could


suffer from overfitting if the network is too large.
 Gradient descent may converge to local minimum. One
way to escape from the local minimum is to add a
momentum term to the weight update formula.
 Model building can be very time consuming, but testing
can be very fast
 Can handle redundant attributes because weights are
automatically learnt
 Sensitive to noise in training data
 Difficult to handle missing attributes

02/14/2018 Introduction to Data Mining, 2nd Edition 26

You might also like