Unit III Deep Learning Chapter Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

UNIT 3 NEURAL NETWORKS 9 hours

Basic concepts of artificial neurons, single and multi-layer perceptron, perceptron


learning algorithm, its convergence proof, different activation functions, SoftMax
cross entropy loss function.

Basic concepts of artificial neurons:

The term "Artificial neural network" refers to a biologically inspired sub-field of artificial
intelligence modeled after the brain. An Artificial neural network is usually a
computational network based on biological neural networks that construct the structure
of the human brain. Similar to a human brain has neurons interconnected to each other,
artificial neural networks also have neurons that are linked to each other in various
layers of the networks. These neurons are known as nodes.

Artificial neural network tutorial covers all the aspects related to the artificial neural
network. In this tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen self-
organizing map, Building blocks, unsupervised learning, Genetic algorithm, etc.

What is Artificial Neural Network?


The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are
known as nodes.
The given figure illustrates the typical diagram of Biological Neural Network.

The typical Artificial Neural Network looks something like the given figure.

Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents
Output.

Relationship between Biological neural network and artificial neural network:

Biological Neural Network Artificial Neural Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

An Artificial Neural Network in the field of Artificial intelligence where it attempts to


mimic the network of neurons makes up a human brain so that computers will have an
option to understand things and make decisions in a human-like manner. The artificial
neural network is designed by programming computers to behave simply like
interconnected brain cells.

There are around 1000 billion neurons in the human brain. Each neuron has an
association point somewhere in the range of 1,000 and 100,000. In the human brain,
data is stored in such a manner as to be distributed, and we can extract more than one
piece of this data when necessary from our memory parallelly. We can say that the
human brain is made up of incredibly amazing parallel processors.

We can understand the artificial neural network with an example, consider an example
of a digital logic gate that takes an input and gives an output. "OR" gate, which takes
two inputs. If one or both the inputs are "On," then we get "On" in output. If both the
inputs are "Off," then we get "Off" in output. Here the output depends upon input. Our
brain does not perform the same task. The outputs to inputs relationship keep changing
because of the neurons in our brain, which are "learning."

The architecture of an artificial neural


network:
To understand the concept of the architecture of an artificial neural network, we have to
understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.

Artificial Neural Network primarily consists of three layers:

Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.

The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function.

ADVERTISEMENT

It determines weighted total is passed as an input to an activation function to produce


the output. Activation functions choose whether a node should fire or not. Only those
who are fired make it to the output layer. There are distinctive activation functions
available that can be applied upon the sort of task we are performing.

Advantages of Artificial Neural Network


(ANN)
Parallel processing capability:

Artificial neural networks have a numerical value that can perform more than one task
simultaneously.

Storing data on the entire network:

Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent
the network from working.

Capability to work with incomplete knowledge:


After ANN training, the information may produce output even with inadequate data. The
loss of performance here relies upon the significance of missing data.

Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples and to


encourage the network according to the desired output by demonstrating these
examples to the network. The succession of the network is directly proportional to the
chosen instances, and if the event can't appear to the network in all its aspects, it can
produce false output.

Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from generating output, and
this feature makes the network fault-tolerance.

Disadvantages of Artificial Neural Network:


Assurance of proper network structure:

There is no particular guideline for determining the structure of artificial neural


networks. The appropriate network structure is accomplished through experience, trial,
and error.

Unrecognized behavior of the network:

It is the most significant issue of ANN. When ANN produces a testing solution, it does
not provide insight concerning why and how. It decreases trust in the network.

Hardware dependence:

Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.

Difficulty of showing the issue to the network:

ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.

The duration of the network is unknown:


The network is reduced to a specific value of the error, and this value does not give us
optimum results.

How do artificial neural networks work?


Artificial Neural Network can be best represented as a weighted directed graph, where
the artificial neurons form the nodes. The association between the neurons outputs and
neuron inputs can be viewed as the directed edges with weights. The Artificial Neural
Network receives the input signal from the external source in the form of a pattern and
image in the form of a vector. These inputs are then mathematically assigned by the
notations x(n) for every n number of inputs.

Afterward, each of the input is multiplied by its corresponding weights ( these weights
are the details utilized by the artificial neural networks to solve a specific problem ). In
general terms, these weights normally represent the strength of the interconnection
between neurons inside the artificial neural network. All the weighted inputs are
summarized inside the computing unit.

If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and
weight equals to 1. Here the total of weighted inputs can be in the range of 0 to positive
infinity. Here, to keep the response in the limits of the desired value, a certain maximum
value is benchmarked, and the total of weighted inputs is passed through the activation
function.

The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or
non-linear sets of functions. Some of the commonly used sets of activation functions are
the Binary, linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look
at each of them in details:

Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this,
there is a threshold value set up. If the net weighted input of neurons is more than 1,
then the final output of the activation function is returned as one or else the output is
returned as 0.

Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan
hyperbolic function is used to approximate output from the actual net input. The
function is defined as:

F(x) = (1/1 + exp(-????x))

Where ???? is considered the Steepness parameter.

Types of Artificial Neural Network:


There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks.
The majority of the artificial neural networks will have some similarities with a more
complex biological partner and are very effective at their expected tasks. For example,
segmentation or classification.

Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for
Atmospheric Research. The feedback networks feed information back into itself and are
well suited to solve optimization issues. The Internal system error corrections utilize
feedback ANNs.

Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output layer,
and at least one layer of a neuron. Through assessment of its output by reviewing its input, the
intensity of the network can be noticed based on group behavior of the associated neurons, and
the output is decided. The primary advantage of this network is that it figures out how to
evaluate and recognize input patterns.

Prerequisite
No specific expertise is needed as a prerequisite before starting this tutorial.

Audience
Our Artificial Neural Network Tutorial is developed for beginners as well as
professionals, to help them understand the basic concept of ANNs.

Problems
We assure you that you will not find any problem in this Artificial Neural Network
tutorial. But if there is any problem or mistake, please post the problem in the contact
form so that we can further improve it.

Single Layer Perceptron in TensorFlow


The perceptron is a single processing unit of any neural network. Frank Rosenblatt first
proposed in 1958 is a simple neuron which is used to classify its input into one or two
categories. Perceptron is a linear classifier, and is used in supervised learning. It helps to
organize the given input data.

A perceptron is a neural network unit that does a precise computation to detect features
in the input data. Perceptron is mainly used to classify the data into two parts. Therefore,
it is also known as Linear Binary Classifier.
Perceptron uses the step function that returns +1 if the weighted sum of its input 0 and
-1.

The activation function is used to map the input between the required value like (0, 1) or
(-1, 1).

Backward Skip 10sPlay VideoForward Skip 10s

A regular neural network looks like this:


The perceptron consists of 4 parts.
o Input value or One input layer: The input layer of the perceptron is made of artificial
input neurons and takes the initial data into the system for further processing.
o Weights and Bias:
Weight: It represents the dimension or strength of the connection between units. If the
weight to node 1 to node 2 has a higher quantity, then neuron 1 has a more
considerable influence on the neuron.
Bias: It is the same as the intercept added in a linear equation. It is an additional
parameter which task is to modify the output along with the weighted sum of the input
to the other neuron.
o Net sum: It calculates the total sum.
o Activation Function: A neuron can be activated or not, is determined by an activation
function. The activation function calculates a weighted sum and further adding bias with
it to give the result.
A standard neural network looks like the below diagram.

How does it work?


The perceptron works on these simple steps which are given below:

a. In the first step, all the inputs x are multiplied with their weights w.
b. In this step, add all the increased values and call them the Weighted sum.

c. In our last step, apply the weighted sum to a correct Activation Function.

For Example:
A Unit Step Activation Function

There are two types of architecture. These types focus on the functionality of artificial
neural networks as follows-

o Single Layer Perceptron


o Multi-Layer Perceptron

Single Layer Perceptron


The single-layer perceptron was the first neural network model, proposed in 1958 by
Frank Rosenbluth. It is one of the earliest models for learning. Our goal is to find a linear
decision function measured by the weight vector w and the bias parameter b.

To understand the perceptron layer, it is necessary to comprehend artificial neural


networks (ANNs).

The artificial neural network (ANN) is an information processing system, whose


mechanism is inspired by the functionality of biological neural circuits. An artificial
neural network consists of several processing units that are interconnected.

This is the first proposal when the neural model is built. The content of the neuron's
local memory contains a vector of weight.
The single vector perceptron is calculated by calculating the sum of the input vector
multiplied by the corresponding element of the vector, with each increasing the amount
of the corresponding component of the vector by weight. The value that is displayed in
the output is the input of an activation function.

Let us focus on the implementation of a single-layer perceptron for an image


classification problem using TensorFlow. The best example of drawing a single-layer
perceptron is through the representation of "logistic regression."

Now, We have to do the following necessary steps of training logistic regression-

ADVERTISEMENT

o The weights are initialized with the random values at the origination of each training.
o For each element of the training set, the error is calculated with the difference between
the desired output and the actual output. The calculated error is used to adjust the
weight.
o The process is repeated until the fault made on the entire training set is less than the
specified limit until the maximum number of iterations has been reached.

Complete code of Single layer perceptron

1. # Import the MINST dataset


2. from tensorflow.examples.tutorials.mnist import input_data
3. mnist = input_data.read_data_ ("/tmp/data/", one_hot=True)
4. import tensorflow as tf
5. import matplotlib.pyplot as plt
6. # Parameters
7. learning_rate = 0.01
8. training_epochs = 25
9. batch_size = 100
10. display_step = 1
11. # tf Graph Input
12. x = tf.placeholder("float", [none, 784]) # MNIST data image of shape 28*28 = 784
13. y = tf.placeholder("float", [none, 10]) # 0-9 digits recognition => 10 classes
14. # Create model
15. # Set model weights
16. W = tf.Variable(tf.zeros([784, 10]))
17. b = tf.Variable(tf.zeros([10]))
18. # Constructing the model
19. activation=tf.nn.softmaxx(tf.matmul (x, W)+b) # Softmax
20. of function
21. # Minimizing error using cross entropy
22. cross_entropy = y*tf.log(activation)
23. cost = tf.reduce_mean\ (-tf.reduce_sum\ (cross_entropy, reduction_indice = 1))
24. optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
25. #Plot settings
26. avg_set = []
27. epoch_set = []
28. # Initializing the variables where init = tf.initialize_all_variables()
29. # Launching the graph
30. with tf.Session() as sess:
31. sess.run(init)
32. # Training of the cycle in the dataset
33. for epoch in range(training_epochs):
34. avg_cost = 0.
35. total_batch = int(mnist.train.num_example/batch_size)
36.
37. # Creating loops at all the batches in the code
38. for i in range(total_batch):
39. batch_xs, batch_ys = mnist.train.next_batch(batch_size)
40. # Fitting the training by the batch data sess.run(optimizr, feed_dict = {
41. x: batch_xs, y: batch_ys})
42. # Compute all the average of loss avg_cost += sess.run(cost, \ feed_dict = {
43. x: batch_xs, \ y: batch_ys}) //total batch
44. # Display the logs at each epoch steps
45. if epoch % display_step==0:
46. print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format (avg_cost))
47. avg_set.append(avg_cost) epoch_set.append(epoch+1)
48. print ("Training phase finished")
49.
50. plt.plot(epoch_set,avg_set, 'o', label = 'Logistics Regression Training')
51. plt.ylabel('cost')
52. plt.xlabel('epoch')
53. plt.legend()
54. plt.show()
55.
56. # Test the model
57. correct_prediction = tf.equal (tf.argmax (activation, 1),tf.argmax(y,1))
58.
59. # Calculating the accuracy of dataset
60. accuracy = tf.reduce_mean(tf.cast (correct_prediction, "float")) print
61. ("Model accuracy:", accuracy.eval({x:mnist.test.images, y: mnist.test.labels}))

The output of the Code:


ADVERTISEMENT

ADVERTISEMENT

The logistic regression is considered as predictive analysis. Logistic regression is mainly


used to describe data and use to explain the relationship between the dependent binary
variable and one or many nominal or independent variables.
Multi-layer Perceptron in TensorFlow
Multi-Layer perceptron defines the most complex architecture of artificial neural
networks. It is substantially formed from multiple layers of the perceptron. TensorFlow is
a very popular deep learning framework released by, and this notebook will guide to
build a neural network with this library. If we want to understand what is a Multi-layer
perceptron, we have to develop a multi-layer perceptron from scratch using Numpy.

The pictorial representation of multi-layer perceptron learning is as shown below-

MLP networks are used for supervised learning format. A typical learning algorithm for
MLP networks is also called back propagation's algorithm.

A multilayer perceptron (MLP) is a feed forward artificial neural network that generates a
set of outputs from a set of inputs. An MLP is characterized by several layers of input
nodes connected as a directed graph between the input nodes connected as a directed
graph between the input and output layers. MLP uses backpropagation for training the
network. MLP is a deep learning method.

Backward Skip 10sPlay VideoForward Skip 10s


ADVERTISEMENT

Now, we are focusing on the implementation with MLP for an image classification
problem.
1. # Import MINST data
2. from tensorflow.examples.tutorials.mnist import input_data
3. mnist = input_data.read_data_sets("/tmp/data/", one_hot = True)
4.
5. import tensorflow as tf
6. import matplotlib.pyplot as plt
7.
8. # Parameters
9. learning_rate = 0.001
10. training_epochs = 20
11. batch_size = 100
12. display_step = 1
13.
14. # Network Parameters
15. n_hidden_1 = 256
16.
17. # 1st layer num features
18. n_hidden_2 = 256 # 2nd layer num features
19. n_input = 784 # MNIST data input (img shape: 28*28) n_classes = 10
20. # MNIST total classes (0-9 digits)
21.
22. # tf Graph input
23. x = tf.placeholder("float", [None, n_input])
24. y = tf.placeholder("float", [None, n_classes])
25.
26. # weights layer 1
27. h = tf.Variable(tf.random_normal([n_input, n_hidden_1])) # bias layer 1
28. bias_layer_1 = tf.Variable(tf.random_normal([n_hidden_1]))
29. # layer 1 layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, h), bias_layer_1))
30.
31. # weights layer 2
32. w = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]))
33.
34. # bias layer 2
35. bias_layer_2 = tf.Variable(tf.random_normal([n_hidden_2]))
36.
37. # layer 2
38. layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, w), bias_layer_2))
39.
40. # weights output layer
41. output = tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
42.
43. # biar output layer
44. bias_output = tf.Variable(tf.random_normal([n_classes])) # output layer
45. output_layer = tf.matmul(layer_2, output) + bias_output
46.
47. # cost function
48. cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
49. logits = output_layer, labels = y))
50.
51. #cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(output_layer, y))

52. # optimizer
53. optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
54.
55. # optimizer = tf.train.GradientDescentOptimizer(
56. learning_rate = learning_rate).minimize(cost)
57.
58. # Plot settings
59. avg_set = []
60. epoch_set = []
61.
62. # Initializing the variables
63. init = tf.global_variables_initializer()
64.
65. # Launch the graph
66. with tf.Session() as sess:
67. sess.run(init)
68.
69. # Training cycle
70. for epoch in range(training_epochs):
71. avg_cost = 0.
72. total_batch = int(mnist.train.num_examples / batch_size)
73.
74. # Loop over all batches
75. for i in range(total_batch):
76. batch_xs, batch_ys = mnist.train.next_batch(batch_size)
77. # Fit training using batch data sess.run(optimizer, feed_dict = {
78. x: batch_xs, y: batch_ys})
79. # Compute average loss
80. avg_cost += sess.run(cost, feed_dict = {x: batch_xs, y: batch_ys}) / total_batch
81. # Display logs per epoch step
82. if epoch % display_step == 0:
83. print
84. Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(avg_cost)
85. avg_set.append(avg_cost)
86. epoch_set.append(epoch + 1)
87. print
88. "Training phase finished"
89.
90. plt.plot(epoch_set, avg_set, 'o', label = 'MLP Training phase')
91. plt.ylabel('cost')
92. plt.xlabel('epoch')
93. plt.legend()
94. plt.show()
95.
96. # Test model
97. correct_prediction = tf.equal(tf.argmax(output_layer, 1), tf.argmax(y, 1))
98.
99. # Calculate accuracy
100. accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
101. print
102. "Model Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels})

The above line of codes generating the following output-

Creating an interactive section


We have two basic options when using TensorFlow to run our code:

o Build graphs and run sessions [Do all the set-up and then execute a session to
implement a session to evaluate tensors and run operations].
o Create our coding and run on the fly.

For this first part, we will use the interactive session that is more suitable for an
environment like Jupiter notebook.

1. sess = tf.InteractiveSession()

Creating placeholders
It's a best practice to create placeholder before variable assignments when using
TensorFlow. Here we'll create placeholders to inputs ("Xs") and outputs ("Ys").

Placeholder "X": Represent the 'space' allocated input or the images.

o Each input has 784 pixels distributed by a 28 width x 28 height matrix.


o The 'shape' argument defines the tensor size by its dimensions.

You might also like