Ann Cae-3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

ANN

IMPORTANT QUESTIONS - CAE-III

1. What is difference between Perceptron and Multilayer


Perceptron?

à Single-layer Perceptron
Single Layer Perceptron has just two layers of input and
output. It only has single layer hence the name single layer
perceptron. It does not contain Hidden Layers as that of
Multilayer perceptron.

Input nodes are connected fully to a node or multiple


nodes in the next layer. A node in the next layer takes a
weighted sum of all its inputs

Multi-Layer Perceptron (MLP)

A multilayer perceptron is a type of feed-forward artificial


neural network that generates a set of outputs from a set
of inputs. An MLP is a neural network connecting multiple
layers in a directed graph, which means that the signal
path through the nodes only goes one way. The MLP
network consists of input, output, and hidden layers. Each
hidden layer consists of numerous perceptron’s which are
called hidden layers or hidden unit.
2. What are the different parts of the multilayer perceptron model?
à A multi-layer perception is a neural network that has multiple layers. To
create a neural network we combine neurons together so that the outputs of
some neurons are inputs of other neurons.

A multi-layer perceptron has one input layer and for each input, there is one
neuron(or node), it has one output layer with a single node for each output
and it can have any number of hidden layers and each hidden layer can have
any number of nodes. A schematic diagram of a Multi-Layer Perceptron
(MLP) is depicted below.

In the multi-layer perceptron diagram above, we can see that there are three
inputs and thus three input nodes and the hidden layer has three nodes. The
output layer gives two outputs, therefore there are two output nodes. The
nodes in the input layer take input and forward it for further process, in the
diagram above the nodes in the input layer forwards their output to each of
the three nodes in the hidden layer, and in the same way, the hidden layer
processes the information and passes it to the output layer.
Every node in the multi-layer perception uses a sigmoid activation function.
The sigmoid activation function takes real values as input and converts them
to numbers between 0 and 1 using the sigmoid formula.

3. What are decision rules in decision tree?


à The Decision Tree algorithm, like Naive Bayes, is based on conditional
probabilities. Unlike Naive Bayes, decision trees generate rules. A rule is a
conditional statement that can easily be understood by humans and easily used
within a database to identify a set of records.

In some applications of data mining, the accuracy of a prediction is the only thing
that really matters. It may not be important to know how the model works. In
others, the ability to explain the reason for a decision can be crucial. For example, a
Marketing professional would need complete descriptions of customer segments in
order to launch a successful marketing campaign. The Decision Tree algorithm is
ideal for this type of application.

Decision Tree Rules

Rules provide model transparency, a window on the inner workings of the model.
Rules show the basis for the model's predictions. Oracle Data Mining supports a high
level of model transparency. While some algorithms provide rules, all algorithms
provide model details. You can examine model details to determine how the
algorithm handles the attributes internally, including transformations and reverse
transformations.

4. What is feature detection in computer vision?

à What is a feature?

When you see a mango image, how can you identify it as a mango?

By analysing the colour, shape, and texture you can say that it is a mango.

The clues which are used to identify or recognize an image are called
features of an image. In the same way, computer functions, to detect
various features in an image.

Note: The images we give into these algorithms should be in black and
white. This helps the algorithms to focus on the features more.

Image in use:
Method 1: Harris corner detection
Harris corner detection is a method in which we can detect the corners of the
image by sliding a slider box all over the image by finding the corners and it
will apply a threshold and the corners will be marked in the image. This
algorithm is mainly used to detect the corners of the image.
Syntax:
cv2.cornerHarris(image, dest, blockSize, kSize, freeParameter, borderType)
Parameters:
• Image – The source image to detect the features
• Dest – Variable to store the output image
• Block size – Neighborhood size
• Ksize – Aperture parameter
• Border type: The pixel revealing type.

5. Explain Back Propagation algorithm with steps?


à Backpropagation is an algorithm that back propagates the errors from
output nodes to the input nodes. Therefore, it is simply referred to as
backward propagation of errors. It uses in the vast applications of neural
networks in data mining like Character recognition, Signature verification,
etc.

Backpropagation is a widely used algorithm for training feedforward neural


networks. It computes the gradient of the loss function with respect to the
network weights and is very efficient, rather than naively directly computing
the gradient with respect to each individual weight. This efficiency makes it
possible to use gradient methods to train multi-layer networks and update
weights to minimize loss; variants such as gradient descent or stochastic
gradient descent are often used.
The backpropagation algorithm works by computing the gradient of the loss
function with respect to each weight via the chain rule, computing the gradient
layer by layer, and iterating backward from the last layer to avoid redundant
computation of intermediate terms in the chain rule.

Backpropagation Algorithm:

Step 1: Inputs X, arrive through the preconnected path.


Step 2: The input is modelled using true weights W. Weights are usually
chosen randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden
layer to the output layer.
Step 4: Calculate the error in the outputs
Backpropagation Error= Actual Output – Desired Output
Step 5: From the output layer, go back to the hidden layer to adjust the weights
to reduce the error.
Step 6: Repeat the process until the desired output is achieved.
6. What are the limitations of Back-propagation neural network?
• à It is sensitive to noisy data and irregularities. Noisy data can lead
to inaccurate results.
• Performance is highly dependent on input data.
• Spending too much time training.
• The matrix-based approach is preferred over a mini-batch.

7. What is Hessian matrix in neural network?

à • First derivative of a scalar function


E(w) wrt a vector w=[w1,w2]T is a vector
called the Gradient of E(w)
If there are M elements in the vector then Gradient is a M x 1
vector

• Second derivative of E(w) is a matrix


called the Hessian of E(w)
Hessian is a matrix with M2 elements

⎡ ∂E ⎤ ⎢∂w⎥
d

∇E(w) = E(w) = ⎢ 1 ⎥ ⎢ ∂E ⎥
dw

⎢ ∂w ⎥ ⎣2⎦

⎢ 2 ⎥ 1 2
d ∂w ∂w∂w H = ∇∇E(w) =
2
2
E(w) = ⎢ 1 ⎥ ⎢∂2E∂2E⎥
dw

⎢∂w∂w ∂w2 ⎥ ⎣212⎦


⎡ ∂2E ∂2E ⎤

• Jacobian is a matrix consisting of first


derivatives wrt a vector
Computing the Hessian using
Backpropagation
• • We have shown how back
propagation can be used to obtain first
derivatives of error function wrt
weights in network
• • Back propagation can also be used
to derive second derivatives ∂2E
∂w jiwlk
• • If all weights and bias parameters
are elements wi of single
vector w then the second derivatives
form the elements Hij of Hessian matrix
H where i,j å {1,..W} and W is the total
no of weights and biases

Role of Hessian in Neural


Computing
1. Several nonlinear optimization
algorithms for neural networks
• are based on second order properties of
error surface
2. Basis for fast procedure for
retraining with small change of
training data
3. Identifying least significant
weights
• For network pruning requires inverse of
Hessian
4. Bayesian neural network

Central role in Laplace approximation


• Hessian inverse is used to determine the
predictive distribution for a trained network

• • Hessian eigenvalues determine the values


of hyperparameters
• • Hessian determinant is used to evaluate the
model evidence

• Evaluating the Hessian


Matrix
• • Full Hessian matrix can be difficult to
compute in practice
• quasi-Newton algorithms have been
developed that use approximations
• to the Hessian
• • Various approximation techniques
have been used to evaluate the
Hessian for a neural network
• • calculated exactly using an extension of back
propagation
• • Important consideration is efficiency
• With W parameters(weights and
biases)matrix has dimension WxW • Efficient
methods have O(W2)

8. What does cross-validation means?


à Cross-validation is a technique for validating the model efficiency by training it on
the subset of input data and testing on previously unseen subset of the input data. We
can also say that it is a technique to check how a statistical model generalizes to
an independent dataset.

In machine learning

, there is always the need to test the stability of the model. It means based only on the training
dataset; we can't fit our model on the training dataset. For this purpose, we reserve a
particular sample of the dataset, which was not part of the training dataset. After that, we test
our model on that sample before deployment, and this complete process comes under cross-
validation. This is something different from the general train-test split.

Hence the basic steps of cross-validations are:

o Reserve a subset of the dataset as a validation set.


o Provide the training to the model using the training dataset.
o Now, evaluate model performance using the validation set. If the model
performs well with the validation set, perform the further step, else check for
the issues.
9. What are the advantages of cross-validation?
à
1. More accurate estimate of out-of-sample accuracy.
2. More “efficient” use of data as every observation is used for both
training and testing.

10. How does neural network pruning work?

à Pruning is the process of deleting parameters from an existing neural


network, which might involve removing individual parameters or groups of
parameters, such as neurons. This procedure aims to keep the network’s
accuracy while enhancing its efficiency. This can be done to cut down on the
amount of computing power necessary to run the neural network.

Network Pruning

Steps to be followed while pruning:

• Determine the significance of each neuron.


• Prioritize the neurons based on their value (assuming there is a clearly
defined measure for “importance”).
• Remove the neuron that is the least significant.
• Determine whether to prune further based on a termination condition
(to be defined by the user).
Every method that has been invented or we are applying for a long time has
some guidelines. By following those guidelines we can ensure the result of the
method as expected. The research paper LOST IN PRUNING: THE
EFFECTS OF PRUNING NEURAL NETWORKS BEYOND TEST
ACCURACY shown certain guidelines for pruning those are as below:

• If unanticipated adjustments in data distribution may occur during


deployment, don’t prune.
• If you only have a partial understanding of the distribution shifts
throughout training and pruning, prune moderately.
• If you can account for all movements in the data distribution
throughout training and pruning, prune to the maximum extent
possible.
• When retraining, specifically consider data augmentation to maximize
the prune potential.

11. What is pattern classification in neural network?


à The pattern recognition approaches discussed so far are based on direct
computation through machines. Direct computations are based on math and
stats related techniques. Other than those techniques another one is the
neural approach, neural networks related topics are discussed here to
recognize the patterns. As it is known to all neuron is the basic unit of brain
cells and together these neurons create networks to control the specific
tasks. This neural network is implemented in systems. The outcome of this
effort is the invention of artificial neural networks.

An artificial neural network is a computing system that tries to stimulate the


working function of a biological neural network of human brains. In this
network, all the neurons are well connected and that helps to achieve
massive parallel distributing. The input units receive various forms and
structures of information based on an internal weighting system and the
neural network attempts to learn about the information presented to
produce one output report.

The advantages of neural networks are their adaptive-learning, self-


organization, and fault-tolerance capabilities. For these outstanding
capabilities, neural networks are used for pattern recognition applications.
An ANN initially goes through a training phase where it learns to recognize
patterns in data, whether visually, aurally, or textually. Some of the best
neural models are back-propagation, high-order nets, time-delay neural
networks, and recurrent nets.
Basic structure of a feed-forward neural network

Applications
• Image processing, segmentation, and analysis

Pattern Recognition is efficient enough to give machines human


recognition intelligence. This is used for image processing, segmentation,
and analysis. For example, computers can detect different types of insects
better than humans.

• Speech Recognition
All of us have heard the names Siri, Alexa, and Cortona. These are all the
applications of speech recognition. Pattern recognition plays a huge part in this
technique.

• Fingerprint Identification
Many recognition approaches are there to perform Fingerprint Identification.
But pattern recognition system is the most used approach.

• Medical Diagnosis
Algorithms of pattern recognition deal with real data. It has been found that
pattern recognition has a huge role in today’s medical diagnosis. From breast
cancer detection to covid-19 checking algorithms are giving results with more
than 90% accuracy.

12. What is associative memory explain with the help of a


diagram?
à An associative memory can be considered as a memory unit whose stored data
can be identified for access by the content of the data itself rather than by an address
or memory location.

Associative memory is often referred to as Content Addressable Memory (CAM).

When a write operation is performed on associative memory, no address or memory


location is given to the word. The memory itself is capable of finding an empty unused
location to store the word.
On the other hand, when the word is to be read from an associative memory, the
content of the word, or part of the word, is specified. The words which match the
specified content are located by the memory and are marked for reading.

The following diagram shows the block representation of an Associative memory.

From the block diagram, we can say that an associative memory consists of a memory
array and logic for 'm' words with 'n' bits per word.

The functional registers like the argument register A and key register K each
have n bits, one for each bit of a word. The match register M consists of m bits, one
for each memory word.

The words which are kept in the memory are compared in parallel with the content of
the argument register.

The key register (K) provides a mask for choosing a particular field or key in the
argument word. If the key register contains a binary value of all 1's, then the entire
argument is compared with each memory word. Otherwise, only those bits in the
argument that have 1's in their corresponding position of the key register are
compared. Thus, the key provides a mask for identifying a piece of information which
specifies how the reference to memory is made.

13. What is vector quantization in neural network?


à Learning Vector Quantization ( or LVQ ) is a type of Artificial Neural
Network which also inspired by biological models of neural systems. It is
based on prototype supervised learning classification algorithm and trained
its network through a competitive learning algorithm similar to Self
Organizing Map. It can also deal with the multiclass classification problem.
LVQ has two layers, one is the Input layer and the other one is the Output
layer. The architecture of the Learning Vector Quantization with the number
of classes in an input data and n number of input features for any sample is
given
below:
How Learning Vector Quantization works?

Let’s say that an input data of size ( m, n ) where m is the number of training
examples and n is the number of features in each example and a label vector
of size ( m, 1 ). First, it initializes the weights of size ( n, c ) from the first c
number of training samples with different labels and should be discarded
from all training samples. Here, c is the number of classes. Then iterate over
the remaining input data, for each training example, it updates the winning
vector ( weight vector with the shortest distance ( e.g Euclidean distance )
from the training example ).
The weight updation rule is given by:
if correctly_classified:
wij(new) = wij(old) + alpha(t) * (xik - wij(old))
else:
wij(new) = wij(old) - alpha(t) * (xik - wij(old))
where alpha is a learning rate at time t, j denotes the winning vector, i
denotes the ithfeature of training example and k denotes the kth training
example from the input data. After training the LVQ network, trained weights
are used for classifying new examples. A new example is labelled with the
class of the winning vector.

Algorithm

Steps involved are :


• Weight initialization
• For 1 to N number of epochs
• Select a training example
• Compute the winning vector
• Update the winning vector
• Repeat steps 3, 4, 5 for all training example.
• Classify test sample

14. What are Control applications in neural networks?


à There is a constant need to provide better control of more complex
(and probably nonlinear) systems, over a wide range of uncertainty.
Artificial neural networks offer a large number of attractive features
for the area of control systems:-
- The ability to perform arbitrary nonlinear mappings makes them a
cost efficient tool to synthesize accurate forward and inverse models
of nonlinear dynamical systems, allowing traditional control schemes
to be extended to the control of nonlinear plants. This can be done
without the need for detailed knowledge of the plant.

- The ability to create arbitrary decision regions means that they have
the potential to be applied to fault detection problems. Exploiting this
property, a possible use of ANNs is as control managers, deciding
which control algorithm to employ based on current operational
conditions.

- As their training can be effected on-line or off-line, the applications


considered in the last two points can be designed off-line and
afterwards used in an adaptive scheme, if so desired.

- Neural networks are massive parallel computation structures. This


allows calculations to be performed at a high speed, making real-time
implementations feasible. Development of fast architectures further
reduce computation time.

- Neural networks can also provide, as demonstrated in [85],


significant fault tolerance, since damage to a few weights need not
significantly impair the overall performance.

15. What are the applications of speech processing?


à Speech recognition technology and the use of digital assistants have moved
quickly from our cell phones to our homes, and its application in industries such as
business, banking, marketing, and healthcare is quickly becoming apparent.

1. In the workplace

Speech recognition technology in the workplace has evolved into incorporating


simple tasks to increase efficiency, as well as beyond tasks that have traditionally
needed humans, to be performed.

Examples of office tasks digital assistants are, or will be, able to perform:7
• Search for reports or documents on your computer
• Create a graph or tables using data
• Dictate the information you want to be incorporated into a document
• Print documents on request
• Start video conferences
• Schedule meetings
• Record minutes
• Make travel arrangements

2. In banking

The aim of the banking and financial industry is for speech recognition to reduce
friction for the customer.8 Voice-activated banking could largely reduce the need for
human customer service, and lower employee costs. A personalised banking
assistant could in return boost customer satisfaction and loyalty.9

How speech recognition could improve banking:10

• Request information regarding your balance, transactions, and spending


habits without having to open your cell phone
• Make payments
• Receive information about your transaction history

3. In marketing

Voice-search has the potential to add a new dimension to the way marketers reach
their consumers. With the change in how people are going to be interacting with their
devices, marketers should look for developing trends in user data and behaviour.11

• Data – With speech recognition, there will be a new type of data available for
marketers to analyse. People’s accents, speech patterns, and vocabulary can
be used to interpret a consumers location, age, and other information
regarding their demographics, such as their cultural affiliation.13
• Behaviour – While typing necessitates a certain extent of brevity, speaking
allows for longer, more conversational searches.14 Marketers and optimisers
may need to focus on long-tail keywords and producing conversational
content to stay ahead of these trends.15
This type of fast search could make users more impatient and increasingly
dependent on choosing to use the internet as their main source of information.16 Due
to this, the amount of time users spend looking at a screen might decrease.
Marketers should consider what this might mean for predominantly visual content, as
there may be a shift towards focussing on auditory and information-heavy content.

4. In Healthcare

In an environment where seconds are crucial and sterile operating conditions are a
priority, hands-free, immediate access to information can have a significantly positive
impact on patient safety and medical efficiency.17

Benefits include:18

• Quickly finding information from medical records


• Nurses can be reminded of processes or given specific instructions
• Nurses can ask for administrative information, such as the number of patients
on a floor and the number of available units
• At home, parents can ask for common symptoms of diseases, when they
should go to the doctor, and how to look after a sick child
• Less paperwork19
• Less time inputting data20
• Improved workflows21

The most significant concern using speech recognition in healthcare is the content
the digital assistant has access to.22 It has been recognised that the content will
need to be supplied and validated by recognised medical institutions, in order for it to
be a viable option in this field.
16. What are neural networks in image processing?
à Convolutional neural networks are deep learning algorithms that are
very powerful for the analysis of images.

CNN is a powerful algorithm for image processing. These algorithms are


currently the best algorithms we have for the automated processing of
images. Many companies use these algorithms to do things like identifying
the objects in an image.

Three Layers of CNN

Convolutional Neural Networks specialized for applications in image &


video recognition. CNN is mainly used in image analysis tasks like Image
recognition, Object detection & Segmentation.

There are three types of layers in Convolutional Neural Networks:

1) Convolutional Layer: In a typical neural network each input neuron is


connected to the next hidden layer. In CNN, only a small region of the input
layer neurons connect to the neuron hidden layer.

2) Pooling Layer: The pooling layer is used to reduce the dimensionality of


the feature map. There will be multiple activation & pooling layers inside the
hidden layer of the CNN.

3) Fully-Connected layer: Fully Connected Layers form the last few layers in
the network. The input to the fully connected layer is the output from the
final Pooling or Convolutional Layer, which is flattened and then fed into
the fully connected layer.

You might also like