DL Question Bank Answers
DL Question Bank Answers
DL Question Bank Answers
BANK
ANSWERS
(LONGS)
UNIT 1
1.a. What are McCulloch-Pi s units and How it Works?
ANS:
McCulloch-Pi s units, also known as McCulloch-Pi s
neurons, are the founda onal building blocks of ar ficial
neural networks.
They were proposed by Warren McCulloch and Walter Pi s in
1943 and are one of the earliest formaliza ons of ar ficial
neurons.
McCulloch-Pi s units operate based on a simple thresholding
logic.
5. Convergence Theorem:
The perceptron training process is guaranteed to converge
and find a solu on if the data is linearly separable. However,
if the data is not linearly separable, the perceptron training
process may not converge.
2.a. What are Perceptron Learning Algorithm and
Explain its Algorithms?
ANS:
The Perceptron Learning Algorithm (PLA) is a supervised
learning algorithm used to train a linear perceptron for binary
classifica on tasks.
It was introduced by Frank Rosenbla in 1957 and is one of
the earliest learning algorithms for neural networks.
The PLA is designed to find the op mal weights and biases for
a linear perceptron, allowing it to learn a decision boundary
that separates the two classes in the dataset.
Algorithm Steps:
Step 1. Ini aliza on:
Ini alize the weights (w1, w2, ..., wn) and bias (b) of the
perceptron to small random values or zeros.
Step 2. Training Data:
Provide a labeled training dataset where each data point is
associated with a target class (either 0 or 1).
Step 3. Training Process:
- For each data point in the training dataset, do the
following:
- Compute the weighted sum of the inputs and the current
weights: Σ(xi * wi) + b.
- Apply the ac va on func on (step func on) to the
weighted sum to produce the predicted
output (y_pred).
- Update the weights and bias based on the predic on and
the true label (y_true) as follows:
- If y_pred is equal to y_true (correct predic on), do not
update the weights and bias.
- If y_pred is 1 and y_true is 0 (false posi ve), decrease the
weights and bias:
- wi_new = wi_old - α * xi
- b_new = b_old - α
- If y_pred is 0 and y_true is 1 (false nega ve), increase the
weights and bias:
- wi_new = wi_old + α * xi
- b_new = b_old + α
- Repeat the training process for a fixed number of itera ons
(epochs).
2.b. What is backpropaga ons? Explain How it works
and Men on the Benefits?
ANS:
Back Propaga on is known as backward propaga on of Errors. It is widely
used algorithm for training of ANN, which include MLP.
It is a supervised learning algorithm that aims to adjust the weights of
neural networks based on predic on errors, allowing the network to learn
from the training dataset to improve its overall performance.
How Backpropaga on Works:
1. Forward Pass:
During the forward pass, the input data is fed into the neural network,
and the data propagates through the network layer by layer. Each neuron
performs a weighted sum of its inputs, applies an ac va on func on to
produce an output, and passes that output to the next layer as its input.
This process con nues un l the output layer produces the final
predic ons.
2. Loss Calcula on:
A er the forward pass, the neural network produces predic ons for the
input data. The loss func on (e.g., mean squared error for regression or
binary cross-entropy for binary classifica on) is then used to measure the
difference between the predicted values and the actual target values in
the training data.
3. Backward Pass:
The backward pass is the core of the backpropaga on algorithm. It
involves propaga ng the error backward through the network to compute
the gradients of the loss func on with respect to the model's parameters
(weights and biases). The gradients indicate how the loss func on
changes with respect to changes in the model's parameters.
4. Gradient Descent:
Once the gradients have been computed, the model's parameters are
updated using an op miza on algorithm such as gradient descent.
Gradient descent adjusts the weights and biases in the direc on that
minimizes the loss func on. The learning rate determines the step size in
the weight update process.
5. Itera ons:
The forward pass, loss calcula on, backward pass, and weight updates
are performed itera vely over the en re training dataset. This process is
repeated for a fixed number of epochs (itera ons) or un l the model's
performance converges to a sa sfactory level.
Key Features
Feedforward Propaga on:
In a deep feedforward network, informa on flows only in the forward
direc on, from the input layer to the output layer. There are no recurrent
or feedback connec ons.
Layer Structure:
The network is composed of an input layer, one or more hidden layers,
and an output layer. Hidden layers contain neurons that transform the
input data into higher-level representa ons.
Ac va on Func ons:
Ac va on func ons introduce non-linearity into the network, allowing it
to capture complex rela onships in data. Common ac va on func ons
include ReLU (Rec fied Linear Unit), sigmoid, and hyperbolic tangent
(tanh).
Weighted Sum and Bias:
Each neuron computes a weighted sum of its inputs along with a bias
term. The weights and biases are learned during the training process.
Use Cases
Deep feedforward networks are used for various machine learning tasks,
including:
Classifica on: Recognizing pa erns in data and assigning them to
predefined categories.
Regression: Predic ng con nuous values based on input data.
Feature Learning: Learning hierarchical representa ons of data for
downstream tasks.
Func on Approxima on: Approxima ng complex func ons based on
input-output mappings.
Value Range :- -1 to +1
Nature :- non-linear
3.ReLU Func on:
It Stands for Rec fied linear unit. It is the most widely used ac va on
func on. Chiefly implemented in hidden layers of Neural network.
Equa on:- A(x) = max(0,x). It gives an output x if x is posi ve and 0
otherwise.
Value Range:- [0, inf)
Nature:- non-linear, which means we can easily backpropagate the errors
and have mul ple layers of neurons being ac vated by the ReLU func on.
Architecture
Create an MLP with an input layer of two neurons in a hidden layer with a
suitable number of neurons and output layer with one neuron.
In the context of learning XOR, a mul ple perceptron with atleast one
hidden layer can learn the XOR func on. The hidden layer allows the
network to capture the non-linear rela onship between input which is
essen al to solve XOR.
Ac va on Func on
Use the non-linear AF in the hidden layer such as ReLu and sigmoid. These
func ons enable the network to capture non-linear pa ern.
Loss Func on
Here, we use binary cross-entropy loss func on to get appropriate output.
Training
- Ini alize the weight and bias randomly
- Trade the network using Gradient Based and Back Propaga on.
- During each training itera on, feed the training sample through the
network by calcula ng the loss and adjus ng the parameters based on
the gradient of the loss.
6.a. Men ons the Difficul es of Training Deep Neural
Networks?
ANS:
1.Vanishing & Exploding Gradient:
Deep networks suffer from the vanishing gradient problem, where
gradients become too small during BP, leading to slow or ineffec ve
training.
2.Overfi ng:
Deep network are prone to overfi ng, especially when the model is
too complex rela ve to the amount of available training data.
3.Hyper-parameter tunning:
It have numerous hyper-parameter such as learning rates, batch
sizes & network architecture choices.
6. Convergence Speed:
It is important to ensure a model works quickly when using lots of
data and complicated designs.
7. Handling Sequen al Data:
Training deep neural networks on sequen al data, such as me
series or natural language sequences, presents unique challenges.
Works Has
When the feed forward neural network gets simplified, it can appear as a
single layer perceptron.
This model mul plies inputs with weights as they enter the layer.
A erward, the weighted input values get added together to get the sum.
As long as the sum of the values rises above a certain threshold, set at
zero, the output value is usually 1, while if it falls below the threshold, it is
usually -1.
As a feed forward neural network model, the single-layer perceptron
o en gets used for classifica on. Machine learning can also get integrated
into single-layer perceptron’s. Through training, neural networks can
adjust their weights based on a property called the delta rule, which helps
them compare their outputs with the intended values.
As a result of training and learning, gradient descent occurs. Similarly,
mul -layered perceptron’s update their weights. But ,this process gets
known as back-propaga on. If this is the case, the network's hidden layers
will get adjusted according to the output values produced by the final
layer.
6. Keras:
Keras is an open-source Python library designed for developing and
evalua ng neural networks within deep learning and machine learning
models. It is modular, flexible, and extensible, making it beginner- and
user-friendly. It also offers a fully func oning model for crea ng neural
networks as it integrates with objec ves, layers, op mizers, and ac va on
func ons.
7. Scipy:
SciPy is a free and open-source library that’s based on NumPy. It can be
used to perform scien fic and technical compu ng on large sets of data.
Similar to NumPy, SciPy comes with embedded modules for array
op miza on and linear algebra.
8. Matplotlib:
Matplotlib is a data visualiza on library that’s used for making plots and
graphs. It’s an extension of SciPy and is able to handle NumPy data
structures as well as complex data models made by Pandas. Matplotlib is
intui ve and easy to use, making it a great choice for beginners. It’s even
easier to use for people with preexis ng knowledge in various other
graph-plo ng tools.
2.b.Explain deep neural networks and how it works?
ANS:
Deep neural networks (DNNs) are a type of ar ficial neural network (ANN)
designed to model complex pa erns and representa ons by stacking
mul ple layers of interconnected nodes or ar ficial neurons.
3.a. Explain the use case and key features of Deep Feed
Forward neural network?
ANS: unit 1(4.a)
3.b.Explain in details about bias – variance trade off?
ANS: unit 1(4.b)
4.a. Explain gradient learning method?
ANS: unit 1(5.b)
4.b.Men on all the difficul es to train a deep neural
network model?
ANS: unit 1(6.a)
5.a. Explain the op miza on methods in deep
learning?
ANS:
1. Stochas c Gradient Descent (SGD):
Basic op miza on algorithm used in deep learning.
Updates the model parameters by moving in the direc on
opposite to the gradient of the loss func on with respect to
the parameters.
Computes the gradient using a small random subset of the
training data (mini-batch).
2. Batch Gradient Descent:
Computes the gradient of the en re training dataset.
Updates the parameters once per epoch.
Computa onally expensive for large datasets but can provide
more accurate updates.
3. Mini-Batch Gradient Descent:
Balances the advantages of SGD and Batch Gradient Descent.
Randomly samples a small subset (mini-batch) of the training
data for each update.
Provides a good compromise between computa on efficiency
and accurate updates.
4. Momentum:
Helps accelerate SGD in the relevant direc on and dampens
oscilla ons.
Introduces a moving average of past gradients into the
update.
Reduces the variance in the updates, leading to smoother
convergence.
5. Adagrad:
Adapts the learning rates of individual parameters based on
their historical gradients.
Scales down the learning rates for frequently occurring
parameters and scales up for infrequent ones.
Suitable for sparse data.
6. RMSprop (Root Mean Square Propaga on):
Addresses the diminishing learning rate problem in Adagrad.
Divides the learning rate by the root mean square of past
gradients for each parameter.
Helps maintain a more adap ve learning rate.
7. Adam (Adap ve Moment Es ma on):
Combines the ideas of momentum and RMSprop.
Uses both first-order momentum and second-order root mean
square of gradients.
Adapts learning rates for each parameter individually.
Widely used and o en achieves good performance across
different tasks.
8. AdaDelta:
An extension of RMSprop that eliminates the need for a
learning rate hyperparameter.
Adapts the learning rates based on the historical gradient
informa on.
9. Nadam:
An extension of Adam that incorporates Nesterov
momentum.
Combines the advantages of Adam and Nesterov accelerated
gradient.
10. L-BFGS (Limited-memory Broyden-Fletcher-Goldfarb-
Shanno):
A quasi-Newton op miza on method.
Uses a limited-memory approxima on of the inverse Hessian
matrix.
Efficient for problems with a moderate number of parameters.
Advantages:
Greedy Layer-Wise Training can be computa onally more efficient
than training the en re network at once.
It may help the network converge faster and achieve be er
generaliza on.
Disadvantages:
It assumes that features learned at one layer are beneficial for
subsequent layers, which may not always be the case.
The final fine-tuning step is crucial to ensure the en re network
works well for the intended task.
UNIT – 3
1. a.What is CNN? Draw and Explain the
Architecture of CNN?
ANS:
A Convolu onal Neural Network (CNN) is a type of deep learning
neural network that is well-suited for image and video analysis.
CNNs use a series of convolu on and pooling layers to extract
features from images and videos, and then use these features to
classify or detect objects or scenes.
CNN architecture
Convolu onal Neural Network consists of mul ple layers like the
input layer, Convolu onal layer, Pooling layer, and fully connected
layers.
The Convolu onal layer applies filters to the input image to extract
features, the Pooling layer downsamples the image to reduce
computa on, and the fully connected layer makes the final
predic on. The network learns the op mal filters through
backpropaga on and gradient descent.
Types
1.Batch Normaliza on: It focus on standardizing the inputs to any
par cular layer.
Advantages of RNN
1. An RNN remembers each and every piece of informa on through
me. It is useful in me series predic on only because of the
feature to remember previous inputs as well. This is called Long
Short Term Memory.
2. Recurrent neural networks are even used with convolu onal layers
to extend the effec ve pixel neighborhood.
Disadvantages of RNN
1. Gradient vanishing and exploding problems.
2. Training an RNN is a very difficult task.
3. It cannot process very long sequences if using tanh or relu as an
ac va on func on.
Applica ons of RNN
1. Robot control
2. Machine transla on
3. Speech recogni on
4. Time series
5. Language Modelling and Genera ng Text
3.b. What are the Applica ons of Computer Vision
in CNN?
ANS:
1. Input Sequence:
BRNN takes a sequence of data points as input, where each
point is represented as a vector with the same dimensionality.
The sequence may have varying lengths.
2. Dual Processing:
The BRNN processes the data in both forward and backward
direc ons simultaneously.
Forward direc on: Uses input at step t and hidden state at
step t-1 to determine the hidden state at me step t.
Backward direc on: Uses input at step t and hidden state at
step t+1 to calculate the hidden state at step t in a reverse
manner.
3. Compu ng Hidden State:
The hidden state at each step is computed using a non-linear
ac va on func on applied to the weighted sum of the input
and the previous hidden state.
This mechanism allows the network to remember informa on
from earlier steps in the sequence.
4. Determining Output:
The output at each step is determined using a non-linear
ac va on func on applied to the weighted sum of the hidden
state and output weights.
This output can either be the final output or serve as input for
another layer in the network.
5. Training:
The network is trained using a supervised learning approach
to minimize the difference between predicted and actual
outputs.
Backpropaga on is employed to adjust weights in the input-
to-hidden and hidden-to-output connec ons during training.
UNIT – 4
Architecture
An Autoencoder is a type of neural network that can learn to
reconstruct images, text, and other data from compressed versions
of themselves.
2.Code: The Code layer represents the compressed input fed to the
decoder layer.
C. Contrac ve Autoencoders
The input is passed through a bo leneck in a contrac ve
autoencoder and then reconstructed in the decoder. The bo leneck
func on is used to learn a representa on of the image while
passing it through.
The contrac ve autoencoder also has a regulariza on term to
prevent the network from learning the iden ty func on and
mapping input into output.
To train a model that works along with this constraint, we need to
ensure that the deriva ves of the hidden layer ac va ons are small
concerning the input.
D. Denoising Autoencoders
Have you ever wanted to remove noise from an image but didn't
know where to start? If so, then denoising autoencoders are for
you!
Denoising autoencoders are similar to regular autoencoders in that
they take an input and produce an output. However, they differ
because they don't have the input image as their ground truth.
Instead, they use a noisy version.
It is because removing image noise is difficult when working with
images.
You'd have to do it manually. But with a denoising autoencoder, we
feed the noisy idea into our network and let it map it into a lower-
dimensional manifold where filtering out noise becomes much
more manageable.
The loss func on usually used with these networks is L2 or L1 loss.
E. Varia onal Autoencoders
Varia onal autoencoders (VAEs) are models that address a specific
problem with standard autoencoders. When you train an
autoencoder, it learns to represent the input just in a compressed
form called the latent space or the bo leneck. However, this latent
space formed a er training is not necessarily con nuous and, in
effect, might not be easy to interpolate.
Varia onal autoencoders deal with this specific topic and express
their latent a ributes as a probability distribu on, forming a
con nuous latent space that can be easily sampled and
interpolated.
3.b.How does an autoencoder differ from
tradi onal feedforward neural networks in terms
of architecture and func onality?
ANS:
Architecture:
Feedforward Neural Network:
In a tradi onal feedforward neural network, the architecture
consists of an input layer, one or more hidden layers, and an
output layer. Each layer is fully connected to the next layer,
and informa on flows in one direc on, from input to output.
Autoencoder:
An autoencoder has a more specific architecture, comprising
an encoder and a decoder. The encoder compresses the input
data into a lower-dimensional representa on, and the
decoder reconstructs the original input from this
representa on.
Func onality:
Feedforward Neural Network:
The primary purpose of a feedforward neural network is to
learn a mapping from inputs to outputs. It's commonly used
for tasks like classifica on and regression.
It doesn't inherently focus on learning compressed