Assignment Week 8-Deep-Learning PDF
Assignment Week 8-Deep-Learning PDF
Assignment Week 8-Deep-Learning PDF
Deep Learning
Assignment- Week 8
Number of questions: 10 Total mark: 10 X 1 = 10
Which of the following functions can be used as an activation function in the output layer if we
wish to predict the probabilities of n classes such that sum of p over all n equals to 1?
a. Softmax
b. ReLU
c. Sigmoid
d. Tanh
Correct Answer: a
Detailed Solution:
Softmax function ensures that the summation of probabilities asserted over the k classes
equals to 1.
The input image has been converted into a matrix of size 256 X 256 and a kernel/filter of size
3x3 with a stride of 1 and no padding. What will be the size of the convoluted matrix?
a. 253x253
b. 3x3
c. 254x254
d. 256x256
Correct Answer: c
Detailed Solution:
The size of the convoluted matrix is given by CxC where C=((I-F+2P)/S)+1, where C is the
size of the Convoluted matrix, I is the size of the input matrix, F the size of the filter matrix
and P the padding applied to the input matrix. Here P=0, I=256, F=3 and S=1. There the
answer is 254x254.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
The numerical output of a sigmoid node in a neural network is:
Correct Answer: a
Detailed Solution:
The figure below shows image of a face which is input to a convolutional neural net and the
other three images shows different levels of features extracted from the network. Can you
identify from the following options which one is correct?
Correct Answer: b
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Convolutional NN will try to learn low-level features such as edges and lines in early layers
then parts of faces of people and then high-level representation of a face.
Suppose you have 5 convolutional kernel of size 3 x 3 with no padding and stride 1 in the first
layer of a convolutional neural network. You pass an input of dimension 228 x 228 x 3 through
this layer. What are the dimensions of the data which the next layer will receive?
a. 217 x 217 x 3
b. 217 x 217 x 8
c. 225 x 225 x 5
d. 225 x 225 x 3
Correct Answer: c
Detailed Solution:
Requires four hyperparameters: Number of filters K=5, their spatial extent F=3, the stride
S=1, the amount of padding P=0.
What is the mathematical form of the Leaky ReLU layer?
a. f(x)=max(0,x)
b. f(x)=min(0,x)
c. f(x)=min(0, αx), where α is a small constant
d. f(x)=1(x<0)(αx)+1(x>=0)(x), where α is a small constant
Correct Answer: d
Detailed Solution:
The input image has been converted into a matrix of size 224 x 224 and convolved with a
kernel/filter of size FxF with a stride of s and padding P to produce a feature map of dimension
222x222. Which among the following is true?
Correct Answer: c
Detailed Solution:
The size of the convoluted matrix is given by CxC, where C=((I-F+2P)/S)+1, where C is the
size of the convoluted matrix, I is the size of the input matrix, F the size of the filter matrix
and P the padding applied to the input matrix. Here C is given in the question and it is 222.
Therefore, P=0, I=224, F=3 and s=1. Thus option c is the answer.
For a transfer learning task, which layers according to you can be more generally transferred to
another task?
a. Higher layers
b. Lower layers
c. Task specific
d. Cannot comment
Correct Answer: b
Detailed Solution:
Lower layers are more general features (for eg: can be edge detectors) and thus can be
transferred well to other task. Higher layers on the other hand are task specific.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Statement 1: Adding more hidden layers will solve the vanishing gradient problem for a 2-layer
neural network
Statement 2: Making the network deeper will increase the chance of vanishing gradients.
a. Statement 1 is correct
b. Statement 2 is correct
c. Neither Statement 1 nor Statement 2 is correct
d. Vanishing gradient problem is independent of number of hidden layers of the
neural network.
Correct Answer: b
Detailed Solution:
As more layers using certain activation functions are added to neural networks, the
gradients of the loss function approaches zero, making the network hard to train. Thus
statement 2 is correct.
Which of the following activations can cause vanishing gradient problem?
a. ReLU
b. Leaky ReLU
c. Sigmoid
d. Linear
Correct Answer: c
Detailed Solution:
When the sigmoid function value is either too high or too low, the derivative becomes very
small i.e. << 1. This causes vanishing gradients and poor learning for deep networks.