Unit1 Kumod Deeplearning
Unit1 Kumod Deeplearning
Unit1 Kumod Deeplearning
Deep Learning
(ACSML0702)
Unit: I
INTRODUCTION
Dr. Kumod kr. Gupta
Course Details (Asst. Professor)
(B. Tech. 7th Sem)
AI Department
Curse of Dimensionality, Bias and Variance Trade off, Overfitting and underfitting,
Regression - MAE, MSE, RMSE, R Squared, Adjusted R Squared, p-Value, Classification -
Precision, Recall, F1, Other topics, K-Fold Cross validation, RoC curve, Hyper-Parameter
Tuning Introduction – Grid search, random search, Introduction to Deep Learning.
Artificial Neural Network: Neuron, Nerve structure and synapse, Artificial Neuron and its
model, activation functions, Neural network architecture: Single layer and Multilayer feed
forward networks, recurrent networks. Various learning techniques; Perception and
Convergence rule, Hebb Learning. Perceptron’s, Multilayer perceptron, Gradient descent and
the Delta rule, Multilayer networks, Derivation of Backpropagation Algorithm.
Introduction to CNN, Train a simple convolutional neural net, Explore the design space for convolutional nets,
Pooling layer motivation in CNN, Design a convolutional layered application, Understanding and visualizing a
CNN, Transfer learning and fine-tuning CNN, Image classification, Text classification, Image classification and
hyper-parameter tuning, Emerging NN architectures
Why use sequence models? Recurrent Neural Network Model, Notation, Back-propagation
through time (BTT), Different types of RNNs, Language model and sequence generation,
Sampling novel sequences, Vanishing gradients with RNNs, Gated Recurrent Unit (GRU),
Long Short-Term Memory (LSTM), Bidirectional RNN, Deep RNNs
Course Bloom’s
Outcome At the end of course , the student will be able to: Knowledge
( CO) Level (KL)
CO1 Analyze ANN model and understand the ways of accuracy K4
measurement.
CO2 Develop a convolutional neural network for multi-class K6
classification in images
CO3 Apply Deep Learning algorithm to detect and recognize an K3
object.
CO4 Apply RNNs to Time Series Forecasting, NLP, Text and K4
Image Classification
CO5 Apply Lower-dimensional representation over higher- K3
dimensional data for dimensionality reduction and capture
the important features of an object.
Dr. Kumod Kumar Gupta Deep Learning Unit I 10
Program Outcomes (POs)
PO8 : Ethics
PO10 : Communication
CO.K PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1 3 3 3 3 2 2 1 - 1 - 2 2
CO2 3 3 3 3 2 2 1 - 1 1 2 2
CO3 3 3 3 3 3 2 2 - 2 1 2 3
CO4 3 3 3 3 3 2 2 1 2 1 2 3
CO5 3 3 3 3 3 2 2 1 2 1 2 2
AVG 3.0 3.0 3.0 3.0 2.6 2.0 1.6 0.4 1.6 0.8 2.0 2.4
Institute Result
32 bins
33 bins
• It is important to understand prediction errors (bias and variance) when it comes to accuracy in any
machine-learning algorithm.
• There is a tradeoff between a model’s ability to minimize bias and variance which is referred to as the best
solution for selecting a value of Regularization constant.
• A proper understanding of these errors would help to avoid the overfitting and underfitting of a data set
while training the algorithm.
What is Variance?
• The variability of model prediction for a given data point which tells us the spread of our data is called the
variance of the model.
• The model with high variance has a very complex fit to the training data and thus is not able to fit accurately
on the data which it hasn’t seen before. As a result, such models perform very well on training data but have
high error rates on test data.
• When a model is high on variance, it is then said to as Overfitting of Data.
• Overfitting is fitting the training set accurately via complex curve and high order hypothesis but is not the
solution as the error with unseen data is high. While training a data model variance should be kept low. The
high variance data looks as follows.
• In supervised learning, underfitting happens when a model unable to capture the underlying pattern of the data.
These models usually have high bias and low variance. It happens when we have very less amount of data to
build an accurate model or when we try to build a linear model with a nonlinear data. Also, these kind of models
are very simple to capture the complex patterns in data like Linear and logistic regression.
• In supervised learning, Overfitting happens when our model captures the noise along with the underlying pattern
in data. It happens when we train our model a lot over noisy dataset. These models have low bias and high
variance. These models are very complex like Decision trees which are prone to overfitting.
Overfitting is a problem where the evaluation of machine learning algorithms on training data is different from
unseen data.
Reasons for Overfitting:
1. High variance and low bias.
2.The model is too complex.
3.The size of the training data.
Techniques to Reduce Overfitting
4.Increase training data.
5.Reduce model complexity.
6.Early stopping during the training phase (have an eye over the loss over the training period as soon as loss
begins to increase stop training).
7.Ridge Regularization and Lasso Regularization.
8.Use dropout for neural networks to tackle overfitting.
9.Cross- Validation (K- Fold Cross Validation)
10.Batch normalization
Dr. Kumod Kumar Gupta Deep Learning Unit I 36
Overfitting(CO1)
Regularization
• The word “regularize” means to make things regular or acceptable.
• This is exactly why we use it for. Regularization is a form of regression used to reduce the error by
fitting a function appropriately on the given training set and avoid overfitting.
• It discourages the fitting of a complex model, thus reducing the variance and chances of overfitting. It
is used in the case of multicollinearity (when independent variables are highly correlated).
We also introduced the concept of loss functions. We will use one such loss function in this post -
Residual Sum of Squares (RSS). It can be mathematically given as:
• Here, λ is called the “tuning parameter” which decides how heavily we want to penalize the
flexibility of our model.
• If we look closely, we might observe that if λ=0, it performs like linear regression
• as λ→inf, the impact of the shrinkage penalty grows, and the ridge regression coe fficient estimates
will approach zero.
• As can be seen, selecting a good value of λ is critical. The coefficient estimates produced by this
method are sometimes also known as the “L2 norm”.
• Note:
• The tuning parameter λ controls the impact on bias and variance.
• As the value of λ rises, it reduces the value of coefficients and thus reducing the variance.
• Till a point, this increase in λ is beneficial as it is only reducing the variance (hence avoiding overfitting),
without losing any important properties in the data.
• But after a certain value, the model starts losing important properties, giving rise to bias in the model and
thus underfitting. Therefore, the value of λ should be carefully selected.
• λ is optimized using cross-validation(K –Fold Cross Validation)
Regularization:
• The regularization model promotes smoother functions by creating a new criterion function
that relies not only on the training error, but also on algorithmic intricacy.
• Particularly, the new criterion function punishes extremely complex hypotheses; looking for
the minimum in this criterion is to balance error on the training set with complexity.
• Formally, it is possible to write the new criterion as a sum of the error on the training set plus
a regularization term, which depicts constraints or sought after properties of solutions
Early Stopping:
• The training of a learning machine corresponds to iterative decrease in the error function defined as
per the training data.
• During a specific training session, this error generally reduces as a function of the number of iterations
in the algorithm.
• Stopping the training before attaining a minimum training error, represents a technique of restricting
the effective hypothesis complexity.
Pruning:
• An alternative solution that sometimes is more successful than early stopping the growth (complexity)
of the hypothesis is pruning the full-grown hypothesis that is likely to be overfitting the training data.
• Pruning is the basis of search in many decision-tree algorithms; weakest branches of large tree
overfitting the training data, which hardly reduce the error rate, are removed.
Actual_Value Predicted_Value
Roll No. CGPA IQ Loss Function Cost Function
Package Predicted
1 5.2 100 6.3 6.4 0.01
2 4.3 91 4.5 5.3 0.64
3.475
3 8.2 83 6.5 5.2 1.69
4 8.9 102 5.5 8.9 11.56
L = |Actual_Value - Predicted_Value|
C=
UNIT-1 Regression
• Advantages
– Easy to Understand
– Same unit as unit of Actual_Value
– It is Robust to Outlier: It means outlier will not affect error, so if there is
no outliers in dataset then it better to use MAE instead of MSE
• Disadvantages
– Grap is not differenciable due which Gradient Descent(GD) algorithm
not easy to implement.
– To implement GD we need to calculate Sub-Gradient.
UNIT-1 Regression
• MSE (Mean Squared Error): MSE is a metric that calculates the
average squared difference between the predicted values and
the actual values. Squaring the errors gives more weight to
larger errors, making it useful for penalizing significant
deviations from the true values.
L = (Actual_Value - Predicted_Value)2
C= 2
UNIT-1 Regression
• Advantages
– Easy to interpret
– Loss function is differenciable that allows to implement GD easily
– One Local Mininma: It means function has one minimum value that we
have to find.
• Disadvantage
– Unit of error is Square: That creates an confusion to understand it, so
to extract accurate error we have to find square root of MSE.
– It is not Robust to Outlier: If dataser conists outliers then MSE is not
useful
UNIT-1 Regression
• Huber loss
• Huber Loss is applicable when Outlier data is around 25% because 25% is
a significant amount of data and if we use MSE then it will ignore the 75%
data which is correct, because graph will deviate towards Outliers and if
we use MAE, it will ignore 25% outlier data that is also significant. In this
type of situation Huber Loss is useful.
UNIT-1 Regression
• RMSE
• The lower the RMSE, the better the model and its predictions.
• A higher RMSE indicates that there is a large deviation from the
residual to the ground truth.
UNIT-1 Regression
• Pros of the RMSE Evaluation Metric:
– RMSE is easy to understand.
– It serves as a heuristic for training models.
– It is computationally simple and easily differentiable which many
optimization algorithms desire.
– RMSE does not penalize the errors as much as MSE does due to the
square root.
• Cons of the RMSE metric:
– Like MSE, RMSE is dependent on the scale of the data. It increases in
magnitude if the scale of the error increases.
– One major drawback of RMSE is its sensitivity to outliers and the
outliers have to be removed for it to function properly.
UNIT-1 Regression
• R Squared
• R-squared (Coefficient of Determination) is a statistical measure that
quantifies the proportion of the variance in the dependent variable that is
explained by the independent variables in a regression model.
• Where:
– SSR (Sum of Squares Residual) represents the sum of squared differences between
the observed values and the predicted values by the model.
– SST (Total Sum of Squares) represents the sum of squared differences between the
observed values and the mean of the dependent variable.
UNIT-1 Regression
• R-squared ranges between 0 and 1, with the following
interpretations:
– =0: The model does not explain any of the variability in the dependent
variable. It's a poor fit.
– : The model explains a proportion of the variability. A higher R-squared
indicates a better fit, with 1 indicating a perfect fit where the model
explains all the variability.
– =1: The model perfectly predicts the dependent variable based on the
independent variables.
UNIT-1 Regression
• R-squared evaluates regression model fit but has limitations:
• High R-squared doesn't always mean good fit; high value may
imply overfitting, lacking generalization.
• Including more predictors can inflate R-squared, even if they're
weak; adjusted R-squared adjusts for this.
• "Good" R-squared varies by field; lower values acceptable in
data-rich areas.
• R-squared may miss fit quality with nonlinearity or outliers.
UNIT-1 Regression
• Adjusted R Squared
• Where −
– n = the number of points in your data sample.
– k = the number of independent regressors, i.e. the number of variables
in your model, excluding the constant.
UNIT-1 Regression
• Adjusted R Squared
– Adjusted R-squared adjusts the statistic based on the
number of independent variables in the model
– Adjusted R2 also indicates how well terms fit a curve or line,
but adjusts for the number of terms in a model.
– If you add more and more useless variables to a model,
adjusted r-squared will decrease.
– If you add more useful variables, adjusted r-squared will
increase.
– Adjusted R2 will always be less than or equal to R2
UNIT-1 Regression
• Adjusted R Squared
– Problem Statement −
• A fund has a sample R-squared value close to 0.5 and it is
doubtlessly offering higher risk adjusted returns with the
sample size of 50 for 5 predictors. Find Adjusted R square
value.
– Sample size = 50 Number of predictor = 5 Sample R - square
= 0.5.Substitute the qualities in the equation,
UNIT-1 Regression
• RMSE (Root Mean Squared Error): RMSE is the square root of
the MSE and is commonly used to express the average
magnitude of the prediction errors in the same units as the
dependent variable. It provides a measure of the model's
accuracy, and lower values indicate better performance.
• R Squared (Coefficient of Determination): R-squared is a
statistical measure that represents the proportion of the
variance in the dependent variable that is explained by the
independent variables in the regression model. It ranges from 0
to 1, where 1 indicates that the model explains all the variance,
and 0 indicates that the model doesn't explain any of the
UNIT-1 Regression
• Adjusted R Squared: Adjusted R-squared is a modified version
of R-squared that takes into account the number of
independent variables in the model. It penalizes the addition of
irrelevant variables that might artificially inflate the R-squared
value.
• p-Value: The p-value is a measure of the evidence against a null
hypothesis in a statistical hypothesis test. In the context of
regression analysis, p-values are used to determine whether
the coefficients of the independent variables are statistically
significant. A low p-value (typically below a significance level
like 0.05) suggests that the variable has a significant impact on
UNIT-1 Classification
• A Fraud Detection Classifier
• Objective: To detect fraud claim
• Assumption:
– The output of your fraud detection model is the probability [0.0–1.0]
that a transaction is fraudulent.
– If this probability is below 0.5, you classify the transaction as non-
fraudulent; otherwise, you classify the transaction as fraudulent.
• Methodology
– Collect 10,000 manually classified transactions, with 300 fraudulent
transaction and 9,700 non-fraudulent transactions.
– You run your classifier on every transaction, predict the class label
(fraudulent or non-fraudulent) and
UNIT-1 Classification
•
UNIT-1 Classification
• A True Positive (TP=100) is an outcome where the model
correctly predicts the positive (fraudulent) class.
• A True Negative (TN=9,000) is an outcome where the model
correctly predicts the negative (non-fraudulent) class.
• A False Positive (FP=700) is an outcome where the model
incorrectly predicts the positive (fraudulent) class.
• A False Negative (FN=200) is an outcome where the model
incorrectly predicts the negative (non-fraudulent) class.
UNIT-1 Classification
• Accuracy: Correctly predicted values out of total given data.
• Accuracy = ?
UNIT-1 Classification
• Area Under Curve
• Area Under Curve(AUC) is one of the most widely used metrics
for evaluation.
• It is used for binary classification problem.
• AUC of a classifier is equal to the probability that the classifier
will rank a randomly chosen positive example higher than a
randomly chosen negative example.
• Two basic terms used in AUC:
– True Positive Rate (Sensitivity)
– True Negative Rate (Specificity)
UNIT-1 Classification
• Area Under Curve
• Few basic terms used in AUC:
– True Positive Rate (Sensitivity) : True Positive Rate is defined as TP/
(FN+TP). True Positive Rate corresponds to the proportion of positive
data points that are correctly considered as positive, with respect to
all positive data points.
• False Positive Rate and True Positive Rate both have values in
the range [0, 1].
• FPR and TPR both are computed at varying threshold values
such as (0.00, 0.02, 0.04, …., 1.00) and a graph is drawn.
• AUC is the area under the curve of plot False Positive Rate vs
UNIT-1 Classification
• Area Under Curve
• As evident, AUC has a range of [0, 1]. The greater the value, the
better is the performance of our model.
UNIT-1 Classification
• F1-Score:
– F1 Score is used to measure a test’s accuracy
– F1 Score is the Harmonic Mean between precision and recall.
– The range for F1 Score is [0, 1].
– It tells you how precise your classifier is (how many instances it classifies correctly), as well as
how robust it is (it does not miss a significant number of instances).
– High precision but lower recall, gives you an extremely accurate, but it then misses a large
number of instances that are difficult to classify.
– The greater the F1 Score, the better is the performance of our model.
– Mathematically, it can be expressed as :
• Root Mean Squared Error (RMSE)and Mean Absolute Error (MAE) are metrics used to evaluate a
Regression Model. These metrics tell us how accurate our predictions are and, what is the amount of
• The Mean absolute error represents the average of the absolute difference between the actual and
predicted values in the dataset. It measures the average of the residuals in the dataset.
• Mean Squared Error represents the average of the squared difference between the original and
predicted values in the data set. It measures the variance of the residuals.
• Root Mean Squared Error is the square root of Mean Squared error. It measures the standard deviation
of residuals.
Technically, RMSE is the Root of the Mean of the Square of Errors and MAE is the Mean of Absolute value
of Errors. Here, errors are the differences between the predicted values (values predicted by our regression model)
and the actual values of a variable. They are calculated as follows :
• The coefficient of determination or R-squared represents the proportion of the variance in the
dependent variable which is explained by the linear regression model. It is a scale-free score i.e.
irrespective of the values being small or large, the value of R square will be less than one.
• Adjusted R squared is a modified version of R square, and it is adjusted for the number of
independent variables in the model, and it will always be less than or equal to R².In the formula
below n is the number of observations in the data and k is the number of the independent variables in
the data.
• The concept of p-value comes from statistics and widely used in machine learning and data
science.
• p-value is also used as an alternative to determine the point of rejection in order to provide the
smallest significance level at which the null hypothesis is least or rejected.
• it is expressed as the level of significance that lies between 0 and 1, and if there is smaller p-
value, then there would be strong evidence to reject the null hypothesis. if the value of p-value
is very small, then it means the observed output is feasible but doesn't lie under the null
hypothesis conditions (h0).
• the p-value of 0.05 is known as the level of significance (α). usually, it is considered using two
suggestions, which are given below:
– if p-value>0.05: the large p-value shows that the null hypothesis needs to be accepted.
– if p-value<0.05: the small p-value shows that the null hypothesis needs to be rejected, and
the result is declared as statically significant.
Some basic terms are Precision, Recall, and F1-Score. These relate to getting a finer-grained idea of how well
a classifier is doing, as opposed to just looking at overall accuracy.
I am looking at a binary classifier in this article. The same concepts do apply more broadly, just require a bit
more consideration on multi-class problems. But that is something to consider another time.
Recall / Sensitivity
Recall is a measure of how many of the positive cases the classifier
correctly predicted, over all the positive cases in the data. It is
sometimes also referred to as Sensitivity. The formula for it is:
• Cross-validation is a statistical method used to estimate the skill of machine learning models.
• It is commonly used in applied machine learning to compare and select a model for a given
predictive modeling problem because it is easy to understand, easy to implement, and results in
skill estimates that generally have a lower bias than other methods.
• Cross-validation is a resampling procedure used to evaluate machine learning models on a
limited data sample.
• The procedure has a single parameter called k that refers to the number of groups that a given
data sample is to be split into. As such, the procedure is often called k-fold cross-validation.
• When a specific value for k is chosen, it may be used in place of k in the reference to the
model, such as k=10 becoming 10-fold cross-validation.
• We can’t check the ability of this person because 70 math questions are from
algebra but in test 30 questions,10 from calculus so we can’t judge the ability of
person.
• That’s why we are going for K-Fold Cross- Validation, to get good results
https://www.youtube.com/watch?v=gJo0uNL-5Qw
95
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
96
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
By changing cutoff point false negative increases By changing cutoff point false positive increases
ROC curve can be used to determine cutoff point, which optimize the sensitivity,
specificity of a given test.
97
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
98
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
99
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
100
Dr. Kumod Kumar Gupta Deep Learning Unit I
Hyper parameter tuning(CO1)
• Hyperparameters in Machine learning are those parameters that are explicitly defined by the user to control
the learning process.
• These hyperparameters are used to improve the learning of the model, and their values are set before
starting the learning process of the model.
• They are usually fixed before the actual training process begins.
• These parameters express important properties of the model such as its complexity or how fast it should
learn.
• Some examples of model hyper parameters include:
• The penalty in Logistic Regression Classifier i.e. L1 or L2 regularization
• The learning rate for training a neural network.
• The C and sigma hyperparameters for support vector machines.
• The k in k-nearest neighbors.
https://www.geeksforgeeks.org/hyperparameter-tuning/
Models can have many hyperparameters and finding the best combination of
parameters can be treated as a search problem. The two best strategies for
Hyperparameter tuning are:
•GridSearchCV: Grid search Cross Validation
•RandomizedSearchCV: Randonized search Cross-Validation
In general, if the number of combinations is limited enough, we can use the Grid Search technique. But
when the number of combinations increases, we should try Random Search or Bayes Search as they are
not computationally expensive.
GridSearchCV takes a dictionary that describes the parameters that could be tried on a model to train
it. The grid of parameters is defined as a dictionary, where the keys are the parameters and the values
are the settings to be tested.
logreg_cv.fit(X, y)
https://www.geeksforgeeks.org/hyperparameter-tuning/
Dr. Kumod Kumar Gupta Deep Learning Unit I 105
Grid Search technique (CO1)
Drawback: GridSearchCV will go through all the intermediate combinations of hyperparameters
which makes grid search computationally very expensive.
• RandomizedSearchCV
RandomizedSearchCV solves the drawbacks of GridSearchCV, as it goes through only a fixed number
of hyperparameter settings.
• It moves within the grid in a random fashion to find the best set of hyperparameters. This approach
reduces unnecessary computation.
Deep learning is a class of machine learning algorithms that use several layers of
nonlinear processing units for feature extraction and transformation. Each successive
layer uses the output from the previous layer as input.
Deep neural networks, deep belief networks and recurrent neural networks have been
applied to fields such as computer vision, speech recognition, natural language
processing, audio recognition, social network filtering, machine translation, and
bioinformatics where they produced results comparable to and in some cases better than
human experts have.
Deep Learning Algorithms and Networks −
are based on the unsupervised learning of multiple levels of features or representations
of the data. Higher-level features are derived from lower level features to form a
hierarchical representation.
use some form of gradient descent for training.
Dr. Kumod Kumar Gupta Deep Learning Unit I 109
Deep Learning Applications (CO1)
111
Dr. Kumod Kumar Gupta Deep Learning Unit I
What Makes Deep Learning State-of-the-Art? (CO1)
In a word, accuracy. Advanced tools and techniques have dramatically improved deep learning algorithms
—to the point where they can outperform humans at classifying images, win against the world’s best GO
player, or enable a voice-controlled assistant like Amazon Echo® and Google Home to find and download
that new song you like.
112
Pretrained models built by experts Models such as AlexNet can be retrained to perform new recognition
tasks using a technique called transfer learning. While AlexNet was trained on 1.3 million high-resolution
images to recognize 1000 different objects, accurate transfer learning can be achieved with much smaller
datasets.
AI ML DL
AI
ML
DL
Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output
Synapse: weight
Dendrites Inputs
Synapse Weights
Axon Output
Our basic computational element (model neuron) is often called a node or unit. It receives input from some other
units, or perhaps from an external source. Each input has an associated weight w, which can be modified so as to
model synaptic learning. The unit computes some function f of the weighted sum of its inputs:
– threshold: a = {1 if n >= 0
(hardlimiting)
0 if n < 0
– sigmoid: a = 1/(1+e-n)
Bipolar continuous
Unipolar continuous
Unipolar Binary
Binary perceptrons
Continuous perceptrons
Feedforward Network
• It is a non-recurrent network having processing units/nodes in
layers and all the nodes in a layer are connected with the nodes of
the previous layers. The connection has different weights upon
them. There is no feedback loop means the signal can only flow in
one direction, from input to output. It may be divided into the
following two types −
• Feedback Network :As the name suggests, a feedback network has feedback
paths, which means the signal can flow in both directions using loops. This
makes it a non-linear dynamic system, which changes continuously until it
reaches a state of equilibrium.
• Recurrent networks − They are feedback networks with closed loops. It is a
closed loop network in which the output will go to the input again as feedback
as shown in the following diagram.
Feedback networks with closed loop are called Recurrent Networks. The response at the k+1’th instant depends on
the entire history of the network starting at k=0.
Automaton: A system with discrete time inputs and a discrete data representation is called an automaton
Final answer:
Dr. Kumod Kumar Gupta Deep Learning Unit I 139
Hebbian Learning Rule(CO1)
Minimization of error requires the weight changes to be in the negative gradient direction
Backpropagation of errors:
• k = Zk[1 - Zk](dk - Zk)
• j = Yj[1 - Yj] k Wjk
Weight updating:
• W (t+1) = W (t) + Y + [W (t) - W (t - 1)]
jk jk k j jk jk
• b (t+1) = b (t) + Y + [b (t) - b (t - 1)]
k k k tn k k
• W (t+1) = W (t) + X + [W (t) - W (t - 1)]
ij ij j i ij ij
• b (t+1) = b (t) + X + [b (t) - b (t - 1)]
j j j tn j j
• 1. https://nptel.ac.in/courses/117/105/117105084/
• 2. https://nptel.ac.in/courses/106/106/106106184/
• 3. https://nptel.ac.in/courses/108/105/108105103/
• 4.https://www.youtube.com/watch?
v=DKSZHN7jftI&list=PLZoTAELRMXVPGU70ZGsckrMdr0FteeRUi
• 5.https://www.youtube.com/watch?
v=aPfkYu_qiF4&list=PLyqSpQzTE6M9gCgajvQbc68Hk_JKGBAYT
(a) Machine learning (b) Artificial intelligence (c) Deep learning (d) none of these
(b)Machine learning (b) Artificial intelligence (c) Deep learning (d) none of these
(c) all the mathematics in the form of flow chart (b) Artificial intelligence algorithm (c)
Deep learning algorithm (d) none of these
(a) Scalability (b) Visualization of Data (c) Pipelining (d) all of these
(b)Whole work doing at a time (b) whole work divide in small segments and then
execute in parallel manner (c) copying a work from another processor (d) none of
these
6. What is API?
(a) A programming interface(b) After programming interface (c)
Application Programming Interface (d) none of these.
7. What is the main operation in TensorFlow?
(b)Computing(b) calculation (c) Pipelining (d) passing values and
assigning the output to another tensor.
8. TensorFlow is the product of which company?
(c) Google research team (b) Amazon technical team (c) PayPal
(d) none of these
Q1. Which neural network has only one hidden layer between the input and output?
A. Shallow neural network
B. Deep neural network
C. Feed-forward neural networks
D. Recurrent neural networks
Q3.Deep learning algorithms are _______ more accurate than machine learning algorithm in image
classification.
A. 33%
B. 37%
C. 40%
D. 41%
Q4. Which of the following functions can be used as an activation function in the output layer if we wish
to predict the probabilities of n classes (p1, p2..pk) such that sum of p over all n equals to 1?
A. Softmax
B. ReLu
C. Sigmoid
D. Tanh
Q5. Which of the following would have a constant input in each epoch of training a Deep Learning model?
A. Weight between input and hidden layer
B. Weight between hidden and output layer
C. Biases of all hidden layer neurons
D. Activation function of output layer
6. If in the training method we are not obtained the accurate output then which value the neural network
changes to get accurate output?
(a) bias (b) perceptron (c) weight (d) all value can change
Thank You