UNIT-III DeepLearning Notes

CONVOLUTION NEURAL NETWORK
DEFINITION
IBM DEFINITION:
IBM defines Convolutional Neural Networks (CNNs) as a specialized type of artificial neural network
designed to process structured grid-like data, such as images, by leveraging the spatial and temporal
dependencies between pixels. IBM emphasizes CNNs' ability to mimic the human visual system. By
scanning small parts of an image (using filters), CNNs recognize essential features and combine them
to understand the overall image.
A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture
commonly used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a
computer to understand and interpret the image or visual data.
When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural Networks
are used in various datasets like images, audio, and text. Different types of Neural Networks are
used for different purposes, for example for predicting the sequence of words we use Recurrent
Neural Networks more precisely an LSTM, similarly for image classification we use Convolution
Neural networks.
ARCHITECTURE EXPLANATION:
Neural Networks: Layers and Functionality
In a regular Neural Network there are three types of layers:
1. Input Layers: It’s the layer in which we give input to our model. The
number of neurons in this layer is equal to the total number of features
in our data (number of pixels in the case of an image).
2. Hidden Layer: The input from the Input layer is then fed into the
hidden layer. There can be many hidden layers depending on our
model and data size. Each hidden layer can have different numbers of
neurons which are generally greater than the number of features. The
output from each layer is computed by matrix multiplication of the
output of the previous layer with learnable weights of that layer and
then by the addition of learnable biases followed by activation function
which makes the network nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a
logistic function like sigmoid or softmax which converts the output of
each class into the probability score of each class.
The data is fed into the model and output from each layer is obtained from
the above step is called feedforward, we then calculate the error using an
error function, some common error functions are cross-entropy, square
loss error, etc. The error function measures how well the network is
performing. After that, we backpropagate into the model by calculating the
derivatives. This step is called Backpropagation which basically is used
to minimize the loss.
1
Convolution Neural Network
Convolutional Neural Network (CNN) is the extended version of artificial
neural networks (ANN) which is predominantly used to extract the feature
from the grid-like matrix dataset. For example visual datasets like images
or videos where data patterns play an extensive role.
CNN Architecture
Convolutional Neural Network consists of multiple layers like the input
layer, Convolutional layer, Pooling layer, and fully connected layers.
Simple CNN architecture
The Convolutional layer applies filters to the input image to extract

features, the Pooling layer downsamples the image to reduce
computation, and the fully connected layer makes the final prediction. The
network learns the optimal filters through backpropagation and gradient
descent.
How Convolutional Layers Works?
Convolution Neural Networks or covnets are neural networks that share
their parameters. Imagine you have an image. It can be represented as a
cuboid having its length, width (dimension of the image), and height (i.e
the channel as images generally have red, green, and blue channels).
Now imagine taking a small patch of this image and running a small
neural network, called a filter or kernel on it, with say, K outputs and
representing them vertically. Now slide that neural network across the
whole image, as a result, we will get another image with different widths,
heights, and depths. Instead of just R, G, and B channels now we have
2
more channels but lesser width and height. This operation is
called Convolution. If the patch size is the same as that of the image it
will be a regular neural network. Because of this small patch, we have
fewer weights.
Image source: Deep Learning Udacity
Mathematical Overview of Convolution

Now let’s talk about a bit of mathematics that is involved in the whole
convolution process.
 Convolution layers consist of a set of learnable filters (or kernels)
having small widths and heights and the same depth as that of input
volume (3 if the input layer is image input).
 For example, if we have to run convolution on an image with
dimensions 34x34x3. The possible size of filters can be axax3, where
‘a’ can be anything like 3, 5, or 7 but smaller as compared to the image
dimension.
 During the forward pass, we slide each filter across the whole input
volume step by step where each step is called stride (which can have
a value of 2, 3, or even 4 for high-dimensional images) and compute
the dot product between the kernel weights and patch from input
volume.
 As we slide our filters we’ll get a 2-D output for each filter and we’ll
stack them together as a result, we’ll get output volume having a depth
equal to the number of filters. The network will learn all the filters.
Layers Used to Build ConvNets
A complete Convolution Neural Networks architecture is also known as
covnets. A covnets is a sequence of layers, and every layer transforms
one volume to another through a differentiable function.
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension 32 x
32 x 3.
 Input Layers: It’s the layer in which we give input to our model. In
CNN, Generally, the input will be an image or a sequence of images.
3
This layer holds the raw input of the image with width 32, height 32,
and depth 3.
 Convolutional Layers: This is the layer, which is used to extract
the feature from the input dataset. It applies a set of learnable filters
known as the kernels to the input images. The filters/kernels are
smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input
image data and computes the dot product between kernel weight and
the corresponding input image patch. The output of this layer is
referred as feature maps. Suppose we use a total of 12 filters for this
layer we’ll get an output volume of dimension 32 x 32 x 12.
 Activation Layer: By adding an activation function to the output of
the preceding layer, activation layers add nonlinearity to the network. it
will apply an element-wise activation function to the output of the
convolution layer. Some common activation functions are RELU:
max(0, x), Tanh, Leaky RELU, etc. The volume remains unchanged
hence output volume will have dimensions 32 x 32 x 12.
 Pooling layer: This layer is periodically inserted in the covnets and
its main function is to reduce the size of volume which makes the
computation fast reduces memory and also prevents overfitting. Two
common types of pooling layers are max pooling and average
pooling. If we use a max pool with 2 x 2 filters and stride 2, the
resultant volume will be of dimension 16x16x12.
Image source: cs231n.stanford.edu
 Flattening: The resulting feature maps are flattened into a one-

dimensional vector after the convolution and pooling layers so they can
4
be passed into a completely linked layer for categorization or
regression.
 Fully Connected Layers: It takes the input from the previous layer
and computes the final classification or regression task.
Image source: cs231n.stanford.edu
 Output Layer: The output from the fully connected layers is then
fed into a logistic function for classification tasks like sigmoid or
softmax which converts the output of each class into the probability
score of each class.
Advantages and Disadvantages of Convolutional
Neural Networks (CNNs)
Advantages of CNNs:
1. Good at detecting patterns and features in images, videos, and
audio signals.
2. Robust to translation, rotation, and scaling invariance.
3. End-to-end training, no need for manual feature extraction.
4. Can handle large amounts of data and achieve high accuracy.
Disadvantages of CNNs:
1. Computationally expensive to train and require a lot of memory.
2. Can be prone to overfitting if not enough data or proper
regularization is used.
3. Requires large amounts of labeled data.
4. Interpretability is limited, it’s hard to understand what the network
has learned.
5
CNN TYPES
Convolutional Neural Networks (CNNs) have revolutionized the

field of computer vision, powering advancements in image
recognition, object detection, and various other visual tasks. The
success of CNNs lies in their ability to automatically learn
hierarchical representations from data. In this article, we’ll explore
the rich landscape of CNN architectures, each tailored to specific
challenges and use cases.
CNN Architecture
1. LeNet-5: The Pioneer
LeNet-5, introduced by Yann LeCun and his team in the 1990s, was
one of the first successful CNN architectures. Designed for
handwritten digit recognition, it laid the foundation for subsequent
CNN developments. LeNet-5 features convolutional layers,
subsampling layers, and fully connected layers, showcasing the core
elements of modern CNNs.
6
LeNet CNN
2. AlexNet: Igniting Deep Learning Resurgence
AlexNet, created by Alex Krizhevsky, Ilya Sutskever, and Geoffrey

Hinton, marked a turning point in deep learning. Introduced in the
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in
2012, AlexNet featured deep convolutional layers. The AlexNet
architecture was designed to be used with large-scale image datasets
and it achieved state-of-the-art results at the time of its publication.
AlexNet is composed of 5 convolutional layers with a combination of
max-pooling layers, 3 fully connected layers, and 2 dropout layers.
The activation function used in all layers is Relu. The activation
function used in the output layer is Softmax. The total number of
parameters in this architecture is around 60 million.
7
3. VGGNet: The Pursuit of Simplicity
The Visual Geometry Group (VGG) at Oxford University proposed

the VGGNet architecture. VGGNet is known for its simplicity,
featuring a uniform architecture with small receptive fields (3x3
convolutional kernels) and deep stacks of layers. Its straightforward
design contributed to its popularity, and VGG models come in
various depths (e.g., VGG16, VGG19).
8
4. GoogLeNet (Inception): Embracing Parallelism
GoogLeNet, winner of ILSVRC 2014, introduced the Inception

module, which employs parallel convolutional operations with
different kernel sizes. This architecture efficiently captures features
at multiple scales, promoting better generalization. GoogLeNet
showcased the benefits of inception modules for improving
performance.
9
Google Net Like Architecture
5. ResNet: Tackling Vanishing Gradients
Residual Networks, or ResNets, proposed by Kaiming He et al.,

addressed the challenge of training very deep networks. ResNets
introduce shortcut connections that bypass one or more layers,
allowing the gradient to flow more easily during backpropagation.
This architectural innovation facilitated the training of extremely
deep networks, reaching hundreds of layers.
6. MobileNet: Lightweight Efficiency
MobileNet, designed by Google, focuses on efficiency for mobile and

edge devices. It employs depthwise separable convolutions,
separating spatial and depthwise convolutions to reduce the number
of parameters and computations. MobileNet strikes a balance
between accuracy and computational efficiency, making it ideal for
resource-constrained environments.
10
Image Classification using CNN
Today, we will create an Image Classifier of our own that can distinguish
whether a given pic is of a dog or cat or something else depending
upon your fed data. To achieve our goal, we will use one of the famous
machine learning algorithms out there which are used for Image
Classification i.e. Convolutional Neural Network(or CNN).
So basically what is CNN – as we know it’s a machine learning algorithm
for machines to understand the features of the image with foresight and
remember the features to guess whether the name of the new image is
fed to the machine. Since it’s not an article explaining CNN so I’ll add
some links in the end if you guys are interested in how CNN works and
behaves.
So after going through all those links let us see how to create our very
own cat-vs-dog image classifier. For the dataset we will use the Kaggle
dataset of cat-vs-dog:
Now after getting the data set, we need to preprocess the data a bit and
provide labels to each of the images given there during training the data
set. To do so we can see that name of each image of the training data set
is either start with “cat” or “dog” so we will use that to our advantage then
we use one hot encoder for the machine to understand the labels(cat[1, 0]
or dog[0, 1]).
def label_img(img):
word_label = img.split('.')[-3]
# DIY One hot encoder

if word_label == 'cat': return [1, 0]
elif word_label == 'dog': return [0, 1]
Libraries Required:
o TFLearn – Deep learning library featuring a higher-level API
for TensorFlow used to create layers of our CNN
o tqdm – Instantly make your loops show a smart progress
meter, just for simple design sake
o numpy – To process the image matrices
o open-cv – To process the image like converting them to
grayscale and etc.
o os – To access the file system to read the image from the
train and test directory from our machines
o random – To shuffle the data to overcome the biasing
o matplotlib – To display the result of our predictive outcome.
11
o TensorFlow – Just to use the tensorboard to compare the
loss and adam curve our result data or obtained log.
TRAIN_DIR and TEST_DIR should be set according to the user’s
convenience and play with the basic hyperparameters like an epoch,
learning rate, etc to improve the accuracy. I have converted the image to
grayscale so that we will only have to deal with a 2-d matrix otherwise 3-d
matrix is tough to directly apply CNN to, especially not recommended for
beginners. Below here is the code which is heavily commented on
otherwise you can find the code here in my GitHub account from this link.
Image classification using Convolutional Neural Networks (CNNs) is a popular approach for
analyzing visual data. Here's how you can implement a basic CNN in Python using
TensorFlow/Keras to classify images.
Steps for Implementation
1. Install Required Libraries
Make sure you have TensorFlow installed:
pip install tensorflow matplotlib numpy
2. Load and Prepare Dataset
For demonstration, let's use the CIFAR-10 dataset, which contains 60,000 32x32 color
images across 10 classes.
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
# Load CIFAR-10 dataset

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Normalize the image pixel values (0-255) to range (0-1)

x_train = x_train / 255.0
x_test = x_test / 255.0
# Convert labels to one-hot encoding

y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
3. Visualize the Dataset

# Display the first few training images
12
class_names = ['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog',
'Frog', 'Horse', 'Ship', 'Truck']
plt.figure(figsize=(10, 5))
for i in range(10):
plt.subplot(2, 5, i+1)
plt.imshow(x_train[i])
plt.title(class_names[y_train[i].argmax()])
plt.axis('off')
plt.tight_layout()
plt.show()
4. Build a CNN Model
Define a CNN architecture with convolutional, pooling, and dense layers.
from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense,
Dropout
model = Sequential([
# First Convolutional Layer
Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
MaxPooling2D((2, 2)),
# Second Convolutional Layer

Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
# Flatten and Fully Connected Layers

Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(10, activation='softmax') # Output layer for 10 classes
])
5. Compile the Model
Set up the loss function, optimizer, and evaluation metric:
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
6. Train the Model
Fit the model to the training data:
history = model.fit(
x_train, y_train,
13
validation_split=0.2,
epochs=10,
batch_size=64
)
7. Evaluate the Model
Test the model on the test dataset:
test_loss, test_accuracy = model.evaluate(x_test, y_test)

print(f"Test Accuracy: {test_accuracy * 100:.2f}%")
8. Predict and Visualize
Use the trained model to predict test images:
import numpy as np
# Predict on a few test samples

predictions = model.predict(x_test)
# Display results for first 5 test images

for i in range(5):
plt.imshow(x_test[i])
plt.title(f"Predicted: {class_names[np.argmax(predictions[i])]},
Actual: {class_names[np.argmax(y_test[i])]}")
plt.axis('off')
plt.show()
Output
 Input: Images of objects (e.g., airplane, car, bird, etc.).

 Output: Predicted class of the image, along with the probability distribution across all
classes.
14
CNN PRACTICAL APPLICATION
HANDWRITTEN CHARACTER RECOGNITION USING MNIST

DATASET
Handwritten character recognition using the MNIST dataset involves several steps, from data
preparation to model training and evaluation. Here’s a breakdown of the process:
1. Dataset Overview
 MNIST Dataset: The MNIST dataset contains 70,000 grayscale images of

handwritten digits (0-9). Each image is 28x28 pixels, making it a great benchmark for
image classification tasks.
 Split: The dataset is typically split into 60,000 training images and 10,000 testing
images.
2. Data Preprocessing
 Normalization: Pixel values are scaled to the range [0, 1] by dividing by 255.0. This
helps improve the training stability and convergence speed.
 Reshaping: Since most deep learning frameworks expect input with a channel
dimension, images are reshaped from (28, 28) to (28, 28, 1).
 Label Encoding: The labels (digits 0-9) are one-hot encoded. For example, the label
3 becomes [0, 0, 0, 1, 0, 0, 0, 0, 0, 0].
3. Model Architecture
 Convolutional Neural Network (CNN): CNNs are effective for image classification
because they can capture spatial hierarchies and features.
o Convolutional Layers: These layers apply filters to the input images to detect
features like edges and patterns.
o Activation Function: ReLU (Rectified Linear Unit) is commonly used to
introduce non-linearity.
o Pooling Layers: Max pooling layers reduce the spatial dimensions of the
feature maps, retaining the most important features and reducing computation.
o Dense Layers: After flattening the output from the convolutional layers, dense
layers (fully connected layers) are used to make the final classification. The
last layer typically uses a softmax activation function for multi-class
classification.
4. Training the Model
 Compilation: The model is compiled with a loss function (categorical cross-entropy

for multi-class problems) and an optimizer (like Adam) to update weights during
training.
 Fitting: The model is trained on the training set using backpropagation for a defined
number of epochs. The training process adjusts the model weights to minimize the
loss function.
15
5. Evaluation
 After training, the model is evaluated on the test set to assess its performance. The test
accuracy indicates how well the model can generalize to unseen data.
6. Results and Interpretation
 The final test accuracy gives a measure of the model’s performance. Commonly,
CNNs achieve accuracies above 98% on the MNIST dataset.
 Misclassifications can be analyzed to understand where the model struggles, which
can guide further improvements.
7. Applications
 Handwritten digit recognition can be applied in various fields, such as automated

form processing, bank check recognition, and postal code recognition.
This process provides a solid foundation for understanding how to approach handwritten
character recognition tasks using deep learning. You can further experiment with the model
architecture, hyperparameters, and data augmentation techniques to enhance performance.
PRACTICAL IMPLEMENTATION WITH CODE
Creating a handwritten digit recognition system using the MNIST dataset in Python involves
the following steps:
Prerequisites
1. Install Required Libraries: Use libraries like TensorFlow/Keras, NumPy,

Matplotlib, and Scikit-learn. Install them using:
pip install tensorflow numpy matplotlib sklearn
2. Dataset: MNIST is a dataset of 70,000 handwritten digits (0-9) with each image being
28x28 pixels. You can easily load it using TensorFlow or Keras.
Code Implementation
Here is a step-by-step guide:
1. Import Required Libraries

import numpy as np
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical
16
2. Load the Dataset
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Normalize the data (scale pixel values to 0-1 range)

x_train = x_train / 255.0
x_test = x_test / 255.0
# One-hot encode the labels

y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
3. Visualize Sample Data

# Display a few training samples
for i in range(9):
plt.subplot(3, 3, i + 1)
plt.imshow(x_train[i], cmap='gray')
plt.title(f"Label: {np.argmax(y_train[i])}")
plt.axis('off')
plt.tight_layout()
plt.show()
4. Build the Neural Network Model

model = Sequential([
Flatten(input_shape=(28, 28)), # Flatten 2D images to 1D
Dense(128, activation='relu'), # First hidden layer
Dense(64, activation='relu'), # Second hidden layer
Dense(10, activation='softmax') # Output layer
])
5. Compile the Model

model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
6. Train the Model

x_train, y_train,
validation_split=0.2,
epochs=10,
batch_size=32
)
7. Evaluate the Model
test_loss, test_accuracy = model.evaluate(x_test, y_test)

print(f"Test Accuracy: {test_accuracy * 100:.2f}%")
8. Predict and Visualize

# Display some test results

17
for i in range(5):
plt.imshow(x_test[i], cmap='gray')
plt.title(f"Predicted: {np.argmax(predictions[i])}, Actual:
{np.argmax(y_test[i])}")
plt.axis('off')
plt.show()
Output
 Accuracy: With a simple neural network, you can expect around 96-98% accuracy. To
improve further, you can use convolutional neural networks (CNNs).
Extensions
1. Use CNNs: Leverage layers like Conv2D and MaxPooling2D for better accuracy.
2. Data Augmentation: Enhance training data with transformations like rotation and flipping.
3. Save the Model: Save your trained model using model.save('mnist_model.h5') for
reuse.
Sample Input
1. Image Input
o The MNIST dataset contains grayscale images of digits. Each image is 28x28
pixels, with pixel values ranging from 0 to 255.
o Below is an example of one such image from the dataset:
Image:
A grayscale image representing the digit "5".
Numerical Representation (partial view):
css
Copy code
[[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0]
...
[ 0 0 0 54 63 156 170 253 253 189 39 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0]
...
]
2. Input After Preprocessing
18
o The pixel values are normalized to the range [0, 1]:
[[0.0, 0.0, 0.0, ..., 0.0],

[0.0, 0.0, 0.0, ..., 0.0],
...
[0.0, 0.0, 0.5, ..., 0.0],
...
]
Sample Output
1. Predicted Class After feeding the image into the trained model, it outputs a
probability distribution over the 10 classes (digits 0-9). For example:
print(predictions[0])
Output:
[1.2e-05, 2.0e-06, 0.0003, 0.0001, 0.002, 0.995, 0.0002, 0.0003,

0.0001, 0.00007]
2. Interpretation The model predicts that the input image corresponds to the digit 5, as
the 6th value (index 5) has the highest probability (0.995).
3. Visual Output Display the result visually:
plt.imshow(x_test[0], cmap='gray')
plt.title(f"Predicted: 5, Actual: 5")
plt.axis('off')
plt.show()
Output Image:
o The grayscale image of "5" is shown with the prediction and actual label as
"5".
Summary of Input and Output
 Input: A 28x28 grayscale image of a handwritten digit (e.g., "5").

 Output: The predicted digit (e.g., 5) along with the probability distribution over all
digits.
19
APPLICATIONS OF CNN
Convolutional Neural Networks (CNNs) are widely used in various domains due to their ability to
extract features and patterns from images and structured data. Below are some notable applications
of CNNs:
1. Computer Vision
Image Classification
 Objective: Categorize images into predefined classes (e.g., cat, dog, airplane).
 Examples:
o Handwritten digit recognition (MNIST dataset).
o Classifying medical images (e.g., X-rays, MRIs).
Object Detection
 Objective: Identify objects within an image and locate them with bounding boxes.
 Examples:
o Autonomous vehicles detecting pedestrians, traffic signs, and other vehicles.
o Face detection in images (used in social media tagging).
Image Segmentation
 Objective: Assign a label to each pixel in an image.

 Examples:
o Tumor detection in medical imaging.
o Autonomous driving for road and lane segmentation.
Image Generation
 Objective: Generate realistic images using Generative Adversarial Networks (GANs).

 Examples:
o Creating synthetic faces.
o Enhancing image resolution (super-resolution).
Facial Recognition
 Objective: Identify individuals by analyzing facial features.

 Examples:
o Security systems (e.g., unlocking devices, surveillance).
o Personalization features in apps.
2. Healthcare
20
Medical Imaging
 Objective: Analyze medical images for diagnostic purposes.

 Examples:
o Detecting cancer from CT scans or MRIs.
o Classifying skin lesions as malignant or benign.
Drug Discovery
 Objective: Predict molecular interactions and properties.

 Examples:
o Predicting protein folding structures using 3D CNNs.
Pathology
 Objective: Automating the analysis of slides for diseases.

 Examples:
o Identifying infected cells in microscopic slides.
3. Autonomous Systems
Self-Driving Cars
 Objective: Process real-time video for decision-making.

 Examples:
o Detecting road signs, obstacles, and pedestrians.
o Lane detection for navigation.
Drones
 Objective: Enable drones to navigate and analyze environments.

 Examples:
o Agricultural monitoring using aerial imaging.
o Surveillance and reconnaissance tasks.
4. Natural Language Processing
Text Recognition
 Objective: Recognize and convert handwritten or printed text into digital format.
 Examples:
o Optical Character Recognition (OCR).
o Translating text from images in different languages.
21
Speech Recognition
 Objective: Analyze spectrograms of audio signals for speech-to-text conversion.

 Examples:
o Voice assistants like Siri or Alexa.
5. Gaming and Entertainment
Augmented and Virtual Reality (AR/VR)
 Objective: Enhance virtual environments using image analysis.

 Examples:
o Real-time object recognition in AR.
o Gesture recognition for gaming.
Style Transfer
 Objective: Transform images or videos into a specific artistic style.

 Examples:
o Apps like Prisma for artistic photo transformations.
6. Security and Surveillance
Video Surveillance
 Objective: Analyze video feeds to detect unusual behavior.

 Examples:
o Recognizing theft or violence in public areas.
o Detecting unauthorized entry in restricted areas.
Biometric Authentication
 Objective: Authenticate users based on physical characteristics.

 Examples:
o Iris and fingerprint recognition systems.
7. Robotics
Industrial Automation
 Objective: Enhance precision in automated tasks.
22
 Examples:
o Quality control in manufacturing using visual inspection.
o Robots picking and sorting items using visual cues.
Agriculture
 Objective: Use CNNs for crop monitoring.

 Examples:
o Identifying pest infestations.
o Assessing crop health from aerial images.
8. Astronomy
Celestial Object Detection
 Objective: Analyze telescope data for identifying celestial phenomena.

 Examples:
o Detecting exoplanets or distant galaxies.
o Classifying types of stars or supernovae.
Satellite Imaging
 Objective: Analyze satellite images for environmental monitoring.

 Examples:
o Detecting deforestation.
o Tracking urban development.
9. Fashion and Retail
Visual Search
 Objective: Find similar items based on an input image.

 Examples:
o Online shopping platforms (e.g., searching for clothes by image).
o Style matching in personal wardrobe apps.
Product Recommendation
 Objective: Recommend products based on visual features.

 Examples:
o Suggesting complementary items based on appearance.
23
10. Art and Creativity
Image Enhancement
 Objective: Improve image quality using AI.

 Examples:
o Removing noise or restoring old photos.
o Enhancing image resolution (e.g., for historical documents).
Content Creation
 Objective: Assist artists in creating new artworks.

 Examples:
o AI-generated paintings or animations.
KEY TERMINOLGIES IN CNN
1. Convolutional Layer
 Definition: The core building block of CNNs, this layer applies filters (kernels) to the
input data to extract features like edges, textures, and patterns.
 Key Terms:
o Filter/Kernel: A small matrix (e.g., 3x3 or 5x5) used to scan the input data
and detect patterns.
o Stride: The step size with which the filter moves across the input. A stride of
1 means the filter moves one pixel at a time.
o Feature Map: The output of the convolution operation, showing detected
features.
2. Padding
 Definition: Adding extra pixels (usually zeros) around the edges of the input to
control the size of the output feature map.
 Types:
o Valid Padding: No padding; the feature map shrinks after convolution.
o Same Padding: Padding added to keep the output size the same as the input
size.
3. Activation Function
 Definition: Introduces non-linearity into the model, allowing it to learn complex

patterns.
24
 Common Activation Functions:
o ReLU (Rectified Linear Unit): Sets negative values to 0.
o Sigmoid: Maps output to a range between 0 and 1.
o Softmax: Converts outputs into probabilities for multi-class classification.
4. Pooling Layer
 Definition: Reduces the size of the feature map, making the model faster and less
prone to overfitting.
 Types:
o Max Pooling: Takes the maximum value in each patch.
o Average Pooling: Computes the average of each patch.
5. Fully Connected Layer (FC Layer)
 Definition: Connects all neurons from the previous layer to every neuron in this layer.
It's used at the end of the network for classification or regression tasks.
 Purpose: Combines features learned by convolutional layers for decision-making.
6. Dropout
 Definition: A regularization technique that randomly disables a fraction of neurons

during training to prevent overfitting.
 Dropout Rate: The fraction of neurons to drop (e.g., 0.5 = 50%).
7. Flattening
 Definition: Converts the multi-dimensional output of convolutional layers into a 1D

vector before feeding it to fully connected layers.
8. Backpropagation
 Definition: The process of updating weights in the network using the gradient of the
loss function with respect to the weights.
25
9. Epoch
 Definition: One complete pass through the entire training dataset.

 Related Term:
o Batch: A subset of the dataset processed before updating the model weights.
o Batch Size: The number of samples in a batch.
10. Loss Function
 Definition: Measures the error or difference between the predicted output and the
actual target.
 Common Loss Functions:
o Cross-Entropy Loss: Used for classification tasks.
o Mean Squared Error (MSE): Used for regression tasks.
11. Optimizer
 Definition: An algorithm used to adjust the model weights to minimize the loss
function.
 Popular Optimizers:
o SGD (Stochastic Gradient Descent).
o Adam (Adaptive Moment Estimation).
12. Epoch vs. Iteration
 Epoch: One full pass through the training data.

 Iteration: One update of model weights (i.e., one batch processed).
13. Learning Rate
 Definition: A hyperparameter that controls how much the model weights are updated
during training.
 Importance: Too high → may not converge; Too low → slow training.
14. Receptive Field
 Definition: The region of the input image that a particular feature in the feature map
is derived from. It increases as the network deepens.
26
15. Overfitting and Underfitting
 Overfitting: The model performs well on training data but poorly on unseen data.
 Underfitting: The model fails to capture the underlying patterns in the data.
16. Transfer Learning
 Definition: Using a pre-trained CNN (e.g., ResNet, VGG) on a new but similar task
to save time and improve accuracy.
17. Preprocessing
 Definition: Steps to prepare data for the model, such as resizing, normalization
(scaling pixel values to 0–1), and data augmentation (rotations, flips).
18. Data Augmentation
 Definition: Creating additional training data by applying transformations like

rotation, flipping, zooming, etc., to improve model generalization.
19. Training vs. Validation vs. Test Data
 Training Data: Used to train the model.

 Validation Data: Used to tune hyperparameters and check model performance during
training.
 Test Data: Used to evaluate final model performance.
20. Confusion Matrix
 Definition: A table summarizing the performance of a classification model. It shows

true positives, false positives, true negatives, and false negatives.
A confusion matrix is a performance evaluation tool for classification models. It is a table that
summarizes how well the predictions of a model match the actual labels of a dataset. Each row in
the matrix represents the instances of an actual class, while each column represents the instances of
a predicted class.
27
For a binary classification task (e.g., classifying an email as spam or not spam), the confusion
matrix is typically a 2x2 table:
Predicted: Positive Predicted: Negative

Actual: Positive True Positive (TP) False Negative (FN)
Actual: Negative False Positive (FP) True Negative (TN)
IMAGE AUGMENTATION IN PYTHON
Image augmentation is a common technique used in training convolutional neural networks (CNNs)
to artificially expand the training dataset by applying random transformations to the images. Below
is an example of Python code for image augmentation using the Keras library:
PYTHON CODE:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Load a sample dataset (e.g., CIFAR-10)
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
# Normalize the pixel values
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
# Define the image augmentation generator
datagen = ImageDataGenerator(
rotation_range=20, # Random rotation between 0-20 degrees
width_shift_range=0.2, # Random horizontal shift
height_shift_range=0.2, # Random vertical shift
shear_range=0.2, # Shear transformation
zoom_range=0.2, # Random zoom
horizontal_flip=True, # Randomly flip images horizontally

28
fill_mode='nearest' # Fill mode for points outside boundaries
# Fit the generator to the training data
datagen.fit(x_train)
# Visualize augmented images
def plot_augmented_images(generator, images, num_images=5):
augmented_images = [next(generator.flow(images, batch_size=1))[0] for _ in

range(num_images)]
plt.figure(figsize=(10, 5))
for i in range(num_images):
plt.subplot(1, num_images, i + 1)
plt.imshow(augmented_images[i])
plt.axis('off')
plt.show()
# Plot a few augmented examples
plot_augmented_images(datagen, x_train[:10])
# Use the generator in model training
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
29
# Train the model using augmented data
datagen.flow(x_train, y_train, batch_size=64),
validation_data=(x_test, y_test),
epochs=10
30

UNIT-III DeepLearning Notes

Uploaded by

Copyright:

Available Formats

UNIT-III DeepLearning Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UNIT-III DeepLearning Notes

Uploaded by

Copyright:

Available Formats

CONVOLUTION NEURAL NETWORK

Simple CNN architecture

The Convolutional layer applies filters to the input image to extract

Image source: Deep Learning Udacity

Mathematical Overview of Convolution

Image source: cs231n.stanford.edu

 Flattening: The resulting feature maps are flattened into a one-

Image source: cs231n.stanford.edu

Convolutional Neural Networks (CNNs) have revolutionized the

1. LeNet-5: The Pioneer

2. AlexNet: Igniting Deep Learning Resurgence

AlexNet, created by Alex Krizhevsky, Ilya Sutskever, and Geoffrey

The Visual Geometry Group (VGG) at Oxford University proposed

GoogLeNet, winner of ILSVRC 2014, introduced the Inception

5. ResNet: Tackling Vanishing Gradients

Residual Networks, or ResNets, proposed by Kaiming He et al.,

6. MobileNet: Lightweight Efficiency

MobileNet, designed by Google, focuses on efficiency for mobile and

# DIY One hot encoder

Steps for Implementation

1. Install Required Libraries

Make sure you have TensorFlow installed:

pip install tensorflow matplotlib numpy

2. Load and Prepare Dataset

# Load CIFAR-10 dataset

# Normalize the image pixel values (0-255) to range (0-1)

# Convert labels to one-hot encoding

3. Visualize the Dataset

4. Build a CNN Model

Define a CNN architecture with convolutional, pooling, and dense layers.

from tensorflow.keras.models import Sequential

# Second Convolutional Layer

# Flatten and Fully Connected Layers

5. Compile the Model

Set up the loss function, optimizer, and evaluation metric:

6. Train the Model

Fit the model to the training data:

7. Evaluate the Model

Test the model on the test dataset:

test_loss, test_accuracy = model.evaluate(x_test, y_test)

8. Predict and Visualize

Use the trained model to predict test images:

# Predict on a few test samples

# Display results for first 5 test images

 Input: Images of objects (e.g., airplane, car, bird, etc.).

HANDWRITTEN CHARACTER RECOGNITION USING MNIST

 MNIST Dataset: The MNIST dataset contains 70,000 grayscale images of

4. Training the Model

 Compilation: The model is compiled with a loss function (categorical cross-entropy

6. Results and Interpretation

 Handwritten digit recognition can be applied in various fields, such as automated

PRACTICAL IMPLEMENTATION WITH CODE

1. Install Required Libraries: Use libraries like TensorFlow/Keras, NumPy,

pip install tensorflow numpy matplotlib sklearn

Here is a step-by-step guide:

1. Import Required Libraries

# Normalize the data (scale pixel values to 0-1 range)

# One-hot encode the labels

3. Visualize Sample Data

4. Build the Neural Network Model