Introduction To Convolutional Neural Networks (CNNS)

MACHINE LEARNING
– CONVOLUTIONAL
NEURAL NETWORK
Introduction to Computer Vision
 Computer vision is concerned with the automatic extraction, analysis and

understanding of useful information from a single image or a sequence of
images.
- The British Machine Vision Association and Society for Pattern Recognition
(BMVA)
(or)
 It is an interdisciplinary field that deals with how computers can be made

to gain high-level understanding from digital images or videos.
- Wikipedia
2
What is CNN(Convolution Neural Network)
● It is a class of deep learning.
● Convolutional neural network (ConvNet’s or CNNs) is one of the main

categories to do images recognition, images classifications, objects
detections, recognition faces etc.,
● It is similar to the basic neural network. CNN also have learnable

parameter like neural network i.e., weights, biases etc.
● CNN is heavily used in computer vision
● There 3 basic components to define CNN

○ The Convolution Layer
○ The Pooling Layer
○ The Output Layer (or) Fully Connected Layer 3
Basic Structure of
CNN
• Input Layer: Accepts input images
as pixel data.
• Convolutional Layer: Applies

filters to extract features.
• ReLU Layer: Introduces non-

linearity to the network.
• Pooling Layer: Reduces spatial

dimensions of feature maps.
• Fully Connected Layer: Final layer

for classification.
Convolutional Layer
• Padding:
• • Stride:
Adds pixels
Filters/Kernels: Controls the
around the
Detect specific movement of
input to
features in filters across
maintain
input images. the input.
dimensions.
• Output:
Produces
feature maps
indicating
detected
features.
Architecture of CNN
6
Convolution Layer
Images source:
Analytics Vidhya
7
Padding in CNN
• Zero Padding: Adds zeros
around the input image to
preserve dimensions.
• Valid Padding: No padding,

reduces the size of output
feature maps.
• Role: Helps preserve edge

information during
convolution.
The concept of stride :
● The weight of a matrix moves 1 pixel at a time is called as stride 1 (as we did in above
case).
What if we increase the stride value?
9
Images source: Analytics
• As we can see in above image the increase in the
stride value decreases the size of the image (which
may cause in losing the features of the image).
• Padding the input image across it solves our

problem, we add more than one layer of zeros
around the image in case of higher stride values.
10
• when the input of 6x6 is padded around with zeros we get the output with same
dimensions of 6x6 this is known as ‘Same Padding’.
● The middle 4x4 pixel remains the same, here we have retained the more information from
borders and also preserved the size of image.

11
Pooling Layer
• Purpose: Reduces dimensionality
and computation in the network.
• Max Pooling: Selects the maximum

value from each pooling region.
• Average Pooling: Takes the

average value from each pooling
region.
• Impact: Retains important features

while reducing overfitting.
Basic Mathematics of CNN (B&W
Image)
• Convolution: Applies a filter matrix
across the image to detect features.
• Example: Sliding a 3x3 filter over a

grayscale image, producing a feature
map.
• ReLU: Applies non-linearity after

convolution.
• Pooling: Reduces the size of the

resulting feature map.
Basic Mathematics of CNN
(Colored Image)
• Convolution: Applies the same filter across
each RGB channel.
• Result: Produces a combined feature map from

all channels.
• Example: Sliding a filter across an RGB image

and summing up feature maps.
• Pooling: Reduces the size of the resulting

feature map while preserving important
information.
Fully Connected Layer
• Purpose: Flattens the output and connects to a fully
connected layer.
• Function: Combines features for final classification.
• Uses: Softmax or sigmoid activation functions for
output.
Types of CNN
● Based on the problems, we have the different CNN’s which are used in
computer vision.
● The five major computer vision techniques which can be addressed using
CNN.
■ Image Classification
■ Object Detection
■ Object Tracking
■ Semantic Segmentation
■ Instance Segmentation
16
Types of CNN
Image Classification:
● In an image classification we can use the traditional CNN models or there

also many architectures designed by developers to decrease the error
rate and increasing the trainable parameters.
■ LeNet (1998)
■ AlexNet (2012)
■ ZFNet (2013)
■ GoogLeNet19 (2014)
■ VGGNet 16 (2014)
■ ResNet(2015) 17
LeNet-5 Architecture
• Designed for handwritten
digit recognition (MNIST
dataset).
• Structure: 2 convolutional
layers, 2 subsampling layers,
2 fully connected layers.
• Key Feature: Simple and

efficient, early CNN model.
AlexNet Architecture
• Winner of the ImageNet
competition in 2012.
• Structure: 5 convolutional
layers, 3 fully connected layers.
• Features: Uses ReLU, dropout,

and data augmentation.
• Impact: Revolutionized deep

learning and computer vision.
VGG-16 Architecture
• Uses 16 layers (13
convolutional, 3 fully
connected).
• Features: Smaller filters

(3x3) with deeper networks.
• Strength: Achieves high

accuracy with a simple
structure.
ResNet Architecture
• Introduces Residual Learning
to combat vanishing gradients.
• Structure: Skip connections

or shortcuts between layers.
• Impact: Allows very deep

networks (e.g., ResNet-50,
ResNet-101).
Inception (GoogLeNet)
Architecture
• Introduces Inception modules:
parallel convolutional filters.
• Structure: Multiple filter sizes

(1x1, 3x3, 5x5) in parallel.
• Impact: Efficient and scalable

for large-scale image recognition.
Transfer Learning
• Concept: Uses a pre-trained model on a new but
related task.
• Benefits: Speeds up training, requires less data, and

improves performance.
• Example: Using a pre-trained model like ResNet for a

new image classification task.
Object Localization
• Purpose: Identifies the location of objects within an
image.
• Methods: Bounding box regression, Region Proposal

Networks (RPNs).
• Applications: Object detection, image segmentation.

Landmark Detection
• Definition: Detects specific key
points or landmarks within an
image.
• Applications: Facial recognition,

medical imaging (e.g., key
anatomical points).
• Methods: CNNs used to detect and

regress the position of landmarks.
Applications of Computer Vision
● Computer vision, an AI technology that allows computers to

understand and label images, is now used in convenience stores,
driverless car testing, daily medical diagnostics, and in monitoring
the health of crops and livestock.
● Different use cases found in the computer vision as follows
■ Retail and Retail Security
■ Automotive
■ Healthcare
■ Banking
■ Agriculture
■ Industrial 26
Conclusion
• CNNs have revolutionized computer vision tasks.
• Architectures like LeNet, AlexNet, VGG, ResNet, and

Inception paved the way for modern image processing.
• Transfer learning, object localization, and landmark

detection expand the versatility of CNNs.
Thank you!
28

Introduction To Convolutional Neural Networks (CNNS)

Uploaded by

Copyright:

Available Formats

Introduction To Convolutional Neural Networks (CNNS)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Convolutional Neural Networks (CNNS)

Uploaded by

Copyright:

Available Formats

MACHINE LEARNING

 Computer vision is concerned with the automatic extraction, analysis and

 It is an interdisciplinary field that deals with how computers can be made

● It is a class of deep learning.

● Convolutional neural network (ConvNet’s or CNNs) is one of the main

● It is similar to the basic neural network. CNN also have learnable

● CNN is heavily used in computer vision

● There 3 basic components to define CNN

• Convolutional Layer: Applies

• ReLU Layer: Introduces non-

• Pooling Layer: Reduces spatial

• Fully Connected Layer: Final layer

• Valid Padding: No padding,

• Role: Helps preserve edge

What if we increase the stride value?

• Padding the input image across it solves our

Images source: Analytics

• Max Pooling: Selects the maximum

• Average Pooling: Takes the

• Impact: Retains important features

• Example: Sliding a 3x3 filter over a

• ReLU: Applies non-linearity after

• Pooling: Reduces the size of the

• Result: Produces a combined feature map from

• Example: Sliding a filter across an RGB image

• Pooling: Reduces the size of the resulting

● In an image classification we can use the traditional CNN models or there

• Key Feature: Simple and

• Features: Uses ReLU, dropout,

• Impact: Revolutionized deep

• Features: Smaller filters

• Strength: Achieves high

• Structure: Skip connections

• Impact: Allows very deep

• Structure: Multiple filter sizes

• Impact: Efficient and scalable

• Benefits: Speeds up training, requires less data, and

• Example: Using a pre-trained model like ResNet for a

• Methods: Bounding box regression, Region Proposal

• Applications: Object detection, image segmentation.

• Applications: Facial recognition,

• Methods: CNNs used to detect and

● Computer vision, an AI technology that allows computers to

● Different use cases found in the computer vision as follows

■ Retail and Retail Security

• Architectures like LeNet, AlexNet, VGG, ResNet, and

• Transfer learning, object localization, and landmark

You might also like