2 CNN
2 CNN
2 CNN
▪ Example: Consider a discrete signal ‘xt’ which represents the position of a spaceship at time ‘t’
recorded by a laser sensor.
▪ Considering that, the most recent measurements are more important, we would like to take
a weighted average over ‘xt’. The new estimate at time ‘t’ is computed as follows: x1
convolution
∞
𝑠𝑡 = 𝑥𝑡−𝑎 𝑤−𝑎 = 𝑥 ∗ 𝑤 𝑡
𝑎=0
input Filter/Mask/Kernel
x2
▪ Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 2
The Convolution Operation - 1D
▪ In practice, we would sum only over a small window.
6
▪ We just slide the filter over the input and compute the value of st based on a window around xt
* * * * * * *
x 1.0 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20
s 1.80
Content adapted from : CS7015 Deep Learning, Dept. of CSE, IIT Madras
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 3
The Convolution Operation - 1D
▪ In practice, we would sum only over a small window.
6
▪ We just slide the filter over the input and compute the value of st based on a window around xt
* * * * * * *
x 1.0 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20
s 1.80 1.96
Content adapted from : CS7015 Deep Learning, Dept. of CSE, IIT Madras
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 4
The Convolution Operation - 1D
▪ In practice, we would sum only over a small window.
6
▪ We just slide the filter over the input and compute the value of st based on a window around xt
* * * * * * *
x 1.0 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20
▪ Use cases of 1-D convolution : Audio signal processing, stock market analysis, time series analysis etc.
Content adapted from : CS7015 Deep Learning, Dept. of CSE, IIT Madras
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 5
Convolution in 2-D using Images : What is an Image?
What we see
▪ An image can be represented mathematically as a function f(x,y) which gives the intensity value at
position (x,y), where, f(x,y) ε {0,1,….,Imax-1} and x,y ε {0,1,…..,N-1}.
▪ Larger the value of N, more is the clarity of the picture (larger resolution), but more data to be analyzed
in the image.
▪ If the image is a Gray-scale (8-bit per pixel) image, then it requires N2 Bytes for storage.
▪ If the image is color - RGB, each pixel requires 3 Bytes of storage space.
N is the resolution of the image and Imax is the level of discretized brightness value.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 7
Convolution in 2-D using Images : What is an Image?
Digital camera
[Source: D. Hoiem]
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 8
Convolution in 2-D using Images : What is an Image?
[Source: D. Hoiem]
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 9
Convolution in 2-D using Images : What is an Image?
255 255 255 255 255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255 255 255 255 255
255 255 255 0 0 255 255 255 255 255 255 255
255 255 255 75 75 75 255 255 255 255 255 255
255 255 75 95 95 75 255 255 255 255 255 255
255 255 96 127 145 175 255 255 255 255 255 255
255 255 127 145 175 175 175 255 95 255 255 255
255 255 127 145 200 200 175 175 95 255 255 255
255 255 127 145 145 175 127 127 95 47 255 255
255 255 127 145 145 175 127 127 95 47 255 255
255 255 74 127 127 127 95 95 95 47 255 255
255 255 255 74 74 74 74 74 74 255 255 255
255 255 255 255 255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255 255 255 255 255
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 10
Convolution in 2-D using Images : What is an Image?
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 11
The Convolution Operation - 2D
▪ Images are good examples of 2-D inputs.
▪ A 2-D convolution of an Image ‘I’ using a filter ‘K’ of size ‘m x n’ is now defined as (looking at previous pixels):
𝑚−1 𝑛−1
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 12
The Convolution Operation - 2D
▪ Another way is to consider center pixel as reference pixel, and then look at its surrounding pixels:
𝑚/2 𝑛/2
Pixel of interest
0 1 0 0 1
0 0 1 1 0
1 0 0 0 1
0 1 0 0 1
0 0 1 0 1
Content adapted from : CS7015 Deep Learning, Dept. of CSE, IIT Madras
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 13
The Convolution Operation - 2D
Source: https://developers.google.com/
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 14
The Convolution Operation - 2D
Input Image
Source: https://developers.google.com/
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 15
The Convolution Operation - 2D
Input Image
Source: https://developers.google.com/
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 16
The Convolution Operation - 2D
Smoothening Filter
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 17
The Convolution Operation - 2D
Sharpening Filter
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 18
The Convolution Operation - 2D
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 19
The Convolution Operation – 2D : Various filters (edge detection)
Prewitt
-1 0 1 1 1 1
-1 0 1 0 0 0
-1 0 1 -1 -1 -1
Sx Sy After applying
Horizontal edge
detection filter
Sobel
-1 0 1 1 2 1
-2 0 2 0 0 0
-1 0 1 -1 -2 -1
Sx Sy Input image After applying
Vertical edge
Laplacian Roberts detection filter
0 1 0 0 1 1 0
1 -4 1 -1 0 0 -1
0 1 0 Sx Sy
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 20
The Convolution Operation - 2D
1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1
1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
Input image: 6 x 6
Note: Stride is the number of “unit” the kernel is shifted per slide over rows/ columns
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 21
The Convolution Operation - 2D
1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
If stride=2
1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
Input image: 6 x 6
Note: Stride is the number of “unit” the kernel is shifted per slide over rows/ columns
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 22
The Convolution Operation - 2D
1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1
1 0 0 0 0 1
3 -1 -3 -1
0 1 0 0 1 0
0 0 1 1 0 0 -3 1 0 -3
1 0 0 0 1 0 4 x 4 Feature Map
0 1 0 0 1 0 -3 -3 0 1
0 0 1 0 1 0
3 -2 -2 -1
Input image: 6 x 6
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 23
The Convolution Operation - 2D
-1 1 -1
-1 1 -1 Filter 2
-1 1 -1
stride=1
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 24
The Convolution Operation –RGB Images
R G B
Source: Intuitively Understanding Convolutions for Deep Learning | by Irhum Shafkat | Towards Data Science
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 25
The Convolution Operation –RGB Images multiple filters
11 -1-1 -1-1 -1-1 11 -1-1 -1-1 11 -1-1
1 -1 -1 -1 0 -1 -1 1 -1
-1 1 -1 -1-1 11 -1-1 -1-1 11 -1-1
-1-1 11 -1-1 Filter 1 -1 0 -1 Filter 2
0 0 0 Filter K
-1-1 -1-1 11 -1-1 11 -1-1 -1-1 11 -1-1
-1 -1 1 -1 0 -1 -1 1 -1
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0 K-filters = K-Feature Maps
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0 Depth of feature map = No. of feature maps = No. of filters
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 26
The Convolution Operation : Terminologies
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0 -1-1 11 -1-1
0 0 1 1 0 0 -1 1 -1
1 00 00 10 11 00 0 -1-1 11 -1-1
1 0 0 0 1 0 0 0 0 Filter
0 11 00 00 01 10 0 -1-1 11 -1-1
0 1 0 0 1 0 -1 1 -1
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
1. Depth of an Input Image = No. of channels in the Input Image = Depth of a filter
2. Assuming square filters, Spatial Extent (F) of a filter is the size of the filter
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 27
The Convolution Operation : Zero Padding
conv3x3
2x2
4x4
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 28
The Convolution Operation : Zero Padding
Source: Intuitively Understanding Convolutions for Deep Learning | by Irhum Shafkat | Towards Data Science
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 29
Convolutional Neural Network (CNN) : At a glance
cat | dog
Convolution
Pooling
Can repeat Fully Connected
many times Feedforward network
Convolution
Pooling
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1
• Max Pooling
3 -1 -3 -1 -1 -1 -1 -1 • Average Pooling
-3 1 0 -3 -1 -1 -2 1
-3 -3 0 1 -1 -1 -2 1
3 -2 -2 -1 -1 0 -4 3
Source: CS 898: Deep Learning and Its Applications, University of Waterloo, Canada.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 31
Pooling
Stride ?
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 32
Why Pooling ?
bird
bird
Subsampling
1 -1 -1
W1
11 -1-1 -1-1 W2
-1-1 11 -1-1
𝑾𝟏 − 𝑭 + 𝟐𝑷 -1 1 -1 𝑯𝟏 − 𝑭 + 𝟐𝑷
𝑾𝟐 = +𝟏 -1-1 -1-1 11 𝑯𝟐 = +𝟏 𝑫𝟐 = 𝑲
𝑺 -1 -1 1 𝑺
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 34
Important properties of CNN
▪ Sparse Connectivity
▪ Shared weights
▪ Equivariant representation
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 35
Properties of CNN
1 1 1
1 -1 -1 Filter 1 2 0 -1
-1 1 -1 3 0 -1
-1 -1 1 4 0 -1
3
..
7 0 1
1 0 0 0 0 1 8 1 -1
0 1 0 0 1 0 9 0
0 0 1 1 0 0 10 0 -1
-1
..
1 0 0 0 1 0 1
Fewer parameters!
13 0
0 1 0 0 1 0 Only connect to 9 inputs, not fully
14 0 connected (Sparse Connectivity)
0 0 1 0 1 0
15 1
6 x 6 Image 16 1
..
Source: CS 898: Deep Learning and Its Applications, University of Waterloo, Canada.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 36
Properties of CNN
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 37
Properties of CNN
1 1
1 -1 -1 2 0
-1 1 -1 3 0
-1 -1 1 4 0 3
..
7 0
1 0 0 0 0 1 8 1
0 1 0 0 1 0 9 0
0 0 1 1 0 0 10 0
-1
..
1 0 0 0 1 0
13 0
0 1 0 0 1 0 Even Fewer parameters!
14 0
0 0 1 0 1 0 Fewer parameters!
15 1
6 x 6 Image Shared weights
16 1
..
Source: CS 898: Deep Learning and Its Applications, University of Waterloo, Canada.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 38
Equivariance to translation
▪ A function f is equivariant to a function g if f(g(x)) = g(f(x)) or if the output changes in the same way as the
input.
▪ As the same weights are shared across the images, hence if an object occurs in any image, it will be detected
irrespective of its position in the image.
Source: Translational Invariance Vs Translational Equivariance | by Divyanshu Mishra | Towards Data Science
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 39
CNN vs Fully Connected NN
▪ Shared weights
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 40
Convolutional Neural Network (CNN) : Non-linearity with activation
cat | dog
Convolution +
ReLU
Pooling
Fully Connected
Feedforward network
Convolution+
ReLu
Pooling
#Param.
#Param. ((5*5*16)*120 +
#Param. #Param. 120 = 48120 #Param.
((5*5*6)+1) * 16 = 2416
((5*5*1)+1) * 6 = 156 =0 84*120 + 84=
#Param. 10164
=0 #Param.
84*10 + 10= 850
tanh tanh
sigmoid
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., & others. (1998). Gradient-based learning applied to document recognition.
Proceedings of the IEEE, 86(11), 2278–2324.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 42
LeNet-5 Architecture for handwritten number recognition
Source: http://yann.lecun.com/
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 43
ImageNet Dataset
ZFNet
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 45
AlexNet (2012)
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 46
AlexNet Architecture
#Param. = 0 #Param. = 0
#Param. #Param.
#Param.
((5*5*96)+1) * 256 = 614656 ((3*3*256)+1) * 384 =
((11*11*3)+1) * 96 = 34944
885120
#Param. = 0
#Param.
((3*3*384)+1) * 256 =884992
Total #Param.
#Param. 62M
((3*3*384)+1) * 384 =
1327488
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012).
Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 47
ZFNet Architecture (2013)
Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks.
In European conference on computer vision (pp. 818-833). Springer, Cham.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 48
ZFNet
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 49
ZFNet
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 50
VGGNet Architecture (2014)
• This work reinforced the notion that convolutional neural networks have to have a deep network of layers in order for
this hierarchical representation of visual data to work
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition , International Conference on Learning Representations (ICLR14)
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 51
GoogleNet Architecture (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions.
In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’15)
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 52
GoogleNet Architecture (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions.
In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’15)
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 53
GoogleNet Architecture (2014)
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 54
GoogleNet Architecture (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., & Rabinovich, A. (2015). Going deeper with convolutions.
In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’15)
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 55
ResNet Architecture (2015)
Effect of increasing layers of shallow CNN when experimented over the CIFAR dataset
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 56
ResNet Architecture (2015)
ResNet-34
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition.
In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 58