2 CNN

Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

DSE 3151 DEEP LEARNING

Convolutional Neural Networks

Dr. Rohini Rao & Dr. Abhilash K Pai


Dept. of Data Science and Computer Applications
MIT Manipal
The Convolution Operation - 1D
▪ Convolution is a linear operation on two functions of a real-valued argument, where one function is applied over
the other to yield element-wise dot products.

▪ Example: Consider a discrete signal ‘xt’ which represents the position of a spaceship at time ‘t’
recorded by a laser sensor.

▪ Now, suppose that this sensor is noisy.


x0

▪ To obtain a less noisy measurement we would like to average several measurements.

▪ Considering that, the most recent measurements are more important, we would like to take
a weighted average over ‘xt’. The new estimate at time ‘t’ is computed as follows: x1
convolution

𝑠𝑡 = ෍ 𝑥𝑡−𝑎 𝑤−𝑎 = 𝑥 ∗ 𝑤 𝑡
𝑎=0

input Filter/Mask/Kernel
x2
▪ Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 2
The Convolution Operation - 1D
▪ In practice, we would sum only over a small window.
6

For example: 𝑠𝑡 = ෍ 𝑥𝑡−𝑎 𝑤−𝑎


𝑎=0

▪ We just slide the filter over the input and compute the value of st based on a window around xt

w-6 w-5 w-4 w-3 w-2 w-1 w0


w 0.01 0.01 0.02 0.02 0.04 0.4 0.5

* * * * * * *
x 1.0 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20

s 1.80

Content adapted from : CS7015 Deep Learning, Dept. of CSE, IIT Madras

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 3
The Convolution Operation - 1D
▪ In practice, we would sum only over a small window.
6

For example: 𝑠𝑡 = ෍ 𝑥𝑡−𝑎 𝑤−𝑎


𝑎=0

▪ We just slide the filter over the input and compute the value of st based on a window around xt

w-6 w-5 w-4 w-3 w-2 w-1 w0


w 0.01 0.01 0.02 0.02 0.04 0.4 0.5

* * * * * * *
x 1.0 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20

s 1.80 1.96

Content adapted from : CS7015 Deep Learning, Dept. of CSE, IIT Madras

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 4
The Convolution Operation - 1D
▪ In practice, we would sum only over a small window.
6

For example: 𝑠𝑡 = ෍ 𝑥𝑡−𝑎 𝑤−𝑎


𝑎=0

▪ We just slide the filter over the input and compute the value of st based on a window around xt

w-6 w-5 w-4 w-3 w-2 w-1 w0


w 0.01 0.01 0.02 0.02 0.04 0.4 0.5

* * * * * * *
x 1.0 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20

s 1.80 1.96 2.11

▪ Use cases of 1-D convolution : Audio signal processing, stock market analysis, time series analysis etc.
Content adapted from : CS7015 Deep Learning, Dept. of CSE, IIT Madras

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 5
Convolution in 2-D using Images : What is an Image?

What we see

What a computer sees


Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 6
Convolution in 2-D using Images : What is an Image?

▪ An image can be represented mathematically as a function f(x,y) which gives the intensity value at
position (x,y), where, f(x,y) ε {0,1,….,Imax-1} and x,y ε {0,1,…..,N-1}.

▪ Larger the value of N, more is the clarity of the picture (larger resolution), but more data to be analyzed
in the image.

▪ If the image is a Gray-scale (8-bit per pixel) image, then it requires N2 Bytes for storage.

▪ If the image is color - RGB, each pixel requires 3 Bytes of storage space.

N is the resolution of the image and Imax is the level of discretized brightness value.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 7
Convolution in 2-D using Images : What is an Image?

Digital camera

[Source: D. Hoiem]
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 8
Convolution in 2-D using Images : What is an Image?

▪ Sample the 2-D space on a regular grid.


▪ Quantize each sample, i.e., the photons arriving at each active cell are
integrated and then digitized.

[Source: D. Hoiem]

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 9
Convolution in 2-D using Images : What is an Image?

▪ A grid (matrix) of intensity values.

255 255 255 255 255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255 255 255 255 255
255 255 255 0 0 255 255 255 255 255 255 255
255 255 255 75 75 75 255 255 255 255 255 255
255 255 75 95 95 75 255 255 255 255 255 255
255 255 96 127 145 175 255 255 255 255 255 255
255 255 127 145 175 175 175 255 95 255 255 255
255 255 127 145 200 200 175 175 95 255 255 255
255 255 127 145 145 175 127 127 95 47 255 255
255 255 127 145 145 175 127 127 95 47 255 255
255 255 74 127 127 127 95 95 95 47 255 255
255 255 255 74 74 74 74 74 74 255 255 255
255 255 255 255 255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255 255 255 255 255

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 10
Convolution in 2-D using Images : What is an Image?

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 11
The Convolution Operation - 2D
▪ Images are good examples of 2-D inputs.

▪ A 2-D convolution of an Image ‘I’ using a filter ‘K’ of size ‘m x n’ is now defined as (looking at previous pixels):

𝑚−1 𝑛−1

𝑆𝑖𝑗 = 𝐼 ∗ 𝐾 𝑖𝑗 = ෍ ෍ 𝐼𝑖−𝑎,𝑗−𝑏 𝐾𝑎,𝑏


𝑎=0 𝑏=0

▪ In practice, one of the way is to look at the succeeding pixels:


𝑚−1 𝑛−1

𝑆𝑖𝑗 = 𝐼 ∗ 𝐾 𝑖𝑗 = ෍ ෍ 𝐼𝑖+𝑎,𝑗+𝑏 𝐾𝑎,𝑏


𝑎=0 𝑏=0

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 12
The Convolution Operation - 2D
▪ Another way is to consider center pixel as reference pixel, and then look at its surrounding pixels:
𝑚/2 𝑛/2

𝑆𝑖𝑗 = 𝐼 ∗ 𝐾 𝑖𝑗 = ෍ ෍ 𝐼 𝑖−𝑎,𝑗−𝑏 ∗ 𝐾 𝑚/2 +𝑎, 𝑛/2 +𝑏


𝑎= −𝑚/2 𝑏= −𝑛/2

Pixel of interest

0 1 0 0 1
0 0 1 1 0
1 0 0 0 1
0 1 0 0 1
0 0 1 0 1
Content adapted from : CS7015 Deep Learning, Dept. of CSE, IIT Madras

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 13
The Convolution Operation - 2D

Source: https://developers.google.com/

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 14
The Convolution Operation - 2D

Input Image

Source: https://developers.google.com/

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 15
The Convolution Operation - 2D

Input Image

Source: https://developers.google.com/

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 16
The Convolution Operation - 2D

Smoothening Filter

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 17
The Convolution Operation - 2D

Sharpening Filter

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 18
The Convolution Operation - 2D

Filter for edge


detection

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 19
The Convolution Operation – 2D : Various filters (edge detection)
Prewitt

-1 0 1 1 1 1
-1 0 1 0 0 0
-1 0 1 -1 -1 -1

Sx Sy After applying
Horizontal edge
detection filter
Sobel

-1 0 1 1 2 1
-2 0 2 0 0 0
-1 0 1 -1 -2 -1
Sx Sy Input image After applying
Vertical edge
Laplacian Roberts detection filter

0 1 0 0 1 1 0
1 -4 1 -1 0 0 -1

0 1 0 Sx Sy
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 20
The Convolution Operation - 2D

1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1

1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
Input image: 6 x 6
Note: Stride is the number of “unit” the kernel is shifted per slide over rows/ columns
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 21
The Convolution Operation - 2D

1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
If stride=2

1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
Input image: 6 x 6
Note: Stride is the number of “unit” the kernel is shifted per slide over rows/ columns
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 22
The Convolution Operation - 2D

1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1

1 0 0 0 0 1
3 -1 -3 -1
0 1 0 0 1 0
0 0 1 1 0 0 -3 1 0 -3
1 0 0 0 1 0 4 x 4 Feature Map
0 1 0 0 1 0 -3 -3 0 1
0 0 1 0 1 0
3 -2 -2 -1
Input image: 6 x 6

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 23
The Convolution Operation - 2D
-1 1 -1
-1 1 -1 Filter 2
-1 1 -1
stride=1

1 0 0 0 0 1 Repeat for each filter!


3 -1 -3 -1
0 1 0 0 1 0 -1 -1 -1 -1
0 0 1 1 0 0 -3 1 0 -3
-1 -1 -2 1
1 0 0 0 1 0 Feature
0 1 0 0 1 0 -3 -3 Map0 1
-1 -1 -2 1
0 0 1 0 1 0 Two 4 x 4 images
3 -2 -2 -1 Forming 4 x 4 x 2 matrix
Input image: 6 x 6 -1 0 -4 3

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 24
The Convolution Operation –RGB Images

R G B

Apply the filter to R, G, and B channels of


the image and combine the resultant
feature maps to obtain a 2-D feature map.

Source: Intuitively Understanding Convolutions for Deep Learning | by Irhum Shafkat | Towards Data Science

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 25
The Convolution Operation –RGB Images multiple filters
11 -1-1 -1-1 -1-1 11 -1-1 -1-1 11 -1-1
1 -1 -1 -1 0 -1 -1 1 -1
-1 1 -1 -1-1 11 -1-1 -1-1 11 -1-1
-1-1 11 -1-1 Filter 1 -1 0 -1 Filter 2
0 0 0 Filter K
-1-1 -1-1 11 -1-1 11 -1-1 -1-1 11 -1-1
-1 -1 1 -1 0 -1 -1 1 -1

1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0 K-filters = K-Feature Maps
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0 Depth of feature map = No. of feature maps = No. of filters

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 26
The Convolution Operation : Terminologies

1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0 -1-1 11 -1-1
0 0 1 1 0 0 -1 1 -1
1 00 00 10 11 00 0 -1-1 11 -1-1
1 0 0 0 1 0 0 0 0 Filter
0 11 00 00 01 10 0 -1-1 11 -1-1
0 1 0 0 1 0 -1 1 -1
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0

1. Depth of an Input Image = No. of channels in the Input Image = Depth of a filter

2. Assuming square filters, Spatial Extent (F) of a filter is the size of the filter

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 27
The Convolution Operation : Zero Padding

conv3x3
2x2
4x4

Pad Zeros and then convolve to obtain a


feature map with dimension = input image dimension

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 28
The Convolution Operation : Zero Padding

Feature map size: 5x5

Input image size: 5x5

Source: Intuitively Understanding Convolutions for Deep Learning | by Irhum Shafkat | Towards Data Science

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 29
Convolutional Neural Network (CNN) : At a glance

cat | dog
Convolution

Pooling
Can repeat Fully Connected
many times Feedforward network
Convolution

Pooling

Source: CS 898: Deep Learning and Its Applications, University of Flattened


Waterloo, Canada.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 30
Pooling

1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1
• Max Pooling

3 -1 -3 -1 -1 -1 -1 -1 • Average Pooling

-3 1 0 -3 -1 -1 -2 1

-3 -3 0 1 -1 -1 -2 1

3 -2 -2 -1 -1 0 -4 3
Source: CS 898: Deep Learning and Its Applications, University of Waterloo, Canada.

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 31
Pooling

Max. Pooling Average Pooling

Stride ?

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 32
Why Pooling ?

▪ Subsampling pixels will not change the object

bird
bird

Subsampling

▪ We can subsample the pixels to make image smaller

▪ Therefore, fewer parameters to characterize the image


Source: CS 898: Deep Learning and Its Applications, University of Waterloo, Canada.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 33
Relation between i/p size, feature map size, filter size
Input Image
-1-1 11 -1-1
-1 0 -1 Stride length = S
1 0 0 0 0 1 -1-1 11 -1-1 No. of Filters = K
1 0 0 0 0 1 -1 0 -1 Padding = P
0 11 00 00 01 00 1 -1-1 11 -1-1
0 1 0 0 1 0 -1 0 -1 Filter
0 00 11 01 00 10 0
0 0 1 1 0 0 -1-1 11 -1-1
1 00 00 10 11 00 0 H1 -1 1 -1
1 0 0 0 1 0 -1-1 11 -1-1
0 11 00 00 01 10 0 0 0 0 F H2
0 1 0 0 1 0 -1-1 11 -1-1
0 00 11 00 01 10 0 -1 1 -1
0 0 1 0 1 0
D1
0 0 1 0 1 0 D1
D2
F

1 -1 -1
W1
11 -1-1 -1-1 W2

-1-1 11 -1-1
𝑾𝟏 − 𝑭 + 𝟐𝑷 -1 1 -1 𝑯𝟏 − 𝑭 + 𝟐𝑷
𝑾𝟐 = +𝟏 -1-1 -1-1 11 𝑯𝟐 = +𝟏 𝑫𝟐 = 𝑲
𝑺 -1 -1 1 𝑺

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 34
Important properties of CNN

▪ Sparse Connectivity

▪ Shared weights

▪ Equivariant representation

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 35
Properties of CNN
1 1 1
1 -1 -1 Filter 1 2 0 -1

-1 1 -1 3 0 -1

-1 -1 1 4 0 -1
3

..
7 0 1
1 0 0 0 0 1 8 1 -1

0 1 0 0 1 0 9 0
0 0 1 1 0 0 10 0 -1
-1

..
1 0 0 0 1 0 1
Fewer parameters!
13 0
0 1 0 0 1 0 Only connect to 9 inputs, not fully
14 0 connected (Sparse Connectivity)
0 0 1 0 1 0
15 1
6 x 6 Image 16 1

..
Source: CS 898: Deep Learning and Its Applications, University of Waterloo, Canada.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 36
Properties of CNN

Is sparse connectivity good?

Ian Goodellow et al. 2016

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 37
Properties of CNN
1 1
1 -1 -1 2 0
-1 1 -1 3 0
-1 -1 1 4 0 3

..
7 0
1 0 0 0 0 1 8 1
0 1 0 0 1 0 9 0
0 0 1 1 0 0 10 0
-1

..
1 0 0 0 1 0
13 0
0 1 0 0 1 0 Even Fewer parameters!
14 0
0 0 1 0 1 0 Fewer parameters!
15 1
6 x 6 Image Shared weights
16 1

..
Source: CS 898: Deep Learning and Its Applications, University of Waterloo, Canada.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 38
Equivariance to translation

▪ A function f is equivariant to a function g if f(g(x)) = g(f(x)) or if the output changes in the same way as the
input.

▪ This is achieved by the concept of weight sharing.

▪ As the same weights are shared across the images, hence if an object occurs in any image, it will be detected
irrespective of its position in the image.

Source: Translational Invariance Vs Translational Equivariance | by Divyanshu Mishra | Towards Data Science

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 39
CNN vs Fully Connected NN

▪ A CNN compresses the fully connected NN in two ways:

▪ Reducing the number of connections

▪ Shared weights

▪ Max pooling further reduces the parameters to characterize an image.

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 40
Convolutional Neural Network (CNN) : Non-linearity with activation

cat | dog
Convolution +
ReLU

Pooling
Fully Connected
Feedforward network
Convolution+
ReLu

Pooling

Source: CS 898: Deep Learning and Its Applications, University of Flattened


Waterloo, Canada.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 41
LeNet-5 Architecture for handwritten text recognition

#Param.
#Param. ((5*5*16)*120 +
#Param. #Param. 120 = 48120 #Param.
((5*5*6)+1) * 16 = 2416
((5*5*1)+1) * 6 = 156 =0 84*120 + 84=
#Param. 10164
=0 #Param.
84*10 + 10= 850

tanh tanh

sigmoid

S =1, F=5, S =2, F=2, S =1, F=5, S =2, F=2,


K=6, P=2 K=6, P=0 K=16, P=0 K=16, P=0

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., & others. (1998). Gradient-based learning applied to document recognition.
Proceedings of the IEEE, 86(11), 2278–2324.

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 42
LeNet-5 Architecture for handwritten number recognition

Source: http://yann.lecun.com/

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 43
ImageNet Dataset

More than 14 million images. 22,000 Image categories

Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database."


IEEE conference on computer vision and pattern recognition. IEEE, 2009.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 44
ImageNet Large Scale Visual Recognition Challenge
• 1000 ImageNet Categories

ZFNet

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 45
AlexNet (2012)

▪ Used ReLU activation function instead of


sigmoid and tanh.

▪ Used data augmentation techniques


that consisted of image translations,
horizontal reflections, and patch
extractions.

▪ Implemented dropout layers.

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 46
AlexNet Architecture

#Param. = 0 #Param. = 0
#Param. #Param.
#Param.
((5*5*96)+1) * 256 = 614656 ((3*3*256)+1) * 384 =
((11*11*3)+1) * 96 = 34944
885120

#Param. = 0
#Param.
((3*3*384)+1) * 256 =884992
Total #Param.
#Param. 62M
((3*3*384)+1) * 384 =
1327488
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012).
Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 47
ZFNet Architecture (2013)

• Used filters of size 7x7 instead of 11x11 in AlexNet

• Used Deconvnet to visualize the intermediate results.

Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks.
In European conference on computer vision (pp. 818-833). Springer, Cham.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 48
ZFNet

Visualizing and Understanding Deep Neural Networks by Matt Zeiler - YouTube

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 49
ZFNet

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 50
VGGNet Architecture (2014)

Image Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

• Used filters of size 3x3 in all the convolution layers.

• 3 conv layers back-to-back have an effective receptive field of 7x7.

• Also called VGG-16 as it has 16 layers.

• This work reinforced the notion that convolutional neural networks have to have a deep network of layers in order for
this hierarchical representation of visual data to work

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition , International Conference on Learning Representations (ICLR14)

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 51
GoogleNet Architecture (2014)

• Most of the architectures discussed till now


apply either of the following after each
convolution operation:
• Max Pooling
• 3x3 convolution
• 5x5 convolution

• Idea: Why cant we apply them all together


at the same time and concatenate the
feature maps.

• Problem: This will result in large number of


computations.

• Specifically, each element of the output


required O(FxFxD) computations
Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions.
In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’15)

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 52
GoogleNet Architecture (2014)

• Solution: Apply 1x1 convolutions

• 1x1 convolution aggregates along the depth.

• So, if we apply D1 1x1 convolutions (D1<D), we


will get an output of size W x H x D1

• So, the total number of computations will reduce to


O(FxFxD1)

• We could then apply subsequent 3x3, 5 x5 filters on


this reduced output

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions.
In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’15)

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 53
GoogleNet Architecture (2014)

• Also, we might want to use different


dimensionality reductions (applying 1x1
convolutions of different sizes) before the
3x3 and 5x5 filters.

• We can also add the maxpooling layer


followed by 1x1 convolution.

• After this, we concatenate all these layers.

• This is called the Inception module.

• GoogleNet contains many such inception


The Inception module modules.

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras


Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions.
In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’15)

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 54
GoogleNet Architecture (2014)

Global average pooling

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

• 12 times less parameters and 2 times more


computations than AlexNet

• Used Global Average Pooling instead of


Flattening.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., & Rabinovich, A. (2015). Going deeper with convolutions.
In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’15)

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 55
ResNet Architecture (2015)

Effect of increasing layers of shallow CNN when experimented over the CIFAR dataset

Source: Residual Networks (ResNet) - Deep Learning - GeeksforGeeks


Shallow CNN +
Shallow CNN Additional layers
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition.
In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 56
ResNet Architecture (2015)

ResNet-34

Source: Residual Networks (ResNet) - Deep Learning - GeeksforGeeks


He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition.
In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 57
ResNet Architecture (2015)

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition.
In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 58

You might also like