Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)

CS490 ̶ Advanced Topics in Computing
(Deep Learning)
Lecture 16: Convolutional Neural Networks (CNNs)
Dr. Muhammad Shahzad

[email protected]
Department Of Computing (DOC),

School of Electrical Engineering & Computer Science (SEECS),
National University of Sciences & Technology (NUST)
12/04/2021
Fully Connected Layer
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 2
Motivation: Deep Learning on Images
How many entries does
the weight matrix 𝑤 1
has assuming that the
12288-dimensional first hidden layer have
input vector 1000 units?
64 x 64 x 3 3 Billion!
Shape of 𝒘𝟏 is 1000 x 3M
i.e., adding 1000 biases,

we need to train more
1000 x 1000 x 3 3 Million-dimensional input than 3 Billion parameters
Convolutional Neural Networks
▪ Similar to regular Neural Networks except that they make the

explicit assumption that the inputs are images, which allows us to
encode certain properties into the architecture
▪ These then make the forward function more efficient to implement
and vastly reduce the amount of parameters in the network, e.g.,
using local receptive field and parameter sharing scheme
A ConvNet is made up of Layers

Every Layer has a simple API: It transforms an input 3D volume to an output 3D
volume with some differentiable function that may or may not have parameteras
Layers used to build ConvNets
▪ A ConvNet architecture is in the simplest case a list of Layers that

transform the image volume into an output volume (e.g. holding
the class scores)
▪ Three main types of layers that are stacked to build ConvNet
architectures:
► Convolutional Layer
► Pooling Layer
► Fully-Connected Layer (exactly as seen in regular Neural

Networks)
▪ Each Layer accepts an input 3D volume and transforms it to an
output 3D volume through a differentiable function
▪ Each Layer may or may not have parameters (e.g. CONV/FC do,
RELU/POOL don’t)
▪ Each Layer may or may not have additional hyperparameters (e.g.
CONV/FC/POOL do, RELU doesn’t)
How does Convolution work?
Edge Detection Via Convolution Operation
3x1 + 1x1 + 2x1 + 0x0 + 5x0 + 7x0 + 1x(-1) + 8x(-1) + 2x(-1) = -5
-5
1 0 -1
1 0 -1
1 0 -1
0x1 + 5x1 + 7x1 + 1x0 + 8x0 + 2x0 + 2x(-1) + 9x(-1) + 5x(-1) = -4
-5 -4
1 0 -1
1 0 -1
1 0 -1
1x1 + 8x1 + 2x1 + 2x0 + 9x0 + 5x0 + 7x(-1) + 3x(-1) + 1x(-1) = 0
-5 -4 0
1 0 -1
1 0 -1
1 0 -1
1x1 + 6x1 + 2x1 + 7x0 + 2x0 + 3x0 + 8x(-1) + 8x(-1) + 9x(-1) = -16
-5 -4 0 8
1 0 -1
-10 -2 2 3
1 0 -1
0 -2 -4 -7
1 0 -1
-3 -2 -3 -16
▪ Convolution of the image with a filter (also called kernel,

window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Image convolved with a filter with positive and equal
coefficients results in smoothed output image



► Similarly we can also compute image derivatives to compute
edges in the input image
Any idea what could be the filter coefficients?

The natural derivative operator can be defined as the

difference between the intensity of neighbouring pixels
f
= f ( x + 1) − f ( x)
x
z1 z2 z3
z4 z5 z6
z7 z8 z9
z5 = -1 z6 = -1
z8 = 1 z9 = 1
Vertical edges
Horizontal edges
10x1 + 10x1 + 10x1 + 0x0 + 0x0 + 0x0 + 0x(-1) + 0x(-1) + 0x(-1) = 30
Learning To Detect Edges
3 -3
2 -2 10 -10
3 -3
Prewitt Sobel Schar
With the rise of deep

learning, it is possible to
automatically learn these
filter coefficients more
robustly via backpropagation
for a specific task e.g., edge
detection
Vertical edges
Horizontal edges
Spatial Dimensions: A Closer Look
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7
Output
dimension?
7
5x5 output
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 2
7
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 2
7
Output
dimension?
7
3x3 output
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 2
7
Doesn’t fit!
7 Cannot apply
3x3 filter on
7x7 input 7x7 input with
(spatially) stride 3
assume
3x3 filter
applied with
stride 3
7
Output size?
(N - F) / stride + 1
E.g., with N = 7, F = 3:
stride 1 => (7 - 3)/1 + 1 = 5
stride 2 => (7 - 3)/2 + 1 = 3
stride 3 => (7 - 3)/3 + 1 = 2.33
Common Practice: Zero Padding At Borders
(N+2P-F)/stride + 1
Valid vs Same Convolutions
(N+2P-F)/stride + 1
▪ Valid convolution: The spatial dimensions of the resulting image
after convolution shrinks
▪ Same convolution: The spatial dimensions of the resulting image

after the convolution stays the same
► Acheived via zero-padding
(N+2P-F)/S + 1 = N
For S=1,
N+2P-F + 1 = N
=> P = (F-1)/2
Convolution Layer
Convolution Over Volumes
6x6x3 3x3x3 4x4
Note we have now 27 learnable coefficients
Convolutional Layer: Neuron View
Receptive Field
Convolutional Layer: Neuron View
Single Convolutional Layer
with 6 5x5x3 filters

𝑤1
(75x6 entries)
𝑎0 𝑎1
𝑧 1 = 𝑤 1 𝑎 0 + 𝑏1
𝑎1 = 𝑔(𝑧1 )
ConvNets
ConvNets
Flatten the last volume, e.g., 24 x 24 x 10 volume into 5760-d vector of

neurons and feed them to Fully Connected (FC) layer followed by a softmax
unit for prediction
Example
Input volume: 32x32x3

10 5x5x3 filters with stride 1, pad 2
Output volume size?
(32+2*2-5)/1+1 = 32 spatially, so
32x32x10
Number of parameters in this layer?

each filter has 5*5*3 + 1 = 76 params (+1 for bias)
=> 76*10 = 760
ConvNet Dimensions
Common settings:
K = (powers of 2, e.g. 32, 64, 128, 512)
- F = 3, S = 1, P = 1
- F = 5, S = 1, P = 2
- F = 5, S = 2, P = ? (whatever fits)
- F = 1, S = 1, P = 0
1x1 convolution
2x
1x1 convolution
2x
1x1 convolution layer
Pooling Layer
▪ Makes the representations smaller and more manageable

▪ Operates over each activation map independently
MAX Pooling
Average-Pooling
3.75 1.25
4 2
MAX-Pooling
What would be the

results of appliying
Max-POOl using
F=3&S=1?
MAX-Pooling
9 9 5
9 9 5
8 6 9
Pooling Dimensions
Common settings:
F = 2, S = 2
F = 3, S = 2
Example: ConvNets
Summary of Typical ConvNet Design
▪ ConvNets stack CONV,POOL,FC layers

▪ Trend towards smaller filters and deeper architectures
▪ Trend towards getting rid of POOL/FC layers (just CONV)
▪ Historically architectures looked like
[(CONV-RELU)*N-POOL?]*M - (FC-RELU)*K, SOFTMAX

where N is usually up to ~5, M is large, 0 <= K <= 2
▪ However, recent advances such as ResNet/GoogLeNet have

challenged this paradigm
CNNs vs FC Neural Networks
Two major advantages of CNNs over FC neural networks

▪ Parameter sharing
► A feature detector (such as vertical edge detector) that is
useful in one part of the image is probably useful in another
part of the image (translational invariance)
For a regular neural network with 32 x 32 x 3(= 3072) convolved
dense connections, this means you with 6 filters 5 x 5 x 3 resulting
have 3072 x 4704 ≈ 14 Million weights in 28 x 28 x 6 volume (= 4704)
How many parameters do we
need for Conv layer?
32 x 32 x 3 28 x 28 x 6

32 x 32 x 3(= 3072) convolved
(75 + 1) x 6 = 456 only with 6 filters 5 x 5 x 3 resulting
in 28 x 28 x 6 volume (= 4704)
How many parameters do we
need for Conv layer?
32 x 32 x 3 28 x 28 x 6

▪ Sparsity of connections (i.e., Local receptive field)
► In each layer, each output value depends only on a small
number of inputs
Acknowledgements
Various contents in this presentation have been taken from

different books, lecture notes (particularly CS231n Stanford, MIT
6.S191, deeplearning.ai & neuralnetworksanddeeplearning.com),
and the web. These solely belong to their owners and are here used
only for clarifying various educational concepts. Any copyright
infringement is not intended.

Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)

Uploaded by

Copyright:

Available Formats

Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)

Uploaded by

Copyright:

Available Formats

CS490 ̶ Advanced Topics in Computing

Lecture 16: Convolutional Neural Networks (CNNs)

Dr. Muhammad Shahzad

Department Of Computing (DOC),

i.e., adding 1000 biases,

▪ Similar to regular Neural Networks except that they make the

A ConvNet is made up of Layers

▪ A ConvNet architecture is in the simplest case a list of Layers that

► Fully-Connected Layer (exactly as seen in regular Neural

3x1 + 1x1 + 2x1 + 0x0 + 5x0 + 7x0 + 1x(-1) + 8x(-1) + 2x(-1) = -5

0x1 + 5x1 + 7x1 + 1x0 + 8x0 + 2x0 + 2x(-1) + 9x(-1) + 5x(-1) = -4

1x1 + 8x1 + 2x1 + 2x0 + 9x0 + 5x0 + 7x(-1) + 3x(-1) + 1x(-1) = 0

▪ Convolution of the image with a filter (also called kernel,

▪ Convolution of the image with a filter (also called kernel,

▪ Convolution of the image with a filter (also called kernel,

▪ Convolution of the image with a filter (also called kernel,

Any idea what could be the filter coefficients?

The natural derivative operator can be defined as the

10x1 + 10x1 + 10x1 + 0x0 + 0x0 + 0x0 + 0x(-1) + 0x(-1) + 0x(-1) = 30

With the rise of deep

▪ Same convolution: The spatial dimensions of the resulting image

6x6x3 3x3x3 4x4

Note we have now 27 learnable coefficients

with 6 5x5x3 filters

Flatten the last volume, e.g., 24 x 24 x 10 volume into 5760-d vector of

Input volume: 32x32x3

Output volume size?

Number of parameters in this layer?

▪ Makes the representations smaller and more manageable

What would be the

▪ ConvNets stack CONV,POOL,FC layers

[(CONV-RELU)*N-POOL?]*M - (FC-RELU)*K, SOFTMAX

▪ However, recent advances such as ResNet/GoogLeNet have

Two major advantages of CNNs over FC neural networks

Two major advantages of CNNs over FC neural networks

Two major advantages of CNNs over FC neural networks

Various contents in this presentation have been taken from

You might also like

[(CONV-RELU)N-POOL?]M - (FC-RELU)*K, SOFTMAX