Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

CS490 ̶ Advanced Topics in Computing

(Deep Learning)

Lecture 16: Convolutional Neural Networks (CNNs)

Dr. Muhammad Shahzad

[email protected]

Department Of Computing (DOC),

School of Electrical Engineering & Computer Science (SEECS),
National University of Sciences & Technology (NUST)

Fully Connected Layer

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 2
Motivation: Deep Learning on Images
How many entries does
the weight matrix 𝑤 1
has assuming that the
12288-dimensional first hidden layer have
input vector 1000 units?
64 x 64 x 3 3 Billion!
Shape of 𝒘𝟏 is 1000 x 3M

i.e., adding 1000 biases,

we need to train more
1000 x 1000 x 3 3 Million-dimensional input than 3 Billion parameters
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 3
Convolutional Neural Networks

▪ Similar to regular Neural Networks except that they make the

explicit assumption that the inputs are images, which allows us to
encode certain properties into the architecture
▪ These then make the forward function more efficient to implement
and vastly reduce the amount of parameters in the network, e.g.,
using local receptive field and parameter sharing scheme

A ConvNet is made up of Layers

Every Layer has a simple API: It transforms an input 3D volume to an output 3D
volume with some differentiable function that may or may not have parameteras
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 4
Layers used to build ConvNets

▪ A ConvNet architecture is in the simplest case a list of Layers that

transform the image volume into an output volume (e.g. holding
the class scores)
▪ Three main types of layers that are stacked to build ConvNet
► Convolutional Layer

► Pooling Layer

► Fully-Connected Layer (exactly as seen in regular Neural

▪ Each Layer accepts an input 3D volume and transforms it to an
output 3D volume through a differentiable function
▪ Each Layer may or may not have parameters (e.g. CONV/FC do,
RELU/POOL don’t)
▪ Each Layer may or may not have additional hyperparameters (e.g.
CONV/FC/POOL do, RELU doesn’t)
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 5
How does Convolution work?

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 6
Edge Detection Via Convolution Operation

3x1 + 1x1 + 2x1 + 0x0 + 5x0 + 7x0 + 1x(-1) + 8x(-1) + 2x(-1) = -5

1 0 -1
1 0 -1
1 0 -1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 7
Edge Detection Via Convolution Operation

0x1 + 5x1 + 7x1 + 1x0 + 8x0 + 2x0 + 2x(-1) + 9x(-1) + 5x(-1) = -4

-5 -4
1 0 -1
1 0 -1
1 0 -1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 8
Edge Detection Via Convolution Operation

1x1 + 8x1 + 2x1 + 2x0 + 9x0 + 5x0 + 7x(-1) + 3x(-1) + 1x(-1) = 0

-5 -4 0
1 0 -1
1 0 -1
1 0 -1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 9
Edge Detection Via Convolution Operation

1x1 + 6x1 + 2x1 + 7x0 + 2x0 + 3x0 + 8x(-1) + 8x(-1) + 9x(-1) = -16

-5 -4 0 8
1 0 -1
-10 -2 2 3
1 0 -1
0 -2 -4 -7
1 0 -1
-3 -2 -3 -16

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 10
How does Convolution work?

▪ Convolution of the image with a filter (also called kernel,

window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Image convolved with a filter with positive and equal
coefficients results in smoothed output image

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 11
How does Convolution work?

▪ Convolution of the image with a filter (also called kernel,

window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Image convolved with a filter with positive and equal
coefficients results in smoothed output image

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 12
How does Convolution work?

▪ Convolution of the image with a filter (also called kernel,

window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Image convolved with a filter with positive and equal
coefficients results in smoothed output image

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 13
How does Convolution work?

▪ Convolution of the image with a filter (also called kernel,

window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Similarly we can also compute image derivatives to compute
edges in the input image

Any idea what could be the filter coefficients?

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 14
Edge Detection Via Convolution Operation

The natural derivative operator can be defined as the

difference between the intensity of neighbouring pixels

= f ( x + 1) − f ( x)

z1 z2 z3
z4 z5 z6
z7 z8 z9
z5 = -1 z6 = -1
z8 = 1 z9 = 1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 15
Edge Detection Via Convolution Operation

Vertical edges

Horizontal edges
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 16
Edge Detection Via Convolution Operation

10x1 + 10x1 + 10x1 + 0x0 + 0x0 + 0x0 + 0x(-1) + 0x(-1) + 0x(-1) = 30

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 17
Edge Detection Via Convolution Operation

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 18
Learning To Detect Edges

3 -3
2 -2 10 -10

3 -3
Prewitt Sobel Schar

With the rise of deep

learning, it is possible to
automatically learn these
filter coefficients more
robustly via backpropagation
for a specific task e.g., edge

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 19
Edge Detection Via Convolution Operation

Vertical edges

Horizontal edges
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 20
Spatial Dimensions: A Closer Look

7x7 input
3x3 filter
applied with
stride 1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 21
Spatial Dimensions: A Closer Look

7x7 input
3x3 filter
applied with
stride 1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 22
Spatial Dimensions: A Closer Look

7x7 input
3x3 filter
applied with
stride 1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 23
Spatial Dimensions: A Closer Look

7x7 input
3x3 filter
applied with
stride 1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 24
Spatial Dimensions: A Closer Look

5x5 output
7x7 input
3x3 filter
applied with
stride 1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 25
Spatial Dimensions: A Closer Look

7x7 input
3x3 filter
applied with
stride 2

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 26
Spatial Dimensions: A Closer Look

7x7 input
3x3 filter
applied with
stride 2

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 27
Spatial Dimensions: A Closer Look

3x3 output
7x7 input
3x3 filter
applied with
stride 2

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 28
Spatial Dimensions: A Closer Look

Doesn’t fit!

7 Cannot apply
3x3 filter on
7x7 input 7x7 input with
(spatially) stride 3
3x3 filter
applied with
stride 3

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 29
Spatial Dimensions: A Closer Look

Output size?

(N - F) / stride + 1

E.g., with N = 7, F = 3:
stride 1 => (7 - 3)/1 + 1 = 5
stride 2 => (7 - 3)/2 + 1 = 3
stride 3 => (7 - 3)/3 + 1 = 2.33

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 30
Common Practice: Zero Padding At Borders

(N+2P-F)/stride + 1
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 31
Valid vs Same Convolutions

(N+2P-F)/stride + 1
▪ Valid convolution: The spatial dimensions of the resulting image
after convolution shrinks

▪ Same convolution: The spatial dimensions of the resulting image

after the convolution stays the same
► Acheived via zero-padding

(N+2P-F)/S + 1 = N

For S=1,
N+2P-F + 1 = N
=> P = (F-1)/2

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 32
Convolution Layer

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 33
Convolution Over Volumes

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 34
Convolution Over Volumes

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 35
Convolution Over Volumes

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 36
Convolution Over Volumes

6x6x3 3x3x3 4x4

Note we have now 27 learnable coefficients

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 37
Convolution Over Volumes

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 38
Convolution Over Volumes

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 39
Convolution Over Volumes

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 40
Convolutional Layer: Neuron View

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 41
Receptive Field

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 42
Convolutional Layer: Neuron View

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 43
Single Convolutional Layer

with 6 5x5x3 filters

(75x6 entries)

𝑎0 𝑎1
𝑧 1 = 𝑤 1 𝑎 0 + 𝑏1
𝑎1 = 𝑔(𝑧1 )

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 44

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 45

Flatten the last volume, e.g., 24 x 24 x 10 volume into 5760-d vector of

neurons and feed them to Fully Connected (FC) layer followed by a softmax
unit for prediction
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 46

Input volume: 32x32x3

10 5x5x3 filters with stride 1, pad 2

Output volume size?

(32+2*2-5)/1+1 = 32 spatially, so

Number of parameters in this layer?

each filter has 5*5*3 + 1 = 76 params (+1 for bias)
=> 76*10 = 760

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 47
ConvNet Dimensions

Common settings:
K = (powers of 2, e.g. 32, 64, 128, 512)
- F = 3, S = 1, P = 1
- F = 5, S = 1, P = 2
- F = 5, S = 2, P = ? (whatever fits)
- F = 1, S = 1, P = 0

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 48
1x1 convolution


CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 49
1x1 convolution


CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 50
1x1 convolution layer

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 51
Pooling Layer

▪ Makes the representations smaller and more manageable

▪ Operates over each activation map independently

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 52
MAX Pooling

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 53

3.75 1.25

4 2

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 54

What would be the

results of appliying
Max-POOl using

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 55

9 9 5

9 9 5

8 6 9

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 56
Pooling Dimensions

Common settings:
F = 2, S = 2
F = 3, S = 2

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 57
Example: ConvNets

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 58
Summary of Typical ConvNet Design

▪ ConvNets stack CONV,POOL,FC layers

▪ Trend towards smaller filters and deeper architectures
▪ Trend towards getting rid of POOL/FC layers (just CONV)
▪ Historically architectures looked like


where N is usually up to ~5, M is large, 0 <= K <= 2

▪ However, recent advances such as ResNet/GoogLeNet have

challenged this paradigm

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 59
CNNs vs FC Neural Networks

Two major advantages of CNNs over FC neural networks

▪ Parameter sharing
► A feature detector (such as vertical edge detector) that is
useful in one part of the image is probably useful in another
part of the image (translational invariance)
For a regular neural network with 32 x 32 x 3(= 3072) convolved
dense connections, this means you with 6 filters 5 x 5 x 3 resulting
have 3072 x 4704 ≈ 14 Million weights in 28 x 28 x 6 volume (= 4704)
How many parameters do we
need for Conv layer?

32 x 32 x 3 28 x 28 x 6
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 60
CNNs vs FC Neural Networks

Two major advantages of CNNs over FC neural networks

▪ Parameter sharing
► A feature detector (such as vertical edge detector) that is
useful in one part of the image is probably useful in another
part of the image (translational invariance)
32 x 32 x 3(= 3072) convolved
(75 + 1) x 6 = 456 only with 6 filters 5 x 5 x 3 resulting
in 28 x 28 x 6 volume (= 4704)
How many parameters do we
need for Conv layer?

32 x 32 x 3 28 x 28 x 6
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 61
CNNs vs FC Neural Networks

Two major advantages of CNNs over FC neural networks

▪ Parameter sharing
► A feature detector (such as vertical edge detector) that is
useful in one part of the image is probably useful in another
part of the image (translational invariance)
▪ Sparsity of connections (i.e., Local receptive field)
► In each layer, each output value depends only on a small
number of inputs

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 62

Various contents in this presentation have been taken from

different books, lecture notes (particularly CS231n Stanford, MIT
6.S191, deeplearning.ai & neuralnetworksanddeeplearning.com),
and the web. These solely belong to their owners and are here used
only for clarifying various educational concepts. Any copyright
infringement is not intended.

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 63

You might also like