CS60010: Deep Learning CNN - Part 1: Sudeshna Sarkar

CS60010: Deep Learning
CNN – Part 1
Sudeshna Sarkar
Spring 2019
1 Feb 2019
LeNet-5 (LeCun, 1998)
The original Convolutional Neural Network model goes back to

1989 (LeCun)
Lecture 7 Convolutional Neural Networks

CMSC 35246
Fully Connected Layer
Example: 200x200 image
40K hidden units
~2B parameters!!!
- Spatial correlation is local

- Waste of resources + we have not enough
training samples anyway..
Locally Connected Layer

40K hidden units
Filter size: 10x10
4M parameters
Note: This parameterization is good when

input image is registered (e.g., face recognition).
4
Locally Connected Layer
STATIONARITY? Statistics is similar at
different locations

40K hidden units
Filter size: 10x10
4M parameters
Convolutional Layer
Share the same parameters across

different
locations (assuming input is
stationary):
Convolutions with learned kernels
Convolution
Kernel
w7 w8 w9
w4 w5 w6
w1 w2 w3
Feature Map
Grayscale Image
Convolve image with kernel having weights w (learned by

backpropagation)

CMSC 35246
Convolution
wT x

CMSC 35246
Convolution

CMSC 35246
Convolution
wT x

CMSC 35246
Convolution

CMSC 35246
Convolution
wT x

CMSC 35246
Convolution

CMSC 35246
Convolution
wT x

CMSC 35246
Convolution

CMSC 35246
Convolution
wT x

CMSC 35246
Convolution

CMSC 35246
Convolution

CMSC 35246
Convolution
wT x
What is the number of parameters?
Lecture 7 Convolutional Neural Networks CMSC 35246

Learn Multiple Filters

Convolutional Layer
Learn multiple filters.
E.g.: 200x200 image

100 Filters
Filter size: 10x10
10K parameters
Ranzato
21
Output Size
We used stride of 1, kernel with receptive field of size 3 by 3
Output size:
N −K
+1
S
In previous example: N = 6, K = 3, S = 1, Output size = 4

For N = 8, K = 3, S = 1, output size is 6

CMSC 35246
before:
output layer
input
layer hidden layer
now:
Fei-Fei Li & Andrej Karpathy Lecture 7 - 23 21 Jan 2015

Convolution
32x32x3 image
32 height
32 width
3 depth
Convolution Layer
32x32x3 image
5x5x3 filter
32
Convolve the filter with the image

i.e. “slide over the image spatially,
computing dot products”
32
Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

Convolution Layer
Filters always extend the full
32x32x3 image depth of the input volume
5x5x3 filter
32
Convolve the filter with the image

i.e. “slide over the image spatially,
computing dot products”
32

Convolution Layer
32x32x3 image
5x5x3 filter
32
1 number:
the result of taking a dot product between the
filter and a small 5x5x3 chunk of the image
(i.e. 5*5*3 = 75-dimensional dot product + bias)
32

Convolution Layer
activation map
32x32x3 image
5x5x3 filter
32
28
convolve (slide) over all

spatial locations
32 28
3 1

Convolution Layer
consider a second, green filter
32x32x3 image activation maps
5x5x3 filter
32
28

spatial locations
32 28
3 1

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
activation maps
32
28
Convolution Layer
32 28
3 6
We stack these up to get a “new image” of size 28x28x6!

Preview: ConvNet is a sequence of Convolution Layers, interspersed with
activation functions
32 28
CONV,
ReLU
e.g. 6
5x5x3
filters
32 28
3 6

Preview: ConvNet is a sequence of Convolutional Layers, interspersed with
activation functions
32 28 24
….
CONV, CONV, CONV,
ReLU ReLU ReLU
e.g. 6 e.g. 10
5x5x3 5x5x6
filters filters
32 28 24
3 6 10

Preview [From Yann LeCun
slides]

Preview [From recent Yann
LeCun slides]

convolving the first filter in the input gives
the first slice of depth in output volume
A closer look at spatial dimensions:
activation map
32x32x3 image
5x5x3 filter
32
28

spatial locations
32 28
3 1

7x7 input (spatially)

assume 3x3 filter


assume 3x3 filter


assume 3x3 filter


assume 3x3 filter


assume 3x3 filter
=> 5x5 output

7
assume 3x3 filter
applied with stride 2

7
assume 3x3 filter

7
assume 3x3 filter
=> 3x3 output!

7
assume 3x3 filter
applied with stride 3?

7
assume 3x3 filter
applied with stride 3?
7 doesn’t fit!
cannot apply 3x3 filter on
7x7 input with stride 3.

N
Output size:
(N - F) / stride + 1
F
e.g. N = 7, F = 3:
stride 1 => (7 - 3)/1 + 1 = 5
N
F stride 2 => (7 - 3)/2 + 1 = 3
stride 3 => (7 - 3)/3 + 1 = 2.33 :\

In practice: Common to zero pad the border
0 0 0 0 0 0
0 e.g. input 7x7
3x3 filter, applied with stride 1
0 pad with 1 pixel border => what is the output?
0
0
(recall:)
(N - F) / stride + 1

0 0 0 0 0 0
e.g. input 7x7
0 3x3 filter, applied with stride 1
pad with 1 pixel border => what is the output?
0
0 7x7 output!

0 0 0 0 0 0 e.g. input 7x7

0 3x3 filter, applied with stride 1
pad with 1 pixel border => what is the output?
0
7x7 output!
0 in general, common to see CONV layers with
0 stride 1, filters of size FxF, and zero-padding with
(F-1)/2. (will preserve size spatially)
e.g. F = 3 => zero pad with 1
F = 5 => zero pad with 2
F = 7 => zero pad with 3

Remember back to…
E.g. 32x32 input convolved repeatedly with 5x5 filters shrinks volumes spatially!
(32 -> 28 -> 24 ...). Shrinking too fast is not good, doesn’t work well.
32 28 24
….
CONV, CONV, CONV,
ReLU ReLU ReLU
e.g. 6 e.g. 10
5x5x3 5x5x6
filters filters
32 28 24
3 6 10

Examples time:
Input volume: 32x32x3

10 5x5 filters with stride 1, pad 2
Output volume size: ?

Examples time:

Output volume size:

(32+2*2-5)/1+1 = 32 spatially, so
32x32x10

Examples time:

Number of parameters in this layer?

Examples time:

Number of parameters in this layer?

each filter has 5*5*3 + 1 = 76 params (+1 for bias)
=> 76*10 = 760

Learn Multiple Filters
If we use 100 filters, we get 100 feature maps
Figure: I. Kokkinos

In General
We have only considered a 2-D image as a running example

But we could operate on volumes (e.g. RGB Images would be
depth 3 input, filter would have same depth)
Image from Wikipedia

CMSC 35246
In General: Output Size
For convolutional layer:
• Suppose input is of size W 1 × H 1 ×D 1
• Filter size is K and stride S
• We obtain another volume of dimensions W 2 × H 2 × D 2
• As before:
W1 − K H1 − K
W2 = + 1 and H 2 = +1
S S
• Depths will be equal

CMSC 35246
Convnets
Layers used to build ConvNets:

• a stacked sequence of
layers. 3 main types
• Convolutional Layer, • every layer of a ConvNet transforms
Pooling Layer, and Fully- one volume of activations to
Connected Layer another through a differentiable
function.
The replicated feature approach
• Use many different copies of the

same feature detector with different The red connections all
have the same weight.
positions.
• Could also replicate across scale and
orientation (tricky and expensive)
• Replication greatly reduces the number
of free parameters to be learned.
• Use several different feature types,
each with its own map of replicated
detectors.
• Allows each patch of image to be
represented in several ways.
Backpropagation with weight constraints
• It’s easy to modify the

backpropagation algorithm to
incorporate linear constraints
between the weights.
• We compute the gradients as
usual, and then modify the
gradients so that they satisfy
the constraints.
• So if the weights started off
satisfying the constraints, they
will continue to satisfy them.
What does replicating the feature detectors achieve?
• Equivariant activities: Replicated features do not make the
neural activities invariant to translation. The activities are
equivariant.
representation translated
by active representation
neurons
translated
image image
• Invariant knowledge: If a feature is useful in some locations

during training, detectors for that feature will be available in all
locations during testing.
Pooling the outputs of replicated feature detectors
• Get a small amount of translational invariance at each level by

averaging four neighboring replicated detectors to give a single
output to the next level.
• This reduces the number of inputs to the next layer of feature extraction,
thus allowing us to have many more different feature maps.
• Taking the maximum of the four works slightly better.
• Problem: After several levels of pooling, we have lost

information about the precise positions of things.
• This makes it impossible to use the precise spatial relationships between
high-level parts for recognition.

CS60010: Deep Learning CNN - Part 1: Sudeshna Sarkar

Uploaded by

Copyright:

Available Formats

CS60010: Deep Learning CNN - Part 1: Sudeshna Sarkar

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS60010: Deep Learning CNN - Part 1: Sudeshna Sarkar

Uploaded by

Copyright:

Available Formats

CS60010: Deep Learning

The original Convolutional Neural Network model goes back to

Lecture 7 Convolutional Neural Networks

- Spatial correlation is local

Example: 200x200 image

Note: This parameterization is good when

Example: 200x200 image

Share the same parameters across

Convolve image with kernel having weights w (learned by

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks

What is the number of parameters?

Lecture 7 Convolutional Neural Networks CMSC 35246

Lecture 7 Convolutional Neural Networks CMSC 35246

Learn multiple filters.

E.g.: 200x200 image

We used stride of 1, kernel with receptive field of size 3 by 3

In previous example: N = 6, K = 3, S = 1, Output size = 4

Lecture 7 Convolutional Neural Networks

Fei-Fei Li & Andrej Karpathy Lecture 7 - 23 21 Jan 2015

Convolve the filter with the image

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

Convolve the filter with the image

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

convolve (slide) over all

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

convolve (slide) over all

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

We stack these up to get a “new image” of size 28x28x6!

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

convolve (slide) over all

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

7x7 input (spatially)

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

7x7 input (spatially)

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

7x7 input (spatially)

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

7x7 input (spatially)

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

7x7 input (spatially)

=> 5x5 output

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 7 - 21 Jan 2015