Deep Learning: Data Mining: Advanced Aspects

Download as pdf or txt
Download as pdf or txt
You are on page 1of 131

Deep Learning

Data mining: Advanced Aspects

Siham Tabik
University of Granada
[email protected]
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - 1 1 Feb 2016
Schedule

Deep Learning
February 13th February 14th February 20th
Deep Neural Networks. Tensorflow. Recurrent Neural
Networks.
• Convolutional Neural
Networks (CNNs) • Get familiar • Some theory
• Classification & • Build our first CNN • an example using
detection model tensorflow
• Two case studies • One case study
Logistics
• Forum for debates, …
https://github.com/SihamTabik/Patio_DL/issues
• Textbook “Deep Learning with Tensorflow”. Please, download the
pdf from PRADO.
Convolutional Neural Networks

Siham Tabik
University of Granada
[email protected]

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - 1 1 Feb 2016
Outline
1. What are ANN and CNNs?
2. How do CNNs work?
 Convolution layers, pooling layers, FC layers, gradient, Backpropagation

3. Transfer learning
4. Data augmentation
5. One case study of CNN-classification model
6. CNNs for Detection
7. Two case studies
Outline
1. What are ANN and CNNs?
2. How do CNNs work?
 Convolution layers, pooling layers, FC layers, gradient, Backpropagation

3. Transfer learning
4. Data augmentation
5. One case study of CNN-classification model
6. CNNs for Detection
7. Two case studies
Artificial Neural Networks
Machine learning algorithms
Learn and predict on data
Artificial Neural Networks

... and others
Source:
http://www.asimovinstitute.org/neural-network-zoo/
By the way, what is image classification?
Classification Instance
Classification Object Detection
+ Localization Segmentation

CAT CAT CAT, DOG, DUCK CAT, DOG, DUCK

Single object Multiple objects

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - 8 1 Feb 2016
Convolutional Neural Networks
PASCAL Visual Object Classes Challenge (2005-2012)

 Database: Public dataset of 10,103 images& 20 object classes


 Annual Competition: PASCAL VOC Challenge
 Networks: The most accurate net wins the challenge
IMAGENET image classification challenge

 Database: Public dataset of 14,197,122 images & 21,841 classes


 Annual Competition: The Large Scale Visual Recognition Challenge (ILSVRC)
 Classification task: 1,431,167 images & 1000 classes (1.2M train+100.000 test)
 Detection task, segmentation task, …
 Networks: The most accurate net win the challenge

Imagenet: A large‐scale hierarchical image database J Deng, W Dong, R Socher, LJ Li, K Li, L Fei‐Fei (2009)


IMAGENET image classification challenge

0,3
Imagenet classification error

28,0%
26,0%
0,2

16,4%
0,1 11,7%
7,3% 6,7%
3,5% 5,1%
0,0 2,3%
2010 2011 2012 2013 2014 2015 2016 2017 Human
ILSVRC year Russakovsky et 
al. arXiv, 2014
IMAGENET image classification challenge
Year 2012 2013 2014 2015 2016
AlexNet ZFnet GoogLeNet VGG ResNet Inception&ResNet
8 layers                               8 layers                  19 layers              22 layers                       152 layers

Krizhevsky et al. NIPS  Zeiler and Fergus   Frizhevsky et al. arxiv &  simonyan et al. arxiv He et al CVPR 


Evolution of CNNs
1975 1985 1998 2009 2012
CNNs 1st implementation of Conv.  1st CNN trained with  1st CNN trained with  Won the ILSVRC 
& pooling layers by K.  Backpropagation  backp. on MNIST:  classification 
Fukushima  LeNet (0.8% error) by  challenge
Y. LeCun

Learning Costly Backpropagation  Backpropagation  Backpropagation  Backpropagation 


Algorithms
Database Small Small MNIST 1st massive 
dataset ImageNet

Hardware Limited capacity Limited capacity Limited capacity Powerful GPUs 


and CPUs
CNNs architecture
A lot of influential players who built the fundation
Outline
1. What are ANN?
2. What are CNNs?
3. How CNNs work?
 Convolution layers, pooling layers, FC layers, Gradient, Backpropagation

4. Transfer learning
5. Data augmentation
6. One case study
7. CNNs for Detection
8. Two case studies
How CNNs work?

A two‐dimensional
array of pixels

CNN X or O
For example

CNN X

CNN O
Trickier cases

CNN X

CNN O
Deciding is hard
Deciding is hard

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 1 1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
What computers see

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 X ‐1 ‐1 ‐1 ‐1 X X ‐1
‐1 X X ‐1 ‐1 X X ‐1 ‐1
‐1 ‐1 X 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 X ‐1 ‐1
‐1 ‐1 X X ‐1 ‐1 X X ‐1
‐1 X X ‐1 ‐1 ‐1 ‐1 X ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Computers are literal

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 1 1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
CNNs match pieces of the image
=

=
Features match pieces of the image

1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 1
‐1 1 ‐1 ‐1 1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 1 ‐1 1 1 ‐1 ‐1
1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 1
‐1 1 ‐1 ‐1 1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 1 ‐1 1 1 ‐1 ‐1
1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 1
‐1 1 ‐1 ‐1 1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 1 ‐1 1 1 ‐1 ‐1
1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 1
‐1 1 ‐1 ‐1 1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 1 ‐1 1 1 ‐1 ‐1
1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 1
‐1 1 ‐1 ‐1 1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 1 ‐1 1 1 ‐1 ‐1
1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 1
‐1 1 ‐1 ‐1 1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 1 ‐1 1 1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1
‐1 ‐1 1

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1
‐1 ‐1 1

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1 1
‐1 ‐1 1

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1 1 1
‐1 ‐1 1

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1 1 1 1
‐1 ‐1 1

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1 1 1 1
‐1 ‐1 1
1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1 1 1 1
‐1 ‐1 1
1 1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1 1 1 1
‐1 ‐1 1
1 1 1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1 1 1 1
‐1 ‐1 1
1 1 1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1 1 1 1
‐1 ‐1 1
1 1 1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 1 1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1 1 1 1
‐1 ‐1 1
1 1 1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 1 1 1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
1 1 1
‐1 1 ‐1
1 1 1
‐1 ‐1 1
1 1 1

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1 1
‐1 ‐1 1

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1 1 1 ‐1
‐1 ‐1 1

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1
‐1 1 ‐1 1 1 ‐1
‐1 ‐1 1
1 1 1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 1 1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Filtering: The math behind the match
1 ‐1 ‐1 1 1 ‐1
‐1 1 ‐1
1 1 1
‐1 ‐1 1
‐1 1 1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1 55
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Convolution: Trying every possible match
1 ‐1 ‐1
‐1 1 ‐1
‐1 ‐1 1

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33

‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11

‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1 0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1 0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1 0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Convolution: Trying every possible match

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33

‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11


‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1
= 0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1 0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11


‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 1 ‐1 ‐1
=
0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1
‐1
‐1
‐1
‐1
‐1
‐1
1
1
‐1
‐1
1
‐1
‐1
‐1
‐1
‐1
‐1
‐1 1 ‐1 0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11


‐1
‐1
‐1
1
1
‐1
‐1
‐1
‐1
‐1
‐1
‐1
1
‐1
‐1
1
‐1
‐1
‐1 ‐1 1 ‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 0.33 ‐0.55 0.11 ‐0.11 0.11 ‐0.55 0.33

‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐0.55 0.55 ‐0.55 0.33 ‐0.55 0.55 ‐0.55


‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1 1 ‐1 1 0.11 ‐0.55 0.55 ‐0.77 0.55 ‐0.55 0.11

=
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1
‐1
‐1
‐1
‐1
‐1
‐1
1
1
‐1
‐1
1
‐1
‐1
‐1
‐1
‐1
‐1
‐1 1 ‐1 ‐0.11 0.33 ‐0.77 1.00 ‐0.77 0.33 ‐0.11

0.11 ‐0.55 0.55 ‐0.77 0.55 ‐0.55 0.11


‐1
‐1
‐1
1
1
‐1
‐1
‐1
‐1
‐1
‐1
‐1
1
‐1
‐1
1
‐1
‐1
1 ‐1 1 ‐0.55 0.55 ‐0.55 0.33 ‐0.55 0.55 ‐0.55

0.33 ‐0.55 0.11 ‐0.11 0.11 ‐0.55 0.33


‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77

‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11


‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 1
=
0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1
‐1
‐1
‐1
‐1
‐1
‐1
1
1
‐1
‐1
1
‐1
‐1
‐1
‐1
‐1
‐1
‐1 1 ‐1 0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55


‐1
‐1
‐1
1
1
‐1
‐1
‐1
‐1
‐1
‐1
‐1
1
‐1
‐1
1
‐1
‐1
1 ‐1 ‐1 ‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33


Convolution layer

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11

1 ‐1 ‐1 0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

‐1 1 ‐1 0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

‐1 ‐1 1 ‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77

0.33 ‐0.55 0.11 ‐0.11 0.11 ‐0.55 0.33


‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐0.55 0.55 ‐0.55 0.33 ‐0.55 0.55 ‐0.55
‐1
‐1
‐1
‐1
1
‐1
‐1
1
‐1
‐1
‐1
1
1
‐1
‐1
‐1
‐1
‐1
1 ‐1 1 0.11 ‐0.55 0.55 ‐0.77 0.55 ‐0.55 0.11

‐1
‐1
‐1
‐1
‐1
‐1
‐1
1
1
‐1
‐1
1
‐1
‐1
‐1
‐1
‐1
‐1
‐1 1 ‐1 ‐0.11 0.33 ‐0.77 1.00 ‐0.77 0.33 ‐0.11

0.11 ‐0.55 0.55 ‐0.77 0.55 ‐0.55 0.11


‐1
‐1
‐1
1
1
‐1
‐1
‐1
‐1
‐1
‐1
‐1
1
‐1
‐1
1
‐1
‐1
1 ‐1 1 ‐0.55 0.55 ‐0.55 0.33 ‐0.55 0.55 ‐0.55

0.33 ‐0.55 0.11 ‐0.11 0.11 ‐0.55 0.33


‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

‐1 ‐1 1 0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

‐1 1 ‐1 0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

1 ‐1 ‐1 ‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33


Convolution layer

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77

0.33 ‐0.55 0.11 ‐0.11 0.11 ‐0.55 0.33


‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 ‐0.55 0.55 ‐0.55 0.33 ‐0.55 0.55 ‐0.55
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
0.11 ‐0.55 0.55 ‐0.77 0.55 ‐0.55 0.11
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐0.11 0.33 ‐0.77 1.00 ‐0.77 0.33 ‐0.11
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1 0.11 ‐0.55 0.55 ‐0.77 0.55 ‐0.55 0.11
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐0.55 0.55 ‐0.55 0.33 ‐0.55 0.55 ‐0.55
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
0.33 ‐0.55 0.11 ‐0.11 0.11 ‐0.55 0.33
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33


Pooling: Shrinking the image stack
Pooling

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11


1.00
0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77


Pooling

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11


1.00 0.33
0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77


Pooling

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11


1.00 0.33 0.55
0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77


Pooling

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11


1.00 0.33 0.55 0.33
0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77


Pooling

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11


1.00 0.33 0.55 0.33
0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55
0.33
0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77


Pooling

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11


1.00 0.33 0.55 0.33
0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55
0.33 1.00 0.33 0.55
0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33
0.55 0.33 1.00 0.11
0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11
0.33 0.55 0.11 0.77
‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77


0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33
1.00 0.33 0.55 0.33
‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55


0.33 1.00 0.33 0.55
0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11 0.55 0.33 1.00 0.11
‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11
0.33 0.55 0.11 0.77
0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77

0.33 ‐0.55 0.11 ‐0.11 0.11 ‐0.55 0.33


0.55 0.33 0.55 0.33
‐0.55 0.55 ‐0.55 0.33 ‐0.55 0.55 ‐0.55

0.11 ‐0.55 0.55 ‐0.77 0.55 ‐0.55 0.11


0.33 1.00 0.55 0.11
‐0.11 0.33 ‐0.77 1.00 ‐0.77 0.33 ‐0.11

0.11 ‐0.55 0.55 ‐0.77 0.55 ‐0.55 0.11 0.55 0.55 0.55 0.11
‐0.55 0.55 ‐0.55 0.33 ‐0.55 0.55 ‐0.55
0.33 0.11 0.11 0.33
0.33 ‐0.55 0.11 ‐0.11 0.11 ‐0.55 0.33

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77


0.33 0.55 1.00 0.77
‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11


0.55 0.55 1.00 0.33
0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55 1.00 1.00 0.11 0.55

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11


0.77 0.33 0.55 0.33
0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33
Pooling layer

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33


1.00 0.33 0.55 0.33
‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55


0.33 1.00 0.33 0.55
0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11 0.55 0.33 1.00 0.11
‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11
0.33 0.55 0.11 0.77
0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77

0.33 ‐0.55 0.11 ‐0.11 0.11 ‐0.55 0.33


0.55 0.33 0.55 0.33
‐0.55 0.55 ‐0.55 0.33 ‐0.55 0.55 ‐0.55

0.11 ‐0.55 0.55 ‐0.77 0.55 ‐0.55 0.11


0.33 1.00 0.55 0.11
‐0.11 0.33 ‐0.77 1.00 ‐0.77 0.33 ‐0.11

0.11 ‐0.55 0.55 ‐0.77 0.55 ‐0.55 0.11 0.55 0.55 0.55 0.11

‐0.55 0.55 ‐0.55 0.33 ‐0.55 0.55 ‐0.55


0.33 0.11 0.11 0.33
0.33 ‐0.55 0.11 ‐0.11 0.11 ‐0.55 0.33

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77


0.33 0.55 1.00 0.77
‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11


0.55 0.55 1.00 0.33
0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55 1.00 1.00 0.11 0.55

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11


0.77 0.33 0.55 0.33
0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33
Normalization
Rectified Linear Units (ReLUs)

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33 0.77

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77


Rectified Linear Units (ReLUs)

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33 0.77 0

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77


Rectified Linear Units (ReLUs)

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33 0.77 0 0.11 0.33 0.55 0 0.33

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77


Rectified Linear Units (ReLUs)

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33 0.77 0 0.11 0.33 0.55 0 0.33

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11 0 1.00 0 0.33 0 0.11 0

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55 0.11 0 1.00 0 0.11 0 0.55

0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33 0.33 0.33 0 0.55 0 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11 0.55 0 0.11 0 1.00 0 0.11

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11 0 0.11 0 0.33 0 1.00 0

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77 0.33 0 0.55 0.33 0.11 0 0.77
ReLU layer

0.77 0 0.11 0.33 0.55 0 0.33


0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33

0 1.00 0 0.33 0 0.11 0


‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11

0.11 0 1.00 0 0.11 0 0.55


0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55

0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33 0.33 0.33 0 0.55 0 0.33 0.33

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11 0.55 0 0.11 0 1.00 0 0.11

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11 0 0.11 0 0.33 0 1.00 0

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77 0.33 0 0.55 0.33 0.11 0 0.77

0.33 ‐0.55 0.11 ‐0.11 0.11 ‐0.55 0.33 0.33 0 0.11 0 0.11 0 0.33

‐0.55 0.55 ‐0.55 0.33 ‐0.55 0.55 ‐0.55 0 0.55 0 0.33 0 0.55 0

0.11 ‐0.55 0.55 ‐0.77 0.55 ‐0.55 0.11 0.11 0 0.55 0 0.55 0 0.11

‐0.11 0.33 ‐0.77 1.00 ‐0.77 0.33 ‐0.11 0 0.33 0 1.00 0 0.33 0

0.11 ‐0.55 0.55 ‐0.77 0.55 ‐0.55 0.11 0.11 0 0.55 0 0.55 0 0.11

‐0.55 0.55 ‐0.55 0.33 ‐0.55 0.55 ‐0.55 0 0.55 0 0.33 0 0.55 0

0.33 ‐0.55 0.11 ‐0.11 0.11 ‐0.55 0.33 0.33 0 0.11 0 0.11 0 0.33

0.33 ‐0.11 0.55 0.33 0.11 ‐0.11 0.77 0.33 0 0.55 0.33 0.11 0 0.77

‐0.11 0.11 ‐0.11 0.33 ‐0.11 1.00 ‐0.11 0 0.11 0 0.33 0 1.00 0

0.55 ‐0.11 0.11 ‐0.33 1.00 ‐0.11 0.11 0.55 0 0.11 0 1.00 0 0.11

0.33 0.33 ‐0.33 0.55 ‐0.33 0.33 0.33 0.33 0.33 0 0.55 0 0.33 0.33

0.11 ‐0.11 1.00 ‐0.33 0.11 ‐0.11 0.55 0.11 0 1.00 0 0.11 0 0.55

‐0.11 1.00 ‐0.11 0.33 ‐0.11 0.11 ‐0.11 0 1.00 0 0.33 0 0.11 0

0.77 ‐0.11 0.11 0.33 0.55 ‐0.11 0.33 0.77 0 0.11 0.33 0.55 0 0.33
Layers get stacked

1.00 0.33 0.55 0.33

0.33 1.00 0.33 0.55

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 0.55 0.33 1.00 0.11

‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 0.33 0.55 0.11 0.77

‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
0.55 0.33 0.55 0.33
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
0.33 1.00 0.55 0.11
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
0.55 0.55 0.55 0.11
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
0.33 0.11 0.11 0.33
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 0.33 0.55 1.00 0.77

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 0.55 0.55 1.00 0.33

1.00 1.00 0.11 0.55

0.77 0.33 0.55 0.33


Deep stacking

1.00 0.55

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 0.55 1.00
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1 1.00 0.55
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
0.55 0.55
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1 0.55 1.00
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
1.00 0.55
Fully connected layer

1.00

0.55

1.00 0.55 0.55

0.55 1.00 1.00

1.00

1.00 0.55 0.55

0.55 0.55 0.55

0.55
0.55 1.00
0.55
1.00 0.55
1.00

1.00

0.55
Fully connected layer

1.00

0.55

0.55

1.00

1.00
X
0.55

0.55

O
0.55

0.55

1.00

1.00

0.55
Fully connected layer

0.55

1.00

1.00

0.55

0.55
X
0.55

0.55

O
0.55

1.00

0.55

0.55

1.00
Fully connected layer

0.9

0.65

0.45

0.87

0.96
X
0.73

0.23

O
0.63

0.44

0.89

0.94

0.53
Fully connected layer

0.9

0.65

0.45

0.87

0.96
X
0.73

0.23

O
0.63

0.44

0.89

0.94

0.53
Fully connected layer

0.9

0.65

0.45

0.87

0.96
X
0.73

0.23

O
0.63

0.44

0.89

0.94

0.53
Putting it all together

‐1
‐1
‐1
‐1
‐1
‐1
1
‐1
‐1
‐1
‐1
‐1
1
‐1
‐1
‐1
‐1
‐1
1
‐1
‐1
‐1
‐1
‐1
1
‐1
‐1
‐1
1
‐1
‐1
‐1
1
‐1
‐1
‐1
1
‐1
‐1
‐1
‐1
‐1
‐1
‐1
‐1
X
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1

O
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Learning
Backpropagation
Wi

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Backpropagation
Wi

Forward pass
W1 W2 W3 W4 W5

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1

actual answer
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Backpropagation
Wi

Forward pass
W1 W2 W3 W4 W5

‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1

actual answer
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Backpropagation
Wi X
Forward pass
W1 W2 W3 W4 W5

Error= right answer –
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1

actual answer
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Backpropagation
Wi X
Forward pass
W1 W2 W3 W4 W5

Error= right answer –
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1

actual answer
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 ‐1 1 ‐1 ‐1 ‐1 ‐1
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1

1 2 5
Back pass
Gradient descent

error

weight
Gradient descent

error

weight
Limitations

Practical solutions
Outline
1. What are ANN?
2. What are CNNs?
3. How do CNNs work?
 Convolution layers, pooling layers, FC layers, Backpropagation
4. Transfer learning
5. Data augmentation
6. One case study
7. CNNs for Detection
8. Two case studies
Transfer Learning
 Given a new problem & new data

 Instead of training from scratch, train only the last layers

‐1
‐1
‐1
‐1
‐1
‐1
1
‐1
‐1
‐1
‐1
‐1
1
‐1
‐1
‐1
‐1
‐1
1
‐1
‐1
‐1
‐1
‐1
1
‐1
‐1
‐1
1
‐1
‐1
‐1
1
‐1
‐1
‐1
1
‐1
‐1
‐1
‐1
‐1
‐1
‐1
‐1
X
‐1 ‐1 ‐1 1 ‐1 1 ‐1 ‐1 ‐1
‐1 ‐1 1 ‐1 ‐1 ‐1 1 ‐1 ‐1

O
‐1 1 ‐1 ‐1 ‐1 ‐1 ‐1 1 ‐1
‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1 ‐1
Outline
1. What are ANN?
2. What are CNNs?
3. How CNNs work?
 Convolution layers, pooling layers, FC layers, Backpropagation
4. Transfer learning
5. Data augmentation
6. Three case studies
Data augmentation
 Increase the training dataset volume artificially using transformations
 Objective: Improve model robustness

Original :         Scaling       Centring translation     rotation  …. etc
Outline
1. What are ANN?
2. What are CNNs?
3. How CNNs work?
 Convolution layers, pooling layers, FC layers, Backpropagation

4. Transfer learning
5. Data augmentation
6. One case study
7. CNNs for Detection
8. Two case studies
Case study 1: Data-augumentation for CNNs using MNIST

 Objective: Analyze the benefit


of data-augmentation and
ensembles on CNNs
 Methodology:
 MNIST (60.000 train +
30.000 test) & 10 classes
 Three CNNs: LeNet,
Network3, Dropconnect

A snapshot of image pre-processing for convolutional neural networks: case study of MNIST S Tabik, D Peralta, A
Herrera-Poyatos, F Herrera. International Journal of Computational Intelligence Systems 10, 555–568
Case study 1: Data-augumentation for CNNs using MNIST

Data-augmentation techniques
Three Lenet like CNNs:
1. LeNet: 2 Conv + 2FC(ReLU)
2. Network3: 2 Conv + 3FC (ReLU+Dropout)
3. DropConnect: 2 Conv + 2FC (ReLU+Dropconnect)
Case study 1: Data-augumentation for CNNs using MNIST


Test-set accuracies ( )

Case study 1: Data-augumentation for CNNs using MNIST

Test-set accuracies
Case study 1: Data-augumentation for CNNs using MNIST

Test-set accuracies
Case study 1: Data-augumentation for CNNs using MNIST

Results Esembles using the most voted strategy

Error : 0.28% versus state of the art ensemble 0.23%
Case study 1: Data-augumentation for CNNs using MNIST

Results:The 28 misclassified characters

ensemble-5 (Network3) ensemble-5 (DropConnect)


Case study 1: Data-augumentation for CNNs using MNIST

The 13 handwritten digits


misclassified by ensemble-5 of
DropConnet
and Network3
Detection with CNNs
Siham Tabik
[email protected]
Outline
• What is detection?
• How detection models work?
• The state-of-the art detection models
• Case study 1
• Case study 2
By the way, what is object detection?

Three classes: cat, dog and duck.


Given a test image:

Object Detection

CAT, DOG, DUCK

Fei-Fei Li & Andrej


Fei-Fei Li & Andrej
Karpathy Karpathy & Justin 1
& Justin Feb 2016 Lecture 8 - 8
Johnson 1 Feb 2016
Johnson
Object detection
Results of the ILSVRC object detection task over four years
How object detection works?

 Object detection methods reformulate the detection problem into a


classification problem
 Combine a selective research method with a classification model
Detection as Classification

CAT? NO

DOG? NO

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - 41 1 Feb 2016
Detection as Classification

CAT? YES!

DOG? NO

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - 42 1 Feb 2016
Detection as Classification

CAT? NO

DOG? NO

Fei-Fei Li & Andrej Karpathy


Fei‐Fei Li & Andrej Karpathy & Justin Johnson & Justin Johnson Lecture 8 - 43 1 Feb 2016
1 Feb 2016
Detection as Classification
Problem: Need to test many positions and scales, and use a
computationally demanding classifier (CNN)

Solution: Only look at a tiny subset of possible positions

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - 48 1 Feb 2016
Region Proposals
● Find “blobby” image regions that are likely to contain objects
● “Class-agnostic” object detector
● Look for “blob-like” regions

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - 49 1 Feb 2016
1 Feb 2016
R-CNN (RP+CNN)
 The first CNN based detection model was R-CNN
 Combines: VGG and Region Proposals
 Boosted the object detection challenge by almost 50%.

Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection
and semantic segmentation." 2014.

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - 53 1 Feb 2016
Fast R-CNN and Faster R-CNN

Girshick, Ross. "Fast R-CNN" 2015.

Ren, Shaoqing, et al. "Faster R-CNN:


Towards real-time object detection with
region proposal networks." 2015.
Faster R-CNN, SSD, R-FCN
Outline
• What is detection?
• How detection models work?
• The state-of-the art detection models
• Case study 1
• Case study 2
Case study 3: Handgun alarm detection in videos

 Objective: Develop a fast and accurate


pistol detection model in videos
 Methodology:
 New database
 Two models:
 “VGG+ Sliding Window”
 “VGG+ Region Proposal
(Faster-RCNN)”

Automatic Handgun Detection Alarm in Videos Using Deep Learning R


Olmos, S Tabik, F Herrera. Neurocomputing, 275, 66-72
Case study 3: Handgun alarm detection in videos
A new database
Pistol class
Negative class
Results on the same dataset

Database #TP #FN #TN #FP Precision Recall F1_measure

For Sliding window 97 207 298 6 94.17% 31.91% 47.67%

For Faster RCNN 304 0 247 57 81.21% 100.00% 91.43%


Results in images
Results on pictures
Results on youtube videos
Results on Skyfall
Results on real surveillance videos
Outline
• What is detection?
• How detection models work?
• The state-of-the art detection models
• Case study 1
• Case study 2
Case study 2: Protected plant species detection using Google-
earth imagery

 Objective: Ziziphus Lotus


detection in GE images
 Methodology:
 Design new database to
train CNNs
 Three models: “CNNs+ 
Sliding Window”  “CNNs+ 
Preprocessing” “OBIA”

Deep-Learning Convolutional Neural Networks for scattered shrub detection with Google Earth Imagery E Guirado,
S Tabik, D Alcaraz-Segura, J Cabello, F Herrera. Remote Sensing 9(12): 1220 (2017)
Case study 2: Protected plant species detection using Google-earth imagery
 Design new database to train CNNs
Case study 2: Protected plant species detection using Google-earth imagery

• Database for training 
the CNNs
• A two‐class problem
Case study 2: Protected plant species detection using Google-earth imagery
Case study 2: Protected plant species detection using Google-earth imagery
Preprocessing:

(Pixel value<100)
Case study 2: Protected plant species detection using Google-earth imagery

• Results
(GoogLeNet+sliding
window)
• It takes: 291 min
Case study 2: Protected plant species detection using Google-earth imagery
Results: OBIA(12 hours) vs CNNs(35.4seconds)

Precision Recall F1

CNNs+prp. 98.57% 95.83% 97.18%

OBIA 77.65% 91.67% 84.08%


Case study 2: Protected plant species detection using Google-earth imagery
Results: OBIA(12 hours) vs CNNs(24.1seconds)

Precision Recall F1

CNNs+prp. 92.68% 92.68% 92.68%

OBIA 72.41% 51.00% 60.00%

You might also like