Lec4 SIFT HoG
Lec4 SIFT HoG
Lec4 SIFT HoG
Some slides were adapted/taken from various sources, including 3D Computer Vision of Prof. Hee, NUS, Air Lab Summer
School, The Robotic Institute, CMU, Computer Vision of Prof. Mubarak Shah, UCF, Computer Vision of Prof. William Hoff,
Colorado School of Mines and many more. We thankfully acknowledge them. Students are requested to use this material for
their study only and NOT to distribute it.
Finding the “same” thing across images
Categories Find a bottle: Instances Find these two objects
deformed
Slide Credit: James Hays
Overview of Keypoint Matching
1. Find a set of
distinctive key-
points
A1
2. Define a region
around each
A2 A3 keypoint
3. Extract and
normalize the
region content
fA fB
4. Compute a local
descriptor from the
normalized region
d ( f A, f B ) T
5. Match local
descriptors
Goals for Keypoints
A2 A3
Description of patches
Features Descriptors
Choosing interest points
Where would you
tell your friend to
meet you?
Kristen Grauman
Goal: interest operator repeatability
• We want to detect (at least some of) the
same points in both images.
?
• Must provide some invariance to geometric
and photometric differences between the two
views.
Kristen Grauman
Some patches can be localized
or matched with higher accuracy than
others.
Some patches can be localized
or matched with higher accuracy than
others.
To continue…
Some Mathematical Preliminaries
Derivative of Gaussian filter
* [1 -1] =
1 pixel
1 pixel 3 pixels
Image derivatives
Image derivatives
Definition
Approximation
Convolution kernels
Definition
Approximation
Convolution kernels
Definition
Approximation
Convolution kernels
Definition
Approximation
1
Convolution kernels
Definition
Approximation
Convolution kernels
Definition
Approximation
Convolution
f x 1 1 1
fy
1
1
Image I I x I * 1 1 Iy I *
1
Increasing noise
( x2 y 2 )
g(x, y) e 2 2
Scale of Gaussian
– As increases, more pixels are involved in average
– As increases, image is more blurred
– As increases, noise is more effectively suppressed
2 f 2 f
f 2 2
2
x y
= [f(x+1,y)+f(x-1,y)+f(x,y+1)+f(x,y-1)]-4f(x,y)
0 1 0 1 1 1 -1 2 -1
1 -4 1 1 -8 1 2 -4 2
0 1 0 1 1 1 -1 2 -1
• The Laplacian
• example:
Gaussian smoothing
Find Laplacian
Gaussian smoothing
Find Laplacian
Standard
deviation
x
-3 -2 -1 0 1 2 3
1
3
6
scale scale
Aug
me
nte
List of
d (x, y, σ)
Se
nso K. Grauman, B. Leibe
ry
an
Automatic scale selection
Intuition:
• Find scale that gives local maxima of some function
f in both position and scale.
f f
Image 1 Image 2
K. Grauman,
s1 region size s2 region size
Automatic Scale Selection
f (I i i (x, ))
1 m
f (I i i (x, ))
1 m
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
f (I i i (x, ))
1 m
f (I i i (x, ))
1 m
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
f (I i i (x, ))
1 m
f (I i i (x, ))
1 m
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
f (I i i (x, ))
1 m
f (I i i (x, ))
1 m
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
f (I i i (x, ))
1 m
f (I i i (x, ))
1 m
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
f (I i i (x, ))
1 m
f (I i i (x, ))
1 m
K. Grauman, B. Leibe
What Is A Useful Signature Function?
Co
mp
uti
ng
Aug
me
nte
d
Se
nso K. Grauman, B. Leibe
ry
Scale-space blob detector: Example
k 2
Building a Scale Space
1 2 2
( x y ) / 2 k
G( x, y,k )
2 2
e
2 ( k )2
k 4
k 3
k 2
k 4
k 3
k 2
k 2
k
.707187.6; k 2
How many scales per
octave?
A collection of 32 real images drawn from a
diverse range, including
outdoor scenes, human faces, aerial photographs, and
industrial
3 Scales
Initial value of sigma
1.6
Scale Space Peak Detection
Compare a pixel (X) with 26
pixels in current and adjacent
scales (Green Circles)
Select a pixel (X) if
computationally expensive
Detect the most stable subset
with a coarse sampling of scales
Key Point Localization
Candidates are chosen from extrema
detection
transformation
Use of gradient orientation histograms
Robust representation
Similarity to IT cortex
Complex neurons respond to a gradient at a
particular orientation.
Location of the feature can shift over a small
receptive field.
56
Tomaso Poggio, MIT
61
Matching local features
Kristen Grauman
Matching local features
Image 1 Image 2
• To generate candidate matches, find that
have the most similar appearance or SIFTpatches
descriptor
• Simplest approach: compare them all, take the
closest
(or closest k, or within a thresholded distance)
Kristen Grauman
Query Image
1st NN
2nd NN
Query Image
3rd NN
4th NN
Distance to first match
if .8 Goodmatch
Distance toseond match
Ambiguous matches
????
Image 1 Image 2
• At what distance do we have a good match?
• To add robustness to matching, can consider ratio : distance to
best match / distance to second best match
• If low, first match looks good.
• If high, could be ambiguous match.
Kristen Grauman
The ratio of distance
from the closest to the distance of the
second closest
SIFT Detector
Generate Scale Space of an Image
Detect Peaks in Scale Space (extrema)
Localize Interest Points (Taylor Series)
Remove outliers (remove response
along edges)
Assign Orientation
69
SIFT Descriptor
Compute relative orientation and magnitude in a
16x16 neighborhood at key point
Form weighted histogram (8 bin) for 4x4 regions
Weight by magnitude and spatial Gaussian
Concatenate 16 histograms in one long vector of 128 dimensions
73
Dr. Edgar Seemann 74
HOG Steps
HOG feature extraction
Compute centered horizontal and vertical gradients with no smoothing
Compute gradient orientation and magnitudes
For color image, pick the color channel with the highest gradient magnitude for each
pixel.
Centered: -1 0 1 0
1
Gradient
Magnitude: s sx2 s y2
θ
s
Orientation: arctan( y )
sx
76
Blocks, Cells
Block 2
Block 1
16x16 blocks of 50% overlap.
7x15=105 blocks in total
Cells
(8 by 8)
Votes
Each block consists of 2x2 cells with
size 8x8
Quantize the gradient orientation into 9 9 Bins
bins (0-180)
The vote is the gradient magnitude
Bin centers
Visualization
79
Results
SIFT HOG
128 dimensional vector 3,780 dimensional vector
16 by 16 window 64 by 128 window
4x4 sub-window (16 total) 16 by 16 blocks with
8 bin histogram overlap
Each block consists of 2 by
2 cells each of 8 by 8
Overlapping
9 bin histogram
To continue…