Lecture7 DoG SIFT Cs131

Lecture
7:
Finding Features (part 2/2)
Professor Fei-‐Fei Li

Stanford Vision Lab
Fei-Fei Li! Lecture 7 - !1 2-‐Oct-‐14

What we will learn today?
•  Local invariant features
–  MoHvaHon
–  Requirements, invariances Previous lecture (#6)
•  Keypoint localizaHon
–  Harris corner detector
•  Scale invariant region selecHon
–  AutomaHc scale selecHon
–  Difference-‐of-‐Gaussian (DoG) detector
•  SIFT: an image region descriptor
Some background reading: David Lowe, IJCV 2004

A quick review
–  MoHvaHon
–  Requirements, invariances

Quick review: Harris Corner Detector
Slide credit: Alyosha Efros

“flat” region: “edge”: “corner”:
no change in all no change along significant change
directions the edge direction in all directions

θ = det(M ) − α trace(M )2 = λ1λ2 − α (λ1 + λ2 )2
“Edge”
θ<0
“Corner”
θ > 0
Slide credit: Kristen Grauman
•  Fast approximaHon
–  Avoid compuHng the
eigenvalues
–  α: constant “Flat” “Edge”
(0.04 to 0.06) region θ<0
λ1
Slide adapted from Darya Frolova, Denis Simakov

•  TranslaHon invariance
•  RotaHon invariance
•  Scale invariance?

Corner All points will be
classified as edges!
Not invariant to image scale!

–  MoHvaHon

Scale Invariant DetecHon
•  Consider regions (e.g. circles) of different sizes
around a point
•  Regions of corresponding sizes will look the same
in both images

•  The problem: how do we choose corresponding
circles independently in each image?

•  SoluHon:
–  Design a funcHon on the region (circle), which is “scale
invariant” (the same for corresponding regions, even if
they are at different scales)
Example: average intensity. For corresponding regions
(even of different sizes) it will be the same.
–  For a point in one image, we can consider it as a
funcHon of region size (circle radius)

f Image 1 f Image 2
scale = 1/2
region size region size

•  Common approach:
Take a local maximum of this funcHon
•  ObservaHon: region size, for which the maximum is

achieved, should be invariant to image scale.
Important: this scale invariant region size is

found in each image independently!
f Image 1 f Image 2
scale = 1/2
s1 region size s2 region size

•  A “good” funcHon for scale detecHon:
has one stable sharp peak
f f f
Good
ba bad !
d
region size region size region size
•  For usual images: a good funcHon would be a one

which responds to contrast (sharp local intensity
change)

•  FuncHons for determining scale f = Kernel ∗ Image
Kernels:
L = σ 2 (Gxx ( x, y, σ ) + Gyy ( x, y, σ ) )
(Laplacian)
DoG = G( x, y, kσ ) − G( x, y, σ )
(Difference of Gaussians)
where Gaussian
x2 + y 2
− Note: both kernels are invariant
2σ 2
G ( x, y , σ ) = 1
2πσ
e to scale and rotation

det M = λ1λ2
trace M = λ1 + λ2
trace det
scale scale
From Lindeberg 1998
blob detecHon; Marr 1982; Voorhees and Poggio 1987; Blostein and Ahuja 1989; …

Scale Invariant Detectors
scale
•  Harris-‐Laplacian1
← Laplacian →
Find local maximum of:
–  Harris corner detector in
y
space (image coordinates)
–  Laplacian in scale ← Harris → x
•  SIFT (Lowe)2 scale
Find local maximum of:
–  Difference of Gaussians in space
← DoG →
and scale
y
← DoG → x
1 K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001
2 D.Lowe. “DisHncHve Image Features from Scale-‐Invariant Keypoints”. IJCV 2004

Scale Invariant Detectors
•  Experimental evaluaHon of detectors
w.r.t. scale change
Repeatability rate:
# correspondences
# possible correspondences
K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001

Scale Invariant DetecHon:
Summary
•  Given: two images of the same scene with a large
scale difference between them
•  Goal: find the same interest points independently
in each image
•  SoluHon: search for maxima of suitable funcHons
in scale and in space (over the image)
Methods:
1.  Harris-Laplacian [Mikolajczyk, Schmid]: maximize Laplacian over
scale, Harris’ measure of corner response over the image
2.  SIFT [Lowe]: maximize Difference of Gaussians over scale and
space

–  MoHvaHon

Local Descriptors
•  We know how to detect points
•  Next quesHon:
How to describe them for matching?

Point descriptor should be:
1.  Invariant
2.  Distinctive
CVPR 2003 Tutorial

Recogni5on and Matching Based
on Local Invariant Features
David Lowe
Computer Science Department
University of BriHsh Columbia

Invariant Local Features
•  Image content is transformed into local feature
coordinates that are invariant to translaHon,
rotaHon, scale, and other imaging parameters
22
Lecture 7 - ! 2-‐Oct-‐14
Advantages of invariant local features
•  Locality: features are local, so robust to
occlusion and cluoer (no prior segmentaHon)
•  Dis5nc5veness: individual features can be
matched to a large database of objects
•  Quan5ty: many features can be generated for
even small objects
•  Efficiency: close to real-‐Hme performance
•  Extensibility: can easily be extended to wide
range of differing feature types, with each
adding robustness

Scale invariance
Requires a method to repeatably select points in loca5on
and scale:
•  The only reasonable scale-‐space kernel is a Gaussian
(Koenderink, 1984; Lindeberg, 1994)
•  An efficient choice is to detect peaks in the difference of
Gaussian pyramid (Burt & Adelson, 1983; Crowley &
Resample
Blur
Subtract
Parker, 1984 – but examining more scales)

•  Difference-‐of-‐Gaussian with constant raHo of scales is a
close approximaHon to Lindeberg’s scale-‐normalized
Laplacian (can be shown from the heat diffusion
equaHon)
Lecture 7 - ! 2-‐Oct-‐14
Becoming rotaHon invariant
•  We are given a keypoint and its scale
from DoG
•  We will select a characterisHc
orientaHon for the keypoint (based on
the most prominent gradient there;
discussed next slide)
•  We will describe all features rela5ve
to this orientaHon
•  Causes features to be rotaHon
invariant!
–  If the keypoint appears rotated in
another image, the features will be the
same, because they’re rela5ve to the
characterisHc orientaHon
0 2π

Becoming rotaHon invariant
•  Choosing characterisHc orientaHon:
•  Use the blurred image associated with
the keypoint’s scale. Look at pixels in a
square around it (say, size 16x16)
•  Compute gradient direcHon at each
pixel (this is easy, using verHcal and
horizontal edge filters)
•  Create a histogram of these local
gradient direcHons
•  Keypoint orientaHon = the peak of that
histogram
•  Minor details: we’ll also weight each
pixel’s histogram contribuHon by the
magnitude of its gradient and how close
it is to the keypoint
•  Now, each keypoint has stable 2D
coordinates (x, y, scale, orientaAon).
Now we must give it a “fingerprint.”
0 2π

Example of keypoint detecHon
Threshold on value at DOG peak and on raHo of principle curvatures
(Harris approach)
(a) 233x189 image
(b) 832 DOG extrema
(c) 729 leu auer peak
value threshold
(d) 536 leu auer tesHng
raHo of principle
curvatures

Repeatability vs number of scales sampled per octave
David G. Lowe, "DisHncHve image features from scale-‐invariant keypoints," InternaHonal Journal of
Computer Vision, 60, 2 (2004), pp. 91-‐110

SIFT descriptor formaHon
•  Use the blurred image associated with the keypoint’s scale

•  Take image gradients over a 16x16 array of locaHons.
•  To become rotaHon invariant, rotate the gradient direcHons AND locaHons
by (-‐keypoint orientaHon)
–  Now we’ve cancelled out rotaHon and have gradients expressed at locaHons rela5ve
to keypoint orientaHon θ
–  We could also have just rotated the whole image by -‐θ, but that would be slower.

•  Using precise gradient locaHons is fragile. We’d like to allow some “slop” in
the image, and sHll produce a very similar descriptor
•  Create array of orientaHon histograms (a 4x4 array is shown)
•  Put the rotated gradients into their local orientaHon histograms
–  A gradients’s contribuHon is divided among the nearby histograms based on distance. If
it’s halfway between two histogram locaHons, it gives a half contribuHon to both.
–  Also, scale down gradient contribuHons for gradients far from the center
•  The SIFT authors found that best results were with 8 orientaHon bins per
histogram, and a 4x4 histogram array.

•  8 orientaHon bins per histogram, and a 4x4 histogram array,

yields 8 x 4x4 = 128 numbers.
•  So a SIFT descriptor is a length 128 vector, which is invariant to
rotaHon (because we rotated the descriptor) and scale
(because we worked with the scaled image from DoG)
•  We can compare each vector from image A to each vector
from image B to find matching keypoints!
–  Euclidean “distance” between descriptor vectors gives a good measure
of keypoint similarity

•  Adding robustness to illuminaHon changes:

•  Remember that the descriptor is made of gradients (differences
between pixels), so it’s already invariant to changes in brightness
(e.g. adding 10 to all image pixels yields the exact same descriptor)
•  A higher-‐contrast photo will increase the magnitude of gradients
linearly. So, to correct for contrast changes, normalize the vector
(scale to length 1.0)
•  Very large image gradients are usually from unreliable 3D
illuminaHon effects (glare, etc). So, to reduce their effect, clamp all
values in the vector to be ≤ 0.2 (an experimentally tuned value).
Then normalize the vector again.
•  Result is a vector which is fairly invariant to illuminaHon changes.

SensiHvity to number of histogram orientaHons
David G. Lowe, "DisHncHve image features from scale-‐invariant keypoints," InternaHonal Journal of
Computer Vision, 60, 2 (2004), pp. 91-‐110

Feature stability to noise
•  Match features auer random change in image scale &
orientaHon, with differing levels of image noise
•  Find nearest neighbor in database of 30,000 features

Feature stability to affine change
•  Match features auer random change in image scale &
orientaHon, with 2% image noise, and affine distorHon
•  Find nearest neighbor in database of 30,000 features

DisHncHveness of features
•  Vary size of database of features, with 30 degree affine change,
2% image noise
•  Measure % correct for single nearest neighbor match

RaHo of distances reliable for matching

Nice SIFT resources
•  VLFeat toolbox:
–  hop://www.vlfeat.org/overview/siu.html
•  an online tutorial:
hop://www.aishack.in/2010/05/siu-‐scale-‐
invariant-‐feature-‐transform/
•  Wikipedia:
hop://en.wikipedia.org/wiki/Scale-‐
invariant_feature_transform

ApplicaHons of local invariant
features
•  Wide baseline stereo
•  MoHon tracking
•  Panoramas
•  Mobile robot navigaHon
•  3D reconstrucHon
•  RecogniHon
•  …
Fei-Fei Li! Lecture 7 - !

AutomaHc mosaicing
hop://www.cs.ubc.ca/~mbrown/autosHtch/autosHtch.html
Wide baseline stereo
[Image from T. Tuytelaars ECCV 2006 tutorial]

RecogniHon of specific objects, scenes
Schmid and Mohr 1997 Sivic and Zisserman, 2003
Rothganger et al. 2003 Lowe 2002

What we have learned this week?
–  MoHvaHon
–  Requirements, invariances Previous lecture (#6)
today (#7)
Some background reading: R. Szeliski, Ch 14.1.1; David Lowe, IJCV 2004

Lecture7 DoG SIFT Cs131

Uploaded by

Copyright:

Available Formats

Lecture7 DoG SIFT Cs131

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture7 DoG SIFT Cs131

Uploaded by

Copyright:

Available Formats

Lecture

Professor Fei-­‐Fei Li

Fei-Fei Li! Lecture 7 - !1 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !2 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !3 2-­‐Oct-­‐14

Slide credit: Alyosha Efros

Fei-Fei Li! Lecture 7 - !4 2-­‐Oct-­‐14

Slide adapted from Darya Frolova, Denis Simakov

Slide credit: Kristen Grauman

Fei-Fei Li! Lecture 7 - !7 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !8 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !9 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !10 2-­‐Oct-­‐14

region size region size

• ObservaHon: region size, for which the maximum is

Important: this scale invariant region size is

s1 region size s2 region size

• For usual images: a good funcHon would be a one

Fei-Fei Li! Lecture 7 - !13 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !14 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !15 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !16 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !17 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !18 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !19 2-­‐Oct-­‐14

Slide credit: Kristen Grauman

Fei-Fei Li! Lecture 7 - !21 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !23 2-­‐Oct-­‐14

Parker, 1984 – but examining more scales)

Fei-Fei Li! Lecture 7 - !25 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !26 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !27 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !28 2-­‐Oct-­‐14

• Use the blurred image associated with the keypoint’s scale

Fei-Fei Li! Lecture 7 - !29 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !30 2-­‐Oct-­‐14

• 8 orientaHon bins per histogram, and a 4x4 histogram array,

Fei-Fei Li! Lecture 7 - !31 2-­‐Oct-­‐14

• Adding robustness to illuminaHon changes:

Fei-Fei Li! Lecture 7 - !32 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !33 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !34 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !35 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !36 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !37 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !40 2-­‐Oct-­‐14

Fei-Fei Li! Lecture 7 - !

[Image from T. Tuytelaars ECCV 2006 tutorial]

Fei-Fei Li! Lecture 7 - !

Schmid and Mohr 1997 Sivic and Zisserman, 2003

Rothganger et al. 2003 Lowe 2002

Fei-Fei Li! Lecture 7 - !

Fei-Fei Li! Lecture 7 - !45 2-­‐Oct-­‐14

You might also like

Professor Fei-‐Fei Li

Fei-Fei Li! Lecture 7 - !1 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !2 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !3 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !4 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !7 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !8 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !9 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !10 2-‐Oct-‐14

•  ObservaHon: region size, for which the maximum is

•  For usual images: a good funcHon would be a one

Fei-Fei Li! Lecture 7 - !13 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !14 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !15 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !16 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !17 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !18 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !19 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !21 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !23 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !25 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !26 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !27 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !28 2-‐Oct-‐14

•  Use the blurred image associated with the keypoint’s scale

Fei-Fei Li! Lecture 7 - !29 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !30 2-‐Oct-‐14

•  8 orientaHon bins per histogram, and a 4x4 histogram array,

Fei-Fei Li! Lecture 7 - !31 2-‐Oct-‐14

•  Adding robustness to illuminaHon changes:

Fei-Fei Li! Lecture 7 - !32 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !33 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !34 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !35 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !36 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !37 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !40 2-‐Oct-‐14

Fei-Fei Li! Lecture 7 - !45 2-‐Oct-‐14