06 Local Features

C280 Computer Vision
C280,
Prof. Trevor Darrell

[email protected]
L t
Lecture 6:
6 Local
L l Features
F t
Last Time: Image Pyramids
• Review of Fourier Transform
• Sampling and Aliasing
• Image Pyramids
id
• Applications: Blending and noise removal
Today:
y Feature Detection and
Matching
• Local features
• Pyramids for invariant feature detection
• Invariant descriptors
• M t hi
Matching
Image matching
by Diva Sian
S
by swashford
Harder case
by Diva Sian
S by scgbt
Harder still?
NASA Mars Rover images

Answer below (look for tiny colored squares…)
squares )
NASA Mars Rover images

with SIFT feature matches
Figure by Noah Snavely
Local features and alignment
• We need to match (align) images

• Global methods sensitive to occlusion, lighting, parallax
effects. So look for local features that match well.
• How would you do it by eye?
[Darya Frolova and Denis Simakov]

• Detect feature points in both images

• Find corresponding pairs

• Find corresponding pairs
• Use these p
pairs to align
g images
g

• Problem 1:
– Detect the same point independently in both
images
no chance to match!
We need a repeatable detector

• Problem 2:
– For each point correctly recognize the
corresponding one
We need a reliable and distinctive descriptor

Geometric transformations
Photometric transformations
Figure from T. Tuytelaars ECCV 2006 tutorial

And other nuisances
nuisances…
• Noise
• Blur
• C
Compression
i artifacts
tif t
• …
Invariant local features
Subset of local feature types designed to be invariant to
common geometric and photometric transformations.
Basic steps:
1) Detect distinctive interest points
2) Extract invariant descriptors
Figure: David Lowe

Main questions
• Where will the interest points come from?
– What are salient features that we’ll detect in
multiple views?
• How to describe a local region?
• How
H tto establish
t bli h correspondences,
d i
i.e.,
compute matches?
Finding Corners
Key property: in the region around a corner,

i
image gradient
di h
has two or more d
dominant
i
directions
C
Corners are repeatable
t bl andd distinctive
di ti ti
C.Harris
C H i and dMM.Stephens.
St h "A Combined
C bi d C Corner andd Ed
Edge D
Detector.“
t t “
Proceedings of the 4th Alvey Vision Conference: pages 147--151.
Source: Lana Lazebnik

Corners as distinctive interest points
We should easily recognize the point by
looking through a small window
Shifti a window
Shifting i d iin any direction
di ti should
h ld give
i
a large change in intensity
“flat” region: “edge”: “corner”:

no change in no change significant
all directions along the edge change in all
Source: A. Efros
direction directions
Harris Detector formulation
Change of intensity for the shift [u,v]:
E (u, v)   w( x, y )  I ( x  u , y  v)  I ( x, y ) 
2
x, y
Window Shifted Intensity

function intensity
Window function w(x,y) = or
1 iin window,
i d 0 outside
id G
Gaussian
i
Source: R. Szeliski
This measure of change can be approximated by:
u 
E (u , v)  [u v] M  
v 
where M is a 22 matrix computed from image derivatives:
 I x2 IxI y 
M   w( x, y )  2 
Gradient with
 I x I y I y  respect to x,
x, y times gradient
with respect to y
Sum over image region – area
we are checking for corner
M
where M is a 22 matrix computed from image derivatives:

 I x2 IxI y 
M   w( x, y )  2 
Gradient with
 I x I y I y  respect to x,
x, y times gradient
with respect to y
Sum over image region – area
we are checking for corner
M
What does this matrix reveal?
First, consider an axis-aligned corner:

What does this matrix reveal?
First, consider an axis-aligned corner:
  I x2 I I x y
 1 0 
M   
 I x I y I   0 2 
2
y
This means dominant gradient directions align with

x or y axis
If either λ is close to 0
0, then this is not a corner
corner, so
look for locations where both are large.
What if we have a corner that is not aligned with the
image axes?
Slide credit: David Jacobs
General Case
1 0  1
Since M is symmetric, we have M  R   R
 0 2 
We can visualize M as an ellipse with axis
lengths determined by the eigenvalues and
orientation determined by R
direction of the
fastest change
direction of the
slowest change
(max)-1/2
(min)-1/2
Slide adapted form Darya Frolova, Denis Simakov.

Interpreting the eigenvalues
Classification of image points using eigenvalues
of M:
2 “Edge”
2 >> 1 “Corner”
1 andd 2 are large,
l
1 ~ 2;
E increases in all
directions
1 and 2 are small;

E is almost constant “Flat” “Edge”
in all directions region 1 >> 2
1
Corner response function
R  det( M )   trace( M ) 2  12   (1  2 ) 2
α: constant
t t (0.04
(0 04 tto 0.06)
0 06)
“Edge”
R<0 “Corner”
R>0
|R| small
ll
“Flat” “Edge”
region R<0
Harris Corner Detector
• Algorithm steps:
– Compute M matrix within all image windows to get
their R scores
– Find points with large corner response
(R > threshold)
– Take the ppoints of local maxima of R
Harris Detector: Workflow
Slide adapted form Darya Frolova, Denis Simakov, Weizmann Institute.

Compute corner response R
Find points with large corner response: R>threshold
Take only the points of local maxima of R
Harris Detector: Properties
• Rotation invariance
Ellipse rotates but its shape (i.e.

eigenvalues)
i l ) remains
i th the same
Corner response R is invariant to image rotation

Harris Detector: Properties
• Not invariant to image

g scale
All points will be Corner !

classified as edges
• How can we detect scale invariant
interest points?
How to cope with transformations?
• Exhaustive search
• Invariance
• Robustness
R b t
Exhaustive search
• Multi
Multi-scale
scale approach
Slide from T. Tuytelaars ECCV 2006 tutorial

Exhaustive search
• Multi
Multi-scale
scale approach
Exhaustive search
• Multi
Multi-scale
scale approach
Exhaustive search
• Multi
Multi-scale
scale approach
Invariance
• Extract patch from each image individually
Automatic scale selection
• Solution:
– Design a function on the region
region, which is “scale
scale
invariant” (the same for corresponding regions,
even if they are at different scales)
Example:
E l average intensity.
i t it For
F corresponding di
regions (even of different sizes) it will be the same.
– For a point in one image, we can consider it as
a function of region size (patch width)
f Image 1 f Image 2
scale = 1/2
region size region size

Automatic scale selection
• Common approach:
Take a local maximum of this function
Observation: region size, for which the maximum is

achieved should be invariant to image scale.
achieved, scale
Important: this scale invariant region size is

f
found
d in
i each h iimage independently!
i d d l !
f Image 1 f Image 2
scale = 1/2
s1 region size s2 region size

Automatic Scale Selection
ensory Aug
andReScognition Tutorial Computing
gmented
T
f ( I i1im ( x,  ))  f ( I i1im ( x,  ))
Same operator responses if the patch contains the same image up

Object
ptual
to scale factor.
Percep
Visual
48
K. Grauman, B. Leibe
• Function responses for increasing scale (scale signature)
ptual
Percep
Visual ensory Aug
andReScognition
Object Tutorial Computing
gmented
T
f ( I i1im ( x,  )) f ( I i1im ( x,  ))

49
ptual
Percep
Visual ensory Aug
andReScognition
gmented
T
f ( I i1im ( x,  )) f ( I i1im ( x,  ))

50
ptual
Percep
Visual ensory Aug
andReScognition
gmented
T
f ( I i1im ( x,  )) f ( I i1im ( x,  ))

51
ptual
Percep
Visual ensory Aug
andReScognition
gmented
T
f ( I i1im ( x,  )) f ( I i1im ( x,  ))

52
ptual
Percep
Visual ensory Aug
andReScognition
gmented
T
f ( I i1im ( x,  )) f ( I i1im ( x,  ))

53
ptual
Percep
Visual ensory Aug
andReScognition
gmented
T
f ( I i1im ( x,  )) f ( I i1im ( x,  ))

54
Scale selection
• Use the scale determined by detector to compute
descriptor in a normalized frame
[Images from T. Tuytelaars]

What Is A Useful Signature Function?
• Laplacian-of-Gaussian = “blob” detector
ptual
Percep
Visual ensory Aug
andReScognition
gmented
T
56
Characteristic scale
We define the characteristic scale as the scale
that produces peak of Laplacian response
characteristic scale
T. Lindeberg (1998). "Feature detection with automatic scale selection."
International Journal of Computer Vision 30 (2): pp 77--116. Source: Lana Lazebnik
Laplacian-of-Gaussian (LoG)
• Interest points:

Local maxima in scale
space of Laplacian-of-
Tutorial Computing
Gaussian 
gmented
Lxx ( )  Lyy ( ) 
ensory Aug
andReScognition T

 List of
Object
 (x, y, σ)
ptual
Percep
Visual
Scale-space blob detector: Example



Key point localization with DoG
• Detect maxima of
difference of Gaussian
difference-of-Gaussian
(DoG) in scale space
• Then reject
j p
points with low
contrast (threshold)
Re sampl e
Blur
Su btrac
• Eliminate edge responses
Candidate keypoints:
list of (x
(x,y,σ)
y σ)
Example of keypoint detection
(a) 233x189 image

(b) 832 DOG extrema
t
(c) 729 left after peak
value threshold
(d) 536 left after testing
ratio
ti off principle
i i l
curvatures (removing
edge responses)
Scale Invariant Detection: Summary
• Given: two images of the same scene with a

l
large scale difference between
l diff b t th
them
• Goal: find the same interest points
independently in each image
• Solution: search for maxima of suitable
functions in scale and in space
p ((over the
image)
Main questions
multiple views?
• How
H tto establish
d i
i.e.,
compute matches?
Local descriptors
• We know how to detect points
• Next question:
How to describe them for matching?
?
Point descriptor should be:
1. Invariant
2. Distinctive
Local descriptors
• Simplest descriptor: list of intensities within
a patch.
t h
• What is this going to be invariant to?
Feature descriptors
Disadvantage of patches as descriptors:
• Small shifts can affect matching score a lot
Solution: histograms
0 2
Feature descriptors: SIFT
Scale Invariant Feature Transform
Descriptor computation:
• Divide patch into 4x4 sub
sub-patches:
patches: 16 cells
• Compute histogram of gradient orientations (8 reference
angles) for all pixels inside each sub-patch
• Resulting
R lti ddescriptor:
i t 4 4x4x8
4 8 = 128 di
dimensions
i
David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60

(2), pp. 91-110, 2004.
Rotation Invariant Descriptors
• Harris corner response

p measure:
depends only on the eigenvalues of the
matrix M
• Find local orientation

Dominant direction of gradient for the image patch
• Rotate patch according to this angle

This puts the patches into a canonical
orientation.
CSE 576: Computer Vision
Image from Matthew Brown

Feature descriptors: SIFT
Extraordinarily robust matching technique
• Can handle changes in viewpoint
– Up to about 60 degree out of plane rotation
• Can handle significant changes in illumination
– Sometimes even day vs. night (below)
• Fast and efficient—can run in real time
• Lots of code available
– http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT
Steve Seitz
Working with SIFT descriptors
• One image yields:
– n 128-dimensional descriptors: each
one is a histogram of the gradient
orientations within a patch
• [n
[ x 128 matrix]
i ]
– n scale parameters specifying the size
of each patch
• [n
[ x 1 vector]
t ]
– n orientation parameters specifying the
angle of the patch
• [n
[ x 1 vector]
t ]
– n 2d points giving positions of the
patches
• [n x 2 matrix]
Affine Invariant Detection
( proxy for
(a f invariance
i i to
t perspective
ti transformations)
t f ti )
• Above we considered:
Similarity transform (rotation + uniform scale)
• Now we go on to:
Affine transform (rotation + non-uniform scale)
Mikolajczyk: Harris Laplace
1. Initialization:
Multiscale Harris
corner detection 


Computing Harris function Detecting local maxima

Mikolajczyk: Harris Laplace
1. Initialization: Multiscale Harris corner
detection
2. Scale selection based on Laplacian
Harris points
Harris-Laplace points
Mikolajczyk: Harris Affine
► Based on Harris Laplace
► Using normalization / deskewing
rotate rescale
Mikolajczyk: Harris Affine
1. Detect multi-
multi-scale Harris points
p
2. Automaticallly select the scales
Automatica
3. Adapt affine shape based on second order moment matrix
4. Refine point location
Mikolajczyk: affine invariant
i t
interest
t points
i t
1. Initialization: Multiscale Harris corner
detection
2. It ti algorithm
Iterative l ith
1. Normalize window (deskewing)

2. Select integration
g scale (max.
( of LoG))
3. Select differentiation scale (max. min / max)
4. Detect spatial localization (Harris)
5
5. Compute new affine transformation (  )
6. Go to step 2. (unless stop criterion)
Harris Affine
Affine Invariant Detection :
Summary
• Under affine transformation, we do not know in advance
shapes of the corresponding regions
• Ellipse given by geometric covariance matrix of a region
robustly approximates this region
• For corresponding regions ellipses also correspond
Other Methods:
1. Search for extremum along rays [Tuytelaars, Van Gool]:
2. Maximally Stable Extremal Regions [Matas et.al.]
Feature detector and descriptor summary
• Stable (repeatable) feature points can be
detected regardless of image changes
– Scale: search for correct scale as maximum of appropriate function
– Affine: approximate
pp regions
g with ellipses
p ((this operation
p is affine
invariant)
• Invariant and distinctive descriptors can be

computed
– Invariant moments
– Normalizing
N li i with respect to scale and affine transformation
More on feature detection/description
Main questions
multiple views?
• How
H tto establish
d i
i.e.,
compute matches?
Feature descriptors
We know how to detect and describe good points
Next question: How to match them?
?
Feature matching
Given a feature in I1, how to find the best match in I2?
1. Define distance function that compares two descriptors
2 Test all the features in I2, find the one with min distance
2.
Feature distance
How to define the difference between two features f1, f2?
• Simple approach is SSD(f1, f2)
– sum of square differences between entries of the two descriptors
– can give good scores to very ambiguous (bad) matches
f1 f2
I1 I2
Feature distance
How to define the difference between two features f1, f2?
• Better approach: ratio distance = SSD(f1, f2) / SSD(f1, f2’)
– f2 is best SSD match to f1 in I2
– f2’ is 2nd best SSD match to f1 in I2
– gives small values for ambiguous matches
f1 f2' f2
I1 I2
Evaluating the results
How can we measure the performance of a feature matcher?
50
75
200
feature distance
True/false positives
50
true match
75
200
false match
feature distance
The distance threshold affects performance

p
• True positives = # of detected matches that are correct
– Suppose we want to maximize these—how to choose threshold?
• False positives = # of detected matches that are incorrect
– Suppose we want to minimize these—how to choose threshold?
0.7
# ttrue positives
iti t
true
# matching features (positives) positive
rate
0 0.1 false positive rate 1

# false positives
# unmatched features (negatives)
ROC curve (“Receiver Operator Characteristic”)
1
0.7
# ttrue positives
iti t
true
# matching features (positives) positive
rate
0 0.1 false positive rate 1

# false positives
# unmatched features (negatives)
ROC Curves
• Generated by counting # current/incorrect matches, for different threholds
• Want to maximize area under the curve (AUC)
• Useful for comparing different feature matching methods
• For more info: http://en.wikipedia.org/wiki/Receiver_operating_characteristic
Advanced local features topics
• Self‐Similarity
Self Similarity
• Space‐Time
Human actions
in computer vision
Ivan Laptev
INRIA Rennes,, France
[email protected]
Summer school, June 30 - July 11, 2008, Lotus Hill, China

Motivation
Goal:
Interpretation
of dynamic
scenes
… non-rigid object motion … camera motion … complex background motion
Common methods: Common problems:

• Camera stabilization • Complex BG motion
• Segmentation ? • Changes in appearance
• Tracking
?
 No global assumptions about the scene
Space-time
No global assumptions 
Consider local spatio-temporal neighborhoods
Space-time
No global assumptions 
Consider local spatio-temporal neighborhoods
Space-Time interest points
What neighborhoods to consider?
High image Look at the

Distinctive
Di i i  variation in  distribution of
neighborhoods the gradient
space and time
D fi iti
Definitions:
Original image sequence
Space-time Gaussian with covariance
Gaussian derivative of
Space-time gradient
Second-moment matrix
Properties of :
defines second order approximation for the local

distribution of within neighborhood
 1D space-time
p variation of , e.g.
g moving
g bar
 2D space-time variation of , e.g. moving ball
 3D space-time variation of , e.g. jumping ball
Large eigenvalues of  can be detected by the

local maxima of H over (x,y,t):
( ,y, )
(similar to Harris operator [Harris and Stephens, 1988])

appearance/
accelerations split/merge
disappearance
Motion event detection
Motion event detection: complex background
Spatio-temporal scale selection
Stability to size
changes, e.g.
camera zoom
Spatio-temporal scale selection
Selection of
temporal scales
captures the
frequency of
events
Sequence alignment
 Given a data sequence with the current moment t0,
detect and classify interest points in the time window of
l
length
th tw: (t0, t0-ttw)
 Transform model features according to X and for each

model
d l ffeature
t ( m,i, ym,i, tm,i, m,i, m,i, cm,i) compute
fm,i=(x t
its distance di to the most close data feature fd,j, cd,j=cm,i:
 Define the ”fit

fit function”
function D of model configuration X as a
sum of distances of all features weighted w.r.t. their ”age”
(t0-tm) such that recent features get more influence on the
matching
Sequence alignment
At each moment t0 minimize D with respect to X using

standard
t d dG Gauss-Newton
N t minimization
i i i ti method
th d
data features
model features
Experiments
Experiments
Today:
y Feature Detection and
Matching
• Local features
• Pyramids for invariant feature detection
• Local descriptors
• M t hi
Matching
Lots of applications
Features are used for:
• Image alignment (e.g., mosaics)
• 3D reconstruction
• Motion tracking
• Object recognition
• Indexing and database retrieval
• Robot navigation
• … other
Object recognition (David Lowe)
Snaptell
http://snaptell.com/demos/DemoLarge.htm
Photo Tourism
Slide Credits
• Bill Freeman
• Kristen Grauman
• S
Steve Si
Sietz
• Ivan Laptev
• Tinne Tuytelaars
N t ti
Next time: TTexture
t

06 Local Features

Uploaded by

Copyright:

Available Formats

06 Local Features

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

06 Local Features

Uploaded by

Copyright:

Available Formats

C280 Computer Vision

Prof. Trevor Darrell

NASA Mars Rover images

NASA Mars Rover images

• We need to match (align) images

[Darya Frolova and Denis Simakov]

[Darya Frolova and Denis Simakov]

[Darya Frolova and Denis Simakov]

[Darya Frolova and Denis Simakov]

We need a repeatable detector

[Darya Frolova and Denis Simakov]

We need a reliable and distinctive descriptor

[Darya Frolova and Denis Simakov]

Figure from T. Tuytelaars ECCV 2006 tutorial

Figure: David Lowe

Key property: in the region around a corner,

Source: Lana Lazebnik

“flat” region: “edge”: “corner”:

Change of intensity for the shift [u,v]:

Window Shifted Intensity

Window function w(x,y) = or

where M is a 22 matrix computed from image derivatives:

First, consider an axis-aligned corner:

First, consider an axis-aligned corner:

This means dominant gradient directions align with

Slide adapted form Darya Frolova, Denis Simakov.

1 and 2 are small;

Slide adapted form Darya Frolova, Denis Simakov, Weizmann Institute.

Ellipse rotates but its shape (i.e.

Corner response R is invariant to image rotation

• Not invariant to image

All points will be Corner !

Slide from T. Tuytelaars ECCV 2006 tutorial

region size region size

Observation: region size, for which the maximum is

Important: this scale invariant region size is

s1 region size s2 region size

f ( I i1im ( x,  ))  f ( I i1im ( x,  ))

Same operator responses if the patch contains the same image up

f ( I i1im ( x,  )) f ( I i1im ( x,  ))

f ( I i1im ( x,  )) f ( I i1im ( x,  ))

f ( I i1im ( x,  )) f ( I i1im ( x,  ))

f ( I i1im ( x,  )) f ( I i1im ( x,  ))

f ( I i1im ( x,  )) f ( I i1im ( x,  ))

f ( I i1im ( x,  )) f ( I i1im ( x,  ))

[Images from T. Tuytelaars]

Source: Lana Lazebnik

Source: Lana Lazebnik

Source: Lana Lazebnik

• Eliminate edge responses

(a) 233x189 image

• Given: two images of the same scene with a

David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60

• Harris corner response

• Find local orientation

• Rotate patch according to this angle

CSE 576: Computer Vision

Image from Matthew Brown

Computing Harris function Detecting local maxima

• Invariant and distinctive descriptors can be