Town - ComputerVision - L5
Town - ComputerVision - L5
Town - ComputerVision - L5
σ = 1 pixel σ = 3 pixels
1
Scale Invariant Detection Scale Invariant Detection
• Consider regions (e.g. circles) of different sizes • The problem: how do we choose corresponding
around a point circles independently in each image?
• Regions of corresponding sizes will look the same
in both images
Feature Descriptors
Dr Chris Town Dr Chris Town
2
Feature detection Scale invariant detection
Suppose you’re looking for corners
Local measure of feature uniqueness
– How does the window change when you shift it?
Key idea: find scale that gives local maximum response in both
position and scale: use a Laplacian approximated by difference
between two Gaussian filtered images with different sigmas)
“flat” region: “edge”: “corner”:
no change in all no change along significant change
directions the edge direction in all directions
blur
G1 (G0 * gaussian) 2
G0 Image
blur
High resolution
Source: Irani & Basri Dr Chris Town Source: Irani & Basri Dr Chris Town
3
DoG approximation to LoG
• We can efficiently approximate the (scale-normalised)
Laplacian of a Gaussian with a difference of Gaussians:
B. Leibe
Dr Chris Town Dr Chris Town
Scale-Space Pyramid
• Multiple scales must be examined to identify scale-invariant
features
• An efficient function is to compute the Difference of Gaussian
(DOG) pyramid (Burt & Adelson, 1983)
Resample
Blur
Subtract
Gaussian pyramid
( I F3G3 ) x3
( I F2G2 ) x2
Notice that each layer shows detail
at a particular scale --- these are,
basically, bandpass filtered versions
of the image.
F1G1 x1
Laplacian pyramid
4
Showing, at full resolution, the information captured at each
SIFT – Scale Invariant Feature Transform
level of a Gaussian (top) and Laplacian (bottom) pyramid.
Therefore
Dr Chris Town Dr Chris Town
5
Dr Chris Town Dr Chris Town
(threshold)
• Eliminate edge
responses Candidate keypoints:
list of (x,y,σ)
Slide credit: David Lowe Dr Chris Town Slide credit: David Lowe Dr Chris Town
David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp.
91-110, 2004.
Slide credit: Svetlana Lazebnik Dr Chris Town Slide credit: Svetlana Lazebnik, Matthew Brown Dr Chris Town
6
Orientation Normalisation: Computation
Feature stability to noise
• Match features after random change in image scale &
[Lowe, SIFT, 1999] orientation, with differing levels of image noise
• Compute orientation histogram • Find nearest neighbor in database of 30,000 features
0 2p
37
Slide adapted from David Lowe Dr Chris Town Dr Chris Town
Slide credit: Steve Seitz Dr Chris Town D. Lowe, 2004 Dr Chris Town
7
Feature matching Image stitching
Slides from Steve Seitz and Rick Szeliski Dr Chris Town Brown, Lowe, 2007 Dr Chris Town
8
Recognition with Local Features Fourier transform
• Image content is transformed into local features
that are invariant to translation, rotation, and
scale
• Goal: Verify if they belong to a consistent = *
configuration
= * = *
Gaussian pixel image Laplacian pixel image
pyramid pyramid
Edge Fitting
• Edge Detection:
– The process of labeling the locations in the image where the gray
level’s “rate of change” is high.
• OUTPUT: “edgels” locations,
direction, strength
9
Framework for snakes Modeling
• The contour is defined in the (x, y) plane of an image as a
• A higher level process or a user initialises any curve parametric curve
close to the object boundary. v(s)=(x(s), y(s))
• The snake then starts deforming and moving towards
the desired object boundary. • Contour is said to possess an energy (Esnake) which is defined as
the sum of the three energy terms.
• In the end it completely “shrink-wraps” around the
object. courtesy
Esnake Eint ernal Eexternal Econstra int
• The energy terms are defined in a way such that the final position of
the contour will have minimum energy (Emin)
• The snake is also considered to behave like a thin metal • Generated by the bending energy of the contour.
strip giving rise to bending energy. • Characteristics (refer diagram):
10
External energy of the contour (Eext)
• Image fitting
Eext Eimage (v ( s )) ds
s
For example
leafmv.mpg
dancemv.mpg
11
Generating Functions
Since the wavelets are dilates, translates, and rotates of each other, such a transform
seeks to extract image structure in a way that may be invariant to dilation, translation,
and rotation of the original image or pattern.
Gabor wavelets
x 2 y 2
2
c (x, y) e 2 cos2u0 x
x2 y2
2
s (x, y) e 2
sin2u0 x
12
Dr Chris Town Dr Chris Town
Multiple
orientations at
Wavelet
= = one scale
*
pyramid *
Steerable pixel image
Ortho-normal pixel image pyramid
Multiple
transform (like orientations at
Fourier transform), the next scale
Over-complete
but with localized
representation,
basis functions.
the next scale… but non-aliased
subbands.
Dr Chris Town Dr Chris Town
13