Town - ComputerVision - L5

Recap: Smoothing with a Gaussian
Computer Vision Recall: parameter σ is the “scale” / “width” / “spread” of the

Gaussian kernel, and controls the amount of smoothing.
Computer Science Tripos Part II

…
Dr Christopher Town
Dr Chris Town Dr Chris Town
Recap: Effect of σ on derivatives
σ = 1 pixel σ = 3 pixels
The apparent structures differ depending on Gaussian’s

scale parameter.
Larger values: larger scale edges detected

Smaller values: finer features detected
1
Scale Invariant Detection Scale Invariant Detection
• Consider regions (e.g. circles) of different sizes • The problem: how do we choose corresponding
around a point circles independently in each image?
• Regions of corresponding sizes will look the same
in both images
Scale Invariant Detection Scale Invariant Detection:

• Solution:
Summary
– Design a function on the region (circle), which is “scale • Given: two images of the same scene with a large
invariant” (the same for corresponding regions, even if scale difference between them
they are at different scales)
• Goal: find the same interest points independently
f Image 1 f Image 2
in each image
scale = 1/2 • Solution: search for extrema of suitable functions
in scale and in space (over the image)
Methods:
region size region size
1. Harris-Laplacian [Mikolajczyk, Schmid]: maximise Laplacian over scale,
Harris measure of corner response over the image
2. SIFT [Lowe]: maximise Difference of Gaussians over scale and space
Image Matching Invariant local features

-Algorithm for finding points and representing their patches should produce
similar results even when conditions vary
-Buzzword is “invariance”
– geometric invariance: translation, rotation, scale
– photometric invariance: brightness, exposure, …
Feature Descriptors
2
Feature detection Scale invariant detection
Suppose you’re looking for corners
Local measure of feature uniqueness
– How does the window change when you shift it?
Key idea: find scale that gives local maximum response in both
position and scale: use a Laplacian approximated by difference
between two Gaussian filtered images with different sigmas)
“flat” region: “edge”: “corner”:
no change in all no change along significant change
directions the edge direction in all directions
Slide adapted from Darya Frolova, Denis Simakov, Weizmann Institute.

Gaussian Pyramid The Gaussian Pyramid

Low resolution G4  (G3 * gaussian)  2
All the extra levels
add very little
blur )  2
G3  (G2 * gaussian
overhead for memory
blur
or computation! G2  (G1 * gaussian)  2
blur
G1  (G0 * gaussian)  2
G0  Image
blur
High resolution
Source: Irani & Basri Dr Chris Town Source: Irani & Basri Dr Chris Town
The Laplacian Pyramid Laplacian ~ Difference of Gaussian

Li  Gi  expand(Gi 1 )
Gaussian Pyramid Laplacian Pyramid
- =
Gn Ln  Gn
G2
- = L2
G1 DoG = Difference of Gaussians
L1 Cheap approximation – no derivatives needed.
- =
G0
L0
- =
- =
Why is this useful?
17 B. Leibe
DrSource:
ChrisIrani
Town
& Basri Dr Chris Town
3
DoG approximation to LoG
• We can efficiently approximate the (scale-normalised)
Laplacian of a Gaussian with a difference of Gaussians:
B. Leibe
Scale-Space Pyramid
• Multiple scales must be examined to identify scale-invariant
features
• An efficient function is to compute the Difference of Gaussian
(DOG) pyramid (Burt & Adelson, 1983)
Resample
Blur
Subtract
Gaussian pyramid
Laplacian pyramid algorithm

x1 G1 x1  x2 x2 x3
( I  F3G3 ) x3
( I  F2G2 ) x2
Notice that each layer shows detail
at a particular scale --- these are,
basically, bandpass filtered versions
of the image.
F1G1 x1
Laplacian pyramid
Dr Chris Town ( I  F1G1 ) x1 Dr Chris Town
4
Showing, at full resolution, the information captured at each
SIFT – Scale Invariant Feature Transform
level of a Gaussian (top) and Laplacian (bottom) pyramid.
http://www-bcs.mit.edu/people/adelson/pub_pdfs/pyramid83.pdf Dr Chris Town From: David Lowe (2004) Dr Chris Town
DoG approximates scale-normalised Laplacian of a Gaussian
(heat diffusion equation)
Octave increment in scale of the Gaussian Pyramid
followed by factor-of-two downsampling (for efficiency).

To achieve better performance, each octave i is further
divided into s intervals.
Remember that we defined neighbouring scales as
So starting with some , the next scale parameter will

be , followed by etc., so that after s sub-
levels of the pyramid we have a complete octave with
Therefore
5
Key point localization with DoG Example of Keypoint Detection

• Detect extrema of (a) 233x189 image
difference-of-Gaussian (b) 832 DoG extrema

(c) 729 left after peak
(DoG) in scale space value threshold
(d) 536 left after testing
• Then reject points with ratio of principle
curvatures (removing
low contrast edge responses)
(threshold)
• Eliminate edge
responses Candidate keypoints:
list of (x,y,σ)
Slide credit: David Lowe Dr Chris Town Slide credit: David Lowe Dr Chris Town
Feature Descriptors: SIFT Rotation Invariant Descriptors

• Scale Invariant Feature Transform • Find local orientation
• Descriptor computation: – Dominant direction of gradient
– Divide patch into 4x4 sub-patches: 16 cells for the image patch
– Compute histogram of gradient orientations (8 reference • Rotate patch according to this angle
angles) for all pixels inside each sub-patch – This puts the patches into a canonical orientation.
– Resulting descriptor: 4x4x8 = 128 dimensions
David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp.
91-110, 2004.
Slide credit: Svetlana Lazebnik Dr Chris Town Slide credit: Svetlana Lazebnik, Matthew Brown Dr Chris Town
6
Orientation Normalisation: Computation
Feature stability to noise
• Match features after random change in image scale &
[Lowe, SIFT, 1999] orientation, with differing levels of image noise
• Compute orientation histogram • Find nearest neighbor in database of 30,000 features
• Select dominant orientation

• Normalise: rotate to fixed orientation
0 2p
37
Slide adapted from David Lowe Dr Chris Town Dr Chris Town
Feature stability to affine change Distinctiveness of features

• Match features after random change in image scale & • Vary size of database of features, with 30 degree affine change,
orientation, with 2% image noise, and affine distortion 2% image noise
• Find nearest neighbor in database of 30,000 features • Measure % correct for single nearest neighbor match
Working with SIFT Descriptors SIFT

• One image yields:
– n 128-dimensional descriptors: each
one is a histogram of the gradient
orientations within a patch
• [n x 128 matrix]
– n scale parameters specifying the
size of each patch
• [n x 1 vector]
– n orientation parameters specifying
the angle of the patch
• [n x 1 vector]
– n 2D points giving positions of the
patches
• [n x 2 matrix]
Slide credit: Steve Seitz Dr Chris Town D. Lowe, 2004 Dr Chris Town
7
Feature matching Image stitching
Slides from Steve Seitz and Rick Szeliski Dr Chris Town Brown, Lowe, 2007 Dr Chris Town
Nearest-neighbor matching Approximate k-d tree matching
• Solve following problem for all feature vectors, x:

Key idea:
 Search k-d tree bins in
• Nearest-neighbour matching is the major computational order of distance from

bottleneck query
 Requires use of a priority
– Linear search performs dn2 operations for n features
queue
and d dimensions
– No exact methods are faster than linear search for d>10
– Approximate methods can be much faster, but at the
cost of missing some correct matches. Failure rate gets
worse for large datasets.
K-d tree construction K-d tree query

Simple 2D example
4 l1 6 l9
l5 7 l1
4 l1 6 l9
7 l1 q l6
l5
8 l3 l2 l3
l6 l2 5
l2 8 l3 l2 l3 9
5
l83 l10 10 l
9 7 l4 l5 l7 l6
l83 l10 10
l7 l4 l5 l7 l6 1
l4
2
11
1 2 l8 l10 l9
l4 11 2 5 4 11 8
l8 2 5 4 11 l10 8 l9
1 3 9 10 6 7
1 3 9 10 6 7
Slide credit: Anna Atramentov Slide credit: Anna Atramentov

8
Recognition with Local Features Fourier transform
• Image content is transformed into local features
that are invariant to translation, rotation, and
scale
• Goal: Verify if they belong to a consistent = *
configuration
Fourier Fourier bases pixel domain

transform are global: image
each transform
coefficient
Local Features, depends on all
e.g. SIFT
pixel locations.
49 Dr Chris Town Dr Chris Town
Slide credit: David Lowe
Gaussian pyramid Laplacian pyramid
= * = *
Gaussian pixel image Laplacian pixel image
pyramid pyramid
Overcomplete representation. Overcomplete representation.

Low-pass filters, sampled Transformed pixels represent
appropriately for their blur. Dr Chris Town bandpassed image information. Dr Chris Town
Edge Fitting
• Edge Detection:
– The process of labeling the locations in the image where the gray
level’s “rate of change” is high.
• OUTPUT: “edgels” locations,
direction, strength
• Edge Integration, Contour fitting:

– The process of combining “local” and perhaps sparse and non-
contiguous “edgel”-data into meaningful, long edge curves (or closed
contours) for segmentation
• OUTPUT: edges/curves consistent with the local data
9
Framework for snakes Modeling
• The contour is defined in the (x, y) plane of an image as a
• A higher level process or a user initialises any curve parametric curve
close to the object boundary. v(s)=(x(s), y(s))
• The snake then starts deforming and moving towards
the desired object boundary. • Contour is said to possess an energy (Esnake) which is defined as
the sum of the three energy terms.
• In the end it completely “shrink-wraps” around the
object. courtesy
Esnake  Eint ernal  Eexternal  Econstra int
• The energy terms are defined in a way such that the final position of
the contour will have minimum energy (Emin)
• Therefore our problem of detecting objects reduces to an energy

minimisation problem.
(Diagram courtesy “Snakes, shapes, gradient vector

flow”, Xu, Prince)
Dr Chris Town A. Poonawala Dr Chris Town
Internal Energy (Eint ) Elastic force

• Depends on the intrinsic properties of the curve.
• Generated by elastic potential energy of the curve.
• Sum of elastic energy and bending energy.
Felastic   v ss
Elastic Energy (Eelastic):
• The curve is treated as an elastic rubber band
possessing elastic potential energy. • Characteristics (refer diagram)
• It discourages stretching by introducing tension.
1 dv(s )
E elastic   ( s ) | v s | 2 ds
2 s
vs 
ds
• Weight (s) allows us to control elastic energy along

different parts of the contour. Considered to be
constant  for many applications.
• Responsible for shrinking of the contour.
A. Poonawala Dr Chris Town A. Poonawala Dr Chris Town
Bending Energy (Ebending): Bending force
• The snake is also considered to behave like a thin metal • Generated by the bending energy of the contour.
strip giving rise to bending energy. • Characteristics (refer diagram):
• It is defined as sum of squared curvature of the contour.

1
E b end in g   ( s ) | v ss | 2 ds
2 s
• (s) plays a similar role to (s).
• Bending energy is minimum for a circle. Initial curve Final curve deformed by
(High bending energy) bending force. (low bending
energy)
• Total internal energy of the snake can be defined as
1 • Thus the bending energy tries to smooth out the curve.
Eint  Eelastic Ebending   | vs |2 | vss |2)ds
s
2
A. Poonawala Dr Chris Town A. Poonawala Dr Chris Town
10
External energy of the contour (Eext)
• Image fitting
Eext   Eimage (v ( s )) ds
s
For example
A. Poonawala Dr Chris Town Dr Chris Town
leafmv.mpg
dancemv.mpg
http://www.robots.ox.ac.uk/~ab/dynamics.html Dr Chris Town Dr Chris Town
11
Generating Functions
Since the wavelets are dilates, translates, and rotates of each other, such a transform
seeks to extract image structure in a way that may be invariant to dilation, translation,
and rotation of the original image or pattern.
Gabor wavelets
x 2 y 2
 2
 c (x, y)  e 2 cos2u0 x 
u0=0 U0=0.1 U0=0.2
x2 y2
 2
 s (x, y)  e 2
sin2u0 x 
A. Torralba Dr Chris Town A. Torralba Dr Chris Town
Dilation and rotation Frequency, orientation and

symmetry (phase)
12
Wavelet (QMF) transform Steerable pyramid
Multiple
orientations at
Wavelet
= = one scale
*
pyramid *
Steerable pixel image
Ortho-normal pixel image pyramid
Multiple
transform (like orientations at
Fourier transform), the next scale
Over-complete
but with localized
representation,
basis functions.
the next scale… but non-aliased
subbands.
13

Town - ComputerVision - L5

Uploaded by

Copyright:

Available Formats

Town - ComputerVision - L5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Town - ComputerVision - L5

Uploaded by

Copyright:

Available Formats

Recap: Smoothing with a Gaussian

Computer Vision Recall: parameter σ is the “scale” / “width” / “spread” of the

Computer Science Tripos Part II

Dr Chris Town Dr Chris Town

Recap: Effect of σ on derivatives

The apparent structures differ depending on Gaussian’s

Larger values: larger scale edges detected

Dr Chris Town Dr Chris Town

Dr Chris Town Dr Chris Town

Scale Invariant Detection Scale Invariant Detection:

Dr Chris Town Dr Chris Town

Image Matching Invariant local features

Slide adapted from Darya Frolova, Denis Simakov, Weizmann Institute.

Gaussian Pyramid The Gaussian Pyramid

The Laplacian Pyramid Laplacian ~ Difference of Gaussian

Dr Chris Town Dr Chris Town

Laplacian pyramid algorithm

Dr Chris Town ( I  F1G1 ) x1 Dr Chris Town

http://www-bcs.mit.edu/people/adelson/pub_pdfs/pyramid83.pdf Dr Chris Town From: David Lowe (2004) Dr Chris Town

DoG approximates scale-normalised Laplacian of a Gaussian

(heat diffusion equation)

Dr Chris Town Dr Chris Town

Octave increment in scale of the Gaussian Pyramid

followed by factor-of-two downsampling (for efficiency).

Remember that we defined neighbouring scales as

So starting with some , the next scale parameter will

Key point localization with DoG Example of Keypoint Detection

difference-of-Gaussian (b) 832 DoG extrema

Feature Descriptors: SIFT Rotation Invariant Descriptors

• Select dominant orientation

Feature stability to affine change Distinctiveness of features

Dr Chris Town Dr Chris Town

Working with SIFT Descriptors SIFT

Nearest-neighbor matching Approximate k-d tree matching

• Solve following problem for all feature vectors, x:

• Nearest-neighbour matching is the major computational order of distance from

Dr Chris Town Dr Chris Town

K-d tree construction K-d tree query

Slide credit: Anna Atramentov Slide credit: Anna Atramentov

Fourier Fourier bases pixel domain

Gaussian pyramid Laplacian pyramid

Overcomplete representation. Overcomplete representation.

• Edge Integration, Contour fitting:

Dr Chris Town Dr Chris Town

• Therefore our problem of detecting objects reduces to an energy

(Diagram courtesy “Snakes, shapes, gradient vector

Dr Chris Town A. Poonawala Dr Chris Town

Internal Energy (Eint ) Elastic force

• Weight (s) allows us to control elastic energy along

Bending Energy (Ebending): Bending force

• It is defined as sum of squared curvature of the contour.

A. Poonawala Dr Chris Town Dr Chris Town

http://www.robots.ox.ac.uk/~ab/dynamics.html Dr Chris Town Dr Chris Town

Dr Chris Town Dr Chris Town

Dr Chris Town Dr Chris Town

u0=0 U0=0.1 U0=0.2

A. Torralba Dr Chris Town A. Torralba Dr Chris Town

Dilation and rotation Frequency, orientation and

Dr Chris Town Dr Chris Town

Dr Chris Town Dr Chris Town