Town - ComputerVision - L5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Recap: Smoothing with a Gaussian

Computer Vision Recall: parameter σ is the “scale” / “width” / “spread” of the


Gaussian kernel, and controls the amount of smoothing.

Computer Science Tripos Part II



Dr Christopher Town

Dr Chris Town Dr Chris Town

Recap: Effect of σ on derivatives

σ = 1 pixel σ = 3 pixels

The apparent structures differ depending on Gaussian’s


scale parameter.

Larger values: larger scale edges detected


Smaller values: finer features detected
Dr Chris Town Dr Chris Town

Dr Chris Town Dr Chris Town

1
Scale Invariant Detection Scale Invariant Detection

• Consider regions (e.g. circles) of different sizes • The problem: how do we choose corresponding
around a point circles independently in each image?
• Regions of corresponding sizes will look the same
in both images

Dr Chris Town Dr Chris Town

Scale Invariant Detection Scale Invariant Detection:


• Solution:
Summary
– Design a function on the region (circle), which is “scale • Given: two images of the same scene with a large
invariant” (the same for corresponding regions, even if scale difference between them
they are at different scales)
• Goal: find the same interest points independently
f Image 1 f Image 2
in each image
scale = 1/2 • Solution: search for extrema of suitable functions
in scale and in space (over the image)
Methods:
region size region size
1. Harris-Laplacian [Mikolajczyk, Schmid]: maximise Laplacian over scale,
Harris measure of corner response over the image
2. SIFT [Lowe]: maximise Difference of Gaussians over scale and space

Dr Chris Town Dr Chris Town

Image Matching Invariant local features


-Algorithm for finding points and representing their patches should produce
similar results even when conditions vary
-Buzzword is “invariance”
– geometric invariance: translation, rotation, scale
– photometric invariance: brightness, exposure, …

Feature Descriptors
Dr Chris Town Dr Chris Town

2
Feature detection Scale invariant detection
Suppose you’re looking for corners
Local measure of feature uniqueness
– How does the window change when you shift it?

Key idea: find scale that gives local maximum response in both
position and scale: use a Laplacian approximated by difference
between two Gaussian filtered images with different sigmas)
“flat” region: “edge”: “corner”:
no change in all no change along significant change
directions the edge direction in all directions

Slide adapted from Darya Frolova, Denis Simakov, Weizmann Institute.


Dr Chris Town Dr Chris Town

Gaussian Pyramid The Gaussian Pyramid


Low resolution G4  (G3 * gaussian)  2
All the extra levels
add very little
blur )  2
G3  (G2 * gaussian
overhead for memory
blur
or computation! G2  (G1 * gaussian)  2

blur
G1  (G0 * gaussian)  2

G0  Image
blur

High resolution
Source: Irani & Basri Dr Chris Town Source: Irani & Basri Dr Chris Town

The Laplacian Pyramid Laplacian ~ Difference of Gaussian


Li  Gi  expand(Gi 1 )
Gaussian Pyramid Laplacian Pyramid
- =
Gn Ln  Gn
G2
- = L2
G1 DoG = Difference of Gaussians
L1 Cheap approximation – no derivatives needed.
- =
G0
L0
- =
- =
Why is this useful?
17 B. Leibe
DrSource:
ChrisIrani
Town
& Basri Dr Chris Town

3
DoG approximation to LoG
• We can efficiently approximate the (scale-normalised)
Laplacian of a Gaussian with a difference of Gaussians:

B. Leibe
Dr Chris Town Dr Chris Town

Scale-Space Pyramid
• Multiple scales must be examined to identify scale-invariant
features
• An efficient function is to compute the Difference of Gaussian
(DOG) pyramid (Burt & Adelson, 1983)

Resample

Blur

Subtract
Gaussian pyramid

Dr Chris Town Dr Chris Town

Laplacian pyramid algorithm


x1 G1 x1  x2 x2 x3

( I  F3G3 ) x3

( I  F2G2 ) x2
Notice that each layer shows detail
at a particular scale --- these are,
basically, bandpass filtered versions
of the image.
F1G1 x1

Laplacian pyramid

Dr Chris Town ( I  F1G1 ) x1 Dr Chris Town

4
Showing, at full resolution, the information captured at each
SIFT – Scale Invariant Feature Transform
level of a Gaussian (top) and Laplacian (bottom) pyramid.

http://www-bcs.mit.edu/people/adelson/pub_pdfs/pyramid83.pdf Dr Chris Town From: David Lowe (2004) Dr Chris Town

DoG approximates scale-normalised Laplacian of a Gaussian

(heat diffusion equation)

Dr Chris Town Dr Chris Town

Octave increment in scale of the Gaussian Pyramid

followed by factor-of-two downsampling (for efficiency).


To achieve better performance, each octave i is further
divided into s intervals.

Remember that we defined neighbouring scales as

So starting with some , the next scale parameter will


be , followed by etc., so that after s sub-
levels of the pyramid we have a complete octave with

Therefore
Dr Chris Town Dr Chris Town

5
Dr Chris Town Dr Chris Town

Key point localization with DoG Example of Keypoint Detection


• Detect extrema of (a) 233x189 image

difference-of-Gaussian (b) 832 DoG extrema


(c) 729 left after peak
(DoG) in scale space value threshold
(d) 536 left after testing
• Then reject points with ratio of principle
curvatures (removing
low contrast edge responses)

(threshold)
• Eliminate edge
responses Candidate keypoints:
list of (x,y,σ)

Slide credit: David Lowe Dr Chris Town Slide credit: David Lowe Dr Chris Town

Feature Descriptors: SIFT Rotation Invariant Descriptors


• Scale Invariant Feature Transform • Find local orientation
• Descriptor computation: – Dominant direction of gradient
– Divide patch into 4x4 sub-patches: 16 cells for the image patch
– Compute histogram of gradient orientations (8 reference • Rotate patch according to this angle
angles) for all pixels inside each sub-patch – This puts the patches into a canonical orientation.
– Resulting descriptor: 4x4x8 = 128 dimensions

David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp.
91-110, 2004.

Slide credit: Svetlana Lazebnik Dr Chris Town Slide credit: Svetlana Lazebnik, Matthew Brown Dr Chris Town

6
Orientation Normalisation: Computation
Feature stability to noise
• Match features after random change in image scale &
[Lowe, SIFT, 1999] orientation, with differing levels of image noise
• Compute orientation histogram • Find nearest neighbor in database of 30,000 features

• Select dominant orientation


• Normalise: rotate to fixed orientation

0 2p
37
Slide adapted from David Lowe Dr Chris Town Dr Chris Town

Feature stability to affine change Distinctiveness of features


• Match features after random change in image scale & • Vary size of database of features, with 30 degree affine change,
orientation, with 2% image noise, and affine distortion 2% image noise
• Find nearest neighbor in database of 30,000 features • Measure % correct for single nearest neighbor match

Dr Chris Town Dr Chris Town

Working with SIFT Descriptors SIFT


• One image yields:
– n 128-dimensional descriptors: each
one is a histogram of the gradient
orientations within a patch
• [n x 128 matrix]
– n scale parameters specifying the
size of each patch
• [n x 1 vector]
– n orientation parameters specifying
the angle of the patch
• [n x 1 vector]
– n 2D points giving positions of the
patches
• [n x 2 matrix]

Slide credit: Steve Seitz Dr Chris Town D. Lowe, 2004 Dr Chris Town

7
Feature matching Image stitching

Slides from Steve Seitz and Rick Szeliski Dr Chris Town Brown, Lowe, 2007 Dr Chris Town

Nearest-neighbor matching Approximate k-d tree matching

• Solve following problem for all feature vectors, x:


Key idea:
 Search k-d tree bins in

• Nearest-neighbour matching is the major computational order of distance from


bottleneck query
 Requires use of a priority
– Linear search performs dn2 operations for n features
queue
and d dimensions
– No exact methods are faster than linear search for d>10
– Approximate methods can be much faster, but at the
cost of missing some correct matches. Failure rate gets
worse for large datasets.

Dr Chris Town Dr Chris Town

K-d tree construction K-d tree query


Simple 2D example
4 l1 6 l9
l5 7 l1
4 l1 6 l9
7 l1 q l6
l5
8 l3 l2 l3
l6 l2 5
l2 8 l3 l2 l3 9
5
l83 l10 10 l
9 7 l4 l5 l7 l6
l83 l10 10
l7 l4 l5 l7 l6 1
l4
2
11
1 2 l8 l10 l9
l4 11 2 5 4 11 8
l8 2 5 4 11 l10 8 l9
1 3 9 10 6 7
1 3 9 10 6 7

Slide credit: Anna Atramentov Slide credit: Anna Atramentov


Dr Chris Town Dr Chris Town

8
Recognition with Local Features Fourier transform
• Image content is transformed into local features
that are invariant to translation, rotation, and
scale
• Goal: Verify if they belong to a consistent = *
configuration

Fourier Fourier bases pixel domain


transform are global: image
each transform
coefficient
Local Features, depends on all
e.g. SIFT
pixel locations.
49 Dr Chris Town Dr Chris Town
Slide credit: David Lowe

Gaussian pyramid Laplacian pyramid

= * = *
Gaussian pixel image Laplacian pixel image
pyramid pyramid

Overcomplete representation. Overcomplete representation.


Low-pass filters, sampled Transformed pixels represent
appropriately for their blur. Dr Chris Town bandpassed image information. Dr Chris Town

Edge Fitting
• Edge Detection:
– The process of labeling the locations in the image where the gray
level’s “rate of change” is high.
• OUTPUT: “edgels” locations,
direction, strength

• Edge Integration, Contour fitting:


– The process of combining “local” and perhaps sparse and non-
contiguous “edgel”-data into meaningful, long edge curves (or closed
contours) for segmentation
• OUTPUT: edges/curves consistent with the local data

Dr Chris Town Dr Chris Town

9
Framework for snakes Modeling
• The contour is defined in the (x, y) plane of an image as a
• A higher level process or a user initialises any curve parametric curve
close to the object boundary. v(s)=(x(s), y(s))
• The snake then starts deforming and moving towards
the desired object boundary. • Contour is said to possess an energy (Esnake) which is defined as
the sum of the three energy terms.
• In the end it completely “shrink-wraps” around the
object. courtesy
Esnake  Eint ernal  Eexternal  Econstra int

• The energy terms are defined in a way such that the final position of
the contour will have minimum energy (Emin)

• Therefore our problem of detecting objects reduces to an energy


minimisation problem.

(Diagram courtesy “Snakes, shapes, gradient vector


flow”, Xu, Prince)

Dr Chris Town A. Poonawala Dr Chris Town

Internal Energy (Eint ) Elastic force


• Depends on the intrinsic properties of the curve.
• Generated by elastic potential energy of the curve.
• Sum of elastic energy and bending energy.
Felastic   v ss
Elastic Energy (Eelastic):
• The curve is treated as an elastic rubber band
possessing elastic potential energy. • Characteristics (refer diagram)
• It discourages stretching by introducing tension.
1 dv(s )
E elastic   ( s ) | v s | 2 ds
2 s
vs 
ds

• Weight (s) allows us to control elastic energy along


different parts of the contour. Considered to be
constant  for many applications.
• Responsible for shrinking of the contour.
A. Poonawala Dr Chris Town A. Poonawala Dr Chris Town

Bending Energy (Ebending): Bending force

• The snake is also considered to behave like a thin metal • Generated by the bending energy of the contour.
strip giving rise to bending energy. • Characteristics (refer diagram):

• It is defined as sum of squared curvature of the contour.


1
E b end in g   ( s ) | v ss | 2 ds
2 s
• (s) plays a similar role to (s).
• Bending energy is minimum for a circle. Initial curve Final curve deformed by
(High bending energy) bending force. (low bending
energy)
• Total internal energy of the snake can be defined as
1 • Thus the bending energy tries to smooth out the curve.
Eint  Eelastic Ebending   | vs |2 | vss |2)ds
s
2
A. Poonawala Dr Chris Town A. Poonawala Dr Chris Town

10
External energy of the contour (Eext)
• Image fitting

Eext   Eimage (v ( s )) ds
s

For example

A. Poonawala Dr Chris Town Dr Chris Town

leafmv.mpg

dancemv.mpg

http://www.robots.ox.ac.uk/~ab/dynamics.html Dr Chris Town Dr Chris Town

Dr Chris Town Dr Chris Town

11
Generating Functions

Since the wavelets are dilates, translates, and rotates of each other, such a transform
seeks to extract image structure in a way that may be invariant to dilation, translation,
and rotation of the original image or pattern.

Dr Chris Town Dr Chris Town

Gabor wavelets
x 2 y 2
 2
 c (x, y)  e 2 cos2u0 x 

u0=0 U0=0.1 U0=0.2

x2 y2
 2
 s (x, y)  e 2
sin2u0 x 

A. Torralba Dr Chris Town A. Torralba Dr Chris Town

Dilation and rotation Frequency, orientation and


symmetry (phase)

Dr Chris Town Dr Chris Town

12
Dr Chris Town Dr Chris Town

Dr Chris Town Dr Chris Town

Wavelet (QMF) transform Steerable pyramid

Multiple
orientations at
Wavelet
= = one scale
*
pyramid *
Steerable pixel image
Ortho-normal pixel image pyramid
Multiple
transform (like orientations at
Fourier transform), the next scale
Over-complete
but with localized
representation,
basis functions.
the next scale… but non-aliased
subbands.
Dr Chris Town Dr Chris Town

13

You might also like