Lec4 SIFT HoG

Download as pdf or txt
Download as pdf or txt
You are on page 1of 150

Interest Point Detectors

Sift Invariant Feature Transform (SIFT)


and
Histogram of Gradient (HOG)

Some slides were adapted/taken from various sources, including 3D Computer Vision of Prof. Hee, NUS, Air Lab Summer
School, The Robotic Institute, CMU, Computer Vision of Prof. Mubarak Shah, UCF, Computer Vision of Prof. William Hoff,
Colorado School of Mines and many more. We thankfully acknowledge them. Students are requested to use this material for
their study only and NOT to distribute it.
Finding the “same” thing across images
Categories Find a bottle: Instances Find these two objects

Can’t do Can nail it


unless you do not
care about few errors…
Building a Panorama

M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003


What is an interest point
 Expressive texture
 The point at which the direction of the
boundary of object changes abruptly
 Intersection point between two or more
edge segments
What is an interest point
 Expressive texture
 The point at which the direction of the
boundary of object changes abruptly
 Intersection point between two or more
edge segments
Synthetic & Real
Interest Points

Corners are indicated in red


Properties of Interest Point
Detectors
 Detect all (or most) true interest points
 No false interest points
 Well localized.
 Robust with respect to noise.
 Efficient detection
Possible Approaches to
Corner Detection
 Based on brightness of images
 Usually image derivatives

 Based on boundary extraction


 First step edge detection
 Curvature analysis of edges
Image Derivative and Average

13-Feb-22 CS 574 Lecture 2 9


Derivative

13-Feb-22 CS 574 Lecture 2 10


Discrete Derivative

13-Feb-22 CS 574 Lecture 2 11


Discrete Derivative
Finite Difference

13-Feb-22 CS 574 Lecture 2 12


Example: 1D
(Backward Difference)

13-Feb-22 CS 574 Lecture 2 13


Derivative in 2D

13-Feb-22 CS 574 Lecture 2 14


Derivatives of Images

X direction Centre difference Y direction Centre difference

13-Feb-22 CS 574 Lecture 2 15


Derivatives of Images

13-Feb-22 CS 574 Lecture 2 16


Where can we use it?
 Automate object tracking
 Point matching for computing disparity
 Stereo calibration
 Estimation of fundamental matrix
 Motion based segmentation
 Recognition
 3D object reconstruction
 Robot navigation
 Image retrieval and indexing
Correspondence across views

• Correspondence: matching points, patches,


edges, or regions across images

Slide Credit: James Hays


Example: estimating “fundamental matrix”
that corresponds two views

Slide from Silvio Savarese


Example: structure from motion

Slide Credit: James Hays


This class: interest points
original
• Suppose you have to
click on some point,
go away and come
back after I deform the
image, and click on the
same points again.
– Which points would
you choose?

deformed
Slide Credit: James Hays
Overview of Keypoint Matching
1. Find a set of
distinctive key-
points
A1
2. Define a region
around each
A2 A3 keypoint
3. Extract and
normalize the
region content
fA fB
4. Compute a local
descriptor from the
normalized region
d ( f A, f B )  T
5. Match local
descriptors
Goals for Keypoints

Detect points that are repeatable and distinctive


Key trade-offs
A1

A2 A3

Detection of interest points

More Repeatable More Points


Robust detection Robust to occlusion
Precise localization Works with less texture

Description of patches

More Distinctive More Flexible


Minimize wrong matches Robust to expected variations
Slide Credit: James Hays Maximize correct matches
Invariant Local Features

Image content is transformed into local feature coordinates that are


invariant to translation, rotation, scale, and other imaging parameters

Features Descriptors
Choosing interest points
Where would you
tell your friend to
meet you?

Slide Credit: James Hays


Feature extraction: Corners

9300 Harris Corners Pkwy, Charlotte, NC

Slides from Rick Szeliski, Svetlana Lazebnik, and Kristin


Grauman
Local features: main components
1) Detection: Identify the
interest points

2) Description :Extract feature


x1  [ x1(1) , , xd(1) ]
vector descriptor surrounding
each interest point.

3) Matching: Determine x2  [ x1( 2) , , x(d2) ]


correspondence between
descriptors in two views

Kristen Grauman
Goal: interest operator repeatability
• We want to detect (at least some of) the
same points in both images.

No chance to find true matches!

• Yet we have to be able to run the detection


procedure independently per image.
Kristen Grauman
Goal: descriptor distinctiveness
• We want to be able to reliably determine
which point goes with which.

?
• Must provide some invariance to geometric
and photometric differences between the two
views.
Kristen Grauman
Some patches can be localized
or matched with higher accuracy than
others.
Some patches can be localized
or matched with higher accuracy than
others.
To continue…
Some Mathematical Preliminaries
Derivative of Gaussian filter

* [1 -1] =

13-Feb-22 CS 574 Lecture 2a 35


Tradeoff between smoothing and localization

13-Feb-22 CS 574 Lecture 2a 36


Tradeoff between smoothing and localization

1 pixel

13-Feb-22 CS 574 Lecture 2a 37


Tradeoff between smoothing and localization

1 pixel 3 pixels

13-Feb-22 CS 574 Lecture 2a 38


Tradeoff between smoothing and localization

1 pixel 3 pixels 7 pixels

13-Feb-22 CS 574 Lecture 2a 39


Tradeoff between smoothing and localization

1 pixel 3 pixels 7 pixels

• Smoothed derivative removes noise, but blurs


edge. Also finds edges at different “scales”.

13-Feb-22 CS 574 Lecture 2a 40


Detecting Discontinuities

 Image derivatives

 Convolve image with derivative filters

13-Feb-22 CS 574 Lecture 2a 41


Detecting Discontinuities

 Image derivatives

 Convolve image with derivative filters


Backward difference [-1 1]

Forward difference [1 -1]


Central difference [-1 0 1]

13-Feb-22 CS 574 Lecture 2a 42


Derivative in Two-Dimensions

 Definition

 Approximation

 Convolution kernels

13-Feb-22 CS 574 Lecture 2a 43


Derivative in Two-Dimensions

 Definition

 Approximation

 Convolution kernels

13-Feb-22 CS 574 Lecture 2a 44


Derivative in Two-Dimensions

 Definition

 Approximation

 Convolution kernels

13-Feb-22 CS 574 Lecture 2a 45


Derivative in Two-Dimensions

 Definition

 Approximation

1
 Convolution kernels

13-Feb-22 CS 574 Lecture 2a 46


Derivative in Two-Dimensions

 Definition

 Approximation

 Convolution kernels

13-Feb-22 CS 574 Lecture 2a 47


Derivative in Two-Dimensions

 Definition

 Approximation

 Convolution
f x  1 1 1
fy   
1

13-Feb-22 CS 574 Lecture 2a 48


Image Derivatives

1
Image I I x  I * 1 1 Iy  I * 
1

13-Feb-22 CS 574 Lecture 2a 49


Derivatives and Noise

Strongly affected by noise What is to be done?


•obvious reason: image noise •Neighboring pixels look alike
results in pixels that look very •Pixel along an edge look alike
different from their neighbors •Image smoothing should help
Force pixels different from
The larger the noise is the their neighbors (possibly
stronger the response noise) to look like neighbors

13-Feb-22 CS 574 Lecture 2a 50


Derivatives and Noise

Increasing noise

Zero mean additive gaussian noise

13-Feb-22 CS 574 Lecture 2a 51


Image Smoothing

 Expect pixels to “be like” their neighbors


– Relatively few reflectance changes
 Generally expect noise to be independent
from pixel to pixel
– Smoothing suppresses noise

13-Feb-22 CS 574 Lecture 2a 52


Gaussian Smoothing

 ( x2  y 2 )

g(x, y)  e 2 2

 Scale of Gaussian 
– As  increases, more pixels are involved in average
– As  increases, image is more blurred
– As  increases, noise is more effectively suppressed

13-Feb-22 CS 574 Lecture 2a 53


Marr Hildreth Edge Detector
 Smooth image by Gaussian filter  S
 Apply Laplacian to S
– Used in mechanics, electromagnetics, wave theory, quantum
mechanics and Laplace equation
 Find zero crossings
– Scan along each row, record an edge point at the location of
zero-crossing.
– Repeat above step along each column

13-Feb-22 CS 574 Lecture 2a 54


Spatial Filters – sharpening
• The Laplacian
• second derivative of a two dimensional function f(x,y)

2 f 2 f
 f  2  2
2

x y
= [f(x+1,y)+f(x-1,y)+f(x,y+1)+f(x,y-1)]-4f(x,y)

13-Feb-22 CS 574 Lecture 2 55


Spatial Filters – sharpening
• The Laplacian
• use a convolution mask to approximate

0 1 0 1 1 1 -1 2 -1
1 -4 1 1 -8 1 2 -4 2
0 1 0 1 1 1 -1 2 -1

13-Feb-22 CS 574 Lecture 2 56


Spatial Filters – sharpening

• The Laplacian
• example:

13-Feb-22 CS 574 Lecture 2 57


Marr Hildreth Edge Detector
 Smooth image by Gaussian filter  S
 Apply Laplacian to S
– Used in mechanics, electromagnetics, wave theory, quantum
mechanics and Laplace equation
 Find zero crossings
– Scan along each row, record an edge point at the location of
zero-crossing.
– Repeat above step along each column

13-Feb-22 CS 574 Lecture 2a 58


Marr Hildreth Edge Detector

 Gaussian smoothing

 Find Laplacian

13-Feb-22 CS 574 Lecture 2a 59


Marr Hildreth Edge Detector

 Gaussian smoothing

 Find Laplacian

13-Feb-22 CS 574 Lecture 2a 60


Marr Hildreth Edge Detector

13-Feb-22 CS 574 Lecture 2a 61


Gaussian

Standard
deviation
x
-3 -2 -1 0 1 2 3

.011 .13 .6 1 .6 .13 .011


g(x)

13-Feb-22 CS 574 Lecture 2a 62


Gaussian

13-Feb-22 CS 574 Lecture 2a 63


2-D Gaussian

13-Feb-22 CS 574 Lecture 2a 64


2-D Gaussian

13-Feb-22 CS 574 Lecture 2a 65


LoG Filter

13-Feb-22 CS 574 Lecture 2a 66


Finding Zero Crossings

13-Feb-22 CS 574 Lecture 2a 67


On the Separability of Gaussian

13-Feb-22 CS 574 Lecture 2a 68


Seperability

13-Feb-22 CS 574 Lecture 2a 69


Example

I I * 2 g  Zero crossings of 2 S

13-Feb-22 CS 574 Lecture 2a 70


Example

 1

 3

 6

13-Feb-22 CS 574 Lecture 2a 71


LoG Algorithm

 Apply LoG to the Image: Either


– Use 2D filter 2 g ( x, y)
– Or Use 4 1D filters g ( x), g xx ( x), g ( y), g yy ( y)

 Find zero-crossings from each row


 Find slope of zero-crossings
 Apply threshold to slope and mark edges

13-Feb-22 CS 574 Lecture 2a 72


Interest Point Detectors

Sift Invariant Feature Transform (SIFT)


and
Histogram of Gradient (HOG)
SIFT - Key Point Extraction
 Stands for Scale Invariant Feature
Transform
 Patented by university of British
Columbia
Similar to the one used in primate visual
system (human, ape, monkey, etc.)
Transforms image data into scale-
invariant coordinates
D. Lowe. Distinctive image features from scale-invariant key points., International Journal
of Computer Vision 2004.
Goal
 Extract distinctive invariant features
 Correctly matched against a large of
database features from many images
 Invariance to image scale and rotation
 Robustness to
 Affine (rotation, scale, shear) distortion,
 Change in 3D viewpoint,
 Addition of noise,
 Change in illumination.
Advantages
Locality: features are local, so robust to
occlusion and clutter
Distinctiveness: individual features can be
matched to a large database of objects
Quantity: many features can be generated
for even small objects
 Efficiency: close to real-time performance
Invariant Local Features
Steps for Extracting Key
Points (SIF Points)
 Scale space peak selection
 Potential locations for finding features
 Key point localization
 Accurately locating the feature key points
 Orientation Assignment
 Assigning orientation to the key points
 Key point descriptor
 Describing the key point as a high
dimensional vector (128) (SIFT Descriptor)
Scales
 What should be sigma value for Canny
and LoG edge detection?
 If use multiple sigma values (scales),
how do you combine multiple edge
maps?
 Marr-Hildreth:
 Spatial Coincidence assumption:

Zerocrossings that coincide over scales


several are physically significant.
Scale Space

scale scale

Multiple smooth versions of a signal Zerocrossings at multiple scale


Scale Space

Scale Space Interval Tree


Scale Space (Witkin, IJCAI
1983)
 Apply whole spectrum of scales
 Plot zerocrossings vs scales in a scale-space
 Interpret scale space contours
 Contours are arches, open at the bottom, closed
at the top
 Interval tree
 Each interval corresponds to a node in a tree,
 whose parent node represents larger interval, from which interval
emerged, and
 whose off springs represent smaller intervals.
 Stability of a node is a scale range over which the
interval exits.
Scale Space
 Top level description
 Iteratively remove nodes from the tree,
 splicing out nodes that are less stable than
any of their parents and off springs
Scale Space

A top level description of


several signals using stability
criterion.
Laplacian-of-Gaussian (LoG)
• Interest points:

Local maxima in scale
space of Laplacian-of-
Gaussian 
Co
mp
Visual Object Recognition Tutorial

uti Lxx ( )  Lyy ( ) 


ng


Aug
me
nte
 List of
d  (x, y, σ)

Se
nso K. Grauman, B. Leibe
ry

an
Automatic scale selection
Intuition:
• Find scale that gives local maxima of some function
f in both position and scale.

f f
Image 1 Image 2

K. Grauman,
s1 region size s2 region size
Automatic Scale Selection

f (I i1im (x,  ))  f (I i1im (x,  ))

How to find corresponding patch


sizes?
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
K. Grauman, B. Leibe
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
K. Grauman, B. Leibe
What Is A Useful Signature Function?

• Laplacian-of-Gaussian = “blob” detector

Co
mp
uti
ng

Aug
me
nte
d

Se
nso K. Grauman, B. Leibe
ry
Scale-space blob detector: Example

Source: Lana Lazebnik


Sigma = 2
Sigma = 2.5
Sigma = 3.1
Sigma = 3.9
Sigma = 6.1
Scale-space blob detector: Example

Source: Lana Lazebnik


Scale-space blob detector: Example

Source: Lana Lazebnik


Building a Scale Space
 All scales must be examined to identify scale-
invariant features
 An efficient function is to compute the Laplacian
Pyramid (Difference of Gaussian) (Burt & Adelson, 1983)
Approximation of LoG by
Difference of Gaussians
G
 2G Heat Equation

G G(x, y, k) G (x, y,  )
 G 
2

 k  

G(x, y, k )  G(x, y,  )  (k 1) 2 2G


Typical values :   1.6; k  2
Steps for Extracting Key
Points (SIF Points)
 Scale space peak selection
 Potential locations for finding features
 Key point localization
 Accurately locating the feature key points
 Orientation Assignment
 Assigning orientation to the key points
 Key point descriptor
 Describing the key point as a high
dimensional vector (128) (SIFT Descriptor)
Building a Scale Space
1 2 2 2 2
G( x, y,k )  e ( x  y ) / 2 k 
2 ( k )2

k 2
Building a Scale Space
1 2 2
( x  y ) / 2 k 
G( x, y,k ) 
2 2
e
2 ( k )2

k 4
k 3
k 2

k 4

k 3

k 2
k 2
k


  .707187.6; k  2
How many scales per
octave?
 A collection of 32 real images drawn from a
diverse range, including
 outdoor scenes, human faces, aerial photographs, and
industrial

 Each image was then subject to a range


of transformations:
 rotation, scaling, affine stretch, change in brightness and
 contrast, and addition of image
noise.
How many scales per
octave?

3 Scales
Initial value of sigma

1.6
Scale Space Peak Detection
 Compare a pixel (X) with 26
pixels in current and adjacent
scales (Green Circles)
 Select a pixel (X) if

larger/smaller than all 26 pixels


 Large number of extrema,

computationally expensive
 Detect the most stable subset
with a coarse sampling of scales
Key Point Localization
 Candidates are chosen from extrema
detection

original image extrema locations


Initial Outlier Rejection
1. Low contrast candidates
2. Poorly localized candidates along an edge
 Taylor series expansion of DOG, D.

 Minima or maxima is located at


 Value of D(x) at minima/maxima must be
large, |D(x)|>th.
Initial Outlier Rejection

from 832 key points to 729 key points, th=0.03.


Further Outlier Rejection
 DOG has strong response along
edge
 Assume DOG as a surface
 Compute principal curvatures (PC)
 Poorly defined peak will have very

low curvature along the edge, high


across the edge
Further Outlier Rejection
 Use Trace and det similar to Harris corner detector
 Compute Hessian of D (principal curvature)

Tr(H)  Dxx  Dyy 1 2


Det(H )  D xx Dyy Dxy2  1 2
 Remove outliers by evaluating
Tr(H )2 (r  1)2
 1
Det(H) r r
2
Tr(H)2 (1  2 )2 (r2  2 )2 (r  1)2
  
Det(H ) 12 r22 r
Further Outlier Rejection
 Following quantity is minimum (eigen values
are equal) when r=1

 It increases with r r 1
2
Tr(H) 2
(r 1) 2

Det(H ) r

 Eliminate key points if


Tr(H )2 (r 1)2 r  10

Det(H) r
Further Outlier Rejection

from 729 key points to 536 key points.


Orientation Assignment
 To achieve rotation invariance
Compute central derivatives, gradient

magnitude and direction of L (smooth


image) at the scale of key point (x,y)

L is Laplacian of Gaussian (LoG)


Orientation Assignment
 Create a weighted direction
histogram in a neighborhood of
a key point (36 bins)
 Weights are
 Gradient magnitudes
 Spatial gaussian filter with
=1.5 x <scale of key point>
Orientation Assignment
 Select the peak as direction of the key point
Introduce additional key points at same
location
 if another peak is within 80% of max peak of
the histogram with different directions
Local Image Descriptors at
Key Points
 Possible descriptor
 Store intensity samples in the
neighborhood
Sensitive to lighting changes, 3D object

transformation
 Use of gradient orientation histograms
 Robust representation
Similarity to IT cortex
Complex neurons respond to a gradient at a
particular orientation.
Location of the feature can shift over a small
receptive field.

56
Tomaso Poggio, MIT

• Edelman, Intrator, and Poggio


(1997)
• The function of the cells allow
for matching and recognition of
3D objects from a range of view
points.

• Experiments show better


recognition accuracy for 3D
objects rotated in depth by up
to 20 degrees
Extraction of Local Image
Descriptors at Key Points
 Compute relative orientation and magnitude in a
16x16 neighborhood at key point
 Form weighted histogram (8 bin) for 4x4 regions
 Weight by magnitude and spatial Gaussian
 Concatenate 16 histograms in one long vector of 128 dimensions

 Example for 8x8 to 2x2 descriptors


Descriptor Regions (n by n)
Extraction of Local Image
Descriptors at Key Points
 Store numbers in a vector
 Normalize to unit vector (UN)
 Illumination invariance (affine changes)
 For non-linear intensity transforms
 Bound Unit Vector items to maximum 0.2
(remove gradients larger than 0.2)
 Renormalize to unit vector
Key point matching
 Match the key points against a database of
that obtained from training images.
Find the nearest neighbor i.e. a key point with
minimum Euclidean distance.
 Efficient Nearest Neighbor matching
 Looks at ratio of distance between best and 2nd best
match (.8)

61
Matching local features

Kristen Grauman
Matching local features

Image 1 Image 2
• To generate candidate matches, find that
have the most similar appearance or SIFTpatches
descriptor
• Simplest approach: compare them all, take the
closest
(or closest k, or within a thresholded distance)
Kristen Grauman
Query Image
1st NN

2nd NN

Query Image
3rd NN
4th NN
Distance to first match
if  .8 Goodmatch
Distance toseond match
Ambiguous matches

????
Image 1 Image 2
• At what distance do we have a good match?
• To add robustness to matching, can consider ratio : distance to
best match / distance to second best match
• If low, first match looks good.
• If high, could be ambiguous match.
Kristen Grauman
The ratio of distance
from the closest to the distance of the
second closest
SIFT Detector
 Generate Scale Space of an Image
 Detect Peaks in Scale Space (extrema)
 Localize Interest Points (Taylor Series)
 Remove outliers (remove response
along edges)
 Assign Orientation

69
SIFT Descriptor
 Compute relative orientation and magnitude in a
16x16 neighborhood at key point
 Form weighted histogram (8 bin) for 4x4 regions
 Weight by magnitude and spatial Gaussian
 Concatenate 16 histograms in one long vector of 128 dimensions

 Example for 8x8 to 2x2 descriptors


Reference

D. Lowe. Distinctive image features from scale-


invariant key points., International Journal of Computer
Vision 2004.
Histograms of Oriented Gradients for Human Detection
(HOG)
N. Dalal and B. Triggs
CVPR 2005
Cited by 8908

73
Dr. Edgar Seemann 74
HOG Steps
 HOG feature extraction
 Compute centered horizontal and vertical gradients with no smoothing
 Compute gradient orientation and magnitudes
For color image, pick the color channel with the highest gradient magnitude for each
pixel.

 For a 64x128 image,


 Divide the image into 16x16 blocks of 50% overlap.
 7x15=105 blocks in total
 Each block should consist of 2x2 cells with size 8x8.
 Quantize the gradient orientation into 9 bins
 The vote is the gradient magnitude
 Interpolate votes between neighboring bin center.
The vote can also be weighted with Gaussian to downweight the pixels near the edges of
the block.
 Concatenate histograms (Feature dimension: 105x4x9 = 3,780)
Computing Gradients
f (x h) f (x h)
 Centered: f '(x)  lim h0
2h

 Filter masks in x and y directions


-1

 Centered: -1 0 1 0
1

 Gradient

 Magnitude: s  sx2  s y2
θ
s
 Orientation:   arctan( y )
sx

76
Blocks, Cells
Block 2
Block 1
 16x16 blocks of 50% overlap.
 7x15=105 blocks in total

 Each block should consist of 2x2


cells with size 8x8.

Cells
(8 by 8)
Votes
 Each block consists of 2x2 cells with
size 8x8
 Quantize the gradient orientation into 9 9 Bins

bins (0-180)
 The vote is the gradient magnitude
Bin centers

 Interpolate votes linearly between neighboring bin


centers.
 Example: if θ=85 degrees.
 Distance to the bin center Bin 70 and Bin 90
are 15 and 5 degrees, respectively.
 Hence, ratios are 5/20=1/4, 15/20=3/4.

 The vote can also be weighted with Gaussian to


down weight the pixels near the edges of the block.
Final Feature Vector
 Concatenate histograms
 Make it a 1D vector of length 3780.

 Visualization

79
Results

Navneet Dalal and Bill Triggs “Histograms of Oriented Gradients for


Human Detection” CVPR05
SIFT Vs HOG

SIFT HOG
 128 dimensional vector  3,780 dimensional vector
 16 by 16 window  64 by 128 window
 4x4 sub-window (16 total)  16 by 16 blocks with
 8 bin histogram overlap
 Each block consists of 2 by
2 cells each of 8 by 8
 Overlapping
 9 bin histogram
To continue…

You might also like