Lec4 SIFT HoG

Interest Point Detectors
Sift Invariant Feature Transform (SIFT)

and
Histogram of Gradient (HOG)
Some slides were adapted/taken from various sources, including 3D Computer Vision of Prof. Hee, NUS, Air Lab Summer
School, The Robotic Institute, CMU, Computer Vision of Prof. Mubarak Shah, UCF, Computer Vision of Prof. William Hoff,
Colorado School of Mines and many more. We thankfully acknowledge them. Students are requested to use this material for
their study only and NOT to distribute it.
Finding the “same” thing across images
Categories Find a bottle: Instances Find these two objects
Can’t do Can nail it

unless you do not
care about few errors…
Building a Panorama
M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003

What is an interest point
 Expressive texture
 The point at which the direction of the
boundary of object changes abruptly
 Intersection point between two or more
edge segments
What is an interest point
 Expressive texture
 The point at which the direction of the
boundary of object changes abruptly
 Intersection point between two or more
edge segments
Synthetic & Real
Interest Points
Corners are indicated in red

Properties of Interest Point
Detectors
 Detect all (or most) true interest points
 No false interest points
 Well localized.
 Robust with respect to noise.
 Efficient detection
Possible Approaches to
Corner Detection
 Based on brightness of images
 Usually image derivatives
 Based on boundary extraction

 First step edge detection
 Curvature analysis of edges
Image Derivative and Average
13-Feb-22 CS 574 Lecture 2 9

Derivative

Discrete Derivative

Discrete Derivative
Finite Difference

Example: 1D
(Backward Difference)

Derivative in 2D

Derivatives of Images
X direction Centre difference Y direction Centre difference

Derivatives of Images

Where can we use it?
 Automate object tracking
 Point matching for computing disparity
 Stereo calibration
 Estimation of fundamental matrix
 Motion based segmentation
 Recognition
 3D object reconstruction
 Robot navigation
 Image retrieval and indexing
Correspondence across views
• Correspondence: matching points, patches,

edges, or regions across images
Slide Credit: James Hays

Example: estimating “fundamental matrix”
that corresponds two views
Slide from Silvio Savarese

Example: structure from motion

This class: interest points
original
• Suppose you have to
click on some point,
go away and come
back after I deform the
image, and click on the
same points again.
– Which points would
you choose?
deformed
Overview of Keypoint Matching
1. Find a set of
distinctive key-
points
A1
2. Define a region
around each
A2 A3 keypoint
3. Extract and
normalize the
region content
fA fB
4. Compute a local
descriptor from the
normalized region
d ( f A, f B )  T
5. Match local
descriptors
Goals for Keypoints
Detect points that are repeatable and distinctive

Key trade-offs
A1
A2 A3
Detection of interest points
More Repeatable More Points

Robust detection Robust to occlusion
Precise localization Works with less texture
Description of patches
More Distinctive More Flexible

Minimize wrong matches Robust to expected variations
Slide Credit: James Hays Maximize correct matches
Invariant Local Features
Image content is transformed into local feature coordinates that are

invariant to translation, rotation, scale, and other imaging parameters
Features Descriptors
Choosing interest points
Where would you
tell your friend to
meet you?

Feature extraction: Corners
9300 Harris Corners Pkwy, Charlotte, NC
Slides from Rick Szeliski, Svetlana Lazebnik, and Kristin

Grauman
Local features: main components
1) Detection: Identify the
interest points
2) Description :Extract feature

x1  [ x1(1) , , xd(1) ]
vector descriptor surrounding
each interest point.
3) Matching: Determine x2  [ x1( 2) , , x(d2) ]

correspondence between
descriptors in two views
Kristen Grauman
Goal: interest operator repeatability
• We want to detect (at least some of) the
same points in both images.
No chance to find true matches!
• Yet we have to be able to run the detection

procedure independently per image.
Kristen Grauman
Goal: descriptor distinctiveness
• We want to be able to reliably determine
which point goes with which.
?
• Must provide some invariance to geometric
and photometric differences between the two
views.
Kristen Grauman
Some patches can be localized
or matched with higher accuracy than
others.
Some patches can be localized
or matched with higher accuracy than
others.
To continue…
Some Mathematical Preliminaries
Derivative of Gaussian filter
* [1 -1] =
13-Feb-22 CS 574 Lecture 2a 35

Tradeoff between smoothing and localization

1 pixel

1 pixel 3 pixels

1 pixel 3 pixels 7 pixels

1 pixel 3 pixels 7 pixels
• Smoothed derivative removes noise, but blurs

edge. Also finds edges at different “scales”.

Detecting Discontinuities
 Image derivatives
 Convolve image with derivative filters

Detecting Discontinuities
 Image derivatives
 Convolve image with derivative filters

Backward difference [-1 1]
Forward difference [1 -1]

Central difference [-1 0 1]

Derivative in Two-Dimensions
 Definition
 Approximation
 Convolution kernels

 Definition
 Approximation

 Definition
 Approximation

 Definition
 Approximation
1

 Definition
 Approximation

 Definition
 Approximation
 Convolution
f x  1 1 1
fy   
1

Image Derivatives
1
Image I I x  I * 1 1 Iy  I * 
1

Derivatives and Noise
Strongly affected by noise What is to be done?

•obvious reason: image noise •Neighboring pixels look alike
results in pixels that look very •Pixel along an edge look alike
different from their neighbors •Image smoothing should help
Force pixels different from
The larger the noise is the their neighbors (possibly
stronger the response noise) to look like neighbors

Derivatives and Noise
Increasing noise
Zero mean additive gaussian noise

Image Smoothing
 Expect pixels to “be like” their neighbors

– Relatively few reflectance changes
 Generally expect noise to be independent
from pixel to pixel
– Smoothing suppresses noise

Gaussian Smoothing
 ( x2  y 2 )
g(x, y)  e 2 2
 Scale of Gaussian 
– As  increases, more pixels are involved in average
– As  increases, image is more blurred
– As  increases, noise is more effectively suppressed

Marr Hildreth Edge Detector
 Smooth image by Gaussian filter  S
 Apply Laplacian to S
– Used in mechanics, electromagnetics, wave theory, quantum
mechanics and Laplace equation
 Find zero crossings
– Scan along each row, record an edge point at the location of
zero-crossing.
– Repeat above step along each column

Spatial Filters – sharpening
• The Laplacian
• second derivative of a two dimensional function f(x,y)
2 f 2 f
 f  2  2
2
x y
= [f(x+1,y)+f(x-1,y)+f(x,y+1)+f(x,y-1)]-4f(x,y)

• The Laplacian
• use a convolution mask to approximate
0 1 0 1 1 1 -1 2 -1
1 -4 1 1 -8 1 2 -4 2
0 1 0 1 1 1 -1 2 -1

• The Laplacian
• example:

 Smooth image by Gaussian filter  S
 Apply Laplacian to S
– Used in mechanics, electromagnetics, wave theory, quantum
mechanics and Laplace equation
 Find zero crossings
– Scan along each row, record an edge point at the location of
zero-crossing.
– Repeat above step along each column

 Gaussian smoothing
 Find Laplacian

 Gaussian smoothing
 Find Laplacian


Gaussian
Standard
deviation
x
-3 -2 -1 0 1 2 3
.011 .13 .6 1 .6 .13 .011

g(x)

Gaussian

2-D Gaussian

2-D Gaussian

LoG Filter

Finding Zero Crossings

On the Separability of Gaussian

Seperability

Example
I I * 2 g  Zero crossings of 2 S

Example
 1
 3
 6

LoG Algorithm
 Apply LoG to the Image: Either

– Use 2D filter 2 g ( x, y)
– Or Use 4 1D filters g ( x), g xx ( x), g ( y), g yy ( y)
 Find zero-crossings from each row

 Find slope of zero-crossings
 Apply threshold to slope and mark edges

Interest Point Detectors
Sift Invariant Feature Transform (SIFT)

and
Histogram of Gradient (HOG)
SIFT - Key Point Extraction
 Stands for Scale Invariant Feature
Transform
 Patented by university of British
Columbia
Similar to the one used in primate visual
system (human, ape, monkey, etc.)
Transforms image data into scale-
invariant coordinates
D. Lowe. Distinctive image features from scale-invariant key points., International Journal
of Computer Vision 2004.
Goal
 Extract distinctive invariant features
 Correctly matched against a large of
database features from many images
 Invariance to image scale and rotation
 Robustness to
 Affine (rotation, scale, shear) distortion,
 Change in 3D viewpoint,
 Addition of noise,
 Change in illumination.
Advantages
Locality: features are local, so robust to
occlusion and clutter
Distinctiveness: individual features can be
matched to a large database of objects
Quantity: many features can be generated
for even small objects
 Efficiency: close to real-time performance
Invariant Local Features
Steps for Extracting Key
Points (SIF Points)
 Scale space peak selection
 Potential locations for finding features
 Key point localization
 Accurately locating the feature key points
 Orientation Assignment
 Assigning orientation to the key points
 Key point descriptor
 Describing the key point as a high
dimensional vector (128) (SIFT Descriptor)
Scales
 What should be sigma value for Canny
and LoG edge detection?
 If use multiple sigma values (scales),
how do you combine multiple edge
maps?
 Marr-Hildreth:
 Spatial Coincidence assumption:
Zerocrossings that coincide over scales

several are physically significant.
Scale Space
scale scale
Multiple smooth versions of a signal Zerocrossings at multiple scale

Scale Space
Scale Space Interval Tree

Scale Space (Witkin, IJCAI
1983)
 Apply whole spectrum of scales
 Plot zerocrossings vs scales in a scale-space
 Interpret scale space contours
 Contours are arches, open at the bottom, closed
at the top
 Interval tree
 Each interval corresponds to a node in a tree,
 whose parent node represents larger interval, from which interval
emerged, and
 whose off springs represent smaller intervals.
 Stability of a node is a scale range over which the
interval exits.
Scale Space
 Top level description
 Iteratively remove nodes from the tree,
 splicing out nodes that are less stable than
any of their parents and off springs
Scale Space
A top level description of

several signals using stability
criterion.
Laplacian-of-Gaussian (LoG)
• Interest points:

Local maxima in scale
space of Laplacian-of-
Gaussian 
Co
mp
Visual Object Recognition Tutorial
uti Lxx ( )  Lyy ( ) 

ng

Aug
me
nte
 List of
d  (x, y, σ)
Se
nso K. Grauman, B. Leibe
ry
an
Automatic scale selection
Intuition:
• Find scale that gives local maxima of some function
f in both position and scale.
f f
Image 1 Image 2
K. Grauman,
s1 region size s2 region size
Automatic Scale Selection
f (I i1im (x,  ))  f (I i1im (x,  ))
How to find corresponding patch

sizes?
K. Grauman, B. Leibe
• Function responses for increasing scale (scale signature)
f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
f (I i i (x,  ))
1 m
What Is A Useful Signature Function?
• Laplacian-of-Gaussian = “blob” detector
Co
mp
uti
ng
Aug
me
nte
d
Se
nso K. Grauman, B. Leibe
ry
Scale-space blob detector: Example
Source: Lana Lazebnik

Sigma = 2
Sigma = 2.5
Sigma = 3.1
Sigma = 3.9
Sigma = 6.1


Building a Scale Space
 All scales must be examined to identify scale-
invariant features
 An efficient function is to compute the Laplacian
Pyramid (Difference of Gaussian) (Burt & Adelson, 1983)
Approximation of LoG by
Difference of Gaussians
G
 2G Heat Equation

G G(x, y, k) G (x, y,  )
 G 
2

 k  
G(x, y, k )  G(x, y,  )  (k 1) 2 2G

Typical values :   1.6; k  2
Steps for Extracting Key
Points (SIF Points)
 Scale space peak selection
 Potential locations for finding features
 Key point localization
 Accurately locating the feature key points
 Orientation Assignment
 Assigning orientation to the key points
 Key point descriptor
 Describing the key point as a high
dimensional vector (128) (SIFT Descriptor)
1 2 2 2 2
G( x, y,k )  e ( x  y ) / 2 k 
2 ( k )2
k 2
1 2 2
( x  y ) / 2 k 
G( x, y,k ) 
2 2
e
2 ( k )2
k 4
k 3
k 2
k 4
k 3
k 2
k 2
k

  .707187.6; k  2
How many scales per
octave?
 A collection of 32 real images drawn from a
diverse range, including
 outdoor scenes, human faces, aerial photographs, and
industrial
 Each image was then subject to a range

of transformations:
 rotation, scaling, affine stretch, change in brightness and
 contrast, and addition of image
noise.
How many scales per
octave?
3 Scales
Initial value of sigma
1.6
Scale Space Peak Detection
 Compare a pixel (X) with 26
pixels in current and adjacent
scales (Green Circles)
 Select a pixel (X) if
larger/smaller than all 26 pixels

 Large number of extrema,
computationally expensive
 Detect the most stable subset
with a coarse sampling of scales
Key Point Localization
 Candidates are chosen from extrema
detection
original image extrema locations

Initial Outlier Rejection
1. Low contrast candidates
2. Poorly localized candidates along an edge
 Taylor series expansion of DOG, D.
 Minima or maxima is located at

 Value of D(x) at minima/maxima must be
large, |D(x)|>th.
Initial Outlier Rejection
from 832 key points to 729 key points, th=0.03.

Further Outlier Rejection
 DOG has strong response along
edge
 Assume DOG as a surface
 Compute principal curvatures (PC)
 Poorly defined peak will have very
low curvature along the edge, high

across the edge
 Use Trace and det similar to Harris corner detector
 Compute Hessian of D (principal curvature)
Tr(H)  Dxx  Dyy 1 2

Det(H )  D xx Dyy Dxy2  1 2
 Remove outliers by evaluating
Tr(H )2 (r  1)2
 1
Det(H) r r
2
Tr(H)2 (1  2 )2 (r2  2 )2 (r  1)2
  
Det(H ) 12 r22 r
 Following quantity is minimum (eigen values
are equal) when r=1

 It increases with r r 1
2
Tr(H) 2
(r 1) 2

Det(H ) r
 Eliminate key points if

Tr(H )2 (r 1)2 r  10

Det(H) r
from 729 key points to 536 key points.

Orientation Assignment
 To achieve rotation invariance
Compute central derivatives, gradient
magnitude and direction of L (smooth

image) at the scale of key point (x,y)
L is Laplacian of Gaussian (LoG)

 Create a weighted direction
histogram in a neighborhood of
a key point (36 bins)
 Weights are
 Gradient magnitudes
 Spatial gaussian filter with
=1.5 x <scale of key point>
 Select the peak as direction of the key point
Introduce additional key points at same
location
 if another peak is within 80% of max peak of
the histogram with different directions
Local Image Descriptors at
Key Points
 Possible descriptor
 Store intensity samples in the
neighborhood
Sensitive to lighting changes, 3D object
transformation
 Use of gradient orientation histograms
 Robust representation
Similarity to IT cortex
Complex neurons respond to a gradient at a
particular orientation.
Location of the feature can shift over a small
receptive field.
56
Tomaso Poggio, MIT
• Edelman, Intrator, and Poggio

(1997)
• The function of the cells allow
for matching and recognition of
3D objects from a range of view
points.
• Experiments show better

recognition accuracy for 3D
objects rotated in depth by up
to 20 degrees
Extraction of Local Image
Descriptors at Key Points
 Compute relative orientation and magnitude in a
16x16 neighborhood at key point
 Form weighted histogram (8 bin) for 4x4 regions
 Weight by magnitude and spatial Gaussian
 Concatenate 16 histograms in one long vector of 128 dimensions
 Example for 8x8 to 2x2 descriptors

Descriptor Regions (n by n)
Extraction of Local Image
Descriptors at Key Points
 Store numbers in a vector
 Normalize to unit vector (UN)
 Illumination invariance (affine changes)
 For non-linear intensity transforms
 Bound Unit Vector items to maximum 0.2
(remove gradients larger than 0.2)
 Renormalize to unit vector
Key point matching
 Match the key points against a database of
that obtained from training images.
Find the nearest neighbor i.e. a key point with
minimum Euclidean distance.
 Efficient Nearest Neighbor matching
 Looks at ratio of distance between best and 2nd best
match (.8)
61
Matching local features
Kristen Grauman
Matching local features
Image 1 Image 2
• To generate candidate matches, find that
have the most similar appearance or SIFTpatches
descriptor
• Simplest approach: compare them all, take the
closest
(or closest k, or within a thresholded distance)
Kristen Grauman
Query Image
1st NN
2nd NN
Query Image
3rd NN
4th NN
Distance to first match
if  .8 Goodmatch
Distance toseond match
Ambiguous matches
????
Image 1 Image 2
• At what distance do we have a good match?
• To add robustness to matching, can consider ratio : distance to
best match / distance to second best match
• If low, first match looks good.
• If high, could be ambiguous match.
Kristen Grauman
The ratio of distance
from the closest to the distance of the
second closest
SIFT Detector
 Generate Scale Space of an Image
 Detect Peaks in Scale Space (extrema)
 Localize Interest Points (Taylor Series)
 Remove outliers (remove response
along edges)
 Assign Orientation
69
SIFT Descriptor
 Compute relative orientation and magnitude in a
16x16 neighborhood at key point
 Form weighted histogram (8 bin) for 4x4 regions
 Weight by magnitude and spatial Gaussian
 Concatenate 16 histograms in one long vector of 128 dimensions
 Example for 8x8 to 2x2 descriptors

Reference
D. Lowe. Distinctive image features from scale-

invariant key points., International Journal of Computer
Vision 2004.
Histograms of Oriented Gradients for Human Detection
(HOG)
N. Dalal and B. Triggs
CVPR 2005
Cited by 8908
73
Dr. Edgar Seemann 74
HOG Steps
 HOG feature extraction
 Compute centered horizontal and vertical gradients with no smoothing
 Compute gradient orientation and magnitudes
For color image, pick the color channel with the highest gradient magnitude for each
pixel.
 For a 64x128 image,

 Divide the image into 16x16 blocks of 50% overlap.
 7x15=105 blocks in total
 Each block should consist of 2x2 cells with size 8x8.
 Quantize the gradient orientation into 9 bins
 The vote is the gradient magnitude
 Interpolate votes between neighboring bin center.
The vote can also be weighted with Gaussian to downweight the pixels near the edges of
the block.
 Concatenate histograms (Feature dimension: 105x4x9 = 3,780)
Computing Gradients
f (x h) f (x h)
 Centered: f '(x)  lim h0
2h
 Filter masks in x and y directions

-1
 Centered: -1 0 1 0
1
 Gradient
 Magnitude: s  sx2  s y2
θ
s
 Orientation:   arctan( y )
sx
76
Blocks, Cells
Block 2
Block 1
 16x16 blocks of 50% overlap.
 7x15=105 blocks in total
 Each block should consist of 2x2

cells with size 8x8.
Cells
(8 by 8)
Votes
 Each block consists of 2x2 cells with
size 8x8
 Quantize the gradient orientation into 9 9 Bins
bins (0-180)
 The vote is the gradient magnitude
Bin centers
 Interpolate votes linearly between neighboring bin

centers.
 Example: if θ=85 degrees.
 Distance to the bin center Bin 70 and Bin 90
are 15 and 5 degrees, respectively.
 Hence, ratios are 5/20=1/4, 15/20=3/4.
 The vote can also be weighted with Gaussian to

down weight the pixels near the edges of the block.
Final Feature Vector
 Concatenate histograms
 Make it a 1D vector of length 3780.
 Visualization
79
Results
Navneet Dalal and Bill Triggs “Histograms of Oriented Gradients for

Human Detection” CVPR05
SIFT Vs HOG
SIFT HOG
 128 dimensional vector  3,780 dimensional vector
 16 by 16 window  64 by 128 window
 4x4 sub-window (16 total)  16 by 16 blocks with
 8 bin histogram overlap
 Each block consists of 2 by
2 cells each of 8 by 8
 Overlapping
 9 bin histogram
To continue…

Lec4 SIFT HoG

Uploaded by

Copyright:

Available Formats

Lec4 SIFT HoG

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec4 SIFT HoG

Uploaded by

Copyright:

Available Formats

Interest Point Detectors

Sift Invariant Feature Transform (SIFT)

Can’t do Can nail it

M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003

Corners are indicated in red

 Based on boundary extraction

13-Feb-22 CS 574 Lecture 2 9

13-Feb-22 CS 574 Lecture 2 10

13-Feb-22 CS 574 Lecture 2 11

13-Feb-22 CS 574 Lecture 2 12

13-Feb-22 CS 574 Lecture 2 13

13-Feb-22 CS 574 Lecture 2 14

X direction Centre difference Y direction Centre difference

13-Feb-22 CS 574 Lecture 2 15

13-Feb-22 CS 574 Lecture 2 16

• Correspondence: matching points, patches,

Slide Credit: James Hays

Slide from Silvio Savarese

Slide Credit: James Hays

Detect points that are repeatable and distinctive

Detection of interest points

More Repeatable More Points

More Distinctive More Flexible

Image content is transformed into local feature coordinates that are

Slide Credit: James Hays

9300 Harris Corners Pkwy, Charlotte, NC

Slides from Rick Szeliski, Svetlana Lazebnik, and Kristin

2) Description :Extract feature

3) Matching: Determine x2  [ x1( 2) , , x(d2) ]

No chance to find true matches!

• Yet we have to be able to run the detection

13-Feb-22 CS 574 Lecture 2a 35

13-Feb-22 CS 574 Lecture 2a 36

13-Feb-22 CS 574 Lecture 2a 37

13-Feb-22 CS 574 Lecture 2a 38

1 pixel 3 pixels 7 pixels

13-Feb-22 CS 574 Lecture 2a 39

1 pixel 3 pixels 7 pixels

• Smoothed derivative removes noise, but blurs

13-Feb-22 CS 574 Lecture 2a 40

 Convolve image with derivative filters

13-Feb-22 CS 574 Lecture 2a 41

 Convolve image with derivative filters

Forward difference [1 -1]

13-Feb-22 CS 574 Lecture 2a 42

13-Feb-22 CS 574 Lecture 2a 43

13-Feb-22 CS 574 Lecture 2a 44

13-Feb-22 CS 574 Lecture 2a 45

13-Feb-22 CS 574 Lecture 2a 46

13-Feb-22 CS 574 Lecture 2a 47

13-Feb-22 CS 574 Lecture 2a 48

13-Feb-22 CS 574 Lecture 2a 49

Strongly affected by noise What is to be done?

13-Feb-22 CS 574 Lecture 2a 50

Zero mean additive gaussian noise

13-Feb-22 CS 574 Lecture 2a 51

 Expect pixels to “be like” their neighbors

13-Feb-22 CS 574 Lecture 2a 52

13-Feb-22 CS 574 Lecture 2a 53