Lecture7 DoG SIFT Cs131

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Lecture

 7:    
Finding  Features  (part  2/2)  

Professor  Fei-­‐Fei  Li  


Stanford  Vision  Lab  

Fei-Fei Li! Lecture 7 - !1   2-­‐Oct-­‐14  


What  we  will  learn  today?  
•  Local  invariant  features  
–  MoHvaHon  
–  Requirements,  invariances   Previous  lecture  (#6)  

•  Keypoint  localizaHon  
–  Harris  corner  detector    
•  Scale  invariant  region  selecHon  
–  AutomaHc  scale  selecHon  
–  Difference-­‐of-­‐Gaussian  (DoG)  detector  
•  SIFT:  an  image  region  descriptor  
Some  background  reading:  David  Lowe,  IJCV  2004  

Fei-Fei Li! Lecture 7 - !2   2-­‐Oct-­‐14  


A  quick  review  
•  Local  invariant  features  
–  MoHvaHon  
–  Requirements,  invariances  
•  Keypoint  localizaHon  
–  Harris  corner  detector    
•  Scale  invariant  region  selecHon  
–  AutomaHc  scale  selecHon  
–  Difference-­‐of-­‐Gaussian  (DoG)  detector  
•  SIFT:  an  image  region  descriptor  

Fei-Fei Li! Lecture 7 - !3   2-­‐Oct-­‐14  


Quick  review:  Harris  Corner  Detector  

Slide credit: Alyosha Efros


“flat” region: “edge”: “corner”:
no change in all no change along significant change
directions the edge direction in all directions

Fei-Fei Li! Lecture 7 - !4   2-­‐Oct-­‐14  


Quick  review:  Harris  Corner  Detector  
θ = det(M ) − α trace(M )2 = λ1λ2 − α (λ1 + λ2 )2
“Edge”  
  θ<0
“Corner”
θ > 0  
Slide credit: Kristen Grauman

•  Fast  approximaHon  
–  Avoid  compuHng  the  
eigenvalues  
–  α:  constant   “Flat”   “Edge”    
(0.04  to  0.06)   region   θ<0
λ1
Fei-Fei Li! Lecture 7 - !5   2-­‐Oct-­‐14  
Quick  review:  Harris  Corner  Detector  

Slide adapted from Darya Frolova, Denis Simakov


Fei-Fei Li! Lecture 7 - !6   2-­‐Oct-­‐14  
Quick  review:  Harris  Corner  Detector  

•  TranslaHon  invariance  
•  RotaHon  invariance  
•  Scale  invariance?  

Slide credit: Kristen Grauman


Corner All points will be
classified as edges!
Not invariant to image scale!

Fei-Fei Li! Lecture 7 - !7   2-­‐Oct-­‐14  


What  we  will  learn  today?  
•  Local  invariant  features  
–  MoHvaHon  
–  Requirements,  invariances  
•  Keypoint  localizaHon  
–  Harris  corner  detector    
•  Scale  invariant  region  selecHon  
–  AutomaHc  scale  selecHon  
–  Difference-­‐of-­‐Gaussian  (DoG)  detector  
•  SIFT:  an  image  region  descriptor  

Fei-Fei Li! Lecture 7 - !8   2-­‐Oct-­‐14  


Scale  Invariant  DetecHon  
•  Consider  regions  (e.g.  circles)  of  different  sizes  
around  a  point  
•  Regions  of  corresponding  sizes  will  look  the  same  
in  both  images  

Fei-Fei Li! Lecture 7 - !9   2-­‐Oct-­‐14  


Scale  Invariant  DetecHon  
•  The  problem:  how  do  we  choose  corresponding  
circles  independently  in  each  image?  

Fei-Fei Li! Lecture 7 - !10   2-­‐Oct-­‐14  


Scale  Invariant  DetecHon  
•  SoluHon:  
–  Design  a  funcHon  on  the  region  (circle),  which  is  “scale  
invariant”  (the  same  for  corresponding  regions,  even  if  
they  are  at  different  scales)  
  Example:  average  intensity.  For  corresponding  regions  
  (even  of  different  sizes)  it  will  be  the  same.  
–  For  a  point  in  one  image,  we  can  consider  it  as  a  
funcHon  of  region  size  (circle  radius)    
 
f Image 1 f Image 2
scale = 1/2

region size region size


Fei-Fei Li! Lecture 7 - !11   2-­‐Oct-­‐14  
Scale  Invariant  DetecHon  
•  Common  approach:  
Take  a  local  maximum  of  this  funcHon  

•  ObservaHon:  region  size,  for  which  the  maximum  is  


achieved,  should  be  invariant  to  image  scale.  

Important: this scale invariant region size is


found in each image independently!

f Image 1 f Image 2
scale = 1/2

s1 region size s2 region size


Fei-Fei Li! Lecture 7 - !12   2-­‐Oct-­‐14  
Scale  Invariant  DetecHon  
•  A  “good”  funcHon  for  scale  detecHon:  
       has  one  stable  sharp  peak  
f f f
Good
ba bad !
d
region size region size region size

•  For  usual  images:  a  good  funcHon  would  be  a  one  


which  responds  to  contrast  (sharp  local  intensity  
change)  

Fei-Fei Li! Lecture 7 - !13   2-­‐Oct-­‐14  


Scale  Invariant  DetecHon  
•  FuncHons  for  determining  scale   f = Kernel ∗ Image
Kernels:

L = σ 2 (Gxx ( x, y, σ ) + Gyy ( x, y, σ ) )
(Laplacian)

DoG = G( x, y, kσ ) − G( x, y, σ )
(Difference of Gaussians)

where Gaussian

x2 + y 2
− Note: both kernels are invariant
2σ 2
G ( x, y , σ ) = 1
2πσ
e to scale and rotation

Fei-Fei Li! Lecture 7 - !14   2-­‐Oct-­‐14  


det M = λ1λ2
trace M = λ1 + λ2
trace det

scale scale
From  Lindeberg  1998  

blob  detecHon;  Marr  1982;  Voorhees  and  Poggio  1987;  Blostein  and  Ahuja  1989;  …  

Fei-Fei Li! Lecture 7 - !15   2-­‐Oct-­‐14  


Scale  Invariant  Detectors  
scale
•  Harris-­‐Laplacian1  

← Laplacian →
Find  local  maximum  of:  
–  Harris  corner  detector  in  
y
space  (image  coordinates)  
–  Laplacian  in  scale   ← Harris → x
•  SIFT  (Lowe)2   scale
Find  local  maximum  of:  
–  Difference  of  Gaussians  in  space  

← DoG →
and  scale  
y

← DoG → x
1  K.Mikolajczyk,  C.Schmid.  “Indexing  Based  on  Scale  Invariant  Interest  Points”.  ICCV  2001  
2  D.Lowe.  “DisHncHve  Image  Features  from  Scale-­‐Invariant  Keypoints”.    IJCV  2004  

Fei-Fei Li! Lecture 7 - !16   2-­‐Oct-­‐14  


Scale  Invariant  Detectors  
•  Experimental  evaluaHon  of  detectors    
w.r.t.  scale  change  

Repeatability rate:
# correspondences
# possible correspondences

K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001

Fei-Fei Li! Lecture 7 - !17   2-­‐Oct-­‐14  


Scale  Invariant  DetecHon:  
Summary  
•  Given:  two  images  of  the  same  scene  with  a  large  
scale  difference  between  them  
•  Goal:  find  the  same  interest  points  independently  
in  each  image  
•  SoluHon:  search  for  maxima  of  suitable  funcHons  
in  scale  and  in  space  (over  the  image)  
Methods:
1.  Harris-Laplacian [Mikolajczyk, Schmid]: maximize Laplacian over
scale, Harris’ measure of corner response over the image
2.  SIFT [Lowe]: maximize Difference of Gaussians over scale and
space

Fei-Fei Li! Lecture 7 - !18   2-­‐Oct-­‐14  


What  we  will  learn  today?  
•  Local  invariant  features  
–  MoHvaHon  
–  Requirements,  invariances  
•  Keypoint  localizaHon  
–  Harris  corner  detector    
•  Scale  invariant  region  selecHon  
–  AutomaHc  scale  selecHon  
–  Difference-­‐of-­‐Gaussian  (DoG)  detector  
•  SIFT:  an  image  region  descriptor  

Fei-Fei Li! Lecture 7 - !19   2-­‐Oct-­‐14  


Local  Descriptors  
•  We  know  how  to  detect  points  
•  Next  quesHon:  
How  to  describe  them  for  matching?  

Slide credit: Kristen Grauman


Point descriptor should be:
1.  Invariant
2.  Distinctive
Fei-Fei Li! Lecture 7 - !20   2-­‐Oct-­‐14  
CVPR  2003  Tutorial  
 
Recogni5on  and  Matching  Based  
on  Local  Invariant  Features    
David  Lowe    
Computer  Science  Department  
University  of  BriHsh  Columbia  

Fei-Fei Li! Lecture 7 - !21   2-­‐Oct-­‐14  


Invariant  Local  Features  
•  Image  content  is  transformed  into  local  feature  
coordinates  that  are  invariant  to  translaHon,  
rotaHon,  scale,  and  other  imaging  parameters  

22  
Lecture 7 - ! 2-­‐Oct-­‐14  
Advantages  of  invariant  local  features  
•  Locality:  features  are  local,  so  robust  to  
occlusion  and  cluoer  (no  prior  segmentaHon)  
•  Dis5nc5veness:  individual  features  can  be  
matched  to  a  large  database  of  objects  
•  Quan5ty:  many  features  can  be  generated  for  
even  small  objects  
•  Efficiency:  close  to  real-­‐Hme  performance  
•  Extensibility:  can  easily  be  extended  to  wide  
range  of  differing  feature  types,  with  each  
adding  robustness  

Fei-Fei Li! Lecture 7 - !23   2-­‐Oct-­‐14  


Scale  invariance  
Requires  a  method  to  repeatably  select  points  in  loca5on  
and  scale:  
•  The  only  reasonable  scale-­‐space  kernel  is  a  Gaussian  
(Koenderink,  1984;  Lindeberg,  1994)  
•  An  efficient  choice  is  to  detect  peaks  in  the  difference  of  
Gaussian  pyramid  (Burt  &  Adelson,  1983;  Crowley  &  
Resample

Blur

Subtract

Parker,  1984  –  but  examining  more  scales)  


•  Difference-­‐of-­‐Gaussian  with  constant  raHo  of  scales  is  a  
close  approximaHon  to  Lindeberg’s  scale-­‐normalized  
Laplacian  (can  be  shown  from  the  heat  diffusion  
equaHon)  

Lecture 7 - ! 2-­‐Oct-­‐14  
Becoming  rotaHon  invariant  
•  We  are  given  a  keypoint  and  its  scale  
from  DoG  
•  We  will  select  a  characterisHc  
orientaHon  for  the  keypoint  (based  on  
the  most  prominent  gradient  there;  
discussed  next  slide)  
•  We  will  describe  all  features  rela5ve  
to  this  orientaHon  
•  Causes  features  to  be  rotaHon  
invariant!  
–  If  the  keypoint  appears  rotated  in  
another  image,  the  features  will  be  the  
same,  because  they’re  rela5ve  to  the  
characterisHc  orientaHon  
0 2π

Fei-Fei Li! Lecture 7 - !25   2-­‐Oct-­‐14  


Becoming  rotaHon  invariant  
•  Choosing  characterisHc  orientaHon:  
•  Use  the  blurred  image  associated  with  
the  keypoint’s  scale.  Look  at  pixels  in  a  
square  around  it  (say,  size  16x16)  
•  Compute  gradient  direcHon  at  each  
pixel  (this  is  easy,  using  verHcal  and  
horizontal  edge  filters)  
•  Create  a  histogram  of  these  local  
gradient  direcHons  
•  Keypoint  orientaHon  =  the  peak  of  that  
histogram  
•  Minor  details:  we’ll  also  weight  each  
pixel’s  histogram  contribuHon  by  the  
magnitude  of  its  gradient  and  how  close  
it  is  to  the  keypoint  
•  Now,  each  keypoint  has  stable  2D  
coordinates  (x,  y,  scale,  orientaAon).  
Now  we  must  give  it  a  “fingerprint.”  
0 2π

Fei-Fei Li! Lecture 7 - !26   2-­‐Oct-­‐14  


Example  of  keypoint  detecHon  
Threshold  on  value  at  DOG  peak  and  on  raHo  of  principle  curvatures  
(Harris  approach)  
(a)  233x189  image  
(b)  832  DOG  extrema  
(c)  729  leu  auer  peak  
           value  threshold  
(d)  536  leu  auer  tesHng  
           raHo  of  principle  
           curvatures  

Fei-Fei Li! Lecture 7 - !27   2-­‐Oct-­‐14  


Repeatability  vs  number  of  scales  sampled  per  octave  

David  G.  Lowe,  "DisHncHve  image  features  from  scale-­‐invariant  keypoints,"  InternaHonal  Journal  of  
Computer  Vision,  60,  2  (2004),  pp.  91-­‐110  

Fei-Fei Li! Lecture 7 - !28   2-­‐Oct-­‐14  


SIFT  descriptor  formaHon  

•  Use  the  blurred  image  associated  with  the  keypoint’s  scale  


•  Take  image  gradients  over  a  16x16  array  of  locaHons.  
•  To  become  rotaHon  invariant,  rotate  the  gradient  direcHons  AND  locaHons  
by  (-­‐keypoint  orientaHon)  
–  Now  we’ve  cancelled  out  rotaHon  and  have  gradients  expressed  at  locaHons  rela5ve  
to  keypoint  orientaHon  θ  
–  We  could  also  have  just  rotated  the  whole  image  by  -­‐θ,  but  that  would  be  slower.  

Fei-Fei Li! Lecture 7 - !29   2-­‐Oct-­‐14  


SIFT  descriptor  formaHon  

•  Using  precise  gradient  locaHons  is  fragile.  We’d  like  to  allow  some  “slop”  in  
the  image,  and  sHll  produce  a  very  similar  descriptor  
•  Create  array  of  orientaHon  histograms  (a  4x4  array  is  shown)  
•  Put  the  rotated  gradients  into  their  local  orientaHon  histograms  
–  A  gradients’s  contribuHon  is  divided  among  the  nearby  histograms  based  on  distance.  If  
it’s  halfway  between  two  histogram  locaHons,  it  gives  a  half  contribuHon  to  both.  
–  Also,  scale  down  gradient  contribuHons  for  gradients  far  from  the  center  
•  The  SIFT  authors  found  that  best  results  were  with  8  orientaHon  bins  per  
histogram,  and  a  4x4  histogram  array.  

Fei-Fei Li! Lecture 7 - !30   2-­‐Oct-­‐14  


SIFT  descriptor  formaHon  

•  8  orientaHon  bins  per  histogram,  and  a  4x4  histogram  array,  


yields  8  x  4x4  =  128  numbers.  
•  So  a  SIFT  descriptor  is  a  length  128  vector,  which  is  invariant  to  
rotaHon  (because  we  rotated  the  descriptor)  and  scale  
(because  we  worked  with  the  scaled  image  from  DoG)  
•  We  can  compare  each  vector  from  image  A  to  each  vector  
from  image  B  to  find  matching  keypoints!  
–  Euclidean  “distance”  between  descriptor  vectors  gives  a  good  measure  
of  keypoint  similarity  

Fei-Fei Li! Lecture 7 - !31   2-­‐Oct-­‐14  


SIFT  descriptor  formaHon  

•  Adding  robustness  to  illuminaHon  changes:  


•  Remember  that  the  descriptor  is  made  of  gradients  (differences  
between  pixels),  so  it’s  already  invariant  to  changes  in  brightness  
(e.g.  adding  10  to  all  image  pixels  yields  the  exact  same  descriptor)  
•  A  higher-­‐contrast  photo  will  increase  the  magnitude  of  gradients  
linearly.  So,  to  correct  for  contrast  changes,  normalize  the  vector  
(scale  to  length  1.0)  
•  Very  large  image  gradients  are  usually  from  unreliable  3D  
illuminaHon  effects  (glare,  etc).  So,  to  reduce  their  effect,  clamp  all  
values  in  the  vector  to  be  ≤  0.2  (an  experimentally  tuned  value).  
Then  normalize  the  vector  again.  
•  Result  is  a  vector  which  is  fairly  invariant  to  illuminaHon  changes.  

Fei-Fei Li! Lecture 7 - !32   2-­‐Oct-­‐14  


SensiHvity  to  number  of  histogram  orientaHons  

David  G.  Lowe,  "DisHncHve  image  features  from  scale-­‐invariant  keypoints,"  InternaHonal  Journal  of  
Computer  Vision,  60,  2  (2004),  pp.  91-­‐110  

Fei-Fei Li! Lecture 7 - !33   2-­‐Oct-­‐14  


Feature  stability  to  noise  
•  Match  features  auer  random  change  in  image  scale  &  
orientaHon,  with  differing  levels  of  image  noise  
•  Find  nearest  neighbor  in  database  of  30,000  features  

Fei-Fei Li! Lecture 7 - !34   2-­‐Oct-­‐14  


Feature  stability  to  affine  change  
•  Match  features  auer  random  change  in  image  scale  &  
orientaHon,  with  2%  image  noise,  and  affine  distorHon  
•  Find  nearest  neighbor  in  database  of  30,000  features  

Fei-Fei Li! Lecture 7 - !35   2-­‐Oct-­‐14  


DisHncHveness  of  features  
•  Vary  size  of  database  of  features,  with  30  degree  affine  change,  
2%  image  noise  
•  Measure  %  correct  for  single  nearest  neighbor  match  

Fei-Fei Li! Lecture 7 - !36   2-­‐Oct-­‐14  


RaHo  of  distances  reliable  for  matching  

Fei-Fei Li! Lecture 7 - !37   2-­‐Oct-­‐14  


Fei-Fei Li! Lecture 7 - !38   2-­‐Oct-­‐14  
Fei-Fei Li! Lecture 7 - !39   2-­‐Oct-­‐14  
Nice  SIFT  resources  
•  VLFeat  toolbox:  
–  hop://www.vlfeat.org/overview/siu.html  
•  an  online  tutorial:
hop://www.aishack.in/2010/05/siu-­‐scale-­‐
invariant-­‐feature-­‐transform/  
•  Wikipedia:
hop://en.wikipedia.org/wiki/Scale-­‐
invariant_feature_transform  

Fei-Fei Li! Lecture 7 - !40   2-­‐Oct-­‐14  


ApplicaHons  of  local  invariant  
features  
•  Wide  baseline  stereo  
•  MoHon  tracking  
•  Panoramas  
•  Mobile  robot  navigaHon  
•  3D  reconstrucHon  
•  RecogniHon  
•  …  

Fei-Fei Li! Lecture 7 - !


AutomaHc  mosaicing  

hop://www.cs.ubc.ca/~mbrown/autosHtch/autosHtch.html  
Fei-Fei Li! Lecture 7 - !
Wide  baseline  stereo  

[Image  from  T.  Tuytelaars  ECCV  2006  tutorial]  

Fei-Fei Li! Lecture 7 - !


RecogniHon  of  specific  objects,  scenes  

Schmid and Mohr 1997 Sivic and Zisserman, 2003

Rothganger et al. 2003 Lowe 2002

Fei-Fei Li! Lecture 7 - !


What  we  have  learned  this  week?  
•  Local  invariant  features  
–  MoHvaHon  
–  Requirements,  invariances   Previous  lecture  (#6)  

•  Keypoint  localizaHon  
–  Harris  corner  detector    
•  Scale  invariant  region  selecHon  
–  AutomaHc  scale  selecHon  
today  (#7)  
–  Difference-­‐of-­‐Gaussian  (DoG)  detector  
•  SIFT:  an  image  region  descriptor  
Some  background  reading:  R.  Szeliski,  Ch  14.1.1;  David  Lowe,  IJCV  2004  

Fei-Fei Li! Lecture 7 - !45   2-­‐Oct-­‐14  

You might also like