Assignment 4.1: Reading Lowe's SIFT Paper: Leo Dorst & Rein Van Den Boomgaard April 18, 2020
Assignment 4.1: Reading Lowe's SIFT Paper: Leo Dorst & Rein Van Den Boomgaard April 18, 2020
Assignment 4.1: Reading Lowe's SIFT Paper: Leo Dorst & Rein Van Den Boomgaard April 18, 2020
The first half of this week’s assignment is to read and understand a famous scientific paper,
about the Scale Invariant Feature Transform, by Lowe, 2004. It is a classic and has changed the
field: up to 2020 it has already had 56248 citations (and counting).
Reading scientific papers is an important skill, and one of the learning goals of your bachelor.
We help you by providing a small general guide, and more specific hints on reading the Lowe paper.
Leo’s online lectures (see the Modules on Canvas) help you with the first pass. He schedules reading
breaks —em that you really should use!. You’ll have to read the paper sooner or later anyway,
and using the reading breaks (instead of jumping forward to the hints he gives) will help you find
out where the issues are for you, to return to in the second pass.
We have phrased small subproblems as part of the reading guide. Work on those to increase
your understanding, some of those will come up as theory questions in part 4.2 of the Assignment,
where you can put in your answers. (Some others may turn up in the final exam...) The remainder
of that assignment 4.2 is a programming exercise, where you will employ OpenCV functionality
to use the SIFT for stitching images together into a smooth mosaic.
Yes, it is a different assignment than usual; you really should do most of your understanding
before starting the programming. That is why we will hold off putting the programming part
online. We try to be helpful, in a patronizing way.
1.2 Pass 1
Then, read the paper through once, trying to understand it, but don’t spend too much effort on
things you don’t get immediately. Often these will become clear later on. There may be various
reasons for this:
• The paper may be badly written, using things before they are properly introduced, or using
inconvenient symbols or concepts. Also, the main line and the side tracks are not always
clearly separated.
• The paper may use strange abstractions for very concrete things, without you (or the author)
actually needing them at the abstract level. This is sometimes done to impress.
• There may be excursions that are interesting to specialists, but not required for understand-
ing the paper at the level you need it.
• The author may assume common knowledge you don’t have (yet).
1
• The paper may be the wrong paper to read for what you want to know; but when you
see the author refer to other material in context you may be able to home in on the right
information anyway.
The first pass of reading will help you understand the structure and identify the various difficulties
you will have in understanding the paper, sometimes even already resolving them.
1.3 Pass 2
When you have decided that this is really the paper you want to read, and where the interesting
bits are for you, you read it again. In this second pass, you should really make an effort to
understand everything relevant (you will have identified the relevant parts in pass one). Follow
the reasoning in detail, fill in the blanks using the references, some background material (including
internet search for mathematical terms you may not know). This is not a linear pass, you have
to deconstruct the paper! This pass should be very rewarding, you are increasing your knowledge
considerably in doing this. (Always scout the references for useful literature you may not yet
know!) Make notes in the margin, or attach your notes to the paper. You don’t want this effort
to get lost, so that you might have to do it again... At the end of this pass, you should know the
relevant things well enough to be able to explain them to others.
1.4 Pass 3
After you have read and understood it and have thought about it for a few days or weeks, go
back a third time. You will have the satisfaction of seeing it all fall into place, and often some of
the obscure or more abstract bits are now suddenly clear, deepening your understanding of the
subject and the field. Summarize the salient points in 5 or so bullets on the title page, so that
when you see the paper again you will remember what you liked about it (or didn’t).
In brief, personalize the paper so that it becomes part of your ever-expanding knowledge.
Abstract
• The abstract should be clear enough - can you already relate this to your mosaicing problem?
1. Introduction
• An introduction should contain motivation and an overview of the method. And it does!
Understand what each paragraph is saying about method or use.
2. Related research
• Research should progress, so it is important that the author shows the reader what was done
before with more or less the same goals, and how (according to the author) this did not
resolve the problem fully; or how earlier work contained some good ideas that the author
will use. If you are in the field, you will recognize the references; if not, the problem the
author actually addresses should become clearer as he uses the difference with the references
to define his approach. You do not need to look up all these references; but if you are in
the field, you may find some unexpected things you did not know about, and which may be
relevant to you.
2
• Highlight a few (one or two) terms per paragraph as a means to make yourself aware of its
purpose.
• If you did not look it up before, ∇2 is the Laplacian, in 2D images it can be computed as
∂2 ∂2
the trace of the Hessian, i.e., ∂x 2 + ∂y 2 .
• Your first pass of reading should tell you that the order of this section is messy. He want
to use his own method (extrema of difference of Gaussians) because it is efficient. Oh, and
actually it is a good approximation the Laplacian (look up what that is!). That is of course
the really fundamental relationship to local structure. After having given (1), he derives it
(he could/should have done that first), but does not quite return to ‘his’ D. Close the gap
yourself!
• What is an octave? Why does he ‘need s + 3 images in the stack of blurred images for each
octave’ ?
3
• Equation (2) is familiar, equation (3) you should be able to derive (in some years, it is even
part of an exam!), and the consequence (4) you should derive for yourself.
• He needs to do all this on the Laplacian or D image. And then, on page 11, he is doing
the Hessian and derivative of D by local differences. Is that consistent? Do you realize that
these are third and fourth derivatives of scale space image data?!
• Make sure that you understand the equation on page 11: compared to (2) on page 10, the
1/2 looks like a typo. Is it? (You may want to consult the text gradient.pdf in the Canvas
Module.)
• Count the number of floating point operations - do you get about 20? If not, what does he
also include in his calculation?
5. Orientation assignment
• The features are not going to orientation invariant, but the method will be. Resolve that
paradox in your mind!
• In the formula, he does not use the atan2 function. I think he should (and probably does).
Why? (The lecture also touches upon this point.)
• He computes the gradients - where in his total set of computations is that done, do you
think - everywhere or only at the peaks? And is this still the gradient of D (as above) or of
something else?
• ‘Finally a parabola is fit’, make sure you understand to what. This is in fact a 1D version
of the localization of extrema we saw before.
• Again he puts in some experiments for this step. We can feel him developing and testing his
modules step by step. Educational it is.
4
• What does he mean by ‘affine changes in illumination’ ? Gather this from the surrounding
text!
• That bit about 0.2: yuck, how ad hoc!
• Look up RANSAC on wikipedia and contemplate why it would fulfill our needs here.
8 Recognition examples
9 Conclusions
• You read these in your first pass. Do you agree with his conclusions? Are there some that
are missing?