Relating Egomotion and Image Evolution: Present Address: NEC Research Institute, 4 Independence Way, Princeton, NJ 08540
Relating Egomotion and Image Evolution: Present Address: NEC Research Institute, 4 Independence Way, Princeton, NJ 08540
Relating Egomotion and Image Evolution: Present Address: NEC Research Institute, 4 Independence Way, Princeton, NJ 08540
Barak A. Pearlmutter
Leonid Gurvits
March 1995
Report Number SCR-95-TR-536
Abstract By considering the dynamics of the apparent motion of a stationary object relative to a
moving observer, we construct a partial differential equation that relates the changes
in an image to the motion of the observer. These come in two varieties: a first order
system that describes the coevolution of the egocentric radial distances to objects
and the visual scene, and a second order system that does not involve any distances
or other geometry. The later equation leads, via the calculus of variations, to a novel
technique for recovering egomotion from image sequences, a so-called visual yaw
detector, which is tested on real data. For expository purposes the derivation is
carried out in two dimensions, but the approach extends immediately to three.
Using a special camera mounted on the roof of a motorcar which gives a narrow
360 degree strip along the horizon, we are interested in recovering the angular and
forward velocity of the vehicle. We sometimes know the velocity from other sensory
modalities, such as a sensor attached to the speedometer or odometer. In that case,
we need recover only the angular velocity. To this end, we have developed a theory
of the evolution of such images. Because of the particular domain of interest, we
are primarily interested in the two-dimensional world described above. However, the
techniques we develop extend immediately to three dimensions.
For a full description of the visual sensor based navigation system, see Gorr et al.
(1995), Novak et al. (1995), Novak and Hancock (1995), Lin and Judd (1995) or
pending patents (Siemens Docket No.s 93E7601, 94E7541, 94E7617, 94E7618
US.)
2. Point movement
u . y
θ
τ
v
If the fixed point is x ahead of the observer and y to the side, then by definition
tan t =y=x. By the linearity of differentiation dx=dt = u̇ y ? v and dy =dt = ?u̇ x ? u.
Taking the derivative with respect to time of y cos t = x sin t gives
v sin t ? u cos t = (ṫ + u̇ )(x cos t + y sin t ) (1)
and taking the derivative with respect to time again gets rid of x and y to yield
1
In the special case of no lateral motion u = 0 so this simplifies to
3. Isointensity contours
We would like to characterize the contour lines of a function f (t, t ), so that we can
check if they satisfy some given differential equation.
f(t, τ)
(t’, τ’)
t
τ= τ(t )
Consider an arbitrary point (t0 , t 0 ). For notational convenience, let t (t) be the contour
line that runs through (t0 , t 0), so for instance t 0 = t (t0 ). Since t (t) is a contour line, by
definition f (t, t (t)) is constant. Using subscripts to denote partial derivatives, dt d
f (t, t (t)) =
ft + ft ṫ (t) = 0 so
ṫ = ?ft =ft (5)
is the slope of the contour line through an arbitrary point (t, t ).
d2
f (t, t (t)) = ftt + ṫ ftt + ft ẗ + (ftt + ftt ṫ )ṫ = 0
dt2
so
ẗ = ?ftt =ft + 2ft ftt =ft2 ? ft2 ftt =ft3 (6)
gives the second derivative of the contour line.
3.1. Image evolution Let f (t, t ) be the image intensity measured at time t angle t . Assuming Lambertian
reflection and no occlusion, the contour lines of f are the angular paths taken by
particular visible points in the world. Hence the contour line going through each point
2
(t, t ) must satisfy equation 3. We therefore substitute in the first two derivatives of the
contour line. This results in a condition on the partial derivatives of f ,
ü ft3 ? (ftt ft2 ? 2ft ftt ft + ft2 ftt ) = (u̇ ft ? ft )ft (ft v̇=v + (u̇ ft ? 2ft ) cot t) (7)
This equation will be obeyed wherever the assumptions made above are valid.
4. Distances
4.1. Distance from local The distances to the observed object in direction t at time t has surprisingly simple
gradients form. Substituting R = x cos t + y sin t into equation 1 and solving for R gives
v sin t ? u cos t
R= (8)
u̇ + ṫ
which can be put in terms of image intensity
v sin t ? u cos t
R = ft (9)
u̇ ft ? ft
For image intensity g = f , and under the lighting and reflectance assumptions made
above ḟ = 0. Hence
ft = ((u cos t ? v sin t )=R + u̇ )ft
Substituting r = 1=R into this last pair of equations with Rt = ?rt =r 2 yields
ft = ((u cos t ? v sin t )r + u̇ )ft (11)
rt = ((u cos t ? v sin t )r + u̇ )rt + (v cos t + u sin t )r 2
(12)
3
5. Visual yaw detector
5.1. Moving observer Knowing f (t, t ), v = const, and u = 0, we would like to estimate u̇ . We might expect
the measured f to be rather noisy. Substituting the known values into equation 7 and
considering some fixed particular value of t gives a Riccati equation,
d d
e(u̇ , u̇ , t; t ) = A(t; t ) u̇ + B(t; t )u̇ 2 + C(t; t )u̇ + D(t; t ) = 0. (13)
dt dt
where
where1
F(u̇ , ü , t) = I L(e(u̇ , ü , t; t )) ,
L(j ) = j 2 is a loss function2 and we define I fh(t )g = 0 h(t ) dt . Note that I fg is
R 2p
linear. Also note that I f(=t )h(t )g = (=t )I fh(t )g = 0, so for instance I fft g = 0
and I ffft g = I (=t )f 2 =2 = 0. The extremal u̇ (t) can be found using the calculus
of variations. E is of the form required by Euler’s equation, which states that the
functional E is extremal when u̇ satisfies
F d F
=
u̇ dt ü
Evaluating and simplifying this gives a second-order ordinary differential equation
whose coefficients are easy to calculate numerically from visual data,
d2 u̇ + I 2ȦA d u̇ ? I 2B2 u̇ 3 + I ȦB + AḂ ? 3BC u̇ 2
I A2 (15)
dt2
dt
+I ȦC + AĊ ? 2BD ? C2 u̇ + I ȦD + AḊ ? CD
= 0
1 It R
would be reasonable to add a regularizer (u̇ ) to F, to encourage u̇ to be smooth.
2 Inthe presence of occlusion and other local violations of equation 7 we would want to use a more
E
robust estimator than simple squared error. Unfortunately this tends to make minimizing more difficult.
An ordinary differential equation still results if L is a polynomial, but higher degrees in L would make for
higher powers of u̇ and ü in equation 15.
4
angular velocity
1
(deg/m x3)
0
−1
0 1 distance (km) 2 3
Figure 1 Equation 17 applied to a database of visual strips taken from around the horizon during a 3 km trek on
a rural route, sampled every meter. Estimated u̇ is shown as a function of distance. No smoothing has
been done at any point in the processing, except that the strips were downsampled to 120 bins around
the horizon.
2
gain change
(dB/m)
−2
0 1 distance (km) 2 3
Figure 2 Equation 19 applied to a database of visual strips taken from around the horizon during a 3 km trek on
a rural route, sampled every meter. Estimated log gain change is shown as a function of distance. No
smoothing has been done at any point in the processing, except that the strips were downsampled to 120
bins around the horizon. Note the spikes, which correspond to glitches in the image acquisition process.
where Ẋ = (d=dt)X.
An online ode solver, in which only the most recent few values of u̇ are subject to
change, can be used in settings that require running estimates of u̇ .
5.2. Stationary rotating Equation 15 inherits an implicit assumption that v Þ 0, i.e. that the observer has
observer non-negligible proper motion, from equation 7. Under some circumstances we might
wish to make the opposite assumption. The analog of equation 7 when u = v = 0 is
simply
ft = u̇ ft (16)
Applying the calculus of variations as above with F = I (u̇ ft ? ft )2 gives3 the trivial
differential equation
R
I fft ft g
2p
R
ft ft d t
u̇ = = 0
I fft g
2 2p
(17)
f 2 dt 0 t
t t t t t t
the same u̇ . But with noise, the results will be different. How can we choose the best one? Assuming the
noise in the measured f is independent zero mean Gaussian with constant variance, we should choose a
quantity to integrate over both t and t whose variance is the same everywhere. Within that constraint, we
should choose the one with the lowest signal to noise ratio, in other words, the lowest ratio of standard
deviation to mean. This is the F used in the text. A similar analysis must be applied to equation 14.
5
If the camera adds a constant extra signal to the entire visual field, as in a fluctuating
background noise process, we can call this background noise n2 (t) and model it as
ft = u̇ ft + n2 . A more common type of noise is present in cameras with automatic
gain control, which fluctuates due to factors outside the portion of the visual field
being used here. Under these conditions, the change to a pixel’s value attributable
to camera gain change is proportional to that pixel’s value. If the derivative of the log
gain is n1 (t) then process can be modeled as ft = u̇ ft + n1 f. Combining these two noise
processes yields
ft = u̇ ft + n1 f + n2
Using F = I (u̇ ft ? ft + n1 f + n2 )2 gives
(18)
0 u̇ 1 80 1 0 1T 9?1 8 0 19
>
< ft ft >
= < ft =
@ n1 A = I
>@ f A @ f A > I ft @ f A
: 1 ; (19)
n2 : 1 1 ;
0 I f 2 1?1 8 0 f 19
= @ t I ff g A I f @ ft A=
<
I f2
I ff g
:t
I f1g
; 1
which shows that the influences of these noise sources and the influence of rotation
are orthogonal, and therefore the recovered u̇ should not be sensitive to these two
particular kinds of noise. However, the recovered gain changes n1 (t) appear to detect
camera frame acquisition errors quite well, as shown in figure 2.
6. Discussion
The above presentation is for an observer in flatland. This is the character of the
application for which these techniques were developed, but the approach and equa-
tions extend to a full three dimensional setting. The only complication is the tedious
algebra which arises from the coordinate systems that must be used on the surface
of a sphere.
6.1. Scale In many of the equations above, the velocity appears as v̇=v = (d=dt) log v. It is
clear from the physics of the situation that this must be so, for even if everything
but distances are known, the velocity can be determined uniquely only up to a scale
factor. In other words, if v(t) is a solution to the equation, then so must be a v(t) for any
a > 0, because (d=dt) log(a v) = (d=dt) log v. This is of course common in computer
vision.
6.2. Other approaches There are many approaches to recovering egomotion or distances from image se-
quences.
We have taken a gradient-based approach, similar to that of Horn and Weldon (1988),
Negahdaripour and Horn (1987), in contrast to the alternative correlation based ap-
proaches (Hassenstein and Reichardt, 1956; Poggio and Reichardt, 1973). Gradi-
ents are more susceptible to noise than correlations, and present difficulties when the
6
spatial and temporal discretization imposed by modern digital computer technology
makes for image movements greater than a few pixels per time step (Hildreth and
Koch, 1987). This latter problem can be ameliorated by spatially downsampling the
image, or, since often the ideal sampling rate varies across a single image, by adaptive
multiscale techniques. On the other hand, there are theoretical reasons to believe
that gradient methods perform better than correlation methods in the high signal to
noise regime (Potters and Bialek, 1994).
The approach taken here is a direct one, since image gradients are used directly, rather
than being used to calculate the optical flow (Horn and Schunck, 1981; Poggio, Yang,
and Torre, 1989; Verri and Poggio, 1989) which in turn constitutes the input to an
egomotion module (Uras et al., 1989).
6.3. Noise model The equations shown in the text are optimized for a small amount of additive gaussian
noise, and fall into the MLE framework, since they are derived from a least squared
formulation and incorporate no regularizer or prior on the egomotion. Unfortunately
that is not the sort of noise actually encountered in practice. Instead, there are a few
sources of noise, of which, at least for our applications, Gaussian camera noise is
the least significant. More severe are lack of stationarity of the world, such as other
moving vehicles; occlusion and revelation; specularities and reflections; and camera
blooming from the sun and reflections thereof.
Also, an additive gaussian noise model is a poor model of the world. In essence, it
assumes that an object’s intensity and distance follow a random walk. Instead, it is
our observations which are corruptions of an underlying unchanging stationary object.
Some of these problems could be dealt with in part by complicating the model, for
instance including occlusion processes which flow around the visual field according
to the same equation by which objects do. These could be created by the broken
spring models so popular computer vision, which of course correspond to robust
estimators, which themselves might improve performance (Poggio, Torre, and Koch,
1985; Koch, 1988).
Another route for improvement would be to attempt to maintain a world model via an
occupancy grid (Moravec, 1988). This would have the added benefit that it could be
matched against a database, hopefully making the system more robust to seasonal
variation in the visual appearance of objects. Unfortunately the computational burden
might exceed the capacities of our target platform.
6.4. Color Lastly, the current system does not make use of color information. Since color can be
more stable to view angle than intensity, it would make sense to incorporate color in
a non-trivial fashion into a system capable of detecting long range motion. However
here we use a gradient method, which is will operate properly only for small changes
to the visual field, on the order of a pixel or less. It is interesting to note that in primates
short range motion makes little use of color, while long range motion is quite sensitive
to color boundaries (Ramachandran and Gregory, 1978).
7
References
8
Tomasi, C. and Kanade, T. (1992). Shape and Motion from Image Streams under
Orthography. International Journal of Computer Vision, 9(2), 137–154.
Uras, S., Girosi, F., Verri, A., and Torre, V. (1989). A computational approach to motion
perception. Biological Cybernetics, 60, 79–87.
Verri, A. and Poggio, T. (1989). Motion Field and Optical Flow: Qualitative Properties.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 11, 490–498.