Recovering 3D Structure From Images and Inertial Sensors
Jorge Lobo
Institute of Systems and Robotics,
Electrical Engineering Department,
University of Coimbra,
3030 Coimbra, Portugal,
Tel: +351-39-796303
Fax: +351-39-406672
[email protected]
ABSTRACT
Advanced sensor systems, exploring high integrity
and multiple sensorial modalities, have been signi cantly increasing the capabilities of autonomous vehicles and enlarging the application potential of vision systems. The article describes the cooperation
between two relevant sensors - image and inertial sensors. Inertial sensors coupled to the active vision system can provide valuable information, as happens with
the vestibular system in human and other animals. Visual and inertial sensing are two sensory modalities
that can be explored to give robust solutions on image segmentation and recovering of three-dimensional
structure from images.
This article presents our recent results on the use
and integration of those two modalities. In a rst example we use the inertial information to infer one of
the intrinsic parameters of the visual sensor - the focal
length, after de ning the horizontal plane (horizon).
The second example relies on segmentation of images,
by labelling the vertical structures of image scene. In
a third example we use the integration of inertial and
visual information to detect the regions in the scene
that we can drive a vehicle: in our case the levelled
ground plane.
1
INTRODUCTION
Advanced sensor systems, exploring high integrity
and multiple sensorial modalities, have been signi cantly increasing the capabilities of autonomous vehicles and enlarging potential applications of vision systems. The article describes the cooperation between
two relevant sensors - image and inertial sensors. Visual and inertial sensing are two sensory modalities
Jorge Dias
Institute of Systems and Robotics,
Electrical Engineering Department,
University of Coimbra,
3030 Coimbra, Portugal,
Tel: +351-39-796219
Fax: +351-39-406672
[email protected]
that can be explored to give robust solutions on image segmentation and recovering of three-dimensional
structure from images. The cooperation between these
two sensory modalities may be useful for the elaboration of high-level representations such as multimodality three-dimensional maps, detection of speci c
three-dimensional structures in the images such as levelled ground or vertical structures.
Inertial sensors are a class of sensors useful for internal sensing and they are not dependent on external references. In human and other animals the ear
vestibular system gives inertial information essential
for navigation, orientation or equilibrium of the body.
In humans this sensorial system is crucial for several
visual tasks and head stabilisation. This is a sensorial modality, which co-operates with other sensorial
systems and gives essential information for everyday
tasks. One example of co-operation is between the
vestibular sensorial system and the visual system. It is
well known that, in humans, the information provided
by the vestibular system is used during the execution
of navigational and visual movements, as described by
Carpenter [6]. However the inertial information is also
important for head-stabilisation behaviours, including
the control of body posture and equilibrium.
The inertial information can also be useful on applications with autonomous systems and arti cial vision. In the case of active vision systems, the inertial information gives a second modality of sensing
that gives useful information for image stabilisation,
control of pursuit movements, or ego-motion determination when the active vision system is used with a
mobile platform. This kind of sensorial information is
also crucial for the development of tasks with arti cial
autonomous systems where the notion of horizontal or
vertical is important - see [7] for example.
A vision system with inertial sensors obtains a partial estimation of self-motion or absolute orientation
from inertial information. The gravity, the rigid body
acceleration, and the angular instantaneous velocity
can be measured from these sensors, the instantaneous
velocity, the angular position and linear translation of
the vision system can also be obtained. The cooperation between these quantities with visual information can be useful to estimate the instantaneous visual
motion, to segment images (for example between moving objects and background), estimation of vision system orientation with respect to the horizontal plane
or to recover depth maps based on the estimation selfmotion.
This article presents our recent results on the use
and integration of those two modalities. In a rst example we use the inertial information to infer one of
the intrinsic parameters of the visual sensor - the focal
length and after de ning the horizontal plane (horizon). The second example relies on segmentation of
images, by labelling the vertical structures of image
scene. In a third example we use the integration of
inertial and visual information to detect the regions
in the scene that we can drive a vehicle: in our case
the levelled ground plane. The solution is based on
information about the scene that could be obtained
during a process of the visual xation, complemented
by the information provided by inertial sensors.
2 NOTATION AND BACKGROUND
A projection point p~ = (u; v) in each camera image is related with a 3D point P~ = (X; Y; Z ) by the
perspective relations
u=fX
Z
v = f YZ
(1)
where u and v are the pixel co-ordinates with origin
at the image centre, f is the focal length, and P~ is
expressed in the camera referential.
Suppose an image unit sphere where every point X~
in the world is projected. The projection is
~
(2)
X~ 7! ~q = X~
kX k
Note that ~q = (q1 ; q2 ; q3 ) is a unit vector and the projection is not de ned for X~ = 0. Projection onto the
unit sphere is related to projection onto a plane by
1 fq2
;
(u; v) = fq
q3 q3
(3)
X
Object
q
X
Object
q
q’
Figure 1: Projection onto Unit Sphere.
Given f , the projection to a sphere can be computed
from the projection to a plane and conversely.
The angle between two unit vectors ~q and ~q is
0
cos = ~qT ~q
(4)
and two vectors ~q and ~q are conjugate to each other
if
0
0
~qT ~q = 0
0
2.1
(5)
Vanishing Line
A line l on the image plane whose equation is
Au + Bv + C = 0 has homogeneous coordinates ~l =
(A; B; C=f ). This line is the intersection of the image
plane Z = f with a plane AX + BY + CZ=f = 0, passing through the viewpoint. The vector (A; B; C=f ) is
perpendicular to this plane and l has a direction
~
(6)
~n = ~l
klk
A planar surface with a unit normal vector, not parallel to the image plane has, when projected, a vanishing line ~n [17] - see gure 2. Since the vanishing line
is determined alone by the orientation of the planar
surface, then the projections of planar surfaces parallel in the scene de ne a common vanishing line. A
vanishing line is a set of all vanishing points.
If unit vectors ~q and ~q are conjugate and (u; v)
and (u; v) are the image points of these points then
0
~qT ~q = uu + vv + f 2 = 0
and the focal length f could be estimated.
0
0
0
(7)
n
Scene
(u,v)
q
q’
(u’,v’)
n
n
vl
Z
X
Y
Figure 2: Vanishing line.
the x and y-axis a dual axis AccuStar electronic inclinometer, built by Lucas Sensing Systems, was used
- see [12] for details. These type of inertial systems,
known as strap-down systems, have been developed
and used for navigation in mobile robotics and terrestrial vehicles. To estimate position with inertial systems, a double integration of the accelerometer data is
performed. For now we are only extracting the camera's attitude from the accelerometer data. This is
done by keeping track of the gravity vector ~g. The accelerometer sensor by Summit Instruments comes with
a factory calibration table, providing sensitivity, o set
and alignment data. The complete system is initially
calibrated by placing it on a at, levelled tabletop and
estimating the o sets. The actual implementation estimates the camera's attitude when it was still. The
gyros are intended to track all rotations so that the
attitude can be known when accelerations other than
gravity are present [18].
3.1 The Vision System
Figure 3: The inertial system prototype.
3
THE PROTOTYPE
To study the integration of the inertial information
with vision systems, including active vision systems,
we decided to develop an inertial system prototype
composed of low-cost inertial sensors. Their mechanical mounting and the necessary electronics for processing were also designed. The sensors used in the prototype system include a three-axial accelerometer, three
gyroscopes and a dual-axis inclinometer. They were
mounted on an acrylic cube as seen in gure 3.
The three-axial accelerometer chosen for the system, while minimising eventual alignment problems,
did not add much to the equivalent cost of three separate single-axis sensors. The device used was Summit
Instruments' 34103A three-axial capacitive accelerometer. In order to keep track of rotation on the x-, yand z-axis three gyroscopes were used. The piezoelectric vibrating prism gyroscope Gyrostar ENV-011D
built by Murata was chosen. To measure tilt about
Figure 4: The active vision system.
The inertial system prototype can be mounted onto
one single camera or onto an active vision system
such as in the gure 4. For the active vision system the whole system runs on a 120MHz Pentium PC
running a public domain unix operating system, the
FreeBSD [15]. An Advantech PCL818HD data acquisition card was included to handle the inertial sensor
acquisition. A Matrox Meteor framegrabber was used
to capture the images from the Sony XC-999 cameras. The FreeBSD driver for the Meteor was reprogrammed to handle the Advantech board and enable
sensor data acquisition synchronous with the captured
frames. This system, whilst not enabling top performances, is very exible and recon gurable.
3.2 The Mobile System
To study the integration of the inertial information
in arti cial autonomous systems we developed a prototype based on the same inertial system but with a
Assuming that camera is still, then ~ab = 0 and the
measured acceleration ~a = ~g gives de gravity vector in
the camera's referential. This vector is normal to the
world ground plane, and we can write out the equation
of a level ground plane as
di erent computational architecture. Figure 5 shows
the architecture of the system that supervises the active vision, moving platform and inertial system. The
inertial system is continuously monitored by the onboard host computer.
To handle the inertial data acquisition the Advantech card was also used. This card has input lters,
A/D and D/A converters and EISA bus interface,
where the card is connected along with a framegrabber
for image acquisition and processing.
video
signals
Xgx + Y gy + Zgz = h
(9)
in the camera's referential, where h is the cameras
height from ground level.
If we intercept this plane (with h = 0) with the
image plane we get the horizon line.
Using two conjugate points in the horizon line (two
vanishing points), the the focal distance f can be
Matrox Meteor
inertial
sensor signals
system
supervisor
A/D (PCL818)
onboard master
processing unit
i
active vision
system
estimated from (7). These two vanishing points can
be determined from the intersection of two orthogonal
lines belonging to the ground plane.
So making Z = f , where f is the focal distance of
the camera, and performing the perspective transformation to image coordinates, the horizon line is given
by:
u
fgx + svvf fgy + fgz = 0
su f
(10)
v = , ssuv ggxy u , fsv ggyz
where u and v are the image horizontal and vertical coordinates, and su and sv the respective scaling
factors. Assuming that image acquisition maintains
square pixel ratio we have su = sv = s. This can be
reasonably accomplished if some care is taken in programming the framegrabber (see [16]). Equation (10)
can then be re-written as
wireless
ethernet link
motor
control signals
Figure 5: System Architecture. The inertial system
processing board uses the Master processing unit as
host computer.
4
RESULTS
4.1 First Example: The horizon line and
focal length estimation
v = , ggxy u , fs ggyz
The measurements ~a taken by the tri-axial accelerometer include the sensed gravity vector ~g
summed with the body's acceleration ~ab as stated
in (8). Figure 6 shows the relevant coordinate systems.
2 3 2
gx
ax 3
~a = ~g + ~ab = 4 gy 5 + 4 ay 5
(8)
gz
az
Z
{R}
Y
Z
Z
^
n
X
θ
{Cy}
R
{R}
α
y
X
Y
X
>
Z
Z
g
u
{L}
X
θ
{C R }
h
v
Y
Y
L
p
^
n
P
Figure 6: World, cube and image coordinates.
X
Y
(11)
If the camera is levelled with ground this simply
becomes v = 0 since gx = 0, gy = 1, gz = 0. In fact,
with the square pixel assumption, the scaling factor
only becomes relevant when the camera is pitched, i.e.
gz 6= 0 .
In an image the horizon can be found by having two
distinct vanishing points [17]. With a suitable calibration target (e.g. a square with well de ned edges) the
unknown scaling factor can be determined. In fact
only one vanishing point is enough since the slope of
the line is already known.
The quality of the estimation of sf depends on the
noise level in the accelerometer data. In our test the
arti cial horizon would jog up and down a couple of
pixels when the accelerometer's data is not properly
ltered. Nevertheless it provides a reasonable estimate for a completely un-calibrated camera. In the
experimental tests a calibration target was used, as
vertical
horizon given by accelerometer data
using f=593, calibrated with ONE
vanishing point
vanishing point
vanishing point
horizon given by accelerometer data
using initial estimate of f=500
Figure 7: Image vanishing points and the estimated
horizon.
can be seen in gure 7. The camera pose was chosen
so as to have a good perspective distortion. Having
estimated this scaling factor, as above or from some
other calibration procedure, the arti cial horizon can
be correctly traced onto the image.
4.2 Second Example: Detection vertical
lines
If the camera only rotates on world x and z, the
vertical features will remain parallel in the image since
they remain frontal and parallel to the image plane.
The slope of these vertical lines is also given from the
accelerometer data as explained above. These lines
are given by the following equation:
u = ggx v + u0
y
(12)
given in image coordinates. This is just a general line
perpendicular to the horizon line given in (11).
In order to detect vertical lines we extract the edges
in the image using a Sobel lter. The lter was applied
to estimate the gradient magnitude at each pixel. By
choosing an appropriate threshold for the gradient, the
potential edge lines can be identi ed. The square of
the gradient was used in our application to allow faster
integer computation. To obtain the vertical edges only
we could compare the pixel gradient with the expected
gradient for the vertical lines given by the slope of the
line in (12). But this only leads to erroneous results
since the pixel gradient provides a very local information and is a ected by the pixel quantization. In order
to extract the vertical lines in the image a simpli ed
Hough transform was used. The Hough transform was
only applied for the relevant slope, i.e. an histogram
of the image edge points was performed along the vertical given by the accelerometers. So each edge point
(u; v) contributed to the histogram at position
histogram position = (u; v):(gx ; ,gy )
(13)
Figure 8: Detected vertical lines.
This histogram, besides counting the number of
points, also kept track of the endpoints so that the
line could be correctly traced when mapped back to
the image. By thresholding this histogram the relevant vertical edges in the image can be found.
Figure 8 shows the results obtained with the system
running at 10 frames per second. The left image shows
the vertical edges extracted. The right image shows
the initial edge points resulting from thresholding the
estimated gradient given by the Sobel lter. On the
left image, the identi ed vertical edges are shown.
4.3 Third Example:
ground plane
Identifying the
The inertial data is used to keep track of the gravity vector, allowing the identi cation of the vertical in
the images and segment the levelled ground plane in
the images. By performing visual xation of a ground
plane point, and knowing the 3D vector normal to level
ground, we can determine the ground plane. The image can therefore be segmented, and the ground plane
along which a mobile robot or vehicle can move, identied. For on-the- y visualisation of the segmented images and the detected ground points a VRML viewer
is used.
4.3.1 System Geometry
For the ground plane segmentation we used the inertial
unit placed at the middle of an active vision system
will be given by
Z
θ
R
n^ :P~ + h = 0
(15)
where P~ is a point in the plane and h is the distance
from the origin of fCy g down to the ground plane.
X
{R}
Z
Y
X
{Cy}
Z
Y
4.3.3 Estimation of ground plane
θL
X
{L}
Y
Figure 9: System Geometry.
in the point fCy g, as seen in gure 9. The head coordinate frame referential, or Cyclop fCy gis de ned as
having the origin at the centre of the baseline between
the two stereo cameras.
A projection point ~p = (u; v) in each camera image
is related with a 3D point P~ = (X; Y; Z ) by the perspective relations. If we know P~ = (X; Y; Z ), nding
the projection (u; v) is trivial. The reverse problem
involves matching points between the left and right
images. Establishing this correspondence will give us
enough equations to determine the 3D co-ordinates, if
a few vision system parameters are known. However
if visual xation is used, the geometry is simpli ed
and the reconstruction of the 3D xated point is simpli ed, as can be seen in gure 10. Notice that the
visual xation can be achieved by controlling the active vision system and the geometry generated by the
process allows a fast and robust 3D reconstruction of
the xation point - see [9] and [10] for details.
4.3.2 Inclinometer gives the ground plane orientation
The inclinometer data can be used to determine the
orientation of the ground plane. In order to locate
this plane in space at least one point belonging to the
ground plane must be known. When the vehicle is stationary or subject to constant speed the inclinometer
gives the direction of ~g relative to the Cyclop referential fCy g. Assuming the ground is levelled, and with
x and y being the sensed angles along the x and
y-axis, the normal to the ground plane will be
n^ = , k~~ggk
3
2
, cos x sin y
(14)
= p1,sin2 1x ,sin2 y 4 cos y sin x 5
cos y cos x
given in the Cyclop frame of reference. Using this
inertial information the equation for the ground plane
To obtain a point belonging to the ground plane it is
necessary to establish a mechanism to achieve visual
xation. This mechanism was developed in our laboratory and is described in [9] and [10]. If the active
vision system xates in a point that belongs to the
ground plane, the ground plane could be determined
in the Cyclop referential fCy g using the inclinometer
data. Hence, any other correspondent point in the image can be identi ed as belonging or not to the ground
plane.
If the xation point is identi ed as belonging to
the ground plane, the value of h in (15), can be determined. As seen in gure 10 (where only y is non null
to keep the diagram simple) h will be given by
h = ,n^ :P~f
(16)
When visual xation is obtained for a ground point
and assuming symmetric vergence (i.e. = R = ,L )
from (16) and (14) we have
b sin y cos
h = ,n^:P~f =
(17)
2 sin
as can easily be seen in gure 10 (where x is null,
but (17) still holds for any x ).
This value of h will be used to determine if other
points in the image belong or not to the level plane
passing the image centre point (i.e. xation point).
4.3.4 Image Segmentation of the ground
plane
An algorithm for the image segmentation of the
ground plane can now be presented, based on the solution of (16).
To express a world point P~ , given in the camera
referential, on the Cyclop referential fCy g we have
Cy P~
=Cy TCR :CR P~ =Cy TR :R TCR :CR P~
(18)
(19)
=Cy TCL :CL P~ =Cy TL:L TCL :CL P~
From (19) the point expressed in the Cyclop referential
is given by
Cy P~
Cy P~ =Cy TC :CR P~ (uR ; vR ; R )
R
(20)
where R represents an unknown value (depending on
depth from camera). Substituting (20) in (15), R
can be determined and hence ~g is completely known
in the Cyclop referential. Expressing P~ in the fCL g
referential by
CL P~ =CL TC :Cy P~ =
y
,C
,1
:Cy P~
(21)
the correspondent point of interest (uL; vL ) generated
by the projection of P in the left image is given by
uL = Su f
y TC
R
Y
vL = Sv f
Z
X
Z
(22)
Z
{R}
Y
Z
^
n
X
θ
identified ground patch
identified ground patch
Figure 11: Identi ed ground patch.
4.3.5
Viewing with VRML
For visualisation of the detected ground points a
VRML (Virtual Reality Modelling Language) world
is generated [1]. VRML has become the standard for
describing 3D scenes on the Internet. A web browser
with the appropriate plug-in lets the user see the images from the robots viewpoint, or change the viewpoint and " y" around the 3D world viewed by the
vision system. The identi ed ground plane patch can
be mapped onto the 3D scene, as seen in gure 12.
{Cy}
R
α
y
Y
X
>
Z
g
{L}
X
fixation point
θ
h
Y
L
^
n
Pf
ground plane
Figure 10: Ground plane point xated. The point
P~f in the ground plane is visualised by the active vision system. The geometry of this visualisation corresponds to a state, named visual xation.
Starting with a point of interest in one of the images, say the right image (uR ; vR ), is possible to establish the collineation to left image (uL; vL ). Taking R
out of (17) and (20) and substituting in (21) and (22)
we get a set of equations that give this collineation
for a given a set of stereo images.
De ning a point in the right image, the correspondent point and its neighbourhood in the left image
can then be tested for a match with the original point
of interest in the right image. If there is a match,
the point belongs to the ground plane. If there is no
match the point must be something other than the
oor, possibly an obstacle.
Figure 11 show the matched ground plane points of
interest. Grahams Algorithm [14] was used for computation of the convex polygon envolving the set of
points. Adjusting a convex polygon to the set of points
can lead to erroneous ground patch segmentation.
Figure 12: VRML view of ground patch with detected
points.
5
CONCLUSIONS
The integration of inertial sensors with vision opens
a new eld for the development of applications based
or related with inertial information. In this article
we described three examples of integration of inertial
information and vision.
Inertial sensors coupled to mobile vision systems
enable some interesting results. The accelerometer
data can provide an arti cial horizon, short of a vertical scale factor. This factor can be estimated using image vanishing points or other camera calibration
methods. The accelerometer data also gives information about the vertical in the image. Knowing this
vertical is very important to extract relevant image
features. In the tests the vertical edges in the image
were correctly identi ed. Future work involves re ning the nal stage of the vertical edge extraction to
include edge tracking and hysteresis. We also intend
to use the inertial data in more dynamic ways, by using the gyro data as well as the accelerometers.
Information about the ground plane extracted from
the inertial sensors is used to identify the oor plane
using the simulation of visual xation with an active
vision system. By xating the visual system on a point
in the ground plane, the 3D position of the plane in
space is obtained. Any other correspondent point in
the stereo image can be identi ed as belonging or not
to the ground plane. Segmentation of the image is
therefore accomplished. Some ground detection results were presented. For on-the- y visualisation of
the detected points a VRML world is generated which
can be viewed in any web browser with the appropriate plug-in. VRML opens many other possibilities
such as tele-operation or path-planning environments.
Acknowledgements
Financial support for this work was partially provided by FCT-Funda c~ao para a Ci^encia e Tecnologia,
from the Portuguese Government.
References
[1] A. L. Ames, D. R. Nadeau and J. L. Moreland,
VRML 2.0 Sourcebook, John Wiley and Sons,
ISBN 0-471-16507-7, 2nd edition,1997
[2] Jorge Dias, Helder Araujo, Carlos Paredes, Jorge
Batista, Optical Normal Flow Estimation on LogPolar Images. A Solution for Real-Time Binocular Vision, Real-Time Imaging, Vol 3, pp.213-228,
1997.
[3] Rami Guissin, Shimon Ullman, Direct Computation of the Focus of Expansion From Velocity Field
Measurements, IEEE Transactions,1991,pp. 146155.
[4] Michal Irani, Benny Rousso and Shmuel Peleg, Recovery of Ego-Motion Using Image Stabilization,
IEEE Transactions,1994,pp. 454-460.
[5] V. Sundareswaran, P. Bouthemy and F.
Chaumette, Visual servoing using dynamic image
parameters, INRIA, RR no 2336 -Ao^ut 1994.
[6] H. Carpenter, Movements of the Eyes, London
Pion Limited, 2nd edition, 1988, ISBN 0-85086109-8.
[7] T. Vieville and O.D. Faugeras, "Computation of
Inertial Information on a Robot", Fifth International Symposium on Robotics Research, ed. H.
Miura, S. Arimoto, pg. 57-65, MIT-Press,1989
[8] Jorge Lobo, Paulo Lucas, Jorge Dias, A. T. de
Almeida, "Inertial Navigation System for Mobile
Land Vehicles", Proceedings of the 1995 IEEE International Symposium on Industrial Electronics,
1995, July, Athens, Greece, pg. 843-848.
[9] Jorge Dias, Carlos Paredes, Inacio Fonseca, H.
Araujo, J.Batista, A. T. de Almeida, "Simulating
Pursuit with Machines - Experiments with Robots
and Arti cial Vision", IEEE Trans. on Robotics
and Automation , Feb. 1998, Vol 14, No1, pg. 118.
[10] Carlos Paredes, Jorge Dias, A. T. de Almeida,
"Detecting Movements Using Fixation", Proceedings of the 2nd Portuguese Conference on Automation and Control, 1996, September, Oporto, Portugal, pg. 741-746.
[11] T. Vieville, A Few Steps Towards 3D Active
Vision, Springer Series in Information Sciences,
Springer Verlag, 1997.
[12] Jorge Lobo and Jorge Dias, "Integration of Inertial Information with Vision towards Robot Autonomy," in Proc. of the 1997 IEEE International
Symposium on Industrial Electronics, pg. 825-830,
Guimaraes, Portugal, July 1997.
[13] S.M. Smith and J.M. Brady, "SUSAN - a new approach to low level image processing," Int. Journal
of Computer Vision, 23(1):45{78, May 1997.
[14] Joseph O'Rourke, "Computational Geometry in
C", Cambridge University Press, 1993. ISBN 0512-22592-2.
[15] FreeBSD Inc.,FreeBSD, http://www.freebsd.org.
[16] Charles A. Poynton, A Technical Introduction to
Digital Video, John Wiley and Sons, ISBN 0-47112253-X,1996.
[17] K. Kanatani, Geometric Computation for Machine Vision, Oxford Science Publications, Oxford, 1993.
[18] R. Collinson, Introduction to Avionics, Chapman
& Hall, London, 1996, ISBN 0-412-48250-9.