Academia.eduAcademia.edu

Recovering 3D Structure from Images and Inertial Sensors

Advanced sensor systems, exploring high integrity and multiple sensorial modalities, have been signicantly increasing the capabilities of autonomous vehicles and enlarging the application potential of vision systems. The article describes the cooperation between two relevant sensors -image and inertial sensors. Inertial sensors coupled to the active vision system can provide valuable information, as happens with the vestibular system in human and other animals. Visual and inertial sensing are two sensory modalities that can be explored to give robust solutions on image segmentation and recovering of three-dimensional structure f r om images.

Recovering 3D Structure From Images and Inertial Sensors Jorge Lobo Institute of Systems and Robotics, Electrical Engineering Department, University of Coimbra, 3030 Coimbra, Portugal, Tel: +351-39-796303 Fax: +351-39-406672 [email protected] ABSTRACT Advanced sensor systems, exploring high integrity and multiple sensorial modalities, have been signi cantly increasing the capabilities of autonomous vehicles and enlarging the application potential of vision systems. The article describes the cooperation between two relevant sensors - image and inertial sensors. Inertial sensors coupled to the active vision system can provide valuable information, as happens with the vestibular system in human and other animals. Visual and inertial sensing are two sensory modalities that can be explored to give robust solutions on image segmentation and recovering of three-dimensional structure from images. This article presents our recent results on the use and integration of those two modalities. In a rst example we use the inertial information to infer one of the intrinsic parameters of the visual sensor - the focal length, after de ning the horizontal plane (horizon). The second example relies on segmentation of images, by labelling the vertical structures of image scene. In a third example we use the integration of inertial and visual information to detect the regions in the scene that we can drive a vehicle: in our case the levelled ground plane. 1 INTRODUCTION Advanced sensor systems, exploring high integrity and multiple sensorial modalities, have been signi cantly increasing the capabilities of autonomous vehicles and enlarging potential applications of vision systems. The article describes the cooperation between two relevant sensors - image and inertial sensors. Visual and inertial sensing are two sensory modalities Jorge Dias Institute of Systems and Robotics, Electrical Engineering Department, University of Coimbra, 3030 Coimbra, Portugal, Tel: +351-39-796219 Fax: +351-39-406672 [email protected] that can be explored to give robust solutions on image segmentation and recovering of three-dimensional structure from images. The cooperation between these two sensory modalities may be useful for the elaboration of high-level representations such as multimodality three-dimensional maps, detection of speci c three-dimensional structures in the images such as levelled ground or vertical structures. Inertial sensors are a class of sensors useful for internal sensing and they are not dependent on external references. In human and other animals the ear vestibular system gives inertial information essential for navigation, orientation or equilibrium of the body. In humans this sensorial system is crucial for several visual tasks and head stabilisation. This is a sensorial modality, which co-operates with other sensorial systems and gives essential information for everyday tasks. One example of co-operation is between the vestibular sensorial system and the visual system. It is well known that, in humans, the information provided by the vestibular system is used during the execution of navigational and visual movements, as described by Carpenter [6]. However the inertial information is also important for head-stabilisation behaviours, including the control of body posture and equilibrium. The inertial information can also be useful on applications with autonomous systems and arti cial vision. In the case of active vision systems, the inertial information gives a second modality of sensing that gives useful information for image stabilisation, control of pursuit movements, or ego-motion determination when the active vision system is used with a mobile platform. This kind of sensorial information is also crucial for the development of tasks with arti cial autonomous systems where the notion of horizontal or vertical is important - see [7] for example. A vision system with inertial sensors obtains a partial estimation of self-motion or absolute orientation from inertial information. The gravity, the rigid body acceleration, and the angular instantaneous velocity can be measured from these sensors, the instantaneous velocity, the angular position and linear translation of the vision system can also be obtained. The cooperation between these quantities with visual information can be useful to estimate the instantaneous visual motion, to segment images (for example between moving objects and background), estimation of vision system orientation with respect to the horizontal plane or to recover depth maps based on the estimation selfmotion. This article presents our recent results on the use and integration of those two modalities. In a rst example we use the inertial information to infer one of the intrinsic parameters of the visual sensor - the focal length and after de ning the horizontal plane (horizon). The second example relies on segmentation of images, by labelling the vertical structures of image scene. In a third example we use the integration of inertial and visual information to detect the regions in the scene that we can drive a vehicle: in our case the levelled ground plane. The solution is based on information about the scene that could be obtained during a process of the visual xation, complemented by the information provided by inertial sensors. 2 NOTATION AND BACKGROUND A projection point p~ = (u; v) in each camera image is related with a 3D point P~ = (X; Y; Z ) by the perspective relations u=fX Z v = f YZ (1) where u and v are the pixel co-ordinates with origin at the image centre, f is the focal length, and P~ is expressed in the camera referential. Suppose an image unit sphere where every point X~ in the world is projected. The projection is ~ (2) X~ 7! ~q = X~ kX k Note that ~q = (q1 ; q2 ; q3 ) is a unit vector and the projection is not de ned for X~ = 0. Projection onto the unit sphere is related to projection onto a plane by   1 fq2 ; (u; v) = fq q3 q3 (3) X Object q X Object q q’ Figure 1: Projection onto Unit Sphere. Given f , the projection to a sphere can be computed from the projection to a plane and conversely. The angle between two unit vectors ~q and ~q is 0 cos  = ~qT ~q (4) and two vectors ~q and ~q are conjugate to each other if 0 0 ~qT ~q = 0 0 2.1 (5) Vanishing Line A line l on the image plane whose equation is Au + Bv + C = 0 has homogeneous coordinates ~l = (A; B; C=f ). This line is the intersection of the image plane Z = f with a plane AX + BY + CZ=f = 0, passing through the viewpoint. The vector (A; B; C=f ) is perpendicular to this plane and l has a direction ~ (6) ~n =  ~l klk A planar surface with a unit normal vector, not parallel to the image plane has, when projected, a vanishing line ~n [17] - see gure 2. Since the vanishing line is determined alone by the orientation of the planar surface, then the projections of planar surfaces parallel in the scene de ne a common vanishing line. A vanishing line is a set of all vanishing points. If unit vectors ~q and ~q are conjugate and (u; v) and (u; v) are the image points of these points then 0 ~qT ~q = uu + vv + f 2 = 0 and the focal length f could be estimated. 0 0 0 (7) n Scene (u,v) q q’ (u’,v’) n n vl Z X Y Figure 2: Vanishing line. the x and y-axis a dual axis AccuStar electronic inclinometer, built by Lucas Sensing Systems, was used - see [12] for details. These type of inertial systems, known as strap-down systems, have been developed and used for navigation in mobile robotics and terrestrial vehicles. To estimate position with inertial systems, a double integration of the accelerometer data is performed. For now we are only extracting the camera's attitude from the accelerometer data. This is done by keeping track of the gravity vector ~g. The accelerometer sensor by Summit Instruments comes with a factory calibration table, providing sensitivity, o set and alignment data. The complete system is initially calibrated by placing it on a at, levelled tabletop and estimating the o sets. The actual implementation estimates the camera's attitude when it was still. The gyros are intended to track all rotations so that the attitude can be known when accelerations other than gravity are present [18]. 3.1 The Vision System Figure 3: The inertial system prototype. 3 THE PROTOTYPE To study the integration of the inertial information with vision systems, including active vision systems, we decided to develop an inertial system prototype composed of low-cost inertial sensors. Their mechanical mounting and the necessary electronics for processing were also designed. The sensors used in the prototype system include a three-axial accelerometer, three gyroscopes and a dual-axis inclinometer. They were mounted on an acrylic cube as seen in gure 3. The three-axial accelerometer chosen for the system, while minimising eventual alignment problems, did not add much to the equivalent cost of three separate single-axis sensors. The device used was Summit Instruments' 34103A three-axial capacitive accelerometer. In order to keep track of rotation on the x-, yand z-axis three gyroscopes were used. The piezoelectric vibrating prism gyroscope Gyrostar ENV-011D built by Murata was chosen. To measure tilt about Figure 4: The active vision system. The inertial system prototype can be mounted onto one single camera or onto an active vision system such as in the gure 4. For the active vision system the whole system runs on a 120MHz Pentium PC running a public domain unix operating system, the FreeBSD [15]. An Advantech PCL818HD data acquisition card was included to handle the inertial sensor acquisition. A Matrox Meteor framegrabber was used to capture the images from the Sony XC-999 cameras. The FreeBSD driver for the Meteor was reprogrammed to handle the Advantech board and enable sensor data acquisition synchronous with the captured frames. This system, whilst not enabling top performances, is very exible and recon gurable. 3.2 The Mobile System To study the integration of the inertial information in arti cial autonomous systems we developed a prototype based on the same inertial system but with a Assuming that camera is still, then ~ab = 0 and the measured acceleration ~a = ~g gives de gravity vector in the camera's referential. This vector is normal to the world ground plane, and we can write out the equation of a level ground plane as di erent computational architecture. Figure 5 shows the architecture of the system that supervises the active vision, moving platform and inertial system. The inertial system is continuously monitored by the onboard host computer. To handle the inertial data acquisition the Advantech card was also used. This card has input lters, A/D and D/A converters and EISA bus interface, where the card is connected along with a framegrabber for image acquisition and processing. video signals Xgx + Y gy + Zgz = h (9) in the camera's referential, where h is the cameras height from ground level. If we intercept this plane (with h = 0) with the image plane we get the horizon line. Using two conjugate points in the horizon line (two vanishing points), the the focal distance f can be Matrox Meteor inertial sensor signals system supervisor A/D (PCL818) onboard master processing unit i active vision system estimated from (7). These two vanishing points can be determined from the intersection of two orthogonal lines belonging to the ground plane. So making Z = f , where f is the focal distance of the camera, and performing the perspective transformation to image coordinates, the horizon line is given by: u fgx + svvf fgy + fgz = 0 su f (10) v = , ssuv ggxy u , fsv ggyz where u and v are the image horizontal and vertical coordinates, and su and sv the respective scaling factors. Assuming that image acquisition maintains square pixel ratio we have su = sv = s. This can be reasonably accomplished if some care is taken in programming the framegrabber (see [16]). Equation (10) can then be re-written as wireless ethernet link motor control signals Figure 5: System Architecture. The inertial system processing board uses the Master processing unit as host computer. 4 RESULTS 4.1 First Example: The horizon line and focal length estimation v = , ggxy u , fs ggyz The measurements ~a taken by the tri-axial accelerometer include the sensed gravity vector ~g summed with the body's acceleration ~ab as stated in (8). Figure 6 shows the relevant coordinate systems. 2 3 2 gx ax 3 ~a = ~g + ~ab = 4 gy 5 + 4 ay 5 (8) gz az Z {R} Y Z Z ^ n X θ {Cy} R {R} α y X Y X > Z Z g u {L} X θ {C R } h v Y Y L p ^ n P Figure 6: World, cube and image coordinates. X Y (11) If the camera is levelled with ground this simply becomes v = 0 since gx = 0, gy = 1, gz = 0. In fact, with the square pixel assumption, the scaling factor only becomes relevant when the camera is pitched, i.e. gz 6= 0 . In an image the horizon can be found by having two distinct vanishing points [17]. With a suitable calibration target (e.g. a square with well de ned edges) the unknown scaling factor can be determined. In fact only one vanishing point is enough since the slope of the line is already known. The quality of the estimation of sf depends on the noise level in the accelerometer data. In our test the arti cial horizon would jog up and down a couple of pixels when the accelerometer's data is not properly ltered. Nevertheless it provides a reasonable estimate for a completely un-calibrated camera. In the experimental tests a calibration target was used, as vertical horizon given by accelerometer data using f=593, calibrated with ONE vanishing point vanishing point vanishing point horizon given by accelerometer data using initial estimate of f=500 Figure 7: Image vanishing points and the estimated horizon. can be seen in gure 7. The camera pose was chosen so as to have a good perspective distortion. Having estimated this scaling factor, as above or from some other calibration procedure, the arti cial horizon can be correctly traced onto the image. 4.2 Second Example: Detection vertical lines If the camera only rotates on world x and z, the vertical features will remain parallel in the image since they remain frontal and parallel to the image plane. The slope of these vertical lines is also given from the accelerometer data as explained above. These lines are given by the following equation: u = ggx v + u0 y (12) given in image coordinates. This is just a general line perpendicular to the horizon line given in (11). In order to detect vertical lines we extract the edges in the image using a Sobel lter. The lter was applied to estimate the gradient magnitude at each pixel. By choosing an appropriate threshold for the gradient, the potential edge lines can be identi ed. The square of the gradient was used in our application to allow faster integer computation. To obtain the vertical edges only we could compare the pixel gradient with the expected gradient for the vertical lines given by the slope of the line in (12). But this only leads to erroneous results since the pixel gradient provides a very local information and is a ected by the pixel quantization. In order to extract the vertical lines in the image a simpli ed Hough transform was used. The Hough transform was only applied for the relevant slope, i.e. an histogram of the image edge points was performed along the vertical given by the accelerometers. So each edge point (u; v) contributed to the histogram at position histogram position = (u; v):(gx ; ,gy ) (13) Figure 8: Detected vertical lines. This histogram, besides counting the number of points, also kept track of the endpoints so that the line could be correctly traced when mapped back to the image. By thresholding this histogram the relevant vertical edges in the image can be found. Figure 8 shows the results obtained with the system running at 10 frames per second. The left image shows the vertical edges extracted. The right image shows the initial edge points resulting from thresholding the estimated gradient given by the Sobel lter. On the left image, the identi ed vertical edges are shown. 4.3 Third Example: ground plane Identifying the The inertial data is used to keep track of the gravity vector, allowing the identi cation of the vertical in the images and segment the levelled ground plane in the images. By performing visual xation of a ground plane point, and knowing the 3D vector normal to level ground, we can determine the ground plane. The image can therefore be segmented, and the ground plane along which a mobile robot or vehicle can move, identied. For on-the- y visualisation of the segmented images and the detected ground points a VRML viewer is used. 4.3.1 System Geometry For the ground plane segmentation we used the inertial unit placed at the middle of an active vision system will be given by Z θ R n^ :P~ + h = 0 (15) where P~ is a point in the plane and h is the distance from the origin of fCy g down to the ground plane. X {R} Z Y X {Cy} Z Y 4.3.3 Estimation of ground plane θL X {L} Y Figure 9: System Geometry. in the point fCy g, as seen in gure 9. The head coordinate frame referential, or Cyclop fCy gis de ned as having the origin at the centre of the baseline between the two stereo cameras. A projection point ~p = (u; v) in each camera image is related with a 3D point P~ = (X; Y; Z ) by the perspective relations. If we know P~ = (X; Y; Z ), nding the projection (u; v) is trivial. The reverse problem involves matching points between the left and right images. Establishing this correspondence will give us enough equations to determine the 3D co-ordinates, if a few vision system parameters are known. However if visual xation is used, the geometry is simpli ed and the reconstruction of the 3D xated point is simpli ed, as can be seen in gure 10. Notice that the visual xation can be achieved by controlling the active vision system and the geometry generated by the process allows a fast and robust 3D reconstruction of the xation point - see [9] and [10] for details. 4.3.2 Inclinometer gives the ground plane orientation The inclinometer data can be used to determine the orientation of the ground plane. In order to locate this plane in space at least one point belonging to the ground plane must be known. When the vehicle is stationary or subject to constant speed the inclinometer gives the direction of ~g relative to the Cyclop referential fCy g. Assuming the ground is levelled, and with x and y being the sensed angles along the x and y-axis, the normal to the ground plane will be n^ = , k~~ggk 3 2 , cos x sin y (14) = p1,sin2 1x ,sin2 y 4 cos y sin x 5 cos y cos x given in the Cyclop frame of reference. Using this inertial information the equation for the ground plane To obtain a point belonging to the ground plane it is necessary to establish a mechanism to achieve visual xation. This mechanism was developed in our laboratory and is described in [9] and [10]. If the active vision system xates in a point that belongs to the ground plane, the ground plane could be determined in the Cyclop referential fCy g using the inclinometer data. Hence, any other correspondent point in the image can be identi ed as belonging or not to the ground plane. If the xation point is identi ed as belonging to the ground plane, the value of h in (15), can be determined. As seen in gure 10 (where only y is non null to keep the diagram simple) h will be given by h = ,n^ :P~f (16) When visual xation is obtained for a ground point and assuming symmetric vergence (i.e.  = R = ,L ) from (16) and (14) we have b sin y cos  h = ,n^:P~f = (17) 2 sin  as can easily be seen in gure 10 (where x is null, but (17) still holds for any x ). This value of h will be used to determine if other points in the image belong or not to the level plane passing the image centre point (i.e. xation point). 4.3.4 Image Segmentation of the ground plane An algorithm for the image segmentation of the ground plane can now be presented, based on the solution of (16). To express a world point P~ , given in the camera referential, on the Cyclop referential fCy g we have Cy P~ =Cy TCR :CR P~ =Cy TR :R TCR :CR P~ (18) (19) =Cy TCL :CL P~ =Cy TL:L TCL :CL P~ From (19) the point expressed in the Cyclop referential is given by Cy P~ Cy P~ =Cy TC :CR P~ (uR ; vR ; R ) R (20) where R represents an unknown value (depending on depth from camera). Substituting (20) in (15), R can be determined and hence ~g is completely known in the Cyclop referential. Expressing P~ in the fCL g referential by CL P~ =CL TC :Cy P~ = y ,C ,1 :Cy P~ (21) the correspondent point of interest (uL; vL ) generated by the projection of P in the left image is given by uL = Su f y TC R Y vL = Sv f Z X Z (22) Z {R} Y Z ^ n X θ identified ground patch identified ground patch Figure 11: Identi ed ground patch. 4.3.5 Viewing with VRML For visualisation of the detected ground points a VRML (Virtual Reality Modelling Language) world is generated [1]. VRML has become the standard for describing 3D scenes on the Internet. A web browser with the appropriate plug-in lets the user see the images from the robots viewpoint, or change the viewpoint and " y" around the 3D world viewed by the vision system. The identi ed ground plane patch can be mapped onto the 3D scene, as seen in gure 12. {Cy} R α y Y X > Z g {L} X fixation point θ h Y L ^ n Pf ground plane Figure 10: Ground plane point xated. The point P~f in the ground plane is visualised by the active vision system. The geometry of this visualisation corresponds to a state, named visual xation. Starting with a point of interest in one of the images, say the right image (uR ; vR ), is possible to establish the collineation to left image (uL; vL ). Taking R out of (17) and (20) and substituting in (21) and (22) we get a set of equations that give this collineation for a given a set of stereo images. De ning a point in the right image, the correspondent point and its neighbourhood in the left image can then be tested for a match with the original point of interest in the right image. If there is a match, the point belongs to the ground plane. If there is no match the point must be something other than the oor, possibly an obstacle. Figure 11 show the matched ground plane points of interest. Grahams Algorithm [14] was used for computation of the convex polygon envolving the set of points. Adjusting a convex polygon to the set of points can lead to erroneous ground patch segmentation. Figure 12: VRML view of ground patch with detected points. 5 CONCLUSIONS The integration of inertial sensors with vision opens a new eld for the development of applications based or related with inertial information. In this article we described three examples of integration of inertial information and vision. Inertial sensors coupled to mobile vision systems enable some interesting results. The accelerometer data can provide an arti cial horizon, short of a vertical scale factor. This factor can be estimated using image vanishing points or other camera calibration methods. The accelerometer data also gives information about the vertical in the image. Knowing this vertical is very important to extract relevant image features. In the tests the vertical edges in the image were correctly identi ed. Future work involves re ning the nal stage of the vertical edge extraction to include edge tracking and hysteresis. We also intend to use the inertial data in more dynamic ways, by using the gyro data as well as the accelerometers. Information about the ground plane extracted from the inertial sensors is used to identify the oor plane using the simulation of visual xation with an active vision system. By xating the visual system on a point in the ground plane, the 3D position of the plane in space is obtained. Any other correspondent point in the stereo image can be identi ed as belonging or not to the ground plane. Segmentation of the image is therefore accomplished. Some ground detection results were presented. For on-the- y visualisation of the detected points a VRML world is generated which can be viewed in any web browser with the appropriate plug-in. VRML opens many other possibilities such as tele-operation or path-planning environments. Acknowledgements Financial support for this work was partially provided by FCT-Funda c~ao para a Ci^encia e Tecnologia, from the Portuguese Government. References [1] A. L. Ames, D. R. Nadeau and J. L. Moreland, VRML 2.0 Sourcebook, John Wiley and Sons, ISBN 0-471-16507-7, 2nd edition,1997 [2] Jorge Dias, Helder Araujo, Carlos Paredes, Jorge Batista, Optical Normal Flow Estimation on LogPolar Images. A Solution for Real-Time Binocular Vision, Real-Time Imaging, Vol 3, pp.213-228, 1997. [3] Rami Guissin, Shimon Ullman, Direct Computation of the Focus of Expansion From Velocity Field Measurements, IEEE Transactions,1991,pp. 146155. [4] Michal Irani, Benny Rousso and Shmuel Peleg, Recovery of Ego-Motion Using Image Stabilization, IEEE Transactions,1994,pp. 454-460. [5] V. Sundareswaran, P. Bouthemy and F. Chaumette, Visual servoing using dynamic image parameters, INRIA, RR no 2336 -Ao^ut 1994. [6] H. Carpenter, Movements of the Eyes, London Pion Limited, 2nd edition, 1988, ISBN 0-85086109-8. [7] T. Vieville and O.D. Faugeras, "Computation of Inertial Information on a Robot", Fifth International Symposium on Robotics Research, ed. H. Miura, S. Arimoto, pg. 57-65, MIT-Press,1989 [8] Jorge Lobo, Paulo Lucas, Jorge Dias, A. T. de Almeida, "Inertial Navigation System for Mobile Land Vehicles", Proceedings of the 1995 IEEE International Symposium on Industrial Electronics, 1995, July, Athens, Greece, pg. 843-848. [9] Jorge Dias, Carlos Paredes, Inacio Fonseca, H. Araujo, J.Batista, A. T. de Almeida, "Simulating Pursuit with Machines - Experiments with Robots and Arti cial Vision", IEEE Trans. on Robotics and Automation , Feb. 1998, Vol 14, No1, pg. 118. [10] Carlos Paredes, Jorge Dias, A. T. de Almeida, "Detecting Movements Using Fixation", Proceedings of the 2nd Portuguese Conference on Automation and Control, 1996, September, Oporto, Portugal, pg. 741-746. [11] T. Vieville, A Few Steps Towards 3D Active Vision, Springer Series in Information Sciences, Springer Verlag, 1997. [12] Jorge Lobo and Jorge Dias, "Integration of Inertial Information with Vision towards Robot Autonomy," in Proc. of the 1997 IEEE International Symposium on Industrial Electronics, pg. 825-830, Guimaraes, Portugal, July 1997. [13] S.M. Smith and J.M. Brady, "SUSAN - a new approach to low level image processing," Int. Journal of Computer Vision, 23(1):45{78, May 1997. [14] Joseph O'Rourke, "Computational Geometry in C", Cambridge University Press, 1993. ISBN 0512-22592-2. [15] FreeBSD Inc.,FreeBSD, http://www.freebsd.org. [16] Charles A. Poynton, A Technical Introduction to Digital Video, John Wiley and Sons, ISBN 0-47112253-X,1996. [17] K. Kanatani, Geometric Computation for Machine Vision, Oxford Science Publications, Oxford, 1993. [18] R. Collinson, Introduction to Avionics, Chapman & Hall, London, 1996, ISBN 0-412-48250-9.