polygonal mesh mapping Louis-Kenzo Cahier Toru Takahashi Tetsuya Ogata Hiroshi G. Okuno Graduate School of Informatics, Kyoto University
Introduction focuses on the means to construct and maintain a trian-
gular mesh representation efficiently from high-frequency In the past decade, time-of-flight cameras – optical depth as typically returned by time-of-flight cameras, cameras that measure the depth of each pixel – have ignoring – but preparing for – object segmentation and been made available. Combined with a traditional color more synthetic properties such as generic descriptors of camera, this type of sensor provides Red-Green-Blue- shape, visual appearance, auditory emissions, articulated Depth (RGBD) images at high frequencies. With the degree-of-freedom model, etc. mass-produced Microsoft Kinect introduced in late 2010, 30Hz 640 ∗ 480 RGB images are available for less than 2 Probabilistic Polygonal Mesh $200. This source of information, while not exempt of Let us suppose that at time step t we have a 3D mesh problems, is what the stereo vision community has been representing part of the environment of the robot. A new striving to obtain for years. depth scan St ∈ Mn,p (R3 ) is acquired. The points in St We believe this new wealth of information, combined are all sensed as the result of light bouncing off some with state-of-the-art Simultaneous Localization And Map- surface and returning through the optical center O of the ping (SLAM) techniques such as visual SLAM [1], sets sensor. This type of data is more structured and richer the stage for a practical revolution in robotics and artifical than general point clouds, for which we don’t have the intelligence. Indeed, we now have at our disposal methods ray along which light for the sensed pixel passed. In for sensing geometric, visual and auditory information, all particular, we know that the normal of the real surface at spatialized in the same global reference frame and bound each sensed point makes an angle less than 90◦ with the together by SLAM. Because those modalities are the dom- ray of light that bounced off it towards the sensor. inant forms of information for humans, robots now have For each element St (x, y) of St , a virtual ray is cast all the essential tools for moving in and interacting with going through it from the position of the sensor’s optical the physical world (geometry), recognizing common ob- center, which may give rise to an intersection point It (x, y). jects (vision), and communicating with humans (audition). As a sensor itself, time-of-flight technology has several Incremental triangulation We then consider all tri- advantages and disadvantages. At around 10 frames per angles of the forms St (x, y)St (x, y + 1)St (x + 1, y) or second, it is possible to collect 3D data of a scene much St (x + 1, y)St (x, y + 1)St (x + 1, y + 1); any such triangle is faster than a pan-tilt laser scan. This increase in speed is added to the mesh M if none of its vertices have an inter- significant especially for scene reconstruction for mobile section found in the ray casting step, and it satisfies a conti- robots, which cannot wait the 5-10s required to obtain nuity criterion. This continuity criterion aims at separating one 3D scan from a laser. The major drawback of TOF estimates that are neighboring in pixel space, but whose technology is its high frame-to-frame noise, especially off observed points are in fact not neighbors on the same sur- reflective surfaces. It is this unreliability that we attack face. This happens for exemple when observing 2 objects in this paper, to improve the TOF sensor’s utility for roughly aligned with the sensor, creating an overlap. robotics, vision, and other fields. Moreover, neighboring estimates are sensed through 2 In the following sections, we describe a new method projection lines that cross in O but get farther apart as the to transform TOF data into a probabilistically smoothed distance from the sensor increases, making neighboring polygonal mesh. We first explain how the robot could estimates of a given surface grow apart as it gets farther. internally represent its surroundings as a mesh repre- A simple application of Thales’ intercept theorem shows sentation. Then we describe a method for maintaining us the distance between estimates is proportional to the this mesh in an online manner, through triangulation of distance of the object. We account for this by comput- point estimates and converging to stable values through ing a continuity factor for each edge of the triangles, maximum likelihood estimation. (dmax − dmin )/dmin , which is proportional to the drop in distance of the estimates, but inversely proportional to 1 Scene representation the distance of the object. We empirically set a maximum The scene is represented internally by the robot as threshold of 2.5% for the continuity factor of the edges of a polygonal mesh possibly having several connected potential triangles. components. The mesh is defined as a pair M = (V, T ) , This procedure incrementally builds a mesh from an where V is a set of vertices, and T a set of triangles. Each empty start, by creating new connected components when vertex is simply a point in R3 , and each triangle is a triple parts of the environments are newly observed. However, (xi , yi , zi ) ∈ N∗+ 3 of indices to vertices in V . estimates whose ray intersected the existing mesh are not used during the triangulation step. The triangular nature of the mesh representation is not relevant; the fact that it is a representation of the Online vertex estimation Each estimate St (x, y) with environment’s geometric surface, persists through time, an intersection point is a candidate for updating the is built incrementally, and can be arbitrarily simplified or position of the intersected face’s vertices. For each such complexified as attention is focused or drifts away are the face vertex V , we compute the angle α = ∠VOSt (x, y), essential properties of our scene representation. This work the angular gain γ = 1 − min(α /ρ , 1) (ρ is the angular
Copyright 2011 Information Processing Society of Japan.
2-431 All Rights Reserved. 情報処理学会第 73 回全国大会
Figure 2: Experimental setup, with robot facing white wall.
Figure 1: Example comparing directly triangulated data (above) with a PPM built over several scans (below).
resolution of the sensor, here 0.24◦ ), the innovation vector
~I = (OSt (x, y) − OV )−→ OV /OV and the error signal. γ is called gain because it is a number between 0 and 1 used to modulate how much an estimate will be incorporated as a new estimate for a vertex. This is because the value of each pixel does not only measure the distance along a line that passes through the center of the pixel, but rather integrates the distances of points viewable from the optical center through the extent of the pixel. We notice that if α > ρ , γ = 0, which means that the estimate’s distance integration area doesn’t contain V at all and the influence is null; conversely, if α = 0, Figure 3: Comparison of plane reconstruction performance be- tween PPM (blue) and raw data (red) over 200 frames γ = 1 and the estimate will be fully incorporated. The −→ innovation vector is colinear to OV , and its magnitude robot, and faced the TOF camera towards it such that the equal to the difference in measured depth. image filled the sensor’s field of view entirely. The robot One possibility to incorporate the estimate St (x, y) is to remained still during the entire experiment. Our goal here move candidate vertex with the lowest α using the inno- is to how well the raw data represents a relatively flat vation vector modulated by the angular gain: V = V + γ~I. surface, versus our PPM. We recorded 200 frames of TOF Maximum Likelihood vertex estimation However, camera images, and tested the two methods as follows: 1) this would not guarantee convergence even for a zero- extract the mesh generated by raw point cloud data or the mean noise sensor. Rather, we propose using a maximum PPM 2) fit a plane to the vertices of the mesh using linear likelihood estimation from successive estimates, for each least squares 3) extract the correlation coefficient between vertex. We modify the mesh data structure to add a real the mesh and the plane. The results are plotted in Fig. number to each vertex, that we call the effective number 3. The higher the coefficient, the better the mesh fits the of observations Γ; when a vertex is first added as part of plane. Note the quick increase and stability of the PPM a new triangle, Γ = 1. correlation over the lower and more erratic precision of Looking at the lifecycle of a vertex that receives N inno- the raw point clouds. vation vectors ~I1 , . . . , I~N with angular gains γ1 , . . . , γN , we 5 Conclusions and future work now incorporate new measurements as Γn = Γn−1 + γn and We have given a probabilistic method to generate stable Γ V +γ (V +~I ) 3D data from a time of flight camera, using a mesh rep- Vn = n−1 n−1 Γnn n−1 n , which is the online weighted resentation. Our future work includes integration of new arithmetic average of the (Vi−1 +~Ii )i∈{1,...,n} for the vertex, meshes, such that the mobile robot can connect its current ie. the maximum likelihood estimation. representation of the world to any new data it sees. [2] [3] 3 Real-time Implementation This research was supported in part by Global Centers Of We implemented our PPM algorithm on our custom-made Excellence (GCOE). mobile robot named Kappa, which is equipped with a References Swiss Ranger TOF camera as a distance sensor. The [1] N. Karlsson, E. di Bernardo, J. Ostrowski, L. Goncalves, code is written as a C++ ROS node for portability to P. Pirjanian, and M. Munich, “The vSLAM Algorithm for other systems. Mesh processing was performed on an Robust Localization and Mapping,” ICRA, pp. 24–29, 2005. off-board computer with two Intel Xeon Quadcores 2.4 [2] Z. C. Marton, R. B. Rusu, and M. Beetz, “On fast surface GHz, though only no parallel processing was used in reconstruction methods for large and noisy point clouds,” particular for multi-core optimization. ICRA, pp. 3218–3223, May 2009. 4 Evaluation [3] R. Rusu, Z. Marton, N. Blodow, A. Holzbach, and M. Beetz, “Model-based and learned semantic object labeling in Experimental setup and results In our experiment, 3D point cloud maps of kitchen environments,” in IROS, we placed a white wall approximate 1.3m away from the pp. 3601–3608, IEEE, 2009.
Copyright 2011 Information Processing Society of Japan.