Kinect 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

情報処理学会第 73 回全国大会

1T-5

Time-of-flight camera based probabilistic


polygonal mesh mapping
Louis-Kenzo Cahier Toru Takahashi Tetsuya Ogata Hiroshi G. Okuno
Graduate School of Informatics, Kyoto University

Introduction focuses on the means to construct and maintain a trian-


gular mesh representation efficiently from high-frequency
In the past decade, time-of-flight cameras – optical depth as typically returned by time-of-flight cameras,
cameras that measure the depth of each pixel – have ignoring – but preparing for – object segmentation and
been made available. Combined with a traditional color more synthetic properties such as generic descriptors of
camera, this type of sensor provides Red-Green-Blue- shape, visual appearance, auditory emissions, articulated
Depth (RGBD) images at high frequencies. With the degree-of-freedom model, etc.
mass-produced Microsoft Kinect introduced in late 2010,
30Hz 640 ∗ 480 RGB images are available for less than 2 Probabilistic Polygonal Mesh
$200. This source of information, while not exempt of
Let us suppose that at time step t we have a 3D mesh
problems, is what the stereo vision community has been
representing part of the environment of the robot. A new
striving to obtain for years.
depth scan St ∈ Mn,p (R3 ) is acquired. The points in St
We believe this new wealth of information, combined are all sensed as the result of light bouncing off some
with state-of-the-art Simultaneous Localization And Map- surface and returning through the optical center O of the
ping (SLAM) techniques such as visual SLAM [1], sets sensor. This type of data is more structured and richer
the stage for a practical revolution in robotics and artifical than general point clouds, for which we don’t have the
intelligence. Indeed, we now have at our disposal methods ray along which light for the sensed pixel passed. In
for sensing geometric, visual and auditory information, all particular, we know that the normal of the real surface at
spatialized in the same global reference frame and bound each sensed point makes an angle less than 90◦ with the
together by SLAM. Because those modalities are the dom- ray of light that bounced off it towards the sensor.
inant forms of information for humans, robots now have For each element St (x, y) of St , a virtual ray is cast
all the essential tools for moving in and interacting with going through it from the position of the sensor’s optical
the physical world (geometry), recognizing common ob- center, which may give rise to an intersection point It (x, y).
jects (vision), and communicating with humans (audition).
As a sensor itself, time-of-flight technology has several Incremental triangulation We then consider all tri-
advantages and disadvantages. At around 10 frames per angles of the forms St (x, y)St (x, y + 1)St (x + 1, y) or
second, it is possible to collect 3D data of a scene much St (x + 1, y)St (x, y + 1)St (x + 1, y + 1); any such triangle is
faster than a pan-tilt laser scan. This increase in speed is added to the mesh M if none of its vertices have an inter-
significant especially for scene reconstruction for mobile section found in the ray casting step, and it satisfies a conti-
robots, which cannot wait the 5-10s required to obtain nuity criterion. This continuity criterion aims at separating
one 3D scan from a laser. The major drawback of TOF estimates that are neighboring in pixel space, but whose
technology is its high frame-to-frame noise, especially off observed points are in fact not neighbors on the same sur-
reflective surfaces. It is this unreliability that we attack face. This happens for exemple when observing 2 objects
in this paper, to improve the TOF sensor’s utility for roughly aligned with the sensor, creating an overlap.
robotics, vision, and other fields. Moreover, neighboring estimates are sensed through 2
In the following sections, we describe a new method projection lines that cross in O but get farther apart as the
to transform TOF data into a probabilistically smoothed distance from the sensor increases, making neighboring
polygonal mesh. We first explain how the robot could estimates of a given surface grow apart as it gets farther.
internally represent its surroundings as a mesh repre- A simple application of Thales’ intercept theorem shows
sentation. Then we describe a method for maintaining us the distance between estimates is proportional to the
this mesh in an online manner, through triangulation of distance of the object. We account for this by comput-
point estimates and converging to stable values through ing a continuity factor for each edge of the triangles,
maximum likelihood estimation. (dmax − dmin )/dmin , which is proportional to the drop in
distance of the estimates, but inversely proportional to
1 Scene representation the distance of the object. We empirically set a maximum
The scene is represented internally by the robot as threshold of 2.5% for the continuity factor of the edges of
a polygonal mesh possibly having several connected potential triangles.
components. The mesh is defined as a pair M = (V, T ) , This procedure incrementally builds a mesh from an
where V is a set of vertices, and T a set of triangles. Each empty start, by creating new connected components when
vertex is simply a point in R3 , and each triangle is a triple parts of the environments are newly observed. However,
(xi , yi , zi ) ∈ N∗+ 3 of indices to vertices in V . estimates whose ray intersected the existing mesh are not
used during the triangulation step.
The triangular nature of the mesh representation is
not relevant; the fact that it is a representation of the Online vertex estimation Each estimate St (x, y) with
environment’s geometric surface, persists through time, an intersection point is a candidate for updating the
is built incrementally, and can be arbitrarily simplified or position of the intersected face’s vertices. For each such
complexified as attention is focused or drifts away are the face vertex V , we compute the angle α = ∠VOSt (x, y),
essential properties of our scene representation. This work the angular gain γ = 1 − min(α /ρ , 1) (ρ is the angular

Copyright 2011 Information Processing Society of Japan.


2-431 All Rights Reserved.
情報処理学会第 73 回全国大会

Figure 2: Experimental setup, with robot facing white wall.


Figure 1: Example comparing directly triangulated data (above)
with a PPM built over several scans (below).

resolution of the sensor, here 0.24◦ ), the innovation vector


~I = (OSt (x, y) − OV )−→
OV /OV and the error signal.
γ is called gain because it is a number between 0
and 1 used to modulate how much an estimate will be
incorporated as a new estimate for a vertex. This is
because the value of each pixel does not only measure
the distance along a line that passes through the center
of the pixel, but rather integrates the distances of points
viewable from the optical center through the extent of the
pixel. We notice that if α > ρ , γ = 0, which means that
the estimate’s distance integration area doesn’t contain
V at all and the influence is null; conversely, if α = 0, Figure 3: Comparison of plane reconstruction performance be-
tween PPM (blue) and raw data (red) over 200 frames
γ = 1 and the estimate will be fully incorporated. The
−→
innovation vector is colinear to OV , and its magnitude robot, and faced the TOF camera towards it such that the
equal to the difference in measured depth. image filled the sensor’s field of view entirely. The robot
One possibility to incorporate the estimate St (x, y) is to remained still during the entire experiment. Our goal here
move candidate vertex with the lowest α using the inno- is to how well the raw data represents a relatively flat
vation vector modulated by the angular gain: V = V + γ~I. surface, versus our PPM. We recorded 200 frames of TOF
Maximum Likelihood vertex estimation However, camera images, and tested the two methods as follows: 1)
this would not guarantee convergence even for a zero- extract the mesh generated by raw point cloud data or the
mean noise sensor. Rather, we propose using a maximum PPM 2) fit a plane to the vertices of the mesh using linear
likelihood estimation from successive estimates, for each least squares 3) extract the correlation coefficient between
vertex. We modify the mesh data structure to add a real the mesh and the plane. The results are plotted in Fig.
number to each vertex, that we call the effective number 3. The higher the coefficient, the better the mesh fits the
of observations Γ; when a vertex is first added as part of plane. Note the quick increase and stability of the PPM
a new triangle, Γ = 1. correlation over the lower and more erratic precision of
Looking at the lifecycle of a vertex that receives N inno- the raw point clouds.
vation vectors ~I1 , . . . , I~N with angular gains γ1 , . . . , γN , we 5 Conclusions and future work
now incorporate new measurements as Γn = Γn−1 + γn and We have given a probabilistic method to generate stable
Γ V +γ (V +~I ) 3D data from a time of flight camera, using a mesh rep-
Vn = n−1 n−1 Γnn n−1 n , which is the online weighted
resentation. Our future work includes integration of new
arithmetic average of the (Vi−1 +~Ii )i∈{1,...,n} for the vertex, meshes, such that the mobile robot can connect its current
ie. the maximum likelihood estimation. representation of the world to any new data it sees. [2] [3]
3 Real-time Implementation This research was supported in part by Global Centers Of
We implemented our PPM algorithm on our custom-made Excellence (GCOE).
mobile robot named Kappa, which is equipped with a References
Swiss Ranger TOF camera as a distance sensor. The [1] N. Karlsson, E. di Bernardo, J. Ostrowski, L. Goncalves,
code is written as a C++ ROS node for portability to P. Pirjanian, and M. Munich, “The vSLAM Algorithm for
other systems. Mesh processing was performed on an Robust Localization and Mapping,” ICRA, pp. 24–29, 2005.
off-board computer with two Intel Xeon Quadcores 2.4 [2] Z. C. Marton, R. B. Rusu, and M. Beetz, “On fast surface
GHz, though only no parallel processing was used in reconstruction methods for large and noisy point clouds,”
particular for multi-core optimization. ICRA, pp. 3218–3223, May 2009.
4 Evaluation [3] R. Rusu, Z. Marton, N. Blodow, A. Holzbach, and M. Beetz,
“Model-based and learned semantic object labeling in
Experimental setup and results In our experiment, 3D point cloud maps of kitchen environments,” in IROS,
we placed a white wall approximate 1.3m away from the pp. 3601–3608, IEEE, 2009.

Copyright 2011 Information Processing Society of Japan.


2-432 All Rights Reserved.

You might also like