Academia.eduAcademia.edu

LPG-SLAM: a Light-weight Probabilistic Graph-based SLAM

2020, HAL (Le Centre pour la Communication Scientifique Directe)

AI-generated Abstract

Abstract-Most of Current autonomous navigation solutions critically rely on SLAM systems for localisation, especially in GPS-denied environments but also in urban and indoor environments. Their efficacy and efficiency thus depend on the ability of the underlying SLAM method to map large-scale environments in a data-efficient manner. State of the art systems, while accurate, often require powerful hardware such as GPUs, and need careful tuning of hyperparameters in order to adapt to the user's needs. In this paper, we propose a light-weight but accurate probabilistic 2D graph-based SLAM system. We validate our approach on sequences from the KITTI dataset as well as on data gathered by our experimental platform.

LPG-SLAM: a Light-weight Probabilistic Graph-based SLAM Kathia Melbouci, Fawzi Nashashibi To cite this version: Kathia Melbouci, Fawzi Nashashibi. LPG-SLAM: a Light-weight Probabilistic Graph-based SLAM. ICARCV 2020 - International Conference on Control, Automation, Robotics and Vision, Nanyang Technological University; Zhijiang University; Shenzhen Polytechnic, Dec 2020, Shenzhen / Virtual, China. ฀hal-03081646฀ HAL Id: hal-03081646 https://hal.inria.fr/hal-03081646 Submitted on 18 Dec 2020 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV) December 13-15, 2020. Shenzhen, China LPG-SLAM: a Light-weight Probabilistic Graph-based SLAM Kathia Melbouci1 and Fawzi Nashashibi1 Abstract— Most of Current autonomous navigation solutions critically rely on SLAM systems for localisation, especially in GPS-denied environments but also in urban and indoor environments. Their efficacy an d effi ciency thus depend on the ability of the underlying SLAM method to map large-scale environments in a data-efficient ma nner. State of the art systems, while accurate, often require powerful hardware such as GPUs, and need careful tuning of hyperparameters in order to adapt to the user’s needs. In this paper, we propose a light-weight but accurate probabilistic 2D graph-based SLAM system. We validate our approach on sequences from the KITTI dataset as well as on data gathered by our experimental platform. Index Terms— SLAM, Mapping, 2D Lidar, Graph based optimization, loop closure. is closely related to the work of [7] as they use the same probabilistic framework. However, their solution is less accurate as they use a minimal map-matching, and don’t take advantage of loop closure. We validate our approach on sequences from the KITTI dataset as well as on data gathered using our experimental apparatus, described in section IV. The next section is dedicated to our work’s relation with the state of the art. We detail the proposed system in section III. Quantitave and qualitative results are presented and discussed in section IV, and closing remarks and future directions are given in section V. I. INTRODUCTION In this section, we review advances that have been made in the graph-based approach to SLAM [8] in largescale urban environments based on LIDAR data and position ourselves with respect to recent works. One of the first graph-based SLAM approches targeting large-scale and unstructured environments is described in [9]. The authors present a framework for viewbased SLAM using 2D laser range measurements. They construct local maps of a sequence of aligned scans using a robust iterative scan matching technique incorporated into an extended Kalman filter. To enable the detection of loop closure without depending on prior knowledge, they perform map matching via histogram crosscorrelation and entropy sequence projections. Adding the global optimization step to Kalman filter process grows the computation complexity, which makes the approach unfit for high frequencies applications. Exploiting the local map idea mentioned above, [10] propose a 2D real time graph-based LIDAR SLAM which operates on a filtered point cloud. Pairs of local maps and associated scans are stored in memory, as well as the position of each scan. At fixed time intervals, the scan matcher tries to find the position of a current scan in the stored local maps by correlative-scan-matching ([11]), and adds it as an edge in the graph. Subsequently, the optimizer estimates the best locations for the scans and the associated local maps that reduces to a classical non linear least square error. In order to achieve realtime results on modest hardware, this process follows the sparse pose adjustment method described in [12], implemented in a separate thread. However, the size of each local map constructed by the front-end grows with the number of scans that are inserted into it. Larger numbers of scans in a map lead to more accurate scanmatching, but increase the time that is necessary to Simultaneous Localization And Mapping (SLAM) is critical in urban environments, especially in case of degraded or missing GPS signal. In the past decades, significant progress has been made towards that goal. State of the art solutions rely on fusion between complementary exteroceptive (e.g. LIDAR, camera) and proprioceptive (e.g. IMU, odometry) sensors. While sensor graphs that make use of passive vision as their primary source of geometric information about the environment (e.g. [1], [2], [3]) are inexpensive, they will often fail in poorly textured environments or in scenes with challenging lighting conditions, and will overall have inferior accuracy compared to methods that make use of active vision sensors such as laser sensors. Indeed, the dense and finegrained geometric information that such sensors provide result in less uncertainty in depth values and robustness to challenging environmental changes. The accuracy and robustness of solutions exploiting 3D sensors (e.g. [4], [5], [6]) however comes at the cost of increased computational complexity, which makes them difficult to de ploy on low-end hardware. In this work, we present a realtime method capable of exploiting 3d data from LIDAR which is much less demanding in terms of computation while retaining state of the art accuracy. In particular, we show that in situations where the main goal is accurate localization, a 2d occupancy grid computed from 3d data is sufficient. Fu rthermore, ou r method has fewer hyper-parameters and requires less fine-tuning during deployement in different s ituations. O ur method * This work has been supported by the French project CAMPUS, a PIA project funded by the French ADEME (Agence de l’Environnement et de la Maîtrise de l’Energie). 1 Robotics and Intelligent Transportation Systems (RITS) team, Inria Paris, 56 rue du Charolais, Paris 75012, France II. Related work detect loop closures. Here, we make use of their proposed background loop detection process. In contrast to their work however, the aforementioned trade-off between accuracy and efficency resulting from map size is handled dynamically by our system. In [4], the authors extend the ideas described above to 3D data. Local maps are encoded by surfels [13] associated with a rendered vertex and normal map. Thus the detection of loop closure is no longer done by correlative scan matching but via frame-to-model ICP [14]. Each observed scan is aligned with the known map, trying to minimize a point to plane error with respect to the rendered vertex map. Loop closure is included in the optimization step if the observed scan is consistent with the newly rendered map at the predicted position. While graph optimization is performed on a separate thread, the method still suffers from high time complexity as they process all 3D data. In order to better approximate the real world and reject incorrect loop closures, different works have focused on data association strategies. In [15], the posterior distribution over the robot’s location is modeled as a maximum-mixture of Gaussians. While their approach is useful when dealing with perceptual aliasing and multiple loop closure hypotheses, we found that an M-estimator produced satisfactory results in our experiments. Since data association is a major source of error accumulation and drift, some works have mainly focused on improving the scan matching process ([15], [16]). For example, IMLS-SLAM [6] forgoes the use of a backend to concentrate on scan-to-model matching, which results in detailed and accurate maps. However, their method is very costly and doesn’t run in real-time. In this work we present a complete solution for graph based probabilistic SLAM using 2D occupency grids which operates in real-time thanks to an efficient map management process, and which dynamically handles the trade-off between accuracy and efficency. In particular, our experimental results demonstrate that accurate localization is possible by using 2d occupency grids instead, which is less expensive than processing the entire 3d point cloud. III. System overview An overview of our system is given in figure 1. The input consists of LIDAR scan data, which can be an accumulation of 2D scans or one revolution from 3D velodyne LIDAR. We preprocess the data in order to remove noise, dynamic objects and the road plane. Subsequently, we estimate the most likely current pose via multi-resolution map matching and then construct local submaps. The submaps are subdivided and stored in a memory-efficient manner. Those will then be retrieved by the optimizer, which as part of our backend, is responsible for largescale corrections. In the following, we detail the main components of our SLAM. Fig. 1. An overview of our system’s main components and their interactions. A. Map construction We represent the environment with an occupancy grid map m, which in order to filter out noise and dynamic objects is estimated via computing a reflection map [17]. Given a set of vehicle poses x0,t and a set of sensor measurements z1,t , the map is estimated by solving m∗ = argmax P(m | z1 , ..., zt , x0 , ..., xt ) (1) m Assuming a uniform prior on the map and following Baye’s rule, we obtain m∗ = argmax P(z1 , ..., zt | m, x1 , ..., xt ) (2) m N = argmax ∏t=1 P(zt | m, xt ) m N = ∑t=1 ln P(zt | m, xt ) whose numerical solution is given by counting the number of time the laser scan hits and misses each cell [17]. For real time map reconstruction, we need to sample the input point cloud since the the number of points provided by recent 3D sensors (e.g. Velodyne [18]) can reach ∼ 200000 points per revolution, which makes processing the entire point-cloud in real-time infeasible on modest hardware. We choose an adaptive point cloud sampling described in [5]. B. Map management Online matching is critical for navigation, and its failure is difficult to rectify without external information (i.e. additional sensors). Thus, local maps have to be large enough. Loading large maps in memory continuously is time consuming. To handle this, we adopt a strategy inspired by the work of [7]. As shown in figure 2, we consider five equal-sized cells for each local map. The first one, noted l0 the sensor is blind to changes that happen during that distance, we assume that the vehicle follows the constant velocity. (4) xt = xt−1 + vt−1 δ t Initial velocity is estimated aligning two laser scans in the absence of other sensors. Otherwise, it is estimated via an IMU/GPS fusion using Kalman filtering and providing a 3D position estimation including velocity. b) Measurement model: To avoid convergence to local minima, most works use an exhaustive search over a set of candidates via correlative-scan matching (e.g. [16], [7], [19]). In this work we follow a similar approach and use multi-resolution-scan-to-map matching to compute p(zt |xt , m). Fig. 2. Map cells used for scan-matching. The purple scan, centered around the vehicle (yellow arrow) is kept for immediate scan matching. The four cells resulting from subdividing the map in a 2 × 2 grid are compressed and stored along with their orientations with respect to the vehicle. and shown in purple, is centered on the vehicle’s position, and is kept in memory for immediate scan-matching. To obtain the four other cells l1 , ..., l4 , we subdevide the local map as a 2 × 2 grid and compute each cell’s orientation oi with respect to the vehicle (shown as a yellow arrow). This results in four pairs (li , oi ) that we compress and store in secondary memory. They will be retrieved only in case of scan-matching failure with l0 , in which case their order of selection will be determined by the precomputed oi . These cells are only kept temporarily until a new local map is built, at which point they are replaced by its five subdivisions. C. Pose estimation The vehicle’s pose is estimated from successive laser scan alignment. A scan S j = {s1 , ..., sn } is a set of n range measurements, with si = [xi , yi , zi ]t expressed in the sensor’s referential frame. At time t a scan St is matched with the previous mapped environment m to estimate the current pose xt . This is done via maximizing the following likelihood: p(x0:t , m | z1:t , u1:t ) = η p(x0 ) Π [p(zt | xt , m) p(xt , M | xt−1 , u1:t )] (3) t where η ∈ R is normalization constant, and where p(xt , m | x1:t−1 , u1:t ) and p(zt | xt , m) respectively correspond to the motion and measurement models, which will be detailed in the following subsections. a) Motion model: Without additional sensors, a constant velocity model is assumed. This assumption is required for lidar based systems with low rotating rate. The frequency of recent 3D lidars is around 10Hz, a 1 second. As car moving at 100km/h travel ≃ 2.8m per 10 At time step t, the Scan St is projected on a previously estimated part of the environment given by reference map mr with position and orientation (pr , θr ) which encodes the log likelihood of observing a point at a specific location. The pose of the current scan in the global coordinates frame, noted Pt ∈ SE(2), is computed by solving Pt = argmin ∑ fm (Pt si ) i (5) i∈N where the function fm finds the nearest occupied cell that lies on the ray starting from pr and pointing in the direction of θr , and then returns its distance to si . We cast equation 5 as a non linear least square minimisation. To make the scan-matching more reliable and efficient for online mapping applications, we use multi-resolution scan-matching over three levels of the grid map (one more level than [16]). D. Pose graph optimization In our system, loop closure is performed as soon as high-confidence candidates become available. As previously mentioned (§II), we handle the issue of multiple candidates using an M-estimator. Given the current scan St at time t, we rank local map subdivisions based on their orientation relative to the vehicle (§III-B) and load them from secondary memory. We then perform many-to-many matching between all the scans from the candidate maps and the scans from St . Once a candidate is selected, we optimize argmin ∑ eTij (s)Λi j ei j (s) P (6) k using Levenberg-Marquardt’s algorithm. The error state ei j between node i and node j is defined as ei j = (logSE(2) ((Pi−1 Pj Pi j ))∨ (7) where ∨ : se(2) :→ R2 maps the tangent space of the corresponding Lie group to 2d Euclidean space, and where Pi j denotes the relative pose between the two frames, which for loop closure candidates is given by the matching process. At the end of the optimization, all local maps in secondary memory are updated. sequence method Ours PML − SLAM 00 2.00 5.01 05 01.40 2.38 07 0.45 1.34 TABLE I IV. Experimental results RMSE of relative translational error (in meters) on the KITTI We have implemented the proposed graph-based SLAM using the C++ programming language, and used the CERES [20] library for bakend optimization needs. All of the presented experiments have been performed on a laptop computer equiped with an Intel Core i5-8350U CPU @ 1.70GHz @ 4 cores, running Windows 10. We present quantitative and qualitative results on the KITTI odometry dataset [21], as well as qualitative results on the ”campus dataset” that we have gathered using with a VLP-16 Puck 1 mounted on top of a car. In all cases, we apply the following preprocessing to the data: 1) we filter the point cloud in order to remove the road and dynamics objects in order to keep the data that correspond to the static structure of the scene. 2) As our SLAM is based on 2D grid-maps in order to be deployable on modest hardware, we remove the z component from the data. As our method is closely related to the state of the art method PML-SLAM [7], we use their estimations as a baseline for comparison and show that our system consistenly produces more accurate results. odometry dataset. A. Experiments on the KITTI odometry dataset Results are presented for the challenging sequences 00, 05 and 07. Corresponding RMSE values of relative translational and rotational errors are respectively reported in tables I and II. We notice that our average translational error in all three sequences is inferior to 2m , while PML-SLAM’s error is on average larger than 4m. Similarly, our orientation estimation is more accurate (∼ 3◦ vs ∼ 6◦ on average). Trajectories as estimated by both PML-SLAM and our method are shown in figure 3 for visual appreciation. In all three cases, it can be seen that our estimated trajectory is closer to the ground truth than the estimation produced by PMLSLAM, which is subject to considerable drift as it doesn’t do any loop closure. Among all sequences, sequence ”00” is particularly challenging due to weaker geometric constraints from the environment. In particular, during initialization from LIDAR data only, the point cloud provides ambiguous constraints in the direction of motion (up to a translation). B. Campus dataset This dataset consists of two sequences campus-A and campus-B that we have gathered using a VLP-16 sensor mounted on top of a car moving at ∼ 30m/s. Unfortunately, we don’t have any reliable ground-truth data for those sequences, so we overlay the estimated 1 https://www.cadden.fr/familles-de-produit/lidar-2d-et-3d/ lidar-3d-velodyne sequence method Ours PML − SLAM 00 2.45 4.00 05 3.34 10.44 07 5.15 5.58 TABLE II RMSE of relative rotational error (in degress) on the KITTI odometry dataset. trajetories over satellite images from google maps (figures 4 and 5). Both sequences contain loops. The campus-A sequence is 459m, which consists of a ∼ 285m loop around buildings followed by a straight trajectory of ∼ 174m where there are numerous dynamic objects (moving cars, pedestrians). The second sequence, campus-B, is ∼ 296 meters long. In this loop, the vehicle travels around 40m on a straight line and then goes through a loop of ∼ 156m. Figure 4 shows the comparison between the trajectory that is estimated by our method (yellow) and the trajectory that is estimated by PML-SLAM (red) on the campus-A sequence. As expected, the loopclosures performed by our system corrects the drift that is accumulated by PML-SLAM. Similar results can be observed in figure 5 for campus-B. C. Running time and memory consumption All experiments have been performed on an intel Core i5-8350U CPU @ 1.70GHz @ 4 cores, running Windows 10. Each new pass through our entire system (matching, pose and map estimation, loop closure and map update) takes on average ∼ 60ms, which is significantly faster than the active sensor’s frequency. Our system’s memory consumption consistenly remains below 1.5Gb. V. Conclusion and future works We have introduced LPG-SLAM, a light-weight probabilistic 2d graph-based SLAM which exploits data from high-frequency 3d sensors such as Velodyne LIDAR. We showed that 2d occupancy grids obtained from projecting 3d point clouds still provide strong geometric constraints that are sufficient for accurate localization on challening benchmarks. Unlike most state of the art methods which require powerful hardware such as GPUs, our formulation results in a lower-dimensional optimization problem which can be efficiently solved on low-end hardware. We saw however in some sequecnes that constraints from LIDAR data can be ambiguous. Future works will therefore include extending our framework into a 00 05 07 Fig. 3. Results on the KITTI odometry benchmark (sequences 00, 05, 07) : 2-d trajectories as estimated by our method and PML-SLAM. [4] [5] [6] Fig. 4. Estimated trajectories from our method (yellow) and PML-SLAM (red) overlayed on google maps satellite images for the campus-A sequence. The trajectory starts in the area denoted A and ends in the area denoted B. [7] [8] [9] [10] [11] [12] Fig. 5. Estimated trajectories from our method (yellow) and PML-SLAM (red) overlayed on google maps satellite images for the campus-B sequence. The trajectory starts in the area denoted A and ends in the area denoted B. more general sensor fusion framework where among other sensors, we take advantage of GPS and odometry. References [1] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, “Keyframe-based visual–inertial odometry using nonlinear optimization,” The International Journal of Robotics Research, vol. 34, no. 3, pp. 314–334, 2015. [2] J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” Transactions on pattern analysis and machine intelligence, vol. 40, no. 3, pp. 611–625, 2017. [3] A. Salehi, V. Gay-Bellile, S. Bourgeois, N. Allezard, and F. Chausse, “Large-scale, drift-free slam using highly robustified building model constraints,” in International Conference [13] [14] [15] [16] [17] [18] on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 1586–1593. J. Behley and C. Stachniss, “Efficient surfel-based slam using 3d laser range data in urban environments.” Proc. of Robotics: Science and Systems (RSS), 2018. W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time loop closure in 2d lidar slam,” in International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 1271– 1278. J.-E. Deschaud, “Imls-slam: scan-to-model matching based on 3d data,” in International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 2480–2485. Z. Alsayed, G. Bresson, F. Nashashibi, and A. VerroustBlondet, “Pml-slam: a solution for localization in large-scale urban environments,” in Workshop on Planning, Perception and Navigation for Intelligent Vehicles (PPNIV). IEEE, 2015. G. Grisetti, R. Kummerle, C. Stachniss, and W. Burgard, “A tutorial on graph-based slam,” Intelligent Transportation Systems Magazine, vol. 2, no. 4, pp. 31–43, 2010. M. Bosse and R. Zlot, “Map matching and data association for large-scale two-dimensional laser scan-based slam,” The International Journal of Robotics Research (IJRR), vol. 27, no. 6, pp. 667–691, 2008. W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time loop closure in 2d lidar slam,” in International Conference on Robotics and Automation (ICRA). IEEE, May 2016, pp. 1271–1278. E. B. Olson, “Real-time correlative scan matching,” in International Conference on Robotics and Automation (ICRA). IEEE, 2009, pp. 4387–4393. K. Konolige, G. Grisetti, R. Kümmerle, W. Burgard, B. Limketkai, and R. Vincent, “Efficient sparse pose adjustment for 2d mapping,” in International Conference on Intelligent Robots and Systems (IROS). IEEE, 2010, pp. 22–29. H. Pfister, M. Zwicker, J. Van Baar, and M. Gross, “Surfels: Surface elements as rendering primitives,” in Proceedings of the 27th annual conference on Computer graphics and interactive techniques, 2000, pp. 335–342. R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Real-time dense surface mapping and tracking,” in International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2011, pp. 127–136. E. Olson and P. Agarwal, “Inference on networks of mixtures for robust robot mapping,” The International Journal of Robotics Research (IJRR), vol. 32, no. 7, pp. 826–840, 2013. E. Olson, “M3rsm: Many-to-many multi-resolution scan matching,” in International Conference on Robotics and Automation (ICRA). IEEE, may 2015. W. Burgard, C. Stachniss, M. Bennewitz, G. Grisetti, and K. Arras, “Introduction to mobile robotics,” Probalistic Sensor Models, University of Freiburg, pp. 2–3, 2011. V. Lidar, “High definition real-time 3d lidar,” https://velodynelidar.com/products/hdl-64e/, April 2020. [19] J. Xie, F. Nashashibi, M. Parent, and O. Garcia-Favrot, “A real-time robust slam for large-scale outdoor environments,” 2010. [20] S. Agarwal, keir Mierle, and Others, “Ceres solver,” http:// ceres-solver.org. [21] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2012.