LPG-SLAM: a Light-weight Probabilistic Graph-based
SLAM
Kathia Melbouci, Fawzi Nashashibi
To cite this version:
Kathia Melbouci, Fawzi Nashashibi. LPG-SLAM: a Light-weight Probabilistic Graph-based SLAM.
ICARCV 2020 - International Conference on Control, Automation, Robotics and Vision, Nanyang
Technological University; Zhijiang University; Shenzhen Polytechnic, Dec 2020, Shenzhen / Virtual,
China. hal-03081646
HAL Id: hal-03081646
https://hal.inria.fr/hal-03081646
Submitted on 18 Dec 2020
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
2020 16th International Conference on
Control, Automation, Robotics and Vision (ICARCV)
December 13-15, 2020. Shenzhen, China
LPG-SLAM: a Light-weight Probabilistic Graph-based SLAM
Kathia Melbouci1 and Fawzi Nashashibi1
Abstract— Most of Current autonomous navigation solutions critically rely on SLAM systems for localisation,
especially in GPS-denied environments but also in urban
and indoor environments. Their efficacy an d effi ciency thus
depend on the ability of the underlying SLAM method to map
large-scale environments in a data-efficient ma nner. State
of the art systems, while accurate, often require powerful
hardware such as GPUs, and need careful tuning of hyperparameters in order to adapt to the user’s needs. In this
paper, we propose a light-weight but accurate probabilistic
2D graph-based SLAM system. We validate our approach on
sequences from the KITTI dataset as well as on data gathered
by our experimental platform.
Index Terms— SLAM, Mapping, 2D Lidar, Graph based
optimization, loop closure.
is closely related to the work of [7] as they use the same
probabilistic framework. However, their solution is less
accurate as they use a minimal map-matching, and don’t
take advantage of loop closure. We validate our approach
on sequences from the KITTI dataset as well as on data
gathered using our experimental apparatus, described in
section IV.
The next section is dedicated to our work’s relation
with the state of the art. We detail the proposed
system in section III. Quantitave and qualitative results
are presented and discussed in section IV, and closing
remarks and future directions are given in section V.
I. INTRODUCTION
In this section, we review advances that have been
made in the graph-based approach to SLAM [8] in largescale urban environments based on LIDAR data and
position ourselves with respect to recent works.
One of the first graph-based SLAM approches targeting large-scale and unstructured environments is described in [9]. The authors present a framework for viewbased SLAM using 2D laser range measurements. They
construct local maps of a sequence of aligned scans using
a robust iterative scan matching technique incorporated
into an extended Kalman filter. To enable the detection
of loop closure without depending on prior knowledge, they perform map matching via histogram crosscorrelation and entropy sequence projections. Adding the
global optimization step to Kalman filter process grows
the computation complexity, which makes the approach
unfit for high frequencies applications.
Exploiting the local map idea mentioned above, [10]
propose a 2D real time graph-based LIDAR SLAM which
operates on a filtered point cloud. Pairs of local maps
and associated scans are stored in memory, as well as
the position of each scan. At fixed time intervals, the
scan matcher tries to find the position of a current scan
in the stored local maps by correlative-scan-matching
([11]), and adds it as an edge in the graph. Subsequently,
the optimizer estimates the best locations for the scans
and the associated local maps that reduces to a classical
non linear least square error. In order to achieve realtime results on modest hardware, this process follows
the sparse pose adjustment method described in [12],
implemented in a separate thread. However, the size of
each local map constructed by the front-end grows with
the number of scans that are inserted into it. Larger
numbers of scans in a map lead to more accurate scanmatching, but increase the time that is necessary to
Simultaneous Localization And Mapping (SLAM) is
critical in urban environments, especially in case of degraded or missing GPS signal. In the past decades, significant progress has been made towards that goal. State of
the art solutions rely on fusion between complementary
exteroceptive (e.g. LIDAR, camera) and proprioceptive
(e.g. IMU, odometry) sensors. While sensor graphs that
make use of passive vision as their primary source of
geometric information about the environment (e.g. [1],
[2], [3]) are inexpensive, they will often fail in poorly
textured environments or in scenes with challenging
lighting conditions, and will overall have inferior accuracy compared to methods that make use of active vision
sensors such as laser sensors. Indeed, the dense and finegrained geometric information that such sensors provide
result in less uncertainty in depth values and robustness
to challenging environmental changes. The accuracy and
robustness of solutions exploiting 3D sensors (e.g. [4], [5],
[6]) however comes at the cost of increased computational
complexity, which makes them difficult to de ploy on
low-end hardware. In this work, we present a realtime method capable of exploiting 3d data from LIDAR
which is much less demanding in terms of computation
while retaining state of the art accuracy. In particular,
we show that in situations where the main goal is
accurate localization, a 2d occupancy grid computed
from 3d data is sufficient. Fu rthermore, ou r method
has fewer hyper-parameters and requires less fine-tuning
during deployement in different s ituations. O ur method
* This work has been supported by the French project CAMPUS, a PIA project funded by the French ADEME (Agence de
l’Environnement et de la Maîtrise de l’Energie).
1 Robotics and Intelligent Transportation Systems (RITS) team,
Inria Paris, 56 rue du Charolais, Paris 75012, France
II. Related work
detect loop closures. Here, we make use of their proposed
background loop detection process. In contrast to their
work however, the aforementioned trade-off between
accuracy and efficency resulting from map size is handled
dynamically by our system.
In [4], the authors extend the ideas described above
to 3D data. Local maps are encoded by surfels [13]
associated with a rendered vertex and normal map.
Thus the detection of loop closure is no longer done by
correlative scan matching but via frame-to-model ICP
[14]. Each observed scan is aligned with the known map,
trying to minimize a point to plane error with respect to
the rendered vertex map. Loop closure is included in the
optimization step if the observed scan is consistent with
the newly rendered map at the predicted position. While
graph optimization is performed on a separate thread,
the method still suffers from high time complexity as
they process all 3D data.
In order to better approximate the real world and
reject incorrect loop closures, different works have focused on data association strategies. In [15], the posterior
distribution over the robot’s location is modeled as a
maximum-mixture of Gaussians. While their approach is
useful when dealing with perceptual aliasing and multiple
loop closure hypotheses, we found that an M-estimator
produced satisfactory results in our experiments.
Since data association is a major source of error
accumulation and drift, some works have mainly focused
on improving the scan matching process ([15], [16]). For
example, IMLS-SLAM [6] forgoes the use of a backend
to concentrate on scan-to-model matching, which results
in detailed and accurate maps. However, their method
is very costly and doesn’t run in real-time.
In this work we present a complete solution for graph
based probabilistic SLAM using 2D occupency grids
which operates in real-time thanks to an efficient map
management process, and which dynamically handles the
trade-off between accuracy and efficency. In particular,
our experimental results demonstrate that accurate localization is possible by using 2d occupency grids instead,
which is less expensive than processing the entire 3d
point cloud.
III. System overview
An overview of our system is given in figure 1. The
input consists of LIDAR scan data, which can be an accumulation of 2D scans or one revolution from 3D velodyne
LIDAR. We preprocess the data in order to remove noise,
dynamic objects and the road plane. Subsequently, we
estimate the most likely current pose via multi-resolution
map matching and then construct local submaps. The
submaps are subdivided and stored in a memory-efficient
manner. Those will then be retrieved by the optimizer,
which as part of our backend, is responsible for largescale corrections.
In the following, we detail the main components of our
SLAM.
Fig. 1. An overview of our system’s main components and their
interactions.
A. Map construction
We represent the environment with an occupancy grid
map m, which in order to filter out noise and dynamic
objects is estimated via computing a reflection map [17].
Given a set of vehicle poses x0,t and a set of sensor
measurements z1,t , the map is estimated by solving
m∗ = argmax P(m | z1 , ..., zt , x0 , ..., xt )
(1)
m
Assuming a uniform prior on the map and following
Baye’s rule, we obtain
m∗ = argmax P(z1 , ..., zt | m, x1 , ..., xt )
(2)
m
N
= argmax ∏t=1
P(zt | m, xt )
m
N
= ∑t=1
ln P(zt | m, xt )
whose numerical solution is given by counting the
number of time the laser scan hits and misses each cell
[17].
For real time map reconstruction, we need to sample
the input point cloud since the the number of points
provided by recent 3D sensors (e.g. Velodyne [18]) can
reach ∼ 200000 points per revolution, which makes
processing the entire point-cloud in real-time infeasible
on modest hardware. We choose an adaptive point cloud
sampling described in [5].
B. Map management
Online matching is critical for navigation, and its
failure is difficult to rectify without external information
(i.e. additional sensors). Thus, local maps have to be
large enough.
Loading large maps in memory continuously is time
consuming. To handle this, we adopt a strategy inspired
by the work of [7]. As shown in figure 2, we consider five
equal-sized cells for each local map. The first one, noted l0
the sensor is blind to changes that happen during that
distance, we assume that the vehicle follows the constant
velocity.
(4)
xt = xt−1 + vt−1 δ t
Initial velocity is estimated aligning two laser scans in
the absence of other sensors. Otherwise, it is estimated
via an IMU/GPS fusion using Kalman filtering and
providing a 3D position estimation including velocity.
b) Measurement model: To avoid convergence to
local minima, most works use an exhaustive search over
a set of candidates via correlative-scan matching (e.g.
[16], [7], [19]). In this work we follow a similar approach
and use multi-resolution-scan-to-map matching to
compute p(zt |xt , m).
Fig. 2.
Map cells used for scan-matching. The purple scan,
centered around the vehicle (yellow arrow) is kept for immediate
scan matching. The four cells resulting from subdividing the map in
a 2 × 2 grid are compressed and stored along with their orientations
with respect to the vehicle.
and shown in purple, is centered on the vehicle’s position,
and is kept in memory for immediate scan-matching. To
obtain the four other cells l1 , ..., l4 , we subdevide the local
map as a 2 × 2 grid and compute each cell’s orientation
oi with respect to the vehicle (shown as a yellow arrow).
This results in four pairs (li , oi ) that we compress and
store in secondary memory. They will be retrieved only in
case of scan-matching failure with l0 , in which case their
order of selection will be determined by the precomputed
oi . These cells are only kept temporarily until a new local
map is built, at which point they are replaced by its five
subdivisions.
C. Pose estimation
The vehicle’s pose is estimated from successive laser
scan alignment. A scan S j = {s1 , ..., sn } is a set of n
range measurements, with si = [xi , yi , zi ]t expressed in the
sensor’s referential frame. At time t a scan St is matched
with the previous mapped environment m to estimate
the current pose xt . This is done via maximizing the
following likelihood:
p(x0:t , m | z1:t , u1:t ) = η p(x0 )
Π [p(zt | xt , m) p(xt , M | xt−1 , u1:t )] (3)
t
where η ∈ R is normalization constant, and where
p(xt , m | x1:t−1 , u1:t )
and
p(zt | xt , m)
respectively
correspond to the motion and measurement models,
which will be detailed in the following subsections.
a) Motion model: Without additional sensors, a
constant velocity model is assumed. This assumption is
required for lidar based systems with low rotating rate.
The frequency of recent 3D lidars is around 10Hz, a
1
second. As
car moving at 100km/h travel ≃ 2.8m per 10
At time step t, the Scan St is projected on a previously
estimated part of the environment given by reference
map mr with position and orientation (pr , θr ) which
encodes the log likelihood of observing a point at a
specific location. The pose of the current scan in the
global coordinates frame, noted Pt ∈ SE(2), is computed
by solving
Pt = argmin ∑ fm (Pt si )
i
(5)
i∈N
where the function fm finds the nearest occupied cell
that lies on the ray starting from pr and pointing in the
direction of θr , and then returns its distance to si . We
cast equation 5 as a non linear least square minimisation.
To make the scan-matching more reliable and efficient
for online mapping applications, we use multi-resolution
scan-matching over three levels of the grid map (one
more level than [16]).
D. Pose graph optimization
In our system, loop closure is performed as soon as
high-confidence candidates become available. As previously mentioned (§II), we handle the issue of multiple
candidates using an M-estimator. Given the current scan
St at time t, we rank local map subdivisions based on
their orientation relative to the vehicle (§III-B) and
load them from secondary memory. We then perform
many-to-many matching between all the scans from the
candidate maps and the scans from St . Once a candidate
is selected, we optimize
argmin ∑ eTij (s)Λi j ei j (s)
P
(6)
k
using Levenberg-Marquardt’s algorithm. The error
state ei j between node i and node j is defined as
ei j = (logSE(2) ((Pi−1 Pj Pi j ))∨
(7)
where ∨ : se(2) :→ R2 maps the tangent space of the
corresponding Lie group to 2d Euclidean space, and
where Pi j denotes the relative pose between the two
frames, which for loop closure candidates is given by
the matching process.
At the end of the optimization, all local maps in
secondary memory are updated.
sequence
method
Ours
PML − SLAM
00
2.00
5.01
05
01.40
2.38
07
0.45
1.34
TABLE I
IV. Experimental results
RMSE of relative translational error (in meters) on the KITTI
We have implemented the proposed graph-based
SLAM using the C++ programming language, and used
the CERES [20] library for bakend optimization needs.
All of the presented experiments have been performed on
a laptop computer equiped with an Intel Core i5-8350U
CPU @ 1.70GHz @ 4 cores, running Windows 10.
We present quantitative and qualitative results on
the KITTI odometry dataset [21], as well as qualitative
results on the ”campus dataset” that we have gathered
using with a VLP-16 Puck 1 mounted on top of a car.
In all cases, we apply the following preprocessing to the
data: 1) we filter the point cloud in order to remove the
road and dynamics objects in order to keep the data
that correspond to the static structure of the scene. 2)
As our SLAM is based on 2D grid-maps in order to
be deployable on modest hardware, we remove the z
component from the data.
As our method is closely related to the state of the
art method PML-SLAM [7], we use their estimations
as a baseline for comparison and show that our system
consistenly produces more accurate results.
odometry dataset.
A. Experiments on the KITTI odometry dataset
Results are presented for the challenging sequences
00, 05 and 07. Corresponding RMSE values of relative
translational and rotational errors are respectively reported in tables I and II. We notice that our average
translational error in all three sequences is inferior to 2m
, while PML-SLAM’s error is on average larger than 4m.
Similarly, our orientation estimation is more accurate
(∼ 3◦ vs ∼ 6◦ on average). Trajectories as estimated
by both PML-SLAM and our method are shown in
figure 3 for visual appreciation. In all three cases, it
can be seen that our estimated trajectory is closer to
the ground truth than the estimation produced by PMLSLAM, which is subject to considerable drift as it doesn’t
do any loop closure. Among all sequences, sequence
”00” is particularly challenging due to weaker geometric
constraints from the environment. In particular, during
initialization from LIDAR data only, the point cloud provides ambiguous constraints in the direction of motion
(up to a translation).
B. Campus dataset
This dataset consists of two sequences campus-A
and campus-B that we have gathered using a VLP-16
sensor mounted on top of a car moving at ∼ 30m/s.
Unfortunately, we don’t have any reliable ground-truth
data for those sequences, so we overlay the estimated
1 https://www.cadden.fr/familles-de-produit/lidar-2d-et-3d/
lidar-3d-velodyne
sequence
method
Ours
PML − SLAM
00
2.45
4.00
05
3.34
10.44
07
5.15
5.58
TABLE II
RMSE of relative rotational error (in degress) on the KITTI
odometry dataset.
trajetories over satellite images from google maps (figures
4 and 5). Both sequences contain loops. The campus-A
sequence is 459m, which consists of a ∼ 285m loop around
buildings followed by a straight trajectory of ∼ 174m
where there are numerous dynamic objects (moving cars,
pedestrians). The second sequence, campus-B, is ∼ 296
meters long. In this loop, the vehicle travels around
40m on a straight line and then goes through a loop
of ∼ 156m. Figure 4 shows the comparison between the
trajectory that is estimated by our method (yellow)
and the trajectory that is estimated by PML-SLAM
(red) on the campus-A sequence. As expected, the loopclosures performed by our system corrects the drift that
is accumulated by PML-SLAM. Similar results can be
observed in figure 5 for campus-B.
C. Running time and memory consumption
All experiments have been performed on an intel Core
i5-8350U CPU @ 1.70GHz @ 4 cores, running Windows
10. Each new pass through our entire system (matching,
pose and map estimation, loop closure and map update)
takes on average ∼ 60ms, which is significantly faster
than the active sensor’s frequency. Our system’s memory
consumption consistenly remains below 1.5Gb.
V. Conclusion and future works
We have introduced LPG-SLAM, a light-weight probabilistic 2d graph-based SLAM which exploits data from
high-frequency 3d sensors such as Velodyne LIDAR. We
showed that 2d occupancy grids obtained from projecting
3d point clouds still provide strong geometric constraints
that are sufficient for accurate localization on challening
benchmarks. Unlike most state of the art methods which
require powerful hardware such as GPUs, our formulation results in a lower-dimensional optimization problem
which can be efficiently solved on low-end hardware.
We saw however in some sequecnes that constraints
from LIDAR data can be ambiguous. Future works
will therefore include extending our framework into a
00
05
07
Fig. 3. Results on the KITTI odometry benchmark (sequences 00, 05, 07) : 2-d trajectories as estimated by our method and PML-SLAM.
[4]
[5]
[6]
Fig. 4.
Estimated trajectories from our method (yellow) and
PML-SLAM (red) overlayed on google maps satellite images for
the campus-A sequence. The trajectory starts in the area denoted
A and ends in the area denoted B.
[7]
[8]
[9]
[10]
[11]
[12]
Fig. 5.
Estimated trajectories from our method (yellow) and
PML-SLAM (red) overlayed on google maps satellite images for
the campus-B sequence. The trajectory starts in the area denoted
A and ends in the area denoted B.
more general sensor fusion framework where among other
sensors, we take advantage of GPS and odometry.
References
[1] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, “Keyframe-based visual–inertial odometry using nonlinear optimization,” The International Journal of Robotics
Research, vol. 34, no. 3, pp. 314–334, 2015.
[2] J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” Transactions on pattern analysis and machine intelligence, vol. 40, no. 3, pp. 611–625, 2017.
[3] A. Salehi, V. Gay-Bellile, S. Bourgeois, N. Allezard, and
F. Chausse, “Large-scale, drift-free slam using highly robustified building model constraints,” in International Conference
[13]
[14]
[15]
[16]
[17]
[18]
on Intelligent Robots and Systems (IROS). IEEE, 2017, pp.
1586–1593.
J. Behley and C. Stachniss, “Efficient surfel-based slam using
3d laser range data in urban environments.” Proc. of Robotics:
Science and Systems (RSS), 2018.
W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time
loop closure in 2d lidar slam,” in International Conference on
Robotics and Automation (ICRA). IEEE, 2016, pp. 1271–
1278.
J.-E. Deschaud, “Imls-slam: scan-to-model matching based
on 3d data,” in International Conference on Robotics and
Automation (ICRA). IEEE, 2018, pp. 2480–2485.
Z. Alsayed, G. Bresson, F. Nashashibi, and A. VerroustBlondet, “Pml-slam: a solution for localization in large-scale
urban environments,” in Workshop on Planning, Perception
and Navigation for Intelligent Vehicles (PPNIV). IEEE, 2015.
G. Grisetti, R. Kummerle, C. Stachniss, and W. Burgard,
“A tutorial on graph-based slam,” Intelligent Transportation
Systems Magazine, vol. 2, no. 4, pp. 31–43, 2010.
M. Bosse and R. Zlot, “Map matching and data association
for large-scale two-dimensional laser scan-based slam,” The
International Journal of Robotics Research (IJRR), vol. 27,
no. 6, pp. 667–691, 2008.
W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time
loop closure in 2d lidar slam,” in International Conference
on Robotics and Automation (ICRA). IEEE, May 2016, pp.
1271–1278.
E. B. Olson, “Real-time correlative scan matching,” in International Conference on Robotics and Automation (ICRA).
IEEE, 2009, pp. 4387–4393.
K. Konolige, G. Grisetti, R. Kümmerle, W. Burgard,
B. Limketkai, and R. Vincent, “Efficient sparse pose adjustment for 2d mapping,” in International Conference on
Intelligent Robots and Systems (IROS). IEEE, 2010, pp.
22–29.
H. Pfister, M. Zwicker, J. Van Baar, and M. Gross, “Surfels:
Surface elements as rendering primitives,” in Proceedings
of the 27th annual conference on Computer graphics and
interactive techniques, 2000, pp. 335–342.
R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim,
A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Real-time dense surface mapping and
tracking,” in International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2011, pp. 127–136.
E. Olson and P. Agarwal, “Inference on networks of mixtures
for robust robot mapping,” The International Journal of
Robotics Research (IJRR), vol. 32, no. 7, pp. 826–840, 2013.
E. Olson, “M3rsm: Many-to-many multi-resolution scan
matching,” in International Conference on Robotics and Automation (ICRA). IEEE, may 2015.
W. Burgard, C. Stachniss, M. Bennewitz, G. Grisetti, and
K. Arras, “Introduction to mobile robotics,” Probalistic Sensor
Models, University of Freiburg, pp. 2–3, 2011.
V.
Lidar,
“High
definition
real-time
3d
lidar,”
https://velodynelidar.com/products/hdl-64e/, April 2020.
[19] J. Xie, F. Nashashibi, M. Parent, and O. Garcia-Favrot, “A
real-time robust slam for large-scale outdoor environments,”
2010.
[20] S. Agarwal, keir Mierle, and Others, “Ceres solver,” http://
ceres-solver.org.
[21] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for
autonomous driving? the kitti vision benchmark suite,” in
Conference on Computer Vision and Pattern Recognition
(CVPR). IEEE, 2012.