Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
3D Landmarks Extraction from a Range Imager Data for SLAM
Jianguo Jack Wang, Gibson Hu, Shoudong Huang and Gamini Dissanayake
ARC Centre of Excellence for Autonomous Systems
Faculty of Engineering and Information Technology
University of Technology, Sydney
PO Box 123 Broadway, Ultimo, NSW 2007, Australia
Email:
[email protected]
Abstract
This paper introduces a new 3D landmark
extraction method using the range and intensity
images captured by a single range camera.
Speeded up robust features (SURF) detection and
matching is used to extract and match features from
the intensity images. The range image information
is used to transfer the selected 2D features into 3D
points. The range measurement bias and uncertainty
of the range camera are analysed, and their models
are developed for improving the range estimation.
After outliers’ detection and removal using random
sampling consensus (RANSAC), reliable 3D points
are obtained. 3D landmarks for simultaneous
localisation and mapping (SLAM) are selected
from the 3D points considering several factors,
such as the uncertainty and geometry of their
locations. Because of the availability of the SURF
descriptor, the data association in SLAM has been
performed using both the geometry and the
descriptor information. The proposed method is
tested in unstructured indoor environments,
where the range camera moves in six degrees of
freedom. Experimental results demonstrate the
success of the proposed 3D landmark extraction
method for SLAM.
1 Introduction
A fundamental issue in mobile robotics is navigation and
mapping. Only a robot capable of navigating in a safe and
reliable way can achieve true autonomy. State-of-the-art
approaches for metric localization and mapping are
probabilistic methods explicitly considering uncertainty
information modelling sensor noise and imperfections in
robot motion. The approaches can be categorized by their
map representation, including feature, grid or raw data
etc. In feature-based approaches it is assumed that the
physical environment can be modelled by geometric
features [Weingarten, 2006]. Due to the small quantity of
data required to represent these features, the resulting
maps are compact and the associated algorithms efficient.
On the other hand, it requires a reliable feature extraction
mechanism, which depends on the quality of a sensor data
and chosen feature type.
Landmark based Simultaneous localisation and
mapping (SLAM) is the process of building a map of
landmarks in the environment while simultaneously uses the
map to localize the robot. Apart from the estimation or
optimization
algorithm used
in
SLAM,
landmark
extraction, selection, and association are the key steps for a
successful SLAM implementation. This paper introduces a
novel 3D landmark selection, and association method for
SLAM with data collected from a SwissRanger range
camera for robotic search and rescue application.
1.1. Related Work
Laser scanners are one of the most popular sensors
employed in 2D and 3D SLAM because of the accurate
range information they can provide. The most common
approach is to rotate 2D laser scanner to generate 3D
scans [Cole and Newman, 2006] [Lingemann et al., 2005]
[Nuchter et. al. 2007]. The 3D map is usually obtained by
scan matching and trajectory optimization. However, the
use of laser sensor only may exhibit a number of
important limitations: (i) it is difficult provide an
appropriate representation of the map, (ii) the number of
laser points is large and the computation cost is high (iii)
the data association is not easy especially when the loop is
closed [Karlsson et al., 2005].
Vision based SLAM has attracted more attention
recently due to the low cost, light weight and low power
requirements of the camera sensor [Konolige and Agrawal
2008]. In particular, approaches to efficiently represent
the salient regions of an image such as Scale Invariant
Feature Transformation (SIFT) [Lowe, 2004] and SURF
[Bay et al., 2008], provide reliable data association which
is an essential component of SLAM. However, an image
from a 2D camera can not provide depth information
unless stereo vision systems are used [Se et al., 2005]
[Paz et al., 2008]. Although there have been some
interesting SLAM algorithms using a single camera, in
general they are not robust or reliable enough to provide
accurate depth information, particularly at longer
distances.
Combining range sensors with cameras will
certainly allow for the building of more accurate 3D
maps. Recently, some research along this direction has
been published. For example, [Cole and Newman, 2006]
make use of a tilting 2D laser scanner and a camera for
SLAM in outdoor environments. The camera is mainly
used in the loop closure detection. [Ohno and Tadokoro,
2005] use a 3D scanner to generate the 3D map, and then
use the dense texture images to recover the 3D shape.
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
Ellekilde et al. [2007] proposed a combination of
sensors in the form of a digital passive camera with one
low resolution range camera, CSEM SwissRanger SR-2
[Oggier et al., 2004]. They concluded that the resolution
of the SwissRanger is inadequate to be able to extract
reliable geometric features of the environment directly
from the intensity image with SIFT. Thus visually salient
features extracted from a conventional camera are used to
help in associating range images and representing the
environment.
Extract
SURF
features
Intensity
image
Range
image
Match
features
Range image
rectification
Remove 2D
outliers
1.2. Contributions
SwissRanger is a time-of-flight range imaging device that
delivers distances as well as grey-level (i.e. intensity)
images. Therefore in principle, it is possible to extract 2D
feature points from the intensity image as from common
images. We demonstrate in this paper that reliable 2D
features can be detected from SwissRanger’s intensity
images with SURF [Bay et al., 2008]. Data collected by
SwissRanger only can be processed to provide reliable 3D
landmarks for the navigation of robots with six degrees of
freedom. This approach not only reduces the complexity
of robot navigation and mapping system by avoiding
sensor fusion, but also satisfies the special requirement in
some applications where no enough nature light is
available and visible illumining is forbidden, such as
victim search and rescue.
The proposed landmark extraction method
combines several strategic approaches. First 2D features
are extracted from intensity images and matched by
SURF. Adequate number of matches is obtained by
adjusting a threshold parameter. Then 2D constraints are
applied to remove obvious mismatches. 3D feature points
are then interpolated with the four closest range
measurements of each feature. After that 3D rotation and
translation RANSAC [Fischler and Bolles, 1981] is
applied to exclude outliers. The number of points in each
frame is controlled in the landmark selection processes.
Considering the uncertainty and geometry of their
location, good quality 3D points are further selected as
landmarks. The SURF descriptor of all the landmarks are
kept for robust data association and close loop detection.
Each landmark’s locating quality can be represented by
the range measurement uncertainty. Field tests with
ground truth have been conducted to collect data for
performance evaluation.
1.3. Framework
In order to achieve robust 3D landmark extraction from
SwissRanger data, it is necessary to design a proper data
processing procedure, which is illustrated in Figure 1.
SURF features are extracted from the 2D
intensity image and matched across those from previous
images. After removing outliers, matched features are
scored and selected for 3D point interpolation. The ranger
images are rectified to remove camera lens distortion and
range measure bias. Then the 3D position of selected
features is computed from the corresponding pixels in the
range image. After coordinate transformation and
uncertainty estimation, RANSAC is applied to remove
poor quality 3D points and outliers, and to compute the
initial estimation of relative camera pose. If adequate 3D
points have been found, a 3D landmark selection
algorithm is applied based on the uncertainty and
geometry of their location. Otherwise, all the points are
registered as landmarks.
Score &
Select
matches
3D point
interpolation
Coordinate
transformation
Uncertainty
estimation
Select all the
3D points and
estimate pose
No
Enough
inliers
Yes
Select 3D
landmarks
Continuously observed
landmarks searching
Landmarks
and poses
3D RT
RANSAC
3D SLAM &
data association
Figure 1. Flowchart of 3D landmark selection
Data association is conducted after the landmark
selection. One challenge in data association is that how to
efficiently and reliably search previously observed
landmarks for finding the re-observed ones. The details of
important steps in this flowchart are presented in
following sections. The rest of the paper is organized as
follows. Intensity image processing is described in
Section 2, including feature detection, matching and
selection. Range image processing consists of image
rectification, measurement uncertainty estimation, 3D
point interpolation, outlier detection and landmark
selection. All these are introduced in Section 3. Section 4
addresses data association and Section 5 is for field test
and results analysis, following by conclusion.
2 Intensity Image Processing
A SwissRanger camera consists of a two dimensional
dedicated image sensor and a modulated near-infrared
(NIR) light source. Every pixel on the sensor samples the
amount of modulated light reflected by objects in a scene
four times every period at equal intervals. Its range
measurement is calculated from the four measurements,
and its intensity image gets from the average of them. The
intensity images captured by a SwissRanger camera are
basically the same as images from common cameras, but
have low resolution. Despite the low resolution, we found
that some feature extraction methods developed for
common images can still be effective for the intensity
images.
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
2.1
SURF Feature Extraction
A wide variety of scale and rotation invariant feature
extraction methods have already been proposed for
finding correspondences between images. SIFT and
SURF methods are the most popular ones. The main
strength of them is to produce a feature descriptor that
allows quick comparisons with other features and is rich
enough to allow these comparisons to be highly
discriminatory. The robust data association between their
features is particularly effective. We tested both methods
and find that SURF can detect more features and more
suitable to our application.
The descriptor generated by SURF describes a
distribution of Haar-wavelet responses within the interest
point neighbourhood, with only 64 dimensions. This
reduces the time for feature computation and matching,
and increases simultaneously the robustness. Its indexing
step is based on the sign of the Laplacian, which improves
not only the matching speed, but also the robustness of the
descriptor.
2.2
Feature Matching
There are two popular strategies to match feature
descriptors. One is called nearest- similarity-ratio (NSR)
matching strategy, where the difference of the best and
second best matches must be less than a certain ratio to
guarantee reliable matching. The other one is named as
similarity-threshold-based (STB) matching strategy. This
technique is better suited to represent the distribution of
the descriptor in its feature space [Bay et al., 2008].
The feature matching algorithm provided in the
SURF package is based on NSR [Torr, 2002]. It cannot
avoid multi-matching, and the threshold in the algorithm
needs to be adaptively adjusted as a proper number of
matches are expected. As both NSR and STB matching
strategies have pros and cons, it is anticipated to get better
matching results with a wise combination of them.
A new feature matching and selection algorithm
is proposed in this paper by combing NSR and STB
matching strategies. It can avoid multi-matching and
provide required number of matched feature pairs. First
the threshold parameter of NSR is set to reasonable high
to allow enough matches to be detected. Then some
constraints based on statistical value of the features’ 2D
displacement are implemented to remove evidential
mismatches. The similarity scores of the matches are
sorted and the best ones of desired number are selected
for further process.
Figure 2 shows an example of SURF features
matching in a pair of SwissRanger images. The right two
images are the pair of intensity images and the left two are
range images. There are 53 matched pairs after NSR,
showing in the right figures. It is clearly shown that some
mismatched pairs appear. After the proposed process, the
prefer number of best matches (here is 30) are selected as
shown in the left images. Most of them (if not all of them)
are inliers.
In order to investigate how the SURF feature
matching score affect the reliability of the selected
features, we conducted field tests to find the relation
between the matching score and the reliability of the
SURF features. Test results show that the features with
high reliability also have high feature matching scores.
This indicates that our approach that combines NSR and
STB to select reliable features as landmarks is feasible.
Figure 2. Example of SURF feature matching
It should be noted that sufficient overlapping
between images is required for a significant number of
features to be matched. Note also that the set of SURF
feature descriptors are saved for each landmark. This is
very important for data association, which is detailed in
Section 4; and one of the major advantages of the
proposed method using SURF features.
3 Range Image Processing
The SwissRanger cameras are based on the principle of
emitting modulated NIR light with a selected frequency
and then measuring the phase shift of the reflection
[CSEM, 2006]. The distance measurements are encoded
into 16 bit integers. Ideally the mapping should be such
that 0 corresponds to 0 m and 65536 to 7.5 m, the known
non-ambiguity range of the sensor. The cameras have
been calibrated to compensate offset and scaling errors.
In addition to range images, intensity images are
also captured, which is related to the reflected signal’s
strength. The S/N at each pixel has been manifested to
exhibit a direct correspondence to the strength of the
reflected signal [Oggier et al., 2004]. It is thus reasonable
to assume that noise on acquired distance measurement is
primarily related to the variance of the signal’s strength.
Robbins et al. [2009] analysed this relation by conducting
a test. We have investigated the uncertainty and bias of
range measurement and proposed a few models for their
estimation.
3.1
Range Image Rectification
In order to calibrate the SwissRanger camera, a two-stage
procedure was adopted [Weingarten, 2006]. As the sensor
has a standard optical lens refracting the reflected light
onto the imaging chip, distortions and misalignments can
be compensated using common video camera calibration
techniques [Bouguet, 2008].
The relation between range resolution and
distance was provided in the product manual [CSEM,
2006]. Weingarten [2006] proposed an empirical relation
to calibrate range measurement, which is only depended
on the range measurement itself and missing some other
factors. It has been demonstrated that for a large range of
illumination levels the range uncertainty is essentially
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
only limited by the noise of the available light [Oggier et
al., 2004]. This allows reliably predicting the obtainable
range uncertainty if the S/N can be measured. In other
words, for every range measurement there is a reliable
prediction of its uncertainty, which can be exploited by
robot navigation algorithms.
Figure 3 shows the range and intensity images of
a calibration board. It shows clearly in the range image
(left) that the noise of range measurement corresponding
to the black squares in the intensity image (right) is much
higher than the noise in white squares.
by range image calibration, as -19.1882, 3000, 300 and
0.5 respectively.
Figure 4 is the plot of the test data showing the
relationship between the intensity of features and their
range measurement uncertainty. This is similar to
previous test results [Ellekilde et al., 2007].
600
500
400
300
200
100
Figure 3. Range and intensity image of a calibration board
We found that actually the S/N not only affects
uncertainty of range measurement, but also the range bias.
Previous research also found this and suggested a formula
for uncertainty estimation [May et al., 2006]. However,
no quantitative model for bias has been proposed.
Two models are proposed here for estimating
range measurement uncertainty. One directly calculates
the standard deviation (STD) of neighbourhood range
measurements. This is true when an object surface is flat
and perpendicular to the direction of the range. However
it is inaccurate in other situations.
The relation of the neighbourhood STD and
intensity is investigated. Some of the results are listed in
Table 1. It is clearly shown that neighbourhood STD is
highly correlated with the mean intensity value.
0
0
1000
2000
3000
4000
5000
6000
7000
8000
Figure 4. Relationship between intensity and uncertainty
Although the cameras have been calibrated to
compensate range offset and scaling errors, they still have
range bias caused by the intensity variation. Figure 5 is an
example of the plot showing the relationship between
SwissRanger image intensity and the range bias. The blue
dots are the samples of range measurements with
corresponding intensity, the red stars are the median of
the samples and the red line is the true distance.
Intensity vs range bias
0.75
0.7
Table 1. Relation between STD and intensity.
STD (mm) 30.43
11.82
8.53
5.26
3.32
2.20
Intensity 328
428
824
2056
4940
6368
The model we proposed to calculate the range
uncertainty according to both range and intensity
measurement is expressed as the following equation. This
is based on the fact that S/N decides the range
measurement uncertainty, which can be estimated in the
case of constant background light condition. Positioning
the SwissRanger camera at a known distance from the
planar shown in Figure 3, a set of images were acquired
and the noise at each pixel can be statistically determined.
A function can then be fitted to the points obtained by
plotting the uncertainty as a function of the intensity and
range.
δ r = r ⋅ (a + b / ( I − c)k )
where I is the intensity measurement; a is the minimum
range measurement noise; and c the minimum intensity to
get range measurement, which is decided by both the
sensor and background noise. b and k are the other two
parameters in the equation. a, b, c and k can be estimated
0.65
0.6
0.55
0.5
0.45
0
0.5
1
1.5
2
2.5
3
4
x 10
Figure 5. The model of relative range bias and intensity
The range bias can also be empirically modelled
by analysing the range image calibration results. Table 2
and 3 are the proposed lookup tables for range bias
estimation according to intensity and range respectively.
Table 2. Range bias vs. intensity.
Bias (mm) -40.0
-20.0
0.0
Intensity 1
300
1500 20000 24300
0.0
28.0
92.00
25600
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
4 Data Association
Table 3. Range bias vs. range.
Bias (mm) 30.0 30.0 120.0 140.0 140.0 110.0 320.0
Range(m) 0.1
3.2
1.0
1.5
2.0
2.5
3.0
7.5
3D Feature Point Interpolation
As matched pairs of SURF features are obtained from 2D
intensity images, it is necessary to find corresponding 3D
position of the features for SLAM implementation. The
range images are rectified first in both image and range
aspects according to camera and range calibration results.
Then 3D interpolation is applied to calculate the range of
each feature. After that the 3D position of the point
features are transformed to Cartesian coordinate system. It
should be noted that the intensity images do not need to
be rectified because all their pixels exactly match the ones
in corresponding range images.
There is an option to change the order of range
image processing showing in Figure 1 if only rectified
features, instead of all the range data, are required. Range
image rectification block can be put after the 3D feature
interpolation so only the selected 3D feature points are
rectified. This will reduce computation load further.
3.3
3D Rotation and Translation RANSAC
The initial matches between two sets of 3D points in two
frames are obtained from SURF matching, as described
earlier. In fact the matching process is not unambiguous
and can produce mismatched pairings. In order to make
the landmark selection more robust and estimate the
relationship between two camera poses, 3D rotation and
translation RANSAC [Fischler and Bolles, 1981] is
applied to refine the matches.
This algorithm uses the typical RANSAC based
approach to find the inliers of the matched 3D feature
points. N pairs of matched points are randomly selected
from the data set first. Least square 3D point fitting is
applied to compute the rotation and translation that best fit
the N pairs of matched feature points. This transformation
is then applied to all the points within the data set to find
the total number of pairs that are consistent with this
particular transformation. The random selection of N pairs
of points and the above check is repeated M times to find
the transformation with the maximal number of consistent
pairs. This entire subset of matches regarded as correct
can then be used to calculate the camera pose using the
least square fitting of all of them.
Two methods are used for the least square 3D
point fitting to improve the robustness. One is the least
square 3D point fitting techniques proposed by Arun et al.
[1987], where no initial value for the transformation is
required but the uncertainty of the point position cannot
be taken into account. Another is the iterated linearized
least square approach where the uncertainty of the point
position can be considered but an initial value of the
transformation is needed. One way to get the initial value
is using the result from Arun et al. [1987].
As a large number of matched 3D points between
two frames are usually detected, a subset selection is
needed to limit the number of landmarks. A quality
measure has been chosen to avoid features being cluttered
within a small area out of those with the smallest range
measurement error.
One of the key tasks for SLAM is to find correct data
association. A robot has to determine whether a detected
landmark corresponds to a previously seen landmark or to
a new one. Vision-based SLAM approaches often use the
similarity of features’ descriptors from SIFT or SURF etc.
algorithms, as a similarity measurement. If the similarity
between both descriptors is above a certain threshold, the
landmarks are considered to be the same. This technique
can only provide good correspondences when the feature
has been observed from similar viewing angles [Gil et al.,
2006].
Since SURF or SIFT feature descriptors are only
partially invariant to affine projection, the descriptor of
the same feature may be significantly different when
observing it from different viewpoints. For example, a
robot may be unable to make the correct data association
when moving through the same corridor but from
different directions. The descriptor also changes as the
scale factor or brightness changes, or image blur happens,
as analysed by [Bay et al., 2008].
[Gil et al., 2006] proposed a method to deal with
the data association in the context of SIFT features from a
pattern classification point of view. The key idea of their
method is to track visual landmarks during several
consecutive frames and select only those features that stay
comparably stable under different viewing angles. This
reduces the number of landmarks in the resulting map
representation. However, this method requires a feature is
tracked in several consecutive frames before it is selected
as landmark, which is hard to be met when robot is
turning or fast moving.
In this work we proposed a different approach to
deal with the data association in the context of SURF
features. For each landmark, a set of SURF feature
descriptors are associated with it. The selection procedure
of the descriptors is as follows. As soon as a landmark is
selected the first SURF feature descriptor is associated
with it. Then the following descriptors of the landmark
are abandoned unless they are different with the first
descriptor and the other selected ones in certain degree.
At the same time the estimated camera poses number
associated with the selected descriptors are also saved for
improving data association.
Data association can be done either before the
SLAM or during SLAM. Performing data association
during SLAM allows the more accurate landmark and
poses information to be used, and thus used in this
approach. Figure 6 shows the flowchart of the proposed
data association process. The aim of this data association
algorithm is to be robust as well as computationally
efficient.
First geometry of possible matches is estimated.
SURF descriptor matching is then applied with an
adaptive matching ratio to guarantee enough match
quantity without losing match quality. This approach
greatly increases data association accuracy.
The landmarks can be further selected
according to observation frequency. A step interval is
setup to check all landmarks in the global map for
landmarks consistency. Only key landmarks are kept with
decision based on the state of the camera movement when
observation is made and repetitiveness, which can reduce
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
the landmarks size and allow SLAM to perform much
more efficiently.
Get New Observation and
Poses
Covert Observations into
Global Coordinates
Update Global
positions using
SLAM
Remove non key
landmarks
For
a=1:NumberOfObservation
Figure 7a. The test field with ground truth
Find Nearest Neighbours
For b=1:
NumberOfCloseFeature
SURF match a and b by
Descriptor Set
SURF match a and all
best descriptors from b
New: Set
new
landmarks’
ID with
their
descriptors
Old: Add
descriptor to
them and
update
landmarks’
ID
Classify Descriptors
Figure 6. Data association flow chart
To achieve a good estimate of global position of
positions and odometry, our SLAM optimisation is
performed at the end of each non key landmarks removal
step. Assuming there is not a large degree of uncertainty
in the trajectory, a good estimate of possible loop closures
can be calculated.
5 Tests for SLAM
In order to verify the performance of proposed 3D
landmark extraction method, field test data with ground
truth were collected.
5.1
The experiment setup
The experiment is setup such that it mimics a real search
and rescue environment. A SwissRanger camera is
mounted on a push trolley, exactly 0.52m above the
ground, which we negotiate through a set path marked on
the ground. The path also has elevation in the z axis so
that it can represent a more practical 3D SLAM situation.
Features such as, blocks, crates, mannequins and blankets
are scattered randomly throughout this environment.
Figure 7b. The test equipment settings
Data is collected by sampling the camera frame
at a set rate. The intensity and the range images are stored
while the trolley is pushed around the circuit. The
trajectory is created in a way that similar features can be
re-observed while also closing a loop. Figure 7a shows an
example of the environment, and Figure 7b shows the test
equipment settings.
5.2
The SLAM algorithm
The SLAM algorithm implemented is a recent developed
efficient SLAM algorithm called 3D I-SLSJF: Iterative
sparse local sub-map joining algorithm [Hu et al., 2009].
This approach is computationally more efficient than the
typical maximum likelihood (ML) method and also shows
much better accuracy compared with 3D EKF (Extended
Kalman Filter) SLAM. The method essentially builds
several local maps with high local feature accuracy and
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
then joins them to form a global map both using the least
squares approach. The algorithm itself exploits the
sparseness of the map joining process making it
computationally more efficient. Since the algorithm itself
is incremental in nature, it can be easily implemented onto
a real time system.
5.3
2
1.5
1
0.5
Test Results Comparison
0
The selected 3D landmarks and camera poses from the
output of the 3D rotation and translation RANSAC are
used as the input in the 3D I-SLSJF SLAM algorithm, as
shown in Figure 1. The initial estimation of the camera
poses is given a very high uncertainty covariance, so that
the model will be heavily dependent on observations.
Figure 8a shows the results based on odometry
only (the odometry is obtained from the relative poses
computed from the least square fitting between
consecutive frames), in which all the landmarks in each
frames are treated as new landmarks. Comparing with the
ground truth trajectory, the green line shown in the figure,
it can be seen that the error is quite large.
-0.5
5
4
3
2
1
6
0
5
-1
4
3
-2
2
-3
1
0
-4
-5
-1
-2
Figure 8c. SLAM results with the close loop information.
Figure 8c shows the SLAM results with loop
closure using proposed data association for the 3D
landmark data. The navigation result is improved
significantly if a closed loop situation is experienced. Like
all SLAM methods, The SLAM result is still not very
satisfactory presumedly due to the range measurement
bias and some other facts of imperfection. Accurate data
association and landmark identification is very important
as to avoid convergence towards the wrong solution.
2
1.5
1
0.5
0
6 Conclusion
-0.5
-1
-1.5
5
4
3
2
1
6
0
5
4
-1
3
-2
2
-3
1
0
-4
-5
-1
-2
Figure 8a. Navigation result with odometry estimation.
2
1.5
1
0.5
0
-0.5
-1
6
4
2
0
6
-2
5
4
3
-4
2
This paper demonstrated that it is possible to extract and
match 3D point features using SwissRanger data only for
successful SLAM where the camera moves in six degrees
of-freedom without odometry information. Although the
quality of intensity image obtained from the SwissRanger
is not very good, SURF feature detection can be used to
successfully extract and match features in the image
plane. The 3D position of the matched feature can be
further obtained by the range image. After proper outlier
removal and data association steps, the extracted 3D
landmarks are used in the SLAM to obtain an estimate of
the camera poses and landmark positions.
This work shows that it is sufficient to use a
single SwissRanger to perform SLAM without the need of
combination with other sensors. However, there are still
some limitations of the current research. For example, the
ambiguity of the range measurement need to be further
analysed. The 3D I-SLSJF can still be slow if too many
landmarks are selected for SLAM. The success rate of this
method largely depends on data association associated
with the feature matching algorithm.
Further research is necessary to make the feature
selection and matching more efficient and reliable such
that it can be used in real time SLAM.
1
0
-1
-6
-2
Figure 8b. SLAM result without data association.
SLAM results without applying the proposed
data association for the 3D landmarks are shown in Figure
8b, where the continuously observed landmarks are
identified. The results show that I-SLSJF SLAM can
greatly reduce the error produced from feature matching.
It improves the pure odometry estimation to some extent.
Acknowledgements
This research is partially supported by Australia Research
Council (ARC) Linkage research project “Precision
three-dimensional localization system for underground
mining vehicles, offering improved productivity and
personnel safety” (LP0884112).
The authors appreciate Dr. Alen Alempijevic,
Dr. Jaime Valls Miro and Mr Daniel Hordern for their
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
help in SwissRanger
calibration.
camera
configuration
and
References
[Bay et al., 2008] Bay, H., A. Ess, T. Tuytelaars, and L.
V. Gool. SURF: Speeded Up Robust Features, Computer
Vision and Image Understanding (CVIU), 110(3),
346--359.
[Bouguet, 2008] Bouguet, J.-Y. Camera Calibration
Toolbox for Matlab.
[Cole and Newman, 2006] Cole, D. M., and P. Newman.
Using laser range data for 3D SLAM in outdoor
environments, in Proceedings of IEEE International
Conference on Robotics and Automation.
[CSEM, 2006] CSEM. SwissRanger SR-3000 Manual,
Version 1.02.
[Ellekilde et al., 2007] Ellekilde, L.-P., S. Huang, J. Valls
Miro, and G. Dissanayake. Dense 3D Map Construction
for Indoor Search and Rescue, Journal of Field Robotics,
24(1/2), 71-89.
[Fischler and Bolles, 1981] Fischler, M., and R. Bolles.
RANdom SAmpling Consensus: a paradigm for model
fitting with application to image analysis and automated
cartography., Communications of the Association for
Computing Machinery, 24, 381–395.
[Gil et al., 2006] Gil, A., O. Reinoso, O. M. Mozos, C.
Stachniss, and W. Burgard. Improving Data Association
in Vision-based SLAM, in Proceedings of 2006 IEEE/RSJ
International Conference on Intelligent Robots and
Systems, Beijing, China, October 9 - 15.
[Hu et al., 2009] Hu, G., S. Huang, and G. Dissanayake.
3D I-SLSJF: A Consistent Sparse Local Submap Joining
Algorithm for Building Large-Scale 3D Maps, in 48th
IEEE Conference on Decision and Control, Shanghai,
China.
[Karlsson et al., 2005] Karlsson, N., E. Di Bernardo, J.
Ostrowski, L. Goncalves, P. Pirjanian, and M. E. Munich.
The vSLAM algorithm for robust localization and
mapping, in Proceedings of
IEEE International
Conference on Robotics and Automation, Barcelona,
Spain.
[Konolige and Agrawal 2008] Konolige K., and M.
Agrawal FrameSLAM: from Bundle Adjustment to
Realtime Visual Mappping IEEE Transactions on
Robotics,, Vol. 24, No. 5. (2008), pp. 1066-1077.
[Lingemann et al., 2005] Lingemann, K., A. Nüchter, J.
Hertzberg, and H. Surmann. High-speed laser localization
for mobile robots, Robotics and Autonomous Systems, 51,
275–296.
[Lowe, 2004] Lowe, D. G. Distinctive image features
from scaleinvariant keypoints, International Journal of
Computer Vision, 60(2), 91–110.
[May et al., 2006] May, S., G. Werner, H. Surmann, and
K. Pervölz. 3D time-of-flight cameras for mobile robotics,
in Proceedings of the IEEE International Conference on
Intelligent Robot and Systems, Beijing, China.
[Nuchter et. al. 2007] Nuchter, A, K. Lingemann, J.
Hertzberg, and H. Surmann, 6D SLAM - 3D Mapping
Outdoor Environments Journal of Field Robotics (JFR),
Volume 24, Issue 8-9, pages 699 - 722, 2007.
[Oggier et al., 2004] Oggier, T., M. Lehmann, R.
Kaufmann, M. Schweizer, M. Richter, P. Metzler, G.
Lang, F. Lustenberger, and N. Blanc. An all-solid-state
optical range camera for 3D real-time imaging with
sub-centimeter depth resolution (SwissRanger), in
Proceedings of SPIE The International Society of
Optical Engnieering.
[Ohno and Tadokoro, 2005] Ohno, K., and S. Tadokoro.
Dense 3D map building based on LRF data and color
image fusion, in Proceedings of the IEEE International
Conference on Intelligent Robot and Systems.
[Paz et al., 2008] L. M. Paz, P. Pinies, J. D. Tardos, and J.
Neira, Large-Scale 6-DOF SLAM With Stereo-in-Hand,
IEEE Transactions on Robotics,, Vol. 24, No. 5. (2008),
pp. 946-957.
[Robbins et al. 2009] Robbins, S, B. Murawski, and. B.
Schroeder, Photogrammetric calibration and colorization
of the SwissRanger SR-3100 3-D range imaging sensor,
Optical Engineering, 48(5), 053603,_May 2009.
[Saez et al., 2005] Saez, J. M., F. Escolano, and A.
Peñalver. First steps towards stereo-based 6DOF SLAM
for the visually impaired, in Proceedings of the IEEE
Computer Society Conference on Computer Vision and
Pattern Recognition—Workshops, San Diego, CA, USA.
[Se et al., 2005] Se, S., D. G. Lowe, and J. Little.
Vision-based global localization and mapping for mobile
robots, IEEE Transactions on Robotics, 21(3), 364–375.
[Torr, 2002] Torr, P. H. S. A Structure and Motion
Toolkit in Matlab: Interactive Adventures in S and M.
[Weingarten, 2006] Weingarten, J. Feature-based 3D
SLAM, Swiss Federal Institute of Technology in
Lausanne.