Kinect 6
Kinect 6
Kinect 6
Y−Deviation (m)
2.1 ms, feature extraction: 3.1 ms, initial rotation estimation: 0.1
1.0 ms, feature matching: 6.0 ms, inlier detection: 2.2 ms, and
motion estimation required less than 0.1 ms. Runtimes on the 0
MAV are slightly higher due to the slower clock speed, but
are well within real-time.
−0.1
B. Mapping
Visual odometry provides locally accurate pose estimates; −0.2
however global consistency is needed for metric map gen- −0.2 −0.1 0 0.1 0.2
eration and navigation over long time-scales. We therefore X−Deviation (m)
integrate our visual odometry system with our previous work Metric
in RGBD-Mapping [8]. This section focuses on the key Duration 90 s
decisions required for real-time operation; we refer readers to Mean speed 0.10 m/s
our previous publication for details on the original algorithm Mean pos. deviation 6.2 cm
that emphasizes mapping accuracy [8]. Max pos. deviation 19 cm
Unlike the local pose estimates needed for maintaining
stable flight, map updates and global pose updates are not
Fig. 3. A plot showing the ground truth trajectory of the vehicle during
required at a high frequency and can therefore be processed position hold. The red dot near the center is the origin around which the
on an offboard computer. The groundstation computer detects vehicle was hovering.
loop closures, computes global pose corrections, and con-
structs a 3D log-likelihood occupancy grid map. For coarse
navigation, we found a 10 cm resolution to provide a use- they were based was taken, we incorporate the correction by
ful balance between map size and precision. Depth data is retroactively modifying the appropriate position estimate in the
downsampled to 128×96 prior to a voxel map update to state history. All future state estimates are then recomputed
increase the update speed, resulting in spacing between rays of from this corrected position, resulting in globally consistent
approximately 5 cm at a range of 6 m. Incorporating a single real-time state estimates.
frame to the voxel map currently takes approximately 1.5 ms.
We emoploy a keyframe approach to loop closure – new III. E XPERIMENTS
frames are matched against a small set of keyframes to detect
loop closures, using a fast image matching procedure [8]. We have conducted a number of autonomous flight exper-
When a new keyframe is added, a RANSAC procedure over iments, both in a motion capture studio, and in larger envi-
FAST keypoints [16] with Calonder randomized tree descrip- ronments at a number of locations around the MIT campus,
tors [5] is run. The correspondence is accepted if there are and the Intel Research office in Seattle. The vehicle flew
at least 10 inliers with small enough reprojection errors after autonomously with state estimates provided by the algorithms
RANSAC. The final refined relative pose between keyframes presented in this paper. The vehicle was commanded through
is obtained by solving a two-frame sparse bundle adjustment the environment by a human operator selecting destination
system [13], which minimizes overall reprojection error. The waypoints using a graphical interface.
final pose graph is optimized using TORO [6]. Figure 3 shows an example where the MAV was com-
manded to hover at a target point, along with statistics about
C. State estimation and control how well it achieved this goal. Ground truth was recorded by
To control the quadrotor, we integrate the new visual a motion capture system. In larger environments, the SLAM
odometry and RGB-D Mapping algorithms into our system algorithm limits the global drift on its position estimates by
previously developed around 2D laser scan-matching and detecting loop closures and correcting the trajectory estimates.
SLAM [2]. The motion estimates computed by the visual The trajectory history can then be combined with the depth
odometry are fused with measurements from the onboard IMU data to automatically generate maps that are useful both for a
in an Extended Kalman Filter. The filter computes estimates human operator’s situational awareness, and for autonomous
of both the position and velocity, which are used by the PID path planning and decision making. Figure 5a shows an
position controller to stabilize the position of the vehicle. occupancy voxel map constructed during a flight. Figure 5b
We keep the SLAM process separate from the real-time shows a rendering of the MAV’s internal state estimates
control loop, instead having it provide corrections for the real- as it flew through the environment depicted in Figure 6b,
time position estimates. Since these position corrections are and a path planned using the occupancy map and a simple
delayed significantly from when the measurement upon which dynamic programming search strategy. A video demonstrating
(a) (b)
Fig. 4. Trajectories flown by the MAV in two navigation experiments.
(a) (b)
Fig. 5. (a) Dense maximum-likelihood occupancy voxel map of the environment depicted in Fig. 4a, false-colored by height. Unknown/unobserved cells are
also tracked, but not depicted here. (b) Using the voxel map generated for Fig. 4b, the vehicle plans a collision-free 3D trajectory (green).
autonomous flight and incremental mapping is available at: extremely uncertain [4], or to plan in ways that minimize
http://groups.csail.mit.edu/rrg/rss2011-mav. future uncertainty [7].
Looking further afield, the current system allows the vehicle
IV. D ISCUSSION AND F UTURE W ORK to reason about the world in places that is has mapped, or
The system described in this paper enables autonomous that are within the sensing horizon of the range sensors.
MAV flight in many unknown indoor environments. However, However, it will usually not have information in the direction
there remain a great number more challenging situations that of travel. Extending the planning horizon will require us to
would severely tax our system’s abilities. Motion estimation break the dependence on metric range sensing, and leverage
algorithms based on matching visual features, such as ours and the qualitative information about the environment available
virtually all other visual odometry techniques, do not perform in monocular imagery. In doing so, we hope to continue to
as well in regions with few visual features. In large open areas, increase the capabilities of the vehicle to plan and explore in
the visible structure is often far beyond the maximum range large scale unknown environments.
of the stereo based sensors.
Handling these challenges will likely require integration of
other sensors such as laser range-finders. As these sensors ACKNOWLEDGMENTS
have different failure modes, they serve to complement each
other’s capabilities. Additional sensing modalities can reduce,
but not eliminate, state estimation failures. Further robustness This research was supported by the Office of Naval Re-
can be gained by designing planning and control systems search under MURI N00014-07-1-0749 and the Army Re-
able to respond appropriately when the state estimates are search Office under the MAST CTA.
Fig. 6. Textured surfaces generated using sparse bundle adjustment, with data collected from autonomous flights.