SSRN Id3575469
SSRN Id3575469
SSRN Id3575469
Mahesh K. Singh
1 Introduction
Estimating the motion and pose of a mobile robot through visual information is
one of the critical problems in machine vision and have been an agile research
domain from several years. It has a wide scope of applications ranging from
accurate robot navigation, surveillance to augmented reality, real-time manipulation,
etc. Number of different tracking methods have been used to determine
the pose estimations, e.g. vision, acoustic, infrared, ultrasonic wave, and radar,
etc. This paper discusses about the vision-based tracking of the robotic system.
Controlling a motion of robot needs the continuous determination of the position
and orientation (pose) in each image in a real-time sequences (video) [3].
In the literature, many works on tracking of a mobile robot using 2D/3D
visual information has been studied. Strelow et al. [17] have used a panoramic
vision system for determining the trajectory of a six degree-of- freedoms (DoFs)
robotic platform. Cozman et al. [1] have presented a novel method for reliable
position estimation of a Viper system in a natural environment. Stephen et
al. [12] have determined the localization of planetary rover using a stereo camera and
SIFT features as the landmarks. Hartmut et al. [18] have presented an
automatic robotic system for gaging and digitalization of 3D indoor scenes using
a 3D laser range finder. Lamon et al. [9] have presented 3D position tracking
method of a mobile robot in rough terrain using 3D Odometry. Later, Lemon [8] has
done the comprehensive study of 3D pose tracking methods. Saudi et al. [10]
have described 3D vision-based system for localization of a mobile robot in the
natural terrain. They have obtained the 3D data from passive stereo technique
and track the robot using the Binary Corner Detector (BCD) algorithm. Sheshadri et
al. [13] have presented the precise localization of the mobile rover by
registration of scene data using image correlation technique. Indrazno et al. [16]
have demonstrated a position based visual tracking of 7 DOF redundant robotic
manipulator system using 3d information obtained from Kinect. Cupec et al. [2]
have presented a pose tracking algorithm of the mobile robot based on extraction of
planar segments from range images obtained from Kinect. Zhang and
Zhao [21] have presented 3D pose tracking and localization method through an
Autonomous Mobile Rover (AMR) using ToF camera. all the above described
methods take the pose of a mobile robot without modelling the visual sensor
error. Therefore, these methods have some inherent intrinsic error that needs to
be addressed for more accurate motion analysis.
In this paper, we first present the error modelling technique of Kinect sensor
which is used for rover motion planning. Kinect sensor extracts the rover range
information based on color boxes that is mounted on the rover body. We have
used the color histogram-based method to determine the pose during motion.
Later, we present the Kalman filter based rover pose tracking in real-time that
provides a minimum error path for the mobile rover.
The remainder of the paper is structured as follows: the detailed description of the
proposed method is mentioned in Section 2. Section 3 shows the experimental analy-
sis and results. Finally, the paper is summarized in Section 4.
2 Proposed method
Kinect sensor (V1) has been used for the acquisition of depth information for
real-time 3D information of the rover as shown in Fig. 1(a). It consists of an
infrared light emitter, an infrared camera of resolution 640 × 480 and a RGB
camera of resolution 640 × 480 (can also produce a resolution of 1280 × 1024). An
IR emitter projects an infrared pattern to the scene, and simultaneously infrared
and RGB camera capture the depth and color information at the maximum rate
of 30 frame/second. The operational range of the Kinect sensor (V1) is from
0.50 m to 4.5 m with Field of view in direction of vertical 45˚, horizontal 58˚
and diagonal 70˚. It is a low-cost compact range sensing device. For more detail
description about Kinect sensor, the readers can refer to [5, 7, 14, 15, 22]. The
depth information is computed based on the relative disparity between reference and
projected plane. The reference plane provides the known distance of each pixels from
the Kinect and is also called the calibrated plane.
Fig. 1 (b) demonstrates the distance relationship between a scene k th point
from the Kinect’s sensor and a given reference frame; and the computed disparity d
[7]. The depth (Zk) of the point k in scene space is calculated as
(a) (b)
Fig. 1. (a) Kinect sensor and its intrinsic coordinate frame and (b) Determining
Depth information from measured disparity.
𝑧0
𝑧𝑘 = 𝑧 (1)
1+ 𝑜 𝑑
𝑓𝑏
where Z0 is the known distance between reference plane and sensor, f denotes the
focal length of Infra-red (IR) camera sensor and b denotes the base length. The Zk
value of a given point k along with f denotes the imaging scale. Therefore,
corresponding X, Y is computed from its image coordinates that are dependent
on measured Zk. Kinect emitter projects IR dot pattern to the scene that can be
easily corrupted or absorbed by environmental lighting condition. This affects
the correlation and depth disparity measurement, which produces some sort of
noisy 3D information.
Figure 2(a) shows a wheeled mobile rover that has been developed in our robotics
Lab. It has 10DOF rocker bogie type robot along with 4-DOF manipulator
mounted on its body. It is made of eight Maxon EC series brushless DC motors
and these motors are controlled by individual MAXON EPOS position control
unit. The rover can navigate over any type of uneven terrain, can be
said articulated all-terrain-rover (ATR).
(a) (b)
Fig. 2. (a) Side view of the rover on a sandy terrain and (b) Pose of the rover
w.r.to global coordinate frame.
In the indoor system setup, we have positioned the Kinect sensor above
the terrain at a height of 170 cm through iron chassis which is connected to
the ceiling. First, we have defined global {𝐺} and rover {𝑅} coordinate system
through color markers for determining tracking errors in its pose. The rover pose vec-
tor 𝐻 = [𝑋, 𝑌, 𝑍, 𝜑𝑥 , 𝜑𝑦 , 𝜑𝑧 ]𝑇 relative to global frame {𝐺} has been defined as shown
in Fig. 2(b), where (𝑋, 𝑌, 𝑍) are the position vectors and (𝜑𝑥 , 𝜑𝑦 , 𝜑𝑧 ) are orientations
in the respective axes, that is roll-pitch and yaw(heading) angles respectively. Since
the Kinect sensor measures the poses of both frames, therefore we have to apply
frame transformation relation. Figure 3 (a) illustrates the coordinate transformation
scheme. The transformation of Kinect camera frame {𝐶} relative to global frame {𝐺}
is defined as 𝑇𝐺𝐶 and the transformation of rover frame {𝑅} relative to Kinect frame
{𝐶} is defined as 𝑇𝑐𝑅 . Therefore, the transformation of rover frame {R} relative to
global frame {G} is
(a) (b)
Fig. 3. (a) Transformations between different frames and (b) Axes of ellipsoid are
proportional to standard deviation in respective directions.
Now the objective is to calculate the 𝑇𝑅𝐺 = [𝑟𝑅𝐺 𝑡𝑅𝐺 ] and 𝑇𝑅𝐶 = [𝑟𝑅𝐶 𝑡𝑅𝐶 ]. The
positions of both the frame (𝑟𝑅𝐶 , 𝑡𝑅𝐶 ) are directly measured by Kinect. For the
measurement of rotation matrix, three color markers have been placed on the
rover and ground frame as shown in Figure 2(a). The blue marker is defined
as the measuring point on the rover. The 𝑋̂ is normalized direction which is
𝑟 𝑡
𝑇𝑅𝐶 = [ 𝑅 𝑅 ] (7)
0 1
Similarly, we can compute the 𝑇𝐶𝐺 using three different set of color markers.
In this subsection, we illustrate the error analysis of the 3D data obtained from
Kinect sensor based on computed depth data (See Fig. 1(b)). In Equation (1), the
constant parameters are known from datasheet of Kinect sensor, and assuming
depth 𝑑 is a random variable which is normally distribution. Using propagation
relationship, between depth and disparity, we develop a mathematical model
by computing the covariance matrix from the measured error, that denotes the
uncertainty for spatial locations of acquired data samples from Kinect sensor.
The variance of depth measurement (𝜎𝑧2𝑘 ) can be computed from the variance
of the disparity measurement (𝜎𝑧2 ) as [7]
𝛿𝑍𝑘 2
𝜎𝑧2𝑘 = ( ) 𝜎𝑧2 (8)
𝛿𝑑
After computation, we get the following expression:
1
𝑧𝑘 = 𝑧𝑘2 𝜎𝑑 (9)
𝑓𝑏
Since depth information is used in the computation of the 𝑋 and 𝑌 coordinates, there-
fore, the errors in 𝑋 and 𝑌 can be also described as the second order
functions of depth data [7], that is computed as
𝑥 𝑥
𝜎𝑋𝑘 = 2 𝑧𝑘2 𝜎𝑑 = 𝐴𝑧𝑘2 (10)
𝑓 𝑏 𝑓
𝑦 𝑦
𝜎𝑌𝑘 = 𝑧𝑘2 𝜎𝑑 = 𝐴𝑧𝑘2 (11)
𝑓2𝑏 𝑓
1 1
𝜎𝑍𝑘 = 𝑧2𝜎 = 𝐴𝑧𝑘2 (12)
𝑓2𝑏 𝑘 𝑑 𝑓
Figure 3 (b) shows the uncertainty (standard deviation error) ellipsoid map of
3D data in the entire measurable Cartesian space. These random errors will affect the
computation of rover pose during motion. Therefore, we need to calculate the random
error in orientation vectors of ground and rover frame.
The uncertainties in measurement of 𝑋̂, 𝑌̂ and 𝑍̂ of a rover frame 𝑅 when
measured relative to the Kinect sensor frame ({C}) is calculated as following:
Variances in 𝑋̂ and 𝑌̂ can be easily obtained by adding the variances in measurement
of two markers corresponding to the respective axes. Since the 𝑍̂ is the perpendicular
to both the orientations. Therefore, variance in 𝑍̂ can be obtained
by considering each direction of 𝑋̂ and 𝑌̂ as an independent random variable
and propagating the variance using approximation for variance of product of two
Gaussian random variables [6].
1 𝛿𝑇
where 𝑥𝑘 is the states of the dynamic system at 𝑘 𝑡ℎ time-step with matrix 𝐴, [ ],
0 1
𝛿𝑇 being the sampling period and 𝜖𝑘 denotes the interference noise vector character-
ized by Gaussian distribution with zero mean.
The measurement model is simply described as below:
𝑧𝑘 = 𝐴𝑥𝑘−1 + 𝛾𝑘 (18)
with 𝑧𝑘 is the measurement pose vector at the time step 𝑘 and 𝐵 again
1 0
being 2x2 block diagonal matrix with diagonals as [ ]. The random variables
0 0
𝛾𝑘 and 𝜖𝑘 correspond the measurement and process noise respectively. Also,
they are independent to each other with normal probability distributions (pdf)
𝑝(𝛾𝑘 )~𝑁(0, 𝑅)and 𝑝(𝜖𝑘 )~𝑁(0, 𝑄), where 𝑄 and 𝑅 denotes the process noise
covariance and measurement noise covariance that is computed as described in
Eq. 15. Then, a priori and a posteriori determination of errors, and their related
covariances are denoted as 𝑒𝑘− = 𝑥𝑘 − 𝑥̂𝑘− , 𝑃𝑘− = 𝐸[𝑒𝑘− 𝑒𝑘−𝑇 ], 𝑒𝑘 = 𝑥𝑘 − 𝑥̂𝑘 and
𝑃 = 𝐸[𝑒𝑘 𝑒𝑘𝑇 ]where 𝑥̂𝑘 denotes a posteriori state determination at kth time
step 𝑘 when 𝑧𝑘 measurement is given. Kalman filter works in two steps: First
prediction step, where the future state of the dynamical system is estimated
from the given the past measurements. Next, the update step, where the present state
of the dynamical system is computed from the obtained measurement.
Prediction:
𝑥̂𝑘− = 𝐴𝑥̂𝑘−1 (19)
𝑃𝑘− = 𝐴𝑃𝑘−1−
𝐴𝑇 + 𝑄𝑘
Update:
Compute Kalman gain:
− −
𝐾𝑘 = 𝑃𝐾−1 𝐵𝑇 (𝐵𝑇 𝑃𝑘−1 𝐵𝑇 + 𝑅𝑘 )−1 (20)
Update state determination with measurement
𝑥̂𝑘 = 𝑥̂𝑘− + 𝐾𝑘 (𝑧𝑘 − 𝐵𝑥̂𝑘− ) (21)
Update covariance of the error
𝑃𝑘 = (𝐼 − 𝑥̂𝑘 )𝑃𝑘− (22)
Fig. 5. Dynamic execution of pose estimations by Kalman filter and forward kinematics
at different time steps: (a) Rover 3D trajectory on a linear path and (b) Rover 3D
trajectory on ‘S’ shaped circular path.
3 Experimental Results
In this section, the experimental results of the above described pose tracking
method are presented. The proposed pose tracking method demonstrated in
Section 2 is coded in C++ programming language and implemented on an Intel(R)
Core(TM) i7-2600 CPU, 8GB RAM and windows 7 operating system.
The proposed approach is experimentally evaluated on the 3D dataset of terrain
obtained from a Kinect 3D sensor (V1) which is mounted above indoor created
terrain through ceiling on a mobile rover. The default Kalman filter parameters
are as follows: 𝑃0,0 is a diagonal block matrix of 2 × 2 diag [0.02 0.01] and 𝑅0 is
Kinect random covariance noise that is calculated in the previous section.
The rover is instructed to move through a predefined path trajectory (linear
direction) on the plain terrain. The color and depth data from the Kinect sensor
readings are automatically measured and stored to determine the rover motion
in direction of optimum pose (minimum error) using Kalman filter. To determine the
accuracy of determined parameters, the results of pose estimations are
examined with the relative computed pose through the mobile rover simulated
forward kinematics. For more detail study of kinematics modeling and analyses
of mobile rover, reader can refer to [19]. Extensive real-time experiments are
performed to examine the performance of Kalman - filtered results to the forward
kinematics simulated data. Figure 4 (a) shows the variation of estimated
X - direction position state using Kalman filter with forward kinematics data
during motion of rover at various time instant. This figure shows the Kalman
filtered position data tack the path of kinematics position with minimum error.
Similarly, Figure 4 (b-c) show the variation of Y and Z direction position state
measured by Kalman filter and kinematic tools. These figures show the Kalman
filtered data trying to track the Kinematics position with minimum error. Figure 4 (d)
shows the change in state of orientation in X-direction (𝜑𝑥 ) of rover during motion.
This figure shows the filtered data produces minimum error in
the motion. Similarly, Figure 4 (e-f) show the change in state of orientation in
Y and Z - direction (𝜑𝑦 , 𝜑𝑧 ) of rover during motion. Also, in this case, the minimum
error in orientations are produced by filtered data. Figure 5 (a) shows the
mobile rover 3D trajectory tracking path using Kalman filter with the forward
kinematic path during motion of rover at various time instant. This figure shows
that the Kalman filtered 3D pose data tack the path of kinematics position with
minimum error. The comparison of the RMS poses error of the proposed method
with other approaches is shown in Table 1 which clearly states, the proposed
method outperforms with respect to the minimum RMS pose error. Next, the
rover moves on ‘S’ shaped circular path on the terrain and its 3D tracking trajectory at
various time instant is shown in Fig. 5 (b). The experiment results show
that the rover moves on the terrain through filtered pose direction to achieve
desired goals with minimum error.
Table 1. Comparison of the pose error of proposed method with other approaches.
4 Conclusion
In this paper, we have presented a new method for pose tracking of a mobile rover
using 3D information provided by the Kinect sensor. Since Kinect sensor’s 3D data
are prone to unavoidable random errors which are not negligible for accurate pose
tracking. Therefore, we have presented a propagation error model during the compu-
tation of its pose. Kalman filter is used to determine more reliable tracking path from
erroneous 3D information. The experimental results have shown the more precise
tracking of rover pose which validated the correctness of the proposed method. In the
future, we want to develop a real time feedback controller that provides more accurate
trajectory information of a mobile rover.
References