An Approach To Visual Servoing Based On Coded Light
An Approach To Visual Servoing Based On Coded Light
An Approach To Visual Servoing Based On Coded Light
Jordi Pag`es
Christophe Collewet
Francois Chaumette
I. I NTRODUCTION
Visual servoing is a largely used technique which is able
to control robots by using data provided by visual sensors.
The most typical configuration is eye-in-hand, which consists
of linking a camera to the end-effector of the robot. Then,
typical task of positioning the robot with respect to objects or
target tracking are fulfilled by using a control loop based on
visual features extracted from the images [1].
All the visual servoing techniques assume that it is possible
to extract visual measures from the object in order to perform
a pose or partial pose estimation or to use a given set of
features in the control loop. Therefore, visual servoing does
not bring any solution for positioning with respect to nontextured objects or objects for which extracting visual features
is too complex or too time consuming. Note that the sampling
rate in visual servoing must be high enough in order to not
penalising the dynamics of the end-effector and the stability
of the control scheme.
A possible solution to this problem is to project structured
light on the objects in order to obtain visual features. There are
few works in this field and they are mainly based on the use of
laser pointers and laser planes [2][4]. Furthermore, they are
usually designed for positioning with respect to planar objects
or specific non-planar objects like spheres, for example. In this
paper, we propose the use of coded structured light [5]. This
is a powerful technique based on the projection of coded light
patterns which provide robust visual features. This technique
has been largely used in shape acquisition applications based
on triangulation. However, it has never been used in a visual
servoing framework. With the use of coded patterns, visual
features are available independently of the object appearance
so that visual servoing techniques can tackle their limitation
Joaquim Salvi
a)
b)
c)
the one filled by 0s, then M is called an M-array or pseudorandom array [11]. This kind of array has been widely used
in pattern codification because the window property allows
every different submatrix to be associated with an absolute
position in the array. An example of a 4 6 binary M-array
with window property 2 2 is
0 1 1 1 1 0
0 0 1 1 0 0
(1)
0 1 0 0 1 0
0 1 1 1 1 0
This type of arrays can be constructed by folding a pseudorandom sequence [11] which is the unidimensional variant
of an M-array. In this case, however, the length of the
pseudo-random sequence, the size of the resulting M-array
and the size of its window length are correlated. Therefore,
a generic M-array of given dimensions with a given window
property cannot always be constructed. In order to cope with
this constraint, an alternative consists in generating a perfect
submap. This type of arrays has also the window property,
but not all the possible windows are included. Morano et
al. [9] proposed a brute force algorithm for generating perfect
submaps. For example, in order to generate a 6 6 M-array
with window property 3 3 using an alphabet of 3 elements
the procedure is as follows: firstly, a subarray of 3 3 is
randomly chosen and it is placed in the north-west vertex of
the M-array that is being built as shown in Fig. 2a. Then,
consecutive random columns are added rightwards as shown
in Fig. 2b-c. The random elements are only inserted if the
window property of the global array is maintained, i.e. if no
repeated 3 3 sub-arrays appear. Similarly, random rows are
added downwards from the initial sub-array position, as shown
in Fig. 2d-e. Afterwards, the remaining of the array is filled by
completing rows and columns until the end. Note that certain
combinations of array size, window size and length of the
alphabet will not produce any result.
B. Pattern design
A. Formal definition
Let M be a matrix of dimensions r v where each element
is taken from an alphabet of k elements {0, 1, .., k 1}. If
M has the window property, i.e. each different submatrix of
M of dimensions n m appears exactly once, then M is a
perfect map. If M contains all submatrices of n m except
1 1 3
3 1 2
3 1 1
1 1 3 1
3 1 2 1
3 1 1 2
a)
1
3
3
2
1
2
1
1
1
3
1
1
3
2
1
1
3
3
b)
1 3 2
1 3 2
2 1 3
2
f)
Fig. 2.
...
1 1 3 1 3 2
3 1 2 1 3 2
3 1 1 2 1 3
1
3
3
2
1
2
1
1
1
3
1
1
3 1 3 2
2 1 3 2
1 2 1 3
1
3
3
c)
...
e)
1
3
3
2
1
1
1
3
3 1 3 2
2 1 3 2
1 2 1 3
1
d)
Fig. 3.
(2)
a)
b)
Fig. 4.
where s is a vector containing the visual features values, Ls is the so-called interaction matrix, and v =
(vx , vy , vz , x , y , z ) the camera velocity screw.
The goal of visual servoing consists in moving the robot
from an initial relative robot-object pose to a desired one where
a desired set of visual features s is obtained. Most applications obtain the desired features s by using the teachingby-showing approach. In this case, the robot is firstly moved
to the desired position, then an image is acquired and s is
computed. This is useful, for example, in robots having bad
odometry, such as mobile robots. In this case, the exact goal
position can be achieved starting from the surroundings by
using the visual servoing approach.
A robotic task can be described by a function which must
be regulated to 0 [12]. Concretely, when the number of visual
features is higher than the m degrees of freedom of the camera,
the task function is noted as the following mdimensional
vector
+
cs (s s )
e=L
(3)
where s are the visual features corresponding to the current
+
cs is the pseudoinverse of a model or an approxstate and L
imation of the interaction matrix. A typical control law for
cancelling the task function and therefore moving the robot to
the desired position is [12]
+
cs (s s )
v = L
(4)
cs Ls > 0
L
(5)
s = (x1 , y1 , x2 , y2 , ..., xk , yk )
(6)
1/Z1
0
x1 /Z1 x1 y1 (1+x21 ) y1
0
1/Z1 y1 /Z1 1 + y12 x1 y1 x1
..
..
..
..
.. (8)
Ls= ...
.
.
.
.
.
2
1/Zk
0
xk /Zk xk yk (1+xk ) yk
0
1/Zk yk /Zk 1 + yk2 xk yk xk
a)
b)
A. Planar object
The first experiment consists in positioning the robot with
respect to a plane. Fig. 5a shows the robot manipulator and
the plane with the encoded pattern projected on it. The desired
position has been defined so that the camera is parallel to the
plane at a distance of 90 cm. The reference image acquired
in the desired position is shown in Fig. 6a. In this image
a total number of 370 coloured spots out of 400 have been
successfully decoded. The initial position of the robot in the
experiment has been defined from the desired position, by
moving the robot 5 cm along its X axis, 10 cm along
Y , 20 cm along Z, and rotations of 15 about X and
10 about Y have been applied. The image perceived in this
configuration is shown in Figure 6b. In this case, the number of
decoded points is 361. Matching points between the initial and
the desired images is straightforward thanks to the decoding
process of the pattern.
The goal is then to move the camera back to the desired
position by using visual servoing. At each iteration, the visual
features set s in (6) is filled with the matched points between
the current and the desired image. The normalised coordinates
x of the points are obtained by using an approximation of the
camera intrinsic parameters. The control law (4) is computed
cs = L . The result of the servoing
at each iteration with L
s
is presented in Figure 6c-d. Concretely, the camera velocities
generated by the control law are plotted in Figure 6c. Note
that the norm of the task function decreases at each iteration
as shown in Figure 6d. As can be seen, the behaviour of both
the task function and the camera velocities is satisfactory and
the robot reaches the desired position with no problem as for
classical image-based visual servoing.
B. Non-planar object
In the second experiment a non-planar object has been
used. Concretely, the elliptic cylinder shown in Figure 5b has
been positioned in the workspace. In this case, the desired
position has been chosen so that the camera points towards
the objects zone of maximum curvature with a certain angle
and the distance between both is about 60 cm. The desired
image perceived in this configuration is shown in Figure 6e.
The number of successfully decoded points is 160. Then,
the robot end-effector has been displaced 20 cm along X,
0.04
Vx
Vy
Vz
x
y
z
0.03
0.02
0.01
||e||
1.5
0
-0.01
0.5
-0.02
-0.03
-0.04
0
a)
10
15
b)
20
25
30
35
40
45
10
15
c)
20
25
30
35
40
45
80
90
d)
0.04
Vx
Vy
Vz
x
y
z
0.03
0.02
0.01
||e||
1.5
0
-0.01
0.5
-0.02
-0.03
-0.04
0
e)
f)
10
20
30
40
g)
50
60
70
80
90
10
20
30
40
50
60
70
h)
Fig. 6. First experiment: planar object. a) Desired image. b) Initial Image. c) Camera velocities (ms/s and rad/s) vs. time (in s). d) Norm of the task function
vs. time (in s). Second experiment: elliptic cylinder. e) Desired image. f) Initial Image. g) Camera velocities (ms/s and rad/s) vs. time (in s). h) Norm of the
task function vs. time (in s).