TINA: A 3D Vision System For Pick and Place: J Porrill, SB Pollard, TP Pridmore, JB Bowen, JEW Mayhew & JP Frisby
TINA: A 3D Vision System For Pick and Place: J Porrill, SB Pollard, TP Pridmore, JB Bowen, JEW Mayhew & JP Frisby
TINA: A 3D Vision System For Pick and Place: J Porrill, SB Pollard, TP Pridmore, JB Bowen, JEW Mayhew & JP Frisby
Abstract
We describe the Sheffield AP/RU 3D vision system for
robotics. The system currently supports model based object
recognition and location; its potential for robotics applications
is demonstrated by its guidance of a UMI robot arm in a pick
and place task. The system comprises:
1) The recovery of a sparse depth map using edge based
passive stereo triangulation.
2) The grouping, description and segmentation of edge segments to recover a 3D description of the scene geometry
in terms of straight lines and circular arcs.
3) The statistical combination of 3D descriptions for the
purpose of object model creation from multiple stereo
views, and the propagation of constraints for within view
refinement.
4) The matching of 3D wireframe models to 3D scene
descriptions, to recover an initial estimate of their position and orientation.
Introduction.
The following is a brief description of the system. Edge
based binocular stereo is used to recover a depth map of the
scene from which a geometrical description comprising
straight lines and circular arcs is computed. Scene to scene
matching and statistical combination allows multiple stereo
views to be combined into more complete scene descriptions
with obvious application to autonomous navigation and path
planning. Here we show how a number of views of an object
can be integrated to form a useful visual model, which may
subsequently be used to identify the object in a cluttered
scene. The resulting position and attitude information is used
to guide the robot arm. Figure 1 illustrates the system in
operation.
The system is a continuing research project: the scene
description is currently being augmented with surface
geometry and topological information. We are also exploring
the use of predictive feed forward to quicken the stereo algorithm. The remainder of the paper will describe the modules
comprising the system in more detail.
Figure 1. A visually guided robot arm.
Figures (a), (b) and (c) illustrate our visual system at work. A
pair of Panasonic WV-CD50 CCD cameras are mounted on an
adjustable stereo rig. Here they are positioned with optical centers
approximately 15cm apart with asymmetric convergent gaze of
approximately 16 degrees verged upon a robot workspace some
50cm distant. The 28mm Olympus lens (with effective focal length
of approximately 18.5mm) subtends a visual angle of about 27
degrees. The system is able to identify and accurately locate a
modelled object in the cluttered scene. This information is used to
compute a grasp plan for the known object (which is precompiled
with respect to one corner of the object which acts as its coordinate
frame). The UMI robot which is at a predetermined position with
respect to the viewer centered coordinates of the visual system is
able to pick up the object.
66
67
68
69
lel to all the lines they should be, that they are mutually perpendicular, and intersect at a single point The result of the
application of this stage of the process is the position and attitude of the object in the world coordinates. Figure 10 illustrates the SMM matching the compiled visual model in a
number of scenes. The information provided by matching
gives the RHS of the inverse kinematics equation which must
be solved if our manipulator is to grasp the object (see figure
11).
B
Figure 9. The integration of linear edge geometry from multiple
views.
Figure (a) illustrates the 3D data extracted from eight views of
the object to be modelled (produced from IBM WINSOM CSG
body modeler image data). To ensure a description of the model
suitable for visual recognition and to allow greater generality we
combine geometrical data from the multiple views of the object to
produce a primitive visual model of it. Their combination is
achieved by incrementally matching each view to the next. Between
each view the model is updated, novel features added and statistical
estimation theory used to enforce consistency amongst them (here
only through the enforcement of parallelism and perpendicularity).
Finally only line features that have been identified in a more than a
single view appear in the final visual model (see (b)).
The positions of extremal boundaries are viewpoint dependent
and their treatment requires a degree of subtlety not yet present in
our vision system, firstly to identify them, and secondly to treat
them appropriately in the matching and geometrical integration
processes. Clearly, though not position invariant, in the case of
cylinders at least the relative orientation is stable over view and this
information could be exploited. In the figures here, both the circular arcs and the extremal boundaries are displayed for largely
cosmetic purposes.
70
/I
/I
/L
1
\
\
A
REV: The regions, edges, vertices graph.
One may regard the system as generating a sequence of
representations each spatially registered with respect to a
coordinate system based on the left eye: image, edge map,
depth map and geometrical description. In the initial stages of
processing a pass oriented approach may be appropriate but
we consider that it is desirable to provide easy and convenient
access between the representations at a higher level of processing. The REVgraph is an environment, built in Franz
Lisp, in which the lower level representations are all indexed
in the same co-ordinate system. On top of this a number of
tools have been and are being written for use in the development of higher level processes which we envisage overlaying
the geometrical frame with surface and topological information. Such processes will employ both qualitative and quantitative geometrical reasoning heuristics. In order to aid debugging by keeping a history of reasoning, and increase search
efficiency by avoiding backtracking, the REVgraph contains a
consistency maintenance system (CMS), to which any
processes may be easily interfaced. The CMS is our implementation of most of the good ideas in Doyle [1979] and
DeKleer [1984] augmented with some our own. The importance of truth maintenance in building geometrical models of
objects was originally highlighted by Hermann [1985]. Details
of the REVgraph and CMS implementation may be found in
Bowen [1986]. Figure 12 illustrates a prototype wireframe
completion algorithm and figure 13 some useful pairwise relationships that are made explicit within the REVgraph environment.
9<
V
B
Figure 13. Pairwise relations.
The formation of a pairwise relations table is a utility in the
REVgraph. It generates pairs of lines and the geometric relations
between them according to certain user requests. In figure (a) all the
lines perpendicular to the arrowed line have been generated, and in
figure (b) all the lines parallel to it (that is to within a certain tolerance).
Conclusions
We demonstrate the ability of our system to support
visual guided pick and place in a visually cluttered but, in
terms of trajectory planning, benign manipulator workspace. It
is not appropriate at this time to ask how long the visual processing stages of the demonstration take, suffice it to say that
they deliver geometrical information of sufficient quality, not
only for the task in hand but to serve as a starting point for
the development of other visual and geometrical reasoning
competences.
71
Acknowledgements
We gratefully acknowledge Dr Chris Brown for his valuable technical assistance. This research was supported by
SERC project grant no. GR/D/1679.6-IKBS/025 awarded
under the Alvey programme. Stephen Pollard is an SERC IT
Research Fellow.
References
Arnold R. D. and T. O. Binford (1980) Geometric constraints
in stereo vision, Soc. Photo-Optical Instr. Engineers,
238, 281-292.
Bolles R.C., P. Horaud and M.J. Hannah (1983), 3DPO: A
three dimensional part orientation system, Proc. IJCA1 8,
Karlshrue, West Germany, 116-120.
Bowen J.B. and J.E.W. Mayhew (1986), Consistency maintenance in the REV graph environment, Alvey Computer
Vision and Image Interpretation Meeting, University of
Bristol, AIVRU Memo 20, and Image and Vision Computing (submitted).
Burt P. and B. Julesz (1980), Modifications of the classical
notion of panum's fusional area, Perception 9, 671-682.
Canny J.F. (1983), Finding edges and lines in images, MIT AI
memo, 720, 1983.
DeKleer J. (1984), Choices without backtracking, Proc,
National Conference on Artificial Intelligence,
Doyle J. (1979), A truth maintenance system, Artificial Intelligence 12, 231-272.
Durrant-Whyte H.F. (1985), Consistent integration and propagation of disparate sensor observations, Thesis, University of Pennsylvania.
Faugeras O.D., M. Hebert, J. Ponce and E. Pauchon (1984),
Object representation, identification, and positioning from
range data, Proc. 1st Int. Symp. on Robotics Res, J.M.
Brady and R. Paul (eds), MIT Press, 425-446.
Faugeras O.D. and M. Hebert (1985), The representation,
recognition and positioning of 3D shapes from range
data, Int. J. Robotics Res
Faugeras O.D., N. Ayache and B. Faverjon (1986), Building
visual maps by combining noisy stereo measurements,
IEEE Robotics conference,
San Francisco.
Grimson W.E.L. and T. Lozano-Perez (1984), Model based
recognition from sparse range or tactile data, Int. J.
Robotics Res. 3(3): 3-35.
Grimson W.E.L. and T. Lozano-Perez (1985), Recognition
and localisation of overlapping parts from sparse data in
two and three dimensions, Proc IEEE Int. Conf. on
Robotics and Automation, Silver Spring: IEEE Computer
Society Press, 61-66.
Pridmore T.P., J. Porrill and J.E.W. Mayhew (1986), Segmentation and description of binocularly viewed contours,
Alvey Computer Vision and Image Interpretation Meeting, University of Bristol, and Image and Vision Computing 5 No 2 132-138.
Trivedi H.P. and S.A. Lloyd (1985), The role of disparity gradient in stereo vision, Comp. Sys. Memo 165, GEC Hirst
Research Centre, Wembley, England.
Tsai R.Y. (1986), An efficient and accurate camera calibration
technique for 3D machine vision, Proc IEEE CVPR 86,
364-374.
72