Academia.eduAcademia.edu

CAD-Based 3D Object Representation for Robot Vision

1987, Computer

AI-generated Abstract

This paper presents a CAD-based approach to developing 3D object representations for robot vision applications. By utilizing the geometric designs from a CAD system, the authors show that multiple representations of objects—such as surface points, surface curvatures, and extended Gaussian images—can enhance recognition and manipulation capabilities. The systematic combination of these representations aims to facilitate more efficient and reliable object identification in industrial environments.

CAD-Based 3D Object Representation for Robot Vision Bir Bhanu* and Chih-Cheng Ho University of Utah Computer vision researchers lack a systematic approach for building object models for industrial environments. We propose a CAD-based approach for building representations and models for applications involving 3D object recognition and manipulation. In the automated manufacturing system shown in Figure 1 are three key components: the CAD/CAM (computer-aided design/computer-aided manufacturing) system, the vision system, and the intelligent robot system. The CAD/CAM system supports the design, analysis, and manufacturing of each part of a product. The vision system integrates information from sensors such as TV cameras, laser range finders, tactile and ultrasonic sensors. It provides the robot with information about the working environment and the location, identity, and quality of the designed parts. The intelligent robot aligns the inspected parts and performs assembly operations using tactile and force-torque sensors. Most existing vision systems rely on models generated in an ad hoc manner and have no explicit relation to the CAD/CAM system originally used to design and manufacture these objects. We desire a more unified system that allows vision models to be automatically generated from an existing CAD database. A CAD system contains an interactive design interface, graphic display utilities, model *Bhanu is now at Honeywell Systems & Research Center. August 1987 0018-9162/87/0800.0019501.00E1987IEEE analysis tools, automatic manufacturing interfaces, etc. Although it is a suitable environment for design purposes, its representations and the models it generates do not contain all the features that are important in robot vision applications. Current vision systems use only one representation in their models. However, there is no single representation or a matching technique based on a single representation that can efficiently and reliably recognize different classes of objects. A systematic approach for building vision models employing multiple representations is to derive them from a CAD database and incorporate features crucial for object recognition and manipulation.' In this article, we propose a CAD-based approach for building representations and models that can be used in diverse applications involving 3D object recognition and manipulation. There are two main steps in using this approach. First, we design the object's geometry using a CAD system, or extract its CAD model from the existing database if it has already been modeled. Second, we develop representations from the CAD model and construct features possibly by combining multiple representations that are crucial in 3D object recognition and manipulation. In this work we used the Alpha-l solid 19 the wireframe representation, which can be perceived as more than one object from a given description.5 0 I I~ 0 0 0 Volume representation. An intuitive way to represent a solid object is to describe the space occupied by that object. Instead of enumerating a huge number of spatial points, we can use a combination of primitives, possibly with different shapes and sizes. Each of the primitives is then described by its geometrical parameters. Increasing generality of primitives and their combinations lead to different schemes: (1) pure primitive instancing, (2) spatial occupancy enumeration, (3) cell decompositions, and (4) constructive solid geometry (CSG). CSG is a super-set of the other three volume representations. It is commonly used in CAD geometric modeling systems. CSG represents an object as a binary tree (see Figure 2) where each leaf represents an instance of a primitive and each node represents an operation of its descendent(s). Figure 1. An automated manufacturing system. modeling system2 developed at the University of Utah. It utilizes spline-based boundary representation. We present the following six CAD-based representations: (1) surface points and normals, (2) surface curvatures, (3) generalized sweep, (4) polyhedral, (5) extended Gaussian image, and (6) object decomposition and hierarchical representation. The construction of vision models that organize and use these representations efficiently requires consideration of both image feature-extraction algorithms and feature-matching techniques. We do not include them here. These details can be found in a recent survey on model-based robot vision by Chin and Dyer.3 3D object representations A representation for shape as defined by Marr4 "is a formal scheme for describing 20 shape or some aspects of shape together with rules that specify how the scheme is applied to any particular shape," and "the results of using a representation to describe a given shape is called a description of the shape in that representation. " It is always possible to describe a given shape using different representations. The choice of a representation to obtain an efficient description depends not only on the kind of object to be described, but also on how the description is to be used. Three general classes of 3D solid object representations used in computer graphics, CAD, and computer vision are (1) volume, (2) sweep, and (3) surface or boundary. Besides these, other 3D object representations are used in computer vision. A useful representation is the extended Gaussian image (EGI), in which each face of an object is mapped onto a unit sphere, called the Gaussian sphere, according to its orientation and area. Here we do not consider ambiguous CAD schemes, such as Primitives such as blocks, spheres, cylinders, and cones are first transformed (translation, rotation, and scaling) and then combined (union, intersection, or difference) from the bottom to the top of the tree. Regularized set operators5 help ensure the regularity and the validity of the combined object so that there are no dangling edges or faces. CSG representation is sufficient to cover most conventional, unsculptured objects. However, it cannot describe a large class of sculptured, freeform shapes precisely, even using a fairly large number of primitives. For unsculptured objects that contain primitives as their subparts, CSG can give concise descriptions. Similar objects may have the same subtrees, and different configurations of primitives may yield different objects. This representation is unambiguous but not unique, i.e. it can have different decompositions of an object using the same set of primitives. The greatest drawback of using CSG in robot vision applications is that surface evaluation is always required because we can only see the object's surface. The surface evaluation is computationally intensive. Moreover, since the primitives are not equal to the subparts of an object in general, we may not be able to perceive or derive any of the primitives from the object's surface. It makes the reconstruction of the CSG tree from a given scene COMPUTER and the use of its topological information almost impossible. Sweep representation. A generalized sweep, or generalized cylinder (GC) or cone, is defined as the volume swept by a set of cross sections along an axis under some sweeping rule. It was first introduced by Binford6 in computer vision for the recognition of 3D curved objects. Accord0 1 ing to the formal definition of GC given by Shafer,7 a GC consists of four parts: (1) There is a space curve, called the axis of the shape. (2) At each point on the axis, at some fixed angle to the tangent to the axis, there is a cross-section plane. (3) On each such cross-section plane, there is a planar curve which constitutes the cross-section of the object on that plane. (4) There is a transformation rule that specifies the transformation of the crosssection as the cross-section plane is swept along the axis. This rule always imposes (at least) the constraint that the cross-section changes smoothly. The surface of the object is the union of these cross sections and the volume swept by the closed cross-sectional curves is the GC. With different restrictions on the axis and the cross sections together with their intersections and the manner in which cross sections blend, there is an exhaustive taxonomy of different classes of GCs. Sweep representation is well suited for many manmade objects that have an axis of symmetry. It is very concise. Similar Figure 2. An example of constructive solid geometry (CSG) representation. objects have similar axes and cross sections. A small difference between similar objects will be reflected by a small difference in the axes or cross sections. Sweep representation is not unique in computer vision. A solid is represented by fact, the data itself is a description of the general. Although it is unambiguous under segmenting its boundary into a finite num- object's shape in the form of surface a given blending rule, there is no efficient ber of bounded subsets, usually called points. (For an example of the advanced way to describe the shapes at both ends for faces or patches. 3D range sensing technology available an arbitrary GC. We cannot precisely estiIf we use planar patches only, each face today, see the accompanying sidebar.) Bmate the axis and cross sections without is represented by its edges and vertices, rep is unambiguous but the validity is not seeing the whole surface. However, we can resulting in polyhedral or polyhedron- guaranteed and it is also not unique. use the axis alone in object recognition. By approximated objects. In order to describe Different tessellations of the surface and using hierarchical grouping of GCs, it is curved surfaces efficiently, splines are used different polygonal approximations of possible to describe more complex shapes. in many CAD systems. However, some each patch can still give the same object. But to get a unique and reasonable decom- geometrical computations are expensive in Spline-embedded surface representation position of an object is a difficult problem. spline representation. For example, find- gives a concise and optimal approximation GCs have been used only to a very ing the intersection points of a line and a of the true shape.i However, finding limited extent in CAD. surface or the intersection curves between high-level features is not easy and there are two surfaces is not easy in general. Polyg- no spline forms for some computations, Boundary representation. To represent onal approximations are still used in most such as surface curvatures. an object by its enclosing surface or applications. Extended Gaussian image. Extended Boundary representations (B-rep) can boundary is the most commonly used In data Gaussian image9 (EGI) is a mapping from range directly. be derived from 3D and scheme in both computer graphics August 1987 21 plus orientation) to 3 (orientation only). (6) The rotation of the object does not affect the relative weight distribution of its EGI. For concave objects, different faces may have the same orientation and are mapped onto the same cell on the Gaussian sphere. Therefore, EGI representation is not unique. Ikeuchil' used a global approach for concave objects. Instead of mapping the whole object onto one EGI, he selected 60 different viewer directions and mapped the visible surfaces from each of these views onto an EGI. This is not only inefficient in storage (60 EGIs for one object), but also it cannot deal with occluded objects. In general, a local approach in which we decompose the boundary into patches and use one EGI for each patch seems better. However, subdividing the object to get a unique EGI for each part is a difficult problem. Figure 3. An illustration for the relationship between Gaussian curvature and extended Gaussian image (EGI). surface normals onto a unit sphere, called map onto the same region of the Gaussian the Gaussian sphere, with the surface area sphere, the EGI ratio is exactly the same as as its weight. An alternative definition for the area ratio. However, the patch on the a continuous case is a mapping which smaller sphere has a larger Gaussian curassociates the inverse of the Gaussian cur- vature. It is not hard to figure out that for vature at a point on the surface of the a sphere of radius r, the Gaussian curvaobject with the corresponding point on the ture is l/r2 and the EGI is r2. A full Gaussian sphere. In discrete cases such as mathematical proof of the inverse relapolyhedra, EGI can also be obtained by tionship is given by Horn.9 EGI is used in the analysis, but not the placing a weight at each point equal to the sum of the surface area of the faces in the synthesis, of surface shapes. It has the folcorresponding orientation. It can be com- lowing properties: puted easily and is used to find the object's (1) The center of mass of the extended orientation. Gaussian image is at the origin of the The inverse relationship between EGI Gaussian sphere. and Gaussian curvature can be understood (2) The total mass of the EGI equals the using the concept of Gaussian curvature, total surface area of the object. as the spread of surface normals, that is (3) It is unique for any convex object. equal to the area on the Gaussian sphere (4) For a convex object, the weight of mapped from a unit area on the object's each point of the EGI is equal to the surface. Conversely, EGI is the total area inverse of the Gaussian curvature at the on the object's surface mapped onto a unit corresponding point on the original area on the Gaussian surface. surface. A simple example is the EGI and Gaus(5) EGI does not depend on the position sian curvature on spheres of different of the object. It allows us to determine the radii. Shown in Figure 3 are three concen- object's orientation before knowing its tric spherical triangles with radii of 0.5, 1, position. Since it represents objects and 2. The ratio of the areas of these tri- explicitly by their orientation, the degree angles is 0.25: 1 : 4. Since all the triangles of freedom is reduced, from 6 (position 22 Evaluation of 3D object representations for recognition requirements. The problem of 3D object recognition can be defined as follows": Given (a) digitized sensory data corresponding to one particular, but arbitrary, view of the real world (may be in a certain, known environment) and (b) knowledge, or models of a set of dis- tinguishable objects. Then find the following for each object in the set: (1) Does the object appear in the digitized sensory data? If so, how many times does the object occur? (2) For each occurrence of the object determine its location and orientation with respect to a known coordinate system. Marr4 has given a set of criteria for the representation of a shape, used in object recognition: Scope: What kind of objects can it rep- resent? Accessibility: Can it be obtained inexpensively from the image? Conciseness: Can it describe objects efficiently? Uniqueness: Can a unique description of an object be obtained under different conditions? Stability: Can it reflect the similarity between different objects? Sensitivity: Can it reflect the differences between similar objects? A summary of the 3D object representations using Marr's criteria is given in Table 1. COMPUTER f~ i 3D range sensor-phase shift detection Robert E. Sampson Environmental Research Institute of Michigan Three-dimensional irmaging sensors and their informational content have been under investigation for many years, with two main sensor techniques, triangulation and time of flight, recognized as the fundamental technical approaches to the problem. The triangulation method can be implemented in a variety of ways, including stereographic, structured light, etc., all of which present a number of complex problems that need to be overcome. A preferred concept that greatly simplifies and enhances range Imaging uses a modulated laser diode to optically directly measure the range to each point in the scene. In original envisioned forms of such a system, a pulsed laser Is directed at the scene and the time until a returned pulse is received is determined, which is proportional to the range to the object. In a practical adaptation of this concept recently developed at the Environmental Research Institute of Michigan (ERIM) shown schematically in Figure 1, the difference in phase between the transmitted and received signals is determined rather than the time of flight. The major components of the sensor in Figure i include the laser diode and modulation source, a scanning mechanism (either polygon or nodding mirror), a photo detector, and the phase shift detection electronics. The difference in phase between the transmitted and received signals is determined and the phase shift is directly related to range. This sensor yields both a normal reflectance image and an image in which each pixel value is directly proportional to the distance to the pixel area. This method of obtaining 3D images, often referred to as optical radar, has a wide variety of applications, including navigation, robot guidance, and inspection. Although these tasks may seem similar at first glance, the requirements differ greatly. At ERIM, the phase-detection method of 3D Imaging has been employed in the development of a family of sensors. One of these sensors was developed for autonomous vehicle navigation. This navigation sensor has a field of view (FOV) of plus or minus 40 degrees in the horizontal plane and covers a vertical FOV at depression angles of 15 to 45 degrees. It has a range resolution of 8 centimeters. Another, more advanced, navigation sensor uses multiple lasers that operate at various frequencies in the visible, near infrared, and shortwave infrared wavelengths. These multiple wavelengths allow not only determination of the range to the terrain, but also the reflectance values of the materials in the scene. This sensor has a FOV of 60 x 80 degrees and a Reflectance Range _ ~~~~Detector DWIgiId Range Z. 255 0 / August 1987 n14 W Image Frame Figure 1. 3D laser scanner simplified block diagram. 23 Model building using the Alpha1 CAD system grammed). Image collection takes less than a second to over 10 seconds, depending upon image size. While image acquisition is slower than normal visible cameras, for range imagery the system is faster overall because we need not perform complex computational analysis to determine range-it is a direct output from the sensor. Figure 2 includes an example of range imaging from the robot guidance sensor. Figure 2a is a normal visible image of a telephone. Figure 2b is a range image from the same view. In Figure 2b, the value of each pixel is directly proportional to range (light tone is closer, dark tone further away) rather than a measure of the intensity of reflections as in Figure 2a. In Figure 2c, the telephone image is processed into a perspective view using an ERIM-developed "cytocomputer" that provides high-speed range-image processing capability for the sensor. range resolution of 2 centimeters. A third sensor developed by ERIM has been employed to perform bin picking and other robot guidance tasks. This sensor has an adjustable FOV ranging from 1.6 x 1.6 degrees to 35 x 35 degrees and can operate over distances from 15 centimeters to over 90 centimeters. In the robot guidance mode the sensor has a resolution of 0.08 centimeters. It can also operate in an inspection mode, where it has a resolution of 0.003 centimeters. This sensor consists of three modules: the optics head, power supply, and electronic modules. The optics head can be monitored on a robot arm and connected via remote cables to the rackmounted electronics and power supply. This sensor can provide range data at 100,000 measurements per second taken in a programmable scan pattern (any image size from one pixel to 1000 x 1000 pixels can be pro- (a) Figure 2. Visible image of a telephone (a), approximately 30 x 30 centimeters taken with a normal photographic camera. Range image of a telephone (b), 300 pixels by 300 pixels by 8 bits-25 x 25 x 20 centimeters deep-taken in 10 seconds with the range sensor. Perspective view from the range Image (c), with the range Image of (b) rotated by cytocomputer to give a different perspective of the scene; processing time. 0.4 seconds. 24 (b) (c) We use the CAD system Alpha_l, an advanced experimental solid modeling system2 developed at the University of Utah. Alpha_l models the geometry of solid objects by representing their boundaries as discrete B-splines. B-splines are an ideal design tool, simple yet powerful. Many common shapes can be represented exactly using rational B-splines. For example, all of the common primitive shapes (spheres, cylinders, ellipsoids, etc.) used in CSG systems fall into this category. Other advantages include good computational and representational properties of the spline approximation: the variation diminishing property, the convex hull property, and the local interpolation property. The single underlying mathematical formulation of Alpha_l simplifies implementation, but it is sufficiently powerful to represent a very broad class of shapes. It is able to create images of the designed objects, to perform certain analysis functions on them, and to produce information for numerically controlled machining. It uses the Oslo algorithm, 12 for computing discrete B-splines. Subdivision, effected by the Oslo algorithm, supports various capabilities including the computation associated with Boolean operations, such as the intersection of two arbitrary surfaces. At present, tolerancing information is not included in object specification in the Alpha_l system. Once it is available, we can make models in terms of classes of objects (rather than a single object) that are functionally equivalent and interchangeable in assembly operations. Alpha_I has powerful shape description facilities and it supports several modeling paradigms. These include direct manipulation of the B-spline surfaces, creation and combination of primitive shapes using set operations, and high-level shape operators such as ruled surface, loft, bend, stretch, twist, warp, and sweep. The steps in building a CAD model using Alpha_I follow: (1) Analysis of the object. Usually a complex object is decomposed into simpler parts that can be designed more easily. Each of the subparts is called a shell and need not be closed. (2) Design ofparts and measurement of COMPUTER parameters. Geometric operators are used to design each subpart. Sometimes one shape can be designed using different paradigms, which may require a different set of parameters. For example, an arc can be specified by its center point and two end points. It can also be specified by two tangent lines and its radius. We want to use those parameters that can be measured easily and precisely. (3) Validation ofdesigns. When designing the surface patches of each part, ensure that they have the correct orientation and that the adjacency information between patches is correct. Otherwise an invalid object may have been created and the combiner (used in the next step) will be unable to manipulate it. At present, the designer makes explicit the correctness of orientation and adjacency. We want the system to be able to generate such information automatically. (4) Application of the combiner. The last step is to put all the parts of the object in the correct positions and orientations by performing appropriate transformations, then use the combiner to perform the required set operations on them. This results in the design of the complete object. Two examples demonstrate some of the modeling paradigms of the Alpha_l system. Example 1: green piece. To design simple objects such as the "green piece" in Figure 4a, which has many local features, we build the complete object in a stepwise manner. First, we design the plate and all the holes, then the dent part and the scratches shown in Figure 4b. To design these parts, we first design curves using Bsplines, then use various high-level operators for surface construction, such as revolving a curve about an axis, extruding a curve in some direction, and filling the surface between two curves. There are seven holes with threads in the green piece. We design each ofthese by filling two surfaces between two twisted curves. Figure 4c shows the line drawing and shaded display of the completed CAD model. Example 2: Renault piece. For objects like the automobile "Renault piece" in Figure 5a, which contains sculptured surfaces, it is still possible to divide it into a set of simpler parts, although the decomposition may not be obvious. Here we divide it into five subparts in Figure 5b: small right head (upper left), base plate (upper right), left head (lower left), back August 1987 Table 1. Evaluation of 3D object representation. Criterion CSG Sweep B-rep EGI Scope Fair Fair Good Good Accessibility Poor Poor Good Good Conciseness Fair Good Fair Poor Uniqueness Poor Poor Fair Poor Stability Sensitivity Fair Fair Good Fair Good Good Fair Fair bump (lower center), and neck (lower right). For the right head and the left head, we find all sharp edges and then construct the surfaces from them as we did in designing the green piece. For the base plate, the neck, and the back bump, first we design some pseudo edges, which are the intersection of the surface planes. Then we construct these surfaces but leave small gaps between them where cubic patches are used to produce the rounded edges. Figure 5c shows the intersection curves of these parts, which are computed to obtain the complete object using set operations performed by the combiner. Figure 5d shows the completed CAD model of the Renault piece. Note that Alpha_l can be used to model large class of sculptured mechanical parts that are not representable by either a CSG or a sweep model. Although the use of nonuniform rational B-splines allows significant flexibility in the geometric modeling, spline representation does not explicitly exhibit important features used in most recognition techniques. Thus we want to construct descriptions based on other vision representations from this CAD model. a CAD-based 3D object representations In this section we derive vision representations from the Alpha_l CAD models. These representations can then be integrated into a vision model employing multiple representations. Appropriate models are used based on the results ent recognition tests."'3 from differ- Surface points and normals representation. Surface points representation provides a universal discrete description of the object's boundaries. However, it requires a large amount of data to describe a given surface. Note that in a spline-based system, the surface normal is a byproduct of the surface (point) evaluation procedures, called knots insertion, spline refinement, or subdivision. Although this representation does not carry more information than the original CAD model, it provides the ability to communicate with other vision modules that create higher level descriptions based on data in this format. For example, region growing and edge detection algorithms are commonly used with 3D range data and matching is done on the extracted symbolic features. 14,15 Moreover, this representation generates synthetic data for arbitrary shapes as well as for regular objects (cubes, spheres, cylinders, etc.). For an example of the use of surface points to parts localization problem in manufacturing, refer to the article by Gunnarsson and Prinz16 in this issue of Computer. Surface points extraction. A simple technique to extract surface points from Bspline patches uses the subdivision method.' In this technique a B-spline patch is first subdivided into smaller pieces that are within the given resolution, then the centroid of each of these small patches is computed. The points extracted by this 25 (a) (b) Figure 5. Design of the Renault piece using the Alpha_1 CAD system. (a) shows the Renault piece object, (b) shows the subparts of the Renault piece CAD model, (c) shows intersection curves of the subparts of the Renault piece, and (d) shows the designed CAD model of the Renault piece. (a) 26 COMPUTER Figure 4. Design of the green piece using the Alpha_1 CAD system. (a) shows the green piece object, (b) shows the subparts of the green piece CAD model, and (c) shows the designed CAD model for the green piece. (c) August 1987 27 (a) (b) (C) Figure 6. Surface points and normals representation. (a) shows surface points on the green piece (0.1-inch resolution), (b) shows the surface points on the Renault piece (0.2-inch resolution), and (c) shows the surface normals on the green piece (0.4-inch resolution). method depend not only on the shape of each sampled point by using bilinear interthe surface but also on its parameteriza- polation of normals at the adjacent vertion. However, after the set operations, tices. In Figures 6a and 6b we show the some parts of an Alpha_l CAD model are surface points on the green piece and on represented as polygons and some parts as the Renault piece at 0.1-and 0.2-inch resoB-spline surfaces. Our strategy is to sub- lution, respectively. In Fig. 6c, we show divide all the surfaces into polygons to the surface normals on the green piece at make the problem uniform. By applying 0.4-inch resolution. a contour-filling algorithm,'7 we get interior line segments of the polygons and Surface curvature representation. The then extract points along these segments at local surface shape can be characterized by a desired resolution. curvatures, which combine information of The main element of a contour-filling both the first and second derivatives. algorithm is to find the intersection seg- These derivatives have been used in variments of a line and the region enclosed by ous techniques for segmentation of 2D that contour, including the contour itself. contours, 2D images, and 3D range data. This can be done by first splitting the line Features such as edges, corners, and plainto segments at each intersection point of nar patches can also be defined quantitathe line and the contour, then deciding tively by curvatures. In differential which of these segments lie inside the geometry,8 the curvature of a 2D contour region. is defined as the change in the tangent vecAn edge-based contour-filling algo- tor per unit length. If this change has the rithm described by Pavlidis'8 requires an same direction as the normal vector, the expensive preprocessing of the contour curvature is positive, otherwise it is nega(sorting and marking the edges). It is used tive. For 3D surfaces, the normal curvain applications, such as surface shading, ture at a point in one direction is defined where the same contour is used repeatedly. as the curvature of the intersection curve Since, in our case, the number of polygons of the surface and the plane containing this in a model is usually very large and we directional vector and the surface normal extract only a small number of points from vector at this point. Therefore, each point each one of them, we have developed a has different values of normal curvature, new algorithm17 that uses topological one for each direction. Among these information of the contour at each inter- values, the maximum and the minimum section point to decide which segments lie are called the principal curvatures and inside the polygon. It gives a linear com- their corresponding directions are called putational complexity in the average case. the principal directions. The product of We obtain the surface normal vector at the principal curvatures is called Gaussian 28 curvature and their arithmetic average is called mean curvature. Recently, various approaches that use curvatures as intrinsic characteristics of surfaces and describe shape by curvature have been addressed in the literature on computer vision.' 113 Curvature-based intrinsic features are very useful in object recognition techniques. We compute four basic types of surface curvatures-Gaussian, mean, maximum, and minimum-for a given CAD model designed with Alpha_l. The input CAD models may be in different forms: sampled surface points, continuous B-spline surfaces, and subdivided polygons. For sampled surface points, we use finite differences to approximate the first and second partial derivatives. Then we apply standard equations to find the Gaussian, mean, and principal curvatures.8 Figure 7a exhibits samplings of surface points for one view of the Renault piece at various resolutions. Using this data, Figure 7b shows the edge points of the sampled Renault piece model, which are found by simply requiring the larger absolute value of the principal curvatures to fall above a threshold. We can observe the similarity of results on synthetic and real data. Also note that the curvature results even at low resolution are quite good. Like the edge points, planar patches can be found by using a low pass filter. We can obtain surface curvatures from continuous B-spline surfaces in a simple manner. The basic surface type in the COMPUTER Alpha_l system is a tensor product Bspline patch. A convenient way to think of the tensor product surface is to think of the rows or columns of the matrix of spatial points (control mesh) as a set of individual B-spline "control curves, " with one knot vector and one order associated with each of them. The other knot vector and its order then describe how these curves will be blended to form the surface. The derivative of this B-spline surface is another tensor product B-spline surface with a lower order formed by the differentiation of these control curves. Higher order derivatives are found by successive differentiations. The unit normal vector is found from the cross product of the first partial derivatives (tangent vectors). The rest of the computations of curvatures are the same as before. Since it requires rational computations in vector normalization, there is no closedform of surface curvatures in B-spline. For models in the form of subdivided polygons, we use an alternative definition of Gaussian curvature used in EGI. The Gaussian curvature of a small polygon can be approximated by the ratio of the area of regions enclosed by normals of vertices on the Gaussian sphere to the actual area of the polygon. Interpretation of curvature results. Fig- ure 8 shows the results of the computation of curvatures for a Coons patch. In the two views of a Coons patch in Figure 8a, the surfaces contain four interesting parts: peak, pit, and two saddles. Figure 8b shows the four surface curvatures. In Figure 8b the black lines are isoparametric lines of curvatures (not lines of curvature) and the white lines are the zero crossings. Figure 9 shows the Gaussian curvature and the extrema of principal curvatures of a tea pot. Images in Figure 8b and 9 are generated by mapping the curvature values onto its B-spline control mesh approximately. From Figures 8 and 9 we can make the following observations: (1) The zero crossings of Gaussian curvature do not necessarily correspond to step edges. They are just surface inflection points, a kind of critical point. (2) Segmentation of range data based on the zero crossings of Gaussian curvature gives a meaningful decomposition of surface patches. They are clearly separated by the zero crossings of Gaussian curvature. (3) The sign of Gaussian and mean curvatures provides a useful symbolic descrip- August 1987 *.~ ~ ~ ~ ~. .,1!i I., . !: ....... .. l' .... prcurvatures incipalof fiurs.n.a) forthe ex1re1a extrema of principal curvatures for the figures in.....(... (b), Figure 7. Surface points and the extrema of principal curvatures. (a) shows a sampling of surface points at two resolutions (0.2-inch and 0.1-inch spacing) and the real range data taken with a laser range finder (0.12-inch resolution in the x direction and 0.08-inch resolution in they direction). (b) shows edge points as the V; (a) (b) Figure 8. Four basic types of surface curvatures of a Coons patch. (a) shows two different views of a Coons patch and (b) shows surface curvatures: maximum principal curvature (upper left), minimum principal curvature (upper right), Gaussian curvature (lower left), and mean curvature (lower right). 29 Figure 9. Gaussian curvature (upper right) and extrema of principal curvatures (lower left) for teapot. tion for the local surface shape. (4) The local extrema (positive maximum or negative minimum) of the principal curvatures correspond to the object's edges. (5) The local maxima of the smaller absolute values of principal curvatures correspond to the object's corners. (6) Conic surfaces (sphere, cylinder, and cone) and planes can be specified by the value of principal curvatures. Generalized sweep representation. We can extract this representation directly from the CAD design procedures if the object is designed in this way. However, to design the axis, a set of cross sections and their profile functions for a given shape is not straightforward. Sometimes, without special design tools, it even becomes impossible for some fairly complex objects. One possible solution is to design the generalized sweep representation by other powerful construction operators, then extract the approximate cross sections and axis from the designed model's surfaces. For more complex objects, we can use hierarchical structures where different GCs are joined together. However, sophisticated decompositions are required in that case. To generate GCs for simple objects, or simple subparts of a complex object, we have to find some cross sections and link 30 them by an axis. For example, see the deformed ellipsoid in Figure 10a, modeled by the Alpha_l system with several linear deformations on a surface of revolution. It became a warped GC (nonplanar cross sections) after these deformations. One possible way to generate a GC description for this object is: (1) Slice it in some direction to find all the cross sections. (2) Find the centroid of each cross section. This can be done by applying Green's theorem8 to the curve, as long as it is closed. (3) Link all centroids to construct the axis. Figure lOb shows a result of the above procedure. It has the following properties: (1) All cross-section planes are parallel to each other. (2) All cross sections are planar but not necessarily circular. (3) The axis passes through the centroid of each cross section. (4) The angle between the axis and each cross section plane is not necessarily 90 degrees, and each one has a different angle. This parallel slicing method is very simple and the result in Figure 10b looks good. However, using this technique we will have an infinite number of descriptions for a single object. For example, in Figure 1Oc, we used a different slicing direction and got a totally different result-a different axis and different cross sections. This kind of representation obtained from the parallel slicing method is useless in object recognition. More constraints are needed to get a unique description. The initial slicing direction is important in order to get the canonical axis that is invariant to rotation and translation. A splitting scheme given below is similar to the one used in curve approximation. 19 It uses the axis of inertia and generates GCs such that all cross sections are closed, cross sections do not intersect each other, the axis is orthogonal to each cross section, and the axis passes through the centroid of each cross section. Iterative splitting method for generalized cylinder approximation. This algorithm has the following steps: (1) Find the major axis of inertia, the one having minimum moments of inertia, and the extrema of the object along this axis. (2) Find the cross sections near these extrema that are perpendicular to the major axis of inertia and connect their centroids as the first approximation to the axis. (3) Find the cross section that passes through the midpoint of the approximated axis and is perpendicular to it. Connect its COMPUTER centroid to the two endpoints of the previous axis and split the axis and the object into two pieces. (4) Repeat Step 3 on each of the subpieces recursively until the desired resolution is obtained, or the new cross section is not closed, i.e. it intersects other cross sections. (5) Adjust the axis and cross sections recursively such that all cross sections are found at critical points of the axis and are perpendicular to the axis at their centroids. This approach provides several Figure 10. Generalized cylinder representation for a simple object showing (a) a advantages: deformed ellipsoid, (b) a generalized cylinder approximation of (a), and (c) another * It uses the axis of inertia in the initialgeneralized cylinder approximation of (a). ization procedure to obtain the canonical axis of an object. * The axis is perpendicular to the cross sections and passes through their centroid. * The cross sections are closed planar curves and do not intersect each other. * Since the whole surface is split during the recursion, its time complexity is improved from O(mn) to O(mlog2n) where m is the total number of polygons in the CAD model of a component and n is the number of cross sections. The algorithm minimizes the number of possible GC representations for one object to achieve the unique property of a vision model, and does not restrict the shapes of axis and cross sections to represent a larger scope of objects. Figure lla shows the results ofthe axis and cross sections extraction on the object shown in Figure lOa, after one, three, and five iterations. Fig(a) ure 1 lb shows the axis and cross section for the helicopter shown in Figure 12b. It is a first-order approximation. A refined GC representation is obtained by including an angle test along the axis and a similarity test by moments on adjacent cross sections. Polyhedral representation. Polyhedral representation is widely used in computer vision because of its simplicity and good support of both geometrical and topological information. A polyhedral model can be constructed from vertices, edges and faces where vertices are 3D spatial points, edges that are straight-line segments between vertices, and faces that are planar polygons enclosed by an ordered list of edges or their corresponding vertices. A set of geometrical and topological conditions, as given by Requicha,5 must be met for any valid polyhedral model. In this work, the geometrical conditions are assumed from the validity of the given CAD models. Most of the topological con- August 1987 (b) Figure 11. Generalized cylinder approximation showing (a) results of the iterative splitting algorithm after one, three, and five iterations for the object shown in Figure lOa, and (b) generalized cylinder approximation for the helicopter shown in Figure 12b. 31 mation is propagated proportionally to the subdivided patches whenever a subdivision occurs. In this global approach we first perform all the required subdivisions, then build polygons for each small subpatch which contain not only the subpatch's four corners but also the adjacent corners of all neighboring patches. Therefore, the adjacency information on the subpatch can be mapped onto each of the approximated polygons and still maintain the topological validity of the resulting polyhedron. Figure 12a shows the B-spline model for a teapot and its polyhedral approximation. Figure 12b shows a similar example of a helicopter. Features can also be extracted from this polyhedral representation. For example Figure 13 shows the results of edge detection on the teapot and the helicopter model by thresholding the changes in the surface normal vector along adjacent faces. (a) (b) Figure 12. Polyhedral representation for a teapot and a helicopter. (a) shows a Bspline model for a teapot and its polyhedral approximation. (b) shows a B-spline model for a helicopter and its polyhedral approximation. ( (a) / (b) - : Figure 13. Extraction of features from the polyhedral representation showing (a) edge detection on a teapot and (b) edge detection on a helicopter. Extended Gaussian image representation. Although there is a continuous expression9 for EGI for some objects, such as the solid of revolution, a uniform approach that allows both smooth surfaces and polyhedral objects to have the same EGI structure first approximates smooth surfaces by polygons and then maps each polygon onto the Gaussian sphere. To gain the advantages of EGI, such as its invariant mapping under rotation, a tessellation of the Gaussian sphere should have cells such that (1) there is no overlap and gap between cells, (2) they have the same area, (3) they have the same shape, (4) they occur in a regular rounded pattern, and (5) there exists a formal scheme to obtain finer resolution that still has the above properties. ditions are also ensured by the embedded polygonal approximation locally, which linked-list data structure. However, due to leaves some gaps along adjacent patches the flexibility of nonuniform B-splines because of the above situations. In order used in the Alpha_l system, different com- to get a valid polyhedral model such that binations of order, knot vector, and con- every edge is shared by two and only two trol points may result in different curves polygons, we use a global approach in this having identical geometry. For modeling research. Adjacency information on surof complex shapes it is very likely that face patches contains not only the comadjacent patches will not have the same mon sides but also the ranges in which they knot vector and control points along their are matched. For partially adjacent common boundary, and that some patches patches and patches that are differently will be adjacent to more than one patch on parameterized along the common boundone side. ary, we insert information on more than Currently Alpha_l does subdivision and one adjacency into each side. This infor32 Unfortunately, these criteria cannot be satisfied simultaneously. A simple tessellation by divisions of meridians and parallels has a higher density of cells on both north and south poles. Although we can overcome this by having fewer strips at higher altitudes, this tessellation does not have a linear relationship of rotation between the object and its EGI mapping unless the rotational axis is vertical. Better tessellations result from projecting regular polyhedra onto a concentric unit sphere. These tessellations have proved to be the optimal sampling of COMPUTER a sphere on the corresponding number of samples. However, there are only five regular polyhedra and the maximum number of faces for an icosahedron is 20. Further subdivision on each face of these regular polyhedra is required to obtain finer resolution. A well-known method is the geodesic divisions. For a tessellation based on the icosahedron, each triangle of the icosahedron is subdivided into four equal-sized right triangles. After projecting these subdivided triangles onto a Gaussian sphere, we obtain an 80-face tessellation. By repeating the subdivision/projection procedure, a multiple-resolution tessellation of the Gaussian sphere is constructed hierarchically. This structure consists of a set of concentric spherical shells. The outermost shell has the highest resolution of the geodesic tessellation and the innermost one is the basic icosahedron. On each face of the icosahedron is an inverted triangular pyramid that links the corresponding triangles at different resolution levels. This resembles the pyramidal image structure used in 2D computer vision and has similar properties and advantages. For example, the EGI weight of one cell is equal to the sum of the weights of its four descendents at the next level. To construct the multiple-resolution EGI from a given CAD model, each Bspline patch is first subdivided into flat polygons within a given tolerance. The area of each of the polygons is then accumulated in the corresponding cell of each level. The procedure to access the corresponding cell in one resolution from the polygon's normal vector follows: (1) Determine into which of the triangles of the icosahedron the given normal falls. (2) Determine into which of its four descendents the given normal falls. (3) Repeat Step 2 until you reach the required resolution level. The total number of tests needed to access one cell at level n, assuming the icosahedron is level 0, is 4n + 20 in the worst case. In fact, we find not only the cell at level n, but also all the corresponding cells from level 0 to level n - I simultaneously. Figure 14 shows the results of the geodesic tessellation based on an icosahedron and its EGI mapping for a cylinder. Figure 14a shows the testing cylinder, whose length is twice the diameter. Its EGI mappings are also shown at each resolution level. The darker triangle has higher weight. Figure 14b is the icosahedron at August 1987 (d) (e) level 0. Figures 14c to 14e are levels 1, 2, and 3 and have 80, 320, and 1280 faces, respectively. The orientation of the cylinder is clearly reflected on its EGIs. The two black triangles in Figure 14e are images of the top and bottom circular faces of the cylinder. The side faces of the cylinder are mapped onto a circular strip on the Gaussian sphere. Figure 14. Multiresolution geodesic tessellations of the Gaussian sphere and the corresponding EGIs for a cylinder: (a) a cylinder; (b) an icosahedron, level 0 (20 faces); (c) an icosahedron, level 1 (80 faces); (d) an icosahedron, level 2 (320 faces); and (e) an icosahedron, level 3 (1280 faces). For a concave object, we decompose the object's surface and build an EGI for each patch. Object decomposition and hierarchical representation. Hierarchical representation has been commonly used in different domains.4,20,2' An approach to this representation requires two steps: 33 i1. l | d~~I U Figure 15. The CAD-based robot vision system. position of an object can be based on the (1) Decompose the object into parts. (2) Construct relational links between shape of its local surface. Although this kind of partition does not necessarily cordecomposed parts from Step 1. From a computational point of view, hier- respond to the human visual mechanism, archical representation simplifies the com- it is computationally simple and invariant plexity of the problem. For computer to the viewer's position and direction. vision applications, it provides a solution Different representations use different for recognition of partially visible objects. strategies to decompose objects. GeneralIn 3D object recognition, self occlusion ized cylinder representation requires a volume-based decomposition. Surface occurs even for a single object. Hierarchical structures based on decomposition of curvature representation and EGI require the object's surface and/or volume are a surface-based decomposition. For volume-based decomposition, we partinecessary for any of the above representations in practical use. Moreover, psycho- tion surfaces based on the generic interseclogical studies have given evidence of the tions of surfaces (surfaces intersect role of parts in human visual recog- transversally). For surface-based decomposition, we use the zero crossings of the nition 22 Dividing objects into regular primitives Gaussian curvature and the extrema ofthe (spheres, cubes, etc.) has been common in principal curvatures. In Figure 9 note that CSG systems. It is useful in CAD/CAM Gaussian curvature and extrema of prinapplications because of the analogy cipal curvatures provide good decompobetween set operations and mechanical sition. manufacturing. However, this decomposition contains primitives that may not exist in the sensed data. Thus, it does not he CAD-based approach suit computer vision applications. presented here allows the conAs described in the section "Surface _T struction of vision models curvature representation" above, decom- employing multiple representations for 34 most of the objects found in industrial environments. It differs from using CAD tools to design features that can be visually measured. As summarized in Figure 15, the CAD-based vision model preparation procedure reveals a strong analogy to the image-understanding procedure. It needs some preprocessing of the input CAD designs. Decompositions or 3D segmentations are then performed on the model's shape, the physical surface. Finally, we extract features for different representations from each subpart and integrate them into the hierarchical multiplerepresentations vision models. Our approach connects the relationshlip between the object's image in the real world and the sensory data and its image in the designer's mind, the CAD models. It also provides an automatic and systematic approach to building models using multiple representations on different parts of the same object. These multiple representations and the multiple matching techniques based on these representations are required in a flexible automated environment, where robots equipped with multiple sensors operate. C1 COMPUTER EXECUTIVE DIRECTOR Acknowledgments This work was supported in part by National Science Foundation grants DCR-8506393, and ECS-8307483, DMC-8502115, MCS-8221750. We would like to thank the Alpha_l group at the University of Utah for their cooperation on this project. References Proc. 8th Int'l Conf. Pattern Recognition, Oct. 1986, pp. 236-238. 16. K.T. Gunnarsson and F.B. Prinz, "CAD Model-Based Localization of Parts in Manufacturing," Computer, Aug. 1987, this issue. 17. B. Bhanu, C.C. Ho, and T. Henderson, "3D Model Building for Computer Vision, " Pattern Recognition Letters, May 1987, pp. 349-356. 18. T. Pavlidis, Algorithmsfor Graphics and Image Processing, Computer Science Press, 1982. 19. D.H. Ballard and C.M. Brown, Computer Vision, Prentice,Hall, New York, 1982. 20. R. Nevatia and T.O. Binford, "Description and Recognition of Curved Objects," ArtificialIntelligence, Vol. 8, 1977, pp. 77-98. 21. T. Phillips, R. Cannon, and A. Rosenfield, "Decomposition and Approximation of Three-Dimensional Solids," Computer Vision, Graphics, and Image Processing, Mar. 1986, pp. 307-317. 22. 1. Biederman, "Human Image Understanding: Recent Research and a Theory, " Computer Vision, Graphics, and Image Pr,cessing, Vol. 32, 1985, pp. 29-73. 1. B. Bhanu and T.C. Henderson, "CAGDBased 3D Vision," Proc. IEEEInt'l Conf. Robotics andAutomation, Mar. 1985, pp. 411417. 2. Alpha_l Research Group, Alpha-] Users Manual, Dept. Computer Science, University of Utah, Jan. 1986. 3. R.T. Chin and C.R. Dyer, "Model-Based Recognition in Robot Vision," Computing Surveys, Mar. 1986, pp. 67-108. 4. D. Marr, Vision, W.H. Freeman and Co., New York, 1982. 5. A.A.G. Requicha, "Representation for Rigid Solids: Theory, Methods, and Systems," ACM Computing Surveys, Dec. 1980, pp. 437-464. 6. T.O. Binford, "Visual Perception by Computer," Proc. IEEE Conf. Systems and Control, Dec. 1971. Bir Bhanu is the guest editor of this special issue 7. S.A. Shafer, Shadows and Silhouettes in on CAD-based robot vision. His photo and Computer Vision, Kluwer Academic Pub- biography appear following the Guest Editor's Introduction in this issue. lishers, 1985. 8. I.D. Faux and M.J. Pratt, Computational Geometry for Design and Manufacture, John Wiley & Sons, New York, 1979. 9. B.K.P. Horn, "Extended Gaussian Images," Proc. IEEE, Dec. 1984, pp. 1671-1686. 10. K. Ikeuchi, "Generating an Interpretation Tree from a CAD Model to Represent Object Configurations for Bin-Picking Tasks," Tech. Report CMU-CS-86-144, Dept. Computer Science, Carnegie Mellon Univ., Aug. 1986. 11. P.J. Besl and R.C. Jain, "Invariant Surface Characteristics for 3D Object Recognition in Range Images," Computer Vision, Graphics, and Image Processing, Jan. 1986, pp. 33-80. 12. E. Cohen, T. Lyche, and R.F. Riesenfeld, "Discrete B-splines and Subdivision Techniques in Computer-Aided Geometric Design and Computer Graphics," Computer Graphics and Image Processing, Oct. 1980, pp. 87-111. 13. T.C. Henderson and B. Bhanu, "Intrinsic Chih-Cheng Ho is a graduate student in the Characteristics as the Interface Between Department of Computer Science at the UniverCAD and Machine Vision Systems," Pat- sity of Utah. His research interests include comtern Recognition Letters, Vol. 3, 1985, pp. puter vision, computer-aided geometric design, 425-430. and computer graphics. He is also interested in 14. B. Bhanu, "Representation and Shape Unix/C programming and small computer Matching of 3D Objects," IEEE Trans. systems. Pattern Analysis andMachine Intelligence, Ho received his MS degree in computer May 1984, pp. 340-351. science from the University of Utah and BS 15. B. Bhanu et al., "Range Data Processing: degree in engineering from National Taiwan Representation of Surfaces by Edges," University. August 1987 The Computing Sciences Accreditation Board, Inc., a not-for-profit educational organization, has an opening for the staff director to support an all-volunteer computer-scientist Board of Directors (12) and Computer Science Accreditation Commission (40-50). Requires high-drive self-starter who can deal tactfully with all levels of college administrators, with sponsoring scientific and educational societies, with news media, and with other accreditation organizations. Responsible for overseeing volunteer-reviewer schedules and editing final copy for 40-50 formal reports each year; preparing news releases; making arrangements and preparing documentation for policy meetings, accreditation action meetings, and training sessions; preparing annual report; supervising administrative assistant and word-processing secretary. Moderate travel; some weekend meetings. Computer or computer-related degree or equivalent. Compensation/benefits commensurate with college administration. Resumes in confidence to: CSAB Personnel 14th Floor 345 East 47th St. New York, NY 10017 Tel. 212-705-7314. I N - Held in Pacific Grove, CA, 20 papers were compiled dealing with architectures, microcode tools, microprogramming language issues, and compaction and sequencers. 200 pp. Order #653 Proceedings-Microprogramming Workshop (MICRO-I8) Nonmembers-$40.00 Members-S20.00 Handling Charges Extra Order from IEEE Computer Society Order Dept. 10662 Los Vaqueros Circle, Los Alamitos, CA 90720 (714) 821- 380 35