Wire Vision 00
Wire Vision 00
Wire Vision 00
Manuela Veloso
Introduction
An important rst step in many color vision tasks is to classify each pixel in an image into one of a discrete number of color classes. The leading approaches to accomplishing this task include linear color thresholding, nearest neighbor classication, color space thresholding and probabilistic methods. Linear color thresholding works by partitioning the color space with linear boundaries (e.g. planes in 3-dimensional spaces). A particular pixel is then classied according to which partition it lies in. This method is convenient for learning systems such as neural networks (NNs), or multivariate decision trees (MDTs) (Brodley & Utgoff 1995). A second approach is to use nearest neighbor classication. Typically several hundred pre-classied exemplars are employed, each having a unique location in the color space
Thresholding
The thresholding method described here can be used with general multidensional color spaces that have discrete component color levels, but for the purposes of discussion the YUV color space will be used as an example. In our approach, each color class is specied as a set of six threshold values: two for each dimension in the color space, after the tranformation if one is being used. The mechanism used for thresholding is an important efciency consideration because the thresholding operation must be repeated for each color at each pixel in the image. One way to check if a pixel is a member of a particular color class is to use a set of comparisons similar to
if ((Y >= Ylowerthresh) AND (Y <= Yupperthresh) AND (U >= Ulowerthresh) AND (U <= Uupperthresh) AND (V >= Vlowerthresh) AND (V <= Vupperthresh)) pixel_color = color_class;
to determine if a pixel with values Y, U, V should be grouped in the color class. Unfortunately this approach is rather inefcient because, once compiled, it could require as many as 6 conditional branches to determine membership in one color class for each pixel. This can be especially inefcient on pipelined processors with speculative instruction execution. Instead, our implementation uses a boolean valued decomposition of the multidimensional threshold. Such a region can be represented as the product of three functions, one along each of the axes in the space (Figure 1). The decomposed representation is stored in arrays, with one array element for each value of a color component. Thus class membership can be computed as the bitwise AND of the elements of each array indicated by the color component values:
pixel_in_class = YClass[Y] AND UClass[U] AND VClass[V];
The resulting boolean value of pixel in class indicates whether the pixel belongs to the class or not. This approach allows the system to scale linearly with the number of pixels and color space dimensions, and can be implemented as a few array lookups per pixel. The operation is much faster than the naive approach because the the bitwise AND is a signicantly lower cost operation than an integer compare on most modern processors. To illustrate the approach, consider the following example. Suppose we discretize the YUV color space to 10 levels in each each dimension. So orange, for example might be represented by assigning the following values to the elements of each array:
YClass[] = {0,1,1,1,1,1,1,1,1,1}; UClass[] = {0,0,0,0,0,0,0,1,1,1}; VClass[] = {0,0,0,0,0,0,0,1,1,1};
Thus, to check if a pixel with color values (1,8,9) is a member of the color class orange all we need to do is
YClass Y UClass U U VClass V Binary Signal Decomposition of Threshold Y Visualization as Threshold in Full Color Space V
Figure 1: A three-dimensional region of the color space for classication is represented as a combination of three binary functions. evaluate the expression YClass[1] AND UClass[8] AND VClass[9], which in this case would resolve to 1, or true indicating that color is in the class orange. One of the most signicant advantages of our approach is that it can determine a pixels membership in multiple color classes simultaneously. By exploiting parallelism in the bit-wise AND operation for integers we can determine membership in several classes at once. As an example, suppose the region of the color space occupied by blue pixels were represented as follows:
YClass[] = {0,1,1,1,1,1,1,1,1,1}; UClass[] = {1,1,1,0,0,0,0,0,0,0}; VClass[] = {0,0,0,1,1,1,0,0,0,0};
Connected Regions
After the various color samples have been classied, connected regions are formed by examining the classied samples. This is typically an expensive operation that can severely impact real-time performance. Our connected components merging procedure is implemented in two stages for efciency reasons. The rst stage is to compute a run length encoded (RLE) version for the classied image. In many robotic vision applications signicant changes in adjacent image pixels are relatively infrequent. By grouping similar adjacent pixels as a single run we have an opportunity for efciency because subsequent users of the data can operate on entire runs rather than individual pixels. There is also the practical benet that region merging need now only look for vertical connectivity, because the horizontal components are merged in the transformation to the RLE image. The merging method employs a tree-based union nd with path compression. This offers performance that is not only good in practice but also provides a hard algorithmic bound that is for all practical purposes linear (Tarjan 1983). The merging is performed in place on the classied RLE image. This is because each run contains a eld with all the necessary information; an identier indicating a runs parent element (the upper leftmost member of the region). Initially, each run labels itself as its parent, resulting in a completely disjoint forest. The merging procedure scans adjacent rows and merges runs which are of the same color class and overlap under four-connectedness. This results in a disjoint forest where the each runs parent pointer points upward toward the regions global parent. Thus a second pass is needed to compress all of the paths so that each run is labeled with its the actual parent. Now each set of runs pointing to a single parent uniquely identies a connected region. The process is illustrated in Figure 2).
Rather than build a separate set of arrays for each color, we can combine the arrays using each bit position an array element to represent the corresponding values for each color. So, for example if each element in an array were a two-bit integer, we could combine the orange and blue representations as follows:
YClass[] = {00,11,11,11,11,11,11,11,11,11}; UClass[] = {01,01,01,00,00,00,00,10,10,10}; VClass[] = {00,00,00,01,01,01,00,10,10,10};
Where the rst (high-order) bit in each element is used to represent orange and the second bit is used to represent blue. Thus we can check whether (1,8,9) is in one of the two classes by evaluating the single expression YClass[1] AND UClass[8] AND VClass[9]. The result is 10, indicating the color is in the orange class but not blue. In our implementation, each array element is a 32-bit integer. It is therefore possible to evaluate membership in 32 distinct color classes at once with two AND operations. In contrast, the naive comparison approach could require 32 6, or up to 192 comparisons for the same operation. Additionally, due to the small size of the color class representation, the algorithm can take advantage of memory caching effects.
x 1: Runs start as a fully disjoint forest 2: Scanning adjacent lines, neighbors are merged
Figure 2: An example of how regions are grouped after run length encoding.
have since discovered that the digitizer can capture images in YUV space directly. It is therefore possible to eliminate the color space transformation step. When the transformation step is eliminated the system processes 160x120 images at 30 Hz with a 25% utilization of the CPU. The second successful application was for Carnegie Mellons entry into the RoboCup-99 legged-robot league. These robots, provided by Sony, are quadrupeds similar to the commercially available Aibo entertainment robot. The robots play a game of three versus three soccer against other teams in a tournament. To play effectively, several objects must be recognized and processed, including the ball, teammates and opponents, two goals, and 6 location markers placed around the eld. The hardware includes a camera producing 88x60 frames in the YUV color space at about 15Hz. In this application color classication is done in hardware, removing the need for this step in the software system. Even with one step of the algorithm handled in software however, limited computational resources require an optimized algorithm in order to leave time for higher-level processes like motion control, team behaviors, and localization. The system was modied slightly to include density based region merging to overcome excessively noisy images that simple connectivity could not handle. The system proved to be robust at the RoboCup-99 competition, enabling our team to nish 3rd in the international competition. The third application of the system is as part of an entry for the RoboCup small-size league (F180). This domain involves a static camera tracking remotely controlled robots playing soccer on a small eld. It is currently being incorporated into a system under development for the competition in 2000. Due to recent developments in mechanical platforms the level of competition in this league has increased signicantly. To compete with current state-of-the-art systems vision must be able to track 11 objects moving at up to 2 meters/second at full frame rates. Our system is able to process 320x240 color frames at 30Hz on a 200 MHz Pentium II. In this mode the vision system uses approximately 45% of the available CPU resources. 160x120 pixel images can be processed at 30 Hz using only 12% of the CPU. These results indicate that the system will operate with lower processor utilization than previous implementations, providing more resources for higher level operations. These should both contribute to the robustness of the overall tracking system.
1. 2. 3. 4.
Rotate the color space. Classify each pixel as one of up to 32 colors. Run length encode each scanline according to color. Group runs of the same color into blobs.
5. Sort blobs by color and size. 6. Return blob statistics. The speed of our approach is due to a focus on efcient algorithms at each step. Step 1 is accomplished with a linear transformation. In Step 2 we discard a naive approach that would require up to 192 comparisons per pixel in favor of a faster calculation using two bit-wise AND operations. Step 3 is linear in the number of pixels. Step 4 is accomplished using an efcient union nd algorithm. The sorting in Step 5 is accomplished with radix sort, while Step 6 is completed in a single pass over the resulting data structure. The approach is intended primarily to accelerate low level vision for use in real-time applications where hardware acceleration is either too expensive or unavailable. Functionality is appropriate to provide input to higher level routines which encode geometric and/or domain-specic processing. This tool enables formerly ofine processes to run as a part of a real-time intelligent vision system. The current system and its variants have been demonstrated successfully on three hardware platforms.
References
Brodley, C., and Utgoff, P. 1995. Multivariate decision trees. Machine Learning. Brown, T., and Koplowitz, J. 1979. The weighted nearest neighbor rule for class dependent sample sizes. IEEE Transactions on Information Theory 617619. Jain, R.; Kasturi, R.; and Schunck, B. 1995. Machine Vision. McGraw-Hill. Laboratories, N. 1999. Cognachrome image capture device. http://www.newtonlabs.com. Silk, E. 1999. Human detection and recognition for an hors doeuvres serving robot. http://www.cs.swarthmore.edu/ silk/robot/. Tarjan, R. 1983. Data structures and network algorithms. In Data Structures and Network Algorithms.
Conclusion
We have presented a new system for real-time segmentation of color images. It can classify each pixel in a 320x240 color image, nd and merge blobs of up to 32 colors, and report their centroid, bounding box and area at 30 Hz. The primary contribution of this system is that it is a softwareonly approach implemented on general purpose, inexpensive, hardware (in our case a Pentium II 200 MHz processor with a $200 image digitizer). This provides a signicant advantage over more expensive hardware-only solutions, or other, slower software approaches. The system operates on the image in several steps: