0% found this document useful (0 votes)
58 views7 pages

Hardware Implementation of A Neural-Network Recognition Module For Visual Servoing in A Mobile Robot

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 7

2010 Workshops on Database and Expert Systems Applications

Hardware implementation of a Neural-Network Recognition module for Visual Servoing in a Mobile Robot
Maria Isabel de la Fuente1 , Javier Echanobe2 , Ins del Campo2 , Loreto Susperregui1 , Iaki Maurtua1 1 Department of Autonomous and Intelligent Systems, Tekniker-IK4, Eibar, Spain (e-mail:[email protected]) 2 Department of Electricity and Electronics, Faculty of Sciences and Technology, University of the Basque Country, Leioa, Spain.

AbstractThis paper describes the initial steps in the development of an object detection system for manipulation purposes, to be embedded in a mobile robot. The goal is to design a robotic system to aid workers in a manufacturing plant. The proposed implementation involves the integration of a Field Programmable Gate Array (FPGA) based electronic module with the manipulator arm of the robotic platform. The whole system is provided with a camera which captures images of the objects that can be found in the environment. The FPGA performs the object recognition tasks by means of a neural network. Additional image processing algorithms are used to convert the images obtained by the camera into useful information for the neural network.
Keywords Neural networks; manipulation; mobile robotics; visual servoing; image processing; FPGA; SoPC; embedded systems

I.

INTRODUCTION

Image recognition systems are widely used in different industries such as production plants to detect faulty components, to select a piece on a conveyor or as surveillance systems that are capable of detecting intrusion, differentiating people or observing their motion. What all these systems have in common is the use of high performance cameras and powerful computers with few constraints in power consumption, realtime behaviour, size or cost. In autonomous mobile systems such as the mobile manipulator presented in this paper, the framework is very different. Object positions and environmental conditions have to be acquired in real-time. The term Visual Servoing refers to a useful capability for both manipulator arms and mobile robots [1]. Visual Servoing involves moving a robot or some part of a robot to a desired position using visual feedback [2]. It is a basic building block for purposeful robot behaviours such as manipulation, foraging, target pursuit and landmark based navigation. It comprises object detection and recognition processes which must be fast and stable to handle problems such as cluttered background, rotation, partial occlusion, scale variation and lighting change. However, fast and computation intensive tasks are difcult to implement in small and low power consumption electronic systems required in robot-like systems. The goal of this research is to develop an efcient hardware/software implementation of an object recognition system for an autonomous robot. This recognition system is based on an articial neural
1529-4188/10 $26.00 2010 IEEE DOI 10.1109/DEXA.2010.58

network. In addition, some image processing modules to provide the network with useful data have been designed. Some other works that use neural network based systems for Visual Servoing can be found in [3], [4], [5]. However, in general most of such implementations are PC-based architectures. The implementation presented here is carried out in a FPGA (Field Programmable Gate Array). The very high integration of present FPGAs enable the accommodation of all the components of a typical embedded system (processor core, memory blocks, peripherals, specic hardware,...) on a single chip, commonly referred to as system-on-a-programmablechip (SoPC). The design described here is based on such a SoPC. In particular the neural network module is implemented as a specic hardware while pre-processing modules (e.g., colour ltering, edge detection, etc.) together with the control of the whole system are implemented as software in the embedded processor core. The rest of the paper is organized as follows: In section II the formulation of the problem to be addressed is presented. Here, the robotic system together with the environment where the robot operates are also explained. In Section III we describe the different modules of the recognition system. In particular we describe in detail the neural network module: topology, training procedure, recognition rates, etc. A hardware/software implementation on a SoPC is presented in Section IV. Finally, Section V shows conclusions and future works. II. SYSTEM OVERVIEW This article describes the initial steps toward the realization of a robotic system for assisting manipulation activities. The goal of this research is to implement a system with visuallyguided manipulation behaviour in order to support human activities. The robotic system consists of a mobile platform on top of which there is a manipulator arm equipped with a colour camera. Such a conguration is known as eye-in-hand manipulation (Figure 1). The camera provides visual information of the environment. This information has to be processed by an intelligent system that has to be able to make some decisions based on it. The work presented here is focused on this decision system.
226

distinguishable with respect to the background and other objects. Figure 3 shows the block diagram of the whole process. First, the colour camera is used for the object image acquisition. Then the images are processed by the decision system in order to identify if the desired object is in the image and to obtain its position. Finally, the grasping movements needed to handle the object are planned and the command is sent to the manipulator. The work presented in this paper concentrates on the object recognition module. The goal is to implement it in a FPGA to achieve a high performance system.

Figure 1. Left: Robotic arm on the mobile platform. Right: End-efector of the manipulator with the eye-in-hand camera

The target scenario is the manufacturing plant located at Tekniker-IK41 , which is a real manufacturing shop oor where machines and humans share the space in performing production activities (Figure 2). With regard to object detection purposes the shop oor can be characterized as an industrial environment, with high ceilings, metal halide lamps, high windows, etc. The lighting conditions are very changing from one day to another and even in different locations along the path covered by the robot used in this work. In some cases the camera of the robot has to deal with exposure to direct light.

Figure 3. System overview. In the block diagram, the different stages of the system can be seen: rst, the acquisition of the images is done by the eye-in-hand camera. Then, the images have to be processed to extract the characteristics of the objects. Later, the neural network identies the object. Finally, a grasp planning module decides the movements that have to be done to grasp the object and sends the command to the manipulator

III.

RECOGNITION SYSTEM

Figure 2. Manufacturing plant located at Tekniker-IK4, target scenario for the system

One of the manipulation activities in which a mobile robot can aid a human in this environment is locating an object, such as a hand tool, in the shop oor and bringing it to the worker. In order to achieve this goal, rstly it is necessary the identication of the specic object based on the robot vision. The robot uses the camera to identify the object in front of it. The target objects that the prototype is able to recognize have the following characteristics: rigid bodies xed in the space. orientable surfaces from a topology point of view.
1 http://www.tekniker.es

The aim of the recognition system is to locate the target object and obtain its position relative to the robotic arm. Therefore, the colour and the shape of the desired object are known and the problem is to detect if such an object is present in the environment. This is faced up to in two stages: rst, a pre-processing algorithm handles the image provided by the camera, lters the desired colour and outlines the objects of this colour. Second, a neural network checks if any of the obtained shapes correspond to the target object. The initial approach to afford the recognition problem has been limited to the recognition of simple shapes, but in such a way that it could be extrapolated to any shape, for example those of hand tools. For the experiments performed up to now, some differentcolour wood pieces (cubes, cylinders, rectangular prisms and triangular prisms) have been used. For the rst approach, the objects are located on a table and the robotic arm is sited in such a way that the camera obtains a top view of them. In addition, the light from above causes the upper surfaces of the pieces to have a deeper colour, which is the one considered for the ltering. Hence, four possible shapes have to be recognized: square, circle, rectangle and triangle, which

227

correspond to the upper surfaces of these pieces, respectively (Figure 5). Usually, image processing algorithms are implemented in software and run on a PC. However, in applications with high restrictions in response time or low consumption requirements (like the system described here), hardware specic implementations are needed [6]. The main objection of the image recognition techniques for its realization in hardware is the high complexity of the existent algorithms. For this reason, in this paper a method optimized for its hardware implementation is presented. In the following subsections, the two recognition stages of the system are described in detail. A. Image pre-processing The pre-processing stage converts the images obtained by the camera into useful information for the classication system. This process can be split in two steps: a colour ltering and an edge extraction algorithm. 1) Colour ltering: The colour camera provides the recognition system with the images of the wood pieces in RGB values. Each image is then passed through a colour lter, so that the pixels of the image which RGB values match up with the desired colour, become white, whilst the rest become black. Therefore, the original images are converted to binary images where the objects of the target colour are white and the background is black. 2) Edge extraction algorithm: Once a binary image is obtained, the amount of information contained in it is reduced to preserve only the information considered more relevant for the recognition. To achieve this, different possibilites have been considered [7] and nally an edge extraction technique grounded on the chain-code algorithm has been chosen. The bases of the chain-code algorithm were introduced in 1961 by H. Freeman [8], who described a method which permits the encoding of arbitrary geometric congurations, as a way to make it easier for a digital computer to manipulate them. It is a lossless compression algorithm for binary images, that provides a useful way to depict an object and to derive its features for later applications in pattern recognition [9]. Chain-codes are used to represent the contour of an object by means of a sequence of small vectors of unit length, each one representing the direction of the contour at that point. The number of possible directions is determined, being the 8-connected neighborhood and the 4-connected neighborhood congurations the most commonly used. The 4-connected set of directions, also referred to as external chain-code or crackcode in some sources, is the one employed in this work (Figure 4). First of all, the region of the image where the object lays must be determined, in terms of the density of white pixels. Then, the origin of the object in the image must be xed. In the algorithm presented in this paper, the origin is considered to be the left-most white pixel of the rst line in the object region. Once the origin is xed, the object is outlined in clockwise manner and the directions of the boundary are stored until the algorithm reaches back the initial point.

Figure 4.

Example of the external chain-code of a binary image

Using this algorithm, each object is represented by a sequence of numbers, which length is different for each case, depending on the size of the object in the image and its shape. In order to make the lengths equal and to reduce even more the codication, in such a way that it can be used as the input for the neural network, the sequence is normalized by dividing it into a xed number of smaller sequences. Each of them is processed to obtain the slope between its end points. Thus, each object is represented by a xed length sequence which contains the slopes of the contour. In turn, these slopes can only take a denite number of values, so that the translation to a digital system is more direct.

Figure 5. Image pre-processing stages. The colour image is ltered in order to bring out the object of the desired colour. Then the contour of this object is obtained. This contour is divided into a xed number of segments and the slope of these segments is calculated. So, the input data for the neural network is available

B. Neural classication The classication module consists of an articial neural network where the inputs are the values provided by the image pre-processing stage. Based on these data the neural network classies the shape of the target object. The neural network has a multi-layer perceptron architecture, consisting of an input layer, a hidden layer and an output layer.

228

After performing different experiments, a suitable topology has been obtained, which provides a trade-off between the number of neurons and an acceptable performance. Thus, the number of input neurons has been set to 16, which forces the sequence obtained from the image processing stage to be of this length. The hidden layer has 32 neurons with a tansigmoid activation function. Lastly, the output layer consists of 4 output neurons, one for each possible shape, and no activation function is applied. The architecture of the neural network is presented in Figure 6.

Table I R ESULTS OF THE CLASSIFICATION PERFORMED BY THE NEURAL


NETWORK

Figure 6. Architecture of the proposed neural network. It is a multi-layer perceptron network that consists of 16 input neurons, 32 neurons in the hidden layer and 4 output neurons

The network is trained by means of the back-propagation algorithm, using the gradient-descendent method. The training set is made up of 350 sequences obtained from different images of different object shapes, along with the corresponding target output for each sequence. This set is divided into three different subsets: the 60% of the samples are used for training, the 20% for validation (useful for early-stopping of the training process) and the remaining 20% for test (for estimating the network s ability to generalize). The training algorithm of the network is performed in Matlab, by means of the Neural Network Toolbox [10]. Experiments have been made using different images to evaluate the performance of the proposed network. The highest success rate of the system to detect the object has been of 93.3%. Table I shows the results for each of the shapes considered for the recognition. The mean squared error achieved with this conguration has been of 0.019 after a training of 3x105 iterations. IV. H ARDWARE / SOFTWARE IMPLEMENTATION Nowadays, the so called SoPC (system-on-a-programmablechip) take advantage from the exibility of software and the high performance of hardware. Their proliferation has been possible thanks to the high integration levels achieved in the microelectronic industry, which allow the inclusion of a small microprocessor inside the programmable chip. This fact allows

the designing of efcient heterogeneous hardware/software architectures on a single chip. Historically, the most common way for the implementation of neural networks has been a program running on a personal computer or a workstation. This is due to the fact that software implementations offer a high exibility and give the users the possibility of modifying the topology of the network, the type of the processing elements or the learning rules, according to the requirements of their application. However, biological neural networks, in which articial neural networks are inspired, operate highly in parallel. Hence, implementing them on a sequential computer does not seem the most efcient way to do it. Dedicated hardware implementations, on the other hand, offer a number of important advantages, because they exploit the inherent parallelism of neural networks and also are much faster and robust if compared to software solutions [11]. Furthermore, they provide a physically reduced and low-power solution, useful for applications where including a personal computer or a workstation might not be feasible (such as the case of autonomous robots). These are the main reasons why it has been decided to implement the recognition algorithms on an embedded system and, more specically, the neural network on the hardware partition of the system. The hardware/software architecture described here consists of three parts: the software partition, the hardware partition and the interface to perform the communication between both parts.

Figure 7. Internal architecture of the SoPC. Partitions software and hardware and the interface between them are shown

The software partition is built on a MicroBlaze (the softcore processor from Xilinx) [12] and includes the control of

229

the complete system and, in this rst approach, also the image pre-processing algorithms described above. In future versions, the aim is to migrate some of these algorithms to hardware, specially those suitable for parallel processing [13]. On the other hand, the hardware partition contains the neural network designed for the object recognition task. It is explained in detail in subsection IV-B. A. Interface between software and hardware partitions The interface between both partitions is based on the FSL (Fast Simplex Link) bus [14], which provides a fast and efcient communication mechanism. The FSL is a unidirectional point-to-point communication channel available in the MicroBlaze, which can be used to perform a very fast data transfer to and from the register le on the processor to the hardware running on the FPGA. It implements a FIFO-based communication which depth is congurable and can be as high as 8K (if the width is of 32 bits). In the same way, the width of the FSL can also be congured. In the proposed system two FSL buses are used: the rst one (FSL1) sends data from the software partition to the hardware partition, while the other one (FSL2) performs data transfer in the opposite way. The depth of FSL1 is set to 16 elements (16 input neurons) while FSL2 contains just one element (the recognized shape). Both buses have a width of 8 bits, although only 5 bits (FSL1) and 2 bits (FSL2) are necessary to codify the possible values of the inputs and the possible recognized shapes, respectively. Some of the remaining bits are used to transmit control data and some others are left there for further extensions in the codication or the number of shapes considered. Figure 8 shows the structure of both FSL buses and, more specically, how they should be loaded to transmit the data between both partitions.

processing elements (i. e., neurons) have to be independent and operate in parallel. They should be designed in such a way that their internal calculations are optimized, while they should be so simple that the chip area occupied by them is the minimum possible. Following these requirements, a very small but high performance system can be achieved. The hardware architecture proposed in this paper comprises the following modules. A two-layer processing module: the hidden layer and the output layer (the input layer merely transmits the inputs) Three ROM modules, which store the network parameters (weights) for the hidden layer, the output layer and the sigmoid function, respectively. Additional components, such as a multiplexer and a block that calculates the maximum of its inputs. A circuit controller that governs the whole operation of the system. The main component of the processing module is the neuron, which is just a MAC (multiply-accumulate) block. The MAC is loaded with an initial value (offset or bias) and then multiplies each input with its corresponding weight and accumulates these values to obtain the sum of all them. It is a two-cycle synchronous component (see Figure 9). The total number of these MAC blocks is 36 (32 for the hidden layer and 4 for the output layer).

Figure 9. MAC schematic. This is the core of each neuron of the network. It is loaded with the offset or bias and then performs the multiply-accumulate operations

Figure 8. Fast Simplex Link (FSL) Bus for the comunication between both partitions. FSL1 transmits the input data for the neural network to the hardware partition. FSL2 sends the result of the recognition back to the software partition

B. Architecture of the hardware partition The architecture of the network is the one presented in Figure 6. In a hardware implemented neural network, the

As for the ROM modules, the one that contains the weights of the rst layer (ROM1) has a size of 512 weights with a word length of 12 bits, whilst the one corresponding to the second layer (ROM2) contains 128 weights of a word length of 8 bits. ROM1 is organized in 16 blocks of 32 weights, so that for each of the inputs the corresponding block of 32 weights is addressed and sent to the rst layer of neurons. In the same way, ROM2 is divided into 32 blocks of 4 weights each. Finally, ROM3 is the memory that stores the precomputed sigmoid function and contains 256 values with a word length of 8 bits. The system controller, whose main component is a six-bit counter, provides the control signals for the whole system. Such signals are the reset signals for all the modules, the signals to enable each block, the address signals for ROM1 and ROM2 and the selection signal for the multiplexer. The detailed operation of the whole system is described next. The 16 input data come serially through the FSL1 bus, by means of the FIFO mechanism explained above. Each

230

Figure 10. Schematic of the hardware partition. Here, the different components of the system can be observed: two layers of MAC blocks, two ROM memories with the weights of each layer, a ROM memory with the precomputed sigmoid function, a multiplexer, a maximum circuit and the system controller

neuron (MAC block) receives the inputs serially, multiplies each of them with the corresponding weight (stored in the ROM1 memory) and adds them up. The neuron needs only one clock cycle per input to process the MAC operation, because while the accumulate operation is being done, the next data are already being multiplied, creating a pipeline. Furthermore, the 32 neurons of the layer work in parallel. Hence, only 17 clock cycles (one for each input and one more for the rst data, before starting the pipeline operation) are needed to perform the calculations of the rst layer, in spite of the fact that the inputs enter the system serially. Once the outputs of the rst layer are available, they are used to address ROM3, that contains the activation function. This memory is the same for all the neurons of the rst layer. The outputs of this block are the sigmoid functions of their inputs. All the accesses to the ROM3 are made in parallel, so just a clock cycle is required. This ROM module provides 32 outputs that act as the inputs to a 32 to 1 multiplexer. The multiplexer makes it possible for the inputs to the following layer of neurons to arrive serially, in such a way that this layer would work like the rst one. Thus, each of the 4 neurons of the last layer receives the 32 incoming data serially and performs the MAC operation, needing 33 clock cycles to nish this task (one for each input and an additional one, as in the

previous layer). Finally, the results of these MAC operations are carried to a module that calculates which of them has the maximum value, needing only one cycle to do so. The output of this module represents the shape recognized by the network, codied in 2 bits. This data is sent back to the software partition through the FSL2 bus. C. System performance and resource usage The implementation of the system presented in this paper has been carried out on a FPGA Spartan-3A DSP of Xilinx Inc. with 3400K logic gates [15]. The hardware partition has been developed in VHDL with help of the ISE Design Suite of Xilinx Inc. To decide the word lengths for all the signals in order to obtain an acceptable trade-off between complexity and precision, a previous analysis has been made in Matlab by means of the Fixed-Point Toolbox [16]. In terms of time, the recognition process requires a total of 52 clock cycles. This means that, with a clock frequency of 100 MHz, it will only take 0.52 s to check the shape of the target object. This is a very reduced time, particularly in comparison with a possible software implementation. The main reason is that the hardware exploits the parallelism and performs several operations simultaneously, while on a

231

sequential processor only one operation can be performed at the same time. For example, in a one instruction per cycle processor every single instruction (i.e., products, sums, memory accesses, comparisons) consumes one clock cycle. In the architecture presented, around 2000 of such operations have to be done. This involves a rate of clock cycles between a hardware implementation and a software one of approximately 1/40. Considering an embedded processor that runs at about 100 MHz (such as the MicroBlaze in some congurations), this entails that the hardware implementation runs 40 times faster than the software one would. With respect to the chip occupation, the complete system takes up only the 2% of the available slices of the device, the 1% of the LUT (look-up table) blocks and the 28% of the DSP (digital signal processor) blocks, which represents a very small part of the resources of the FPGA. Thus, there is enough free space for the inclusion of a MicroBlaze with the necessary image processing algorithms in the software partition of the system. In addition, the hardware system has been designed for scalability in terms of the number of inputs to the network, the number of neurons, the word length of each element and even the number of shapes that can be recognized. Hence, due to the modular and exible design of the proposed architecture and to the resources available in the device, the system provides full scalability and the possibility to enhance its capabilities and its precision. V.
CONCLUSIONS AND FURTHER WORK

R EFERENCES
[1] Siciliano B., Khatib O. (Eds.), Handbook of Robotics, Springer, 2008. [2] Espiau B., Chaumette F., Rives P., A New Approach to Visual Servoing in Robotics, Lecture Notes In Computer Science 1991, Volume 708, Pages 106-136. [3] Wells G., Venailleb C., Torrasa C., Vision-based robot positioning using neural networks, Image and Vision Computing, Elsevier B.V., Volume 14, Issue 10, December 1996, Pages 715-732. [4] Wu Q.M.J., Stanley K., Modular neural-visual servoing using a neuralfuzzy decision network, IEEE International Conference on Robotics and Automation, Proceedings., Volume 4, Pages 3238-3243, 1997. [5] Siebel N., Kassahun Y., Learning neural networks for visual servoing using evolutionary methods, Proceedings of the 6th International Conference on Hybrid Intelligent Systems (HIS06), Auckland, New Zealand, Pages 6-9, December 2006. [6] Mahlknecht S., Oberhammer R., Novak G., A Real-Time Image Recognition System for Tiny Autonomous Mobile Robots, Real-Time Systems, Volume 29 , Issue 2-3, Pages 247-261, Kluwer Academic Publishers, March 2005. [7] Russ J. C., The Image Processing Handbook, CRC Press, 2006. [8] Freeman, H. On the encoding of arbitrary geometric congurations. IRE Trans. on Electronic Computers, 1961, EC-10, 260-268 [9] Snchez-Cruza H., Bribiescab E., Rodrguez-Dagninoc R. M. Efciency of chain codes to represent binary objects. Pattern Recognition, Elsevier B.V., Volume 40, Issue 6, Pages 1660-1674, June 2007. [10] Demuth H., Beale M., Hagan M., Neural Network Toolbox 6 Users Guide, 2010, The MathWorks, Inc. [11] Omondi A. R., Rajpakse J. C. (Eds.), FPGA Implementations of Neural Networks, Springer, 2006. [12] MicroBlaze RISC 32-Bit Soft Processor, Product Brief, August 21, 2002 Xilinx. [13] Kisacanin B. , Bhattacharyya S. S., Chai S. (Eds.), Embedded Computer Vision (Advances in Pattern Recognition), Springer, 2008. [14] DS449 June 24, 2009 Xilinx, Fast Simplex Link (FSL) Bus (v2.11b) Product Specication. [15] UG456 (v2.1) March 15, 2010 Xilinx, Spartan-3A DSP FPGA Video Starter Kit User Guide. [16] Fixed-Point Toolbox 3 Users Guide, 2010, The MathWorks, Inc.

In this paper a prototype of a vision system for a robotic platform to assist in manipulation activities has been presented. Experiments have been carried out to test the object recognition system, yielding good results, as stated in section III-B. More work will be done to strengthen the overall performance of this system, taking into account more variability in object shape and colour (real objects). A FPGA module for embedding the object recognition module within a robotic mobile platform is being developed. Up to now, the FPGA module includes the implementation of the neural network. As further work, the rest of the image processing algorithms should also be implemented on the chip. They would be included preferably on the hardware partition of the SoPC for performance reasons, but to do so, a previous analysis has to be made in order to study the feasibility of this option. In addition, the whole system has to be integrated with the robotic platform presented in section II in order to perform the manipulation activities.
ACKNOWLEDGMENT

The authors would like to thank the Spanish Ministry of Sciences and Innovation for partial support of this work under Grant TEC2009-07415, the Spanish ministry of Industry, Tourism and Trade for the funds provided by the project ROBAUCO (FIT-170200-2007-1), the Basque Government for the grant IT419-10, aimed at supporting the activities of the research groups in the University of the Basque Country, and the Fundacin Iaki Goenaga for support with the Aid Grant for PhD programmes.

232

You might also like